1 pertemuan 08 intel x86 matakuliah: h0162/ mikroprosesor tahun: 2006 versi: 1/0

47
1 Pertemuan 08 Intel x86 Matakuliah : H0162/ Mikroprosesor Tahun : 2006 Versi : 1/0

Upload: damian-ford

Post on 27-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

1

Pertemuan 08Intel x86

Matakuliah : H0162/ Mikroprosesor

Tahun : 2006

Versi : 1/0

2

Learning Outcomes

Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu:

• menerangkan arsitektur mikroprosesor keluarga Pentium (C2)

3

Outline Materi

• Jenis-jenis Pentium

• Fitur tambahan

• Branch Prediction

• Dual Independent Bus

• SpeedStep

4

Enhancing CPU Operation

• Local Bus (system bus or front side bus): connects CPU to RAM, and video card slots – fastest bus.

• L1 and L2 cache: Cache in CPU is called L1 cache (static RAM – faster). L2 cache (dynamic RAM –slower) is mounted close to the processor L1 cache is divided into data cache and instruction cache. Pentium III incorporates both caches into the CPU.

• Math Coprocessor: Beginning with 486 math coprocessor (floating point unit) was integrated into CPU.

5

Processor Descriptive Features

• RISC (Reduced instructional set of commands) – produces CPU that is faster and cheaper – software has to be better.

• CISC (Complex instructional set computer) – opposite of RISC – most processors.

• MMX (Multimedia extensions) – additional 57 commands for graphics.

• Multiple Branch Prediction – guessing what will be needed next – speeds CPU operations – over 90% accurate.

6

Processor Descriptive Features

• Superscalar Technology – processing more than one instruction at a time – Pentium onwards – 2 pipelines (paths for data to CPU).

• Dynamic Execution – enhanced superscalar and multiple branch predict features – Pentium II onwards.

• Dual Independent Bus (DIB) – Pentium Pro and Pentium II onwards – one bus to main memory and other to L2 cache.

7

Real, Protected and Virtual Modes

• Real Mode – 286 and later – operates within first 1MB of memory - Multitasking not supported – acts like 8088 or 8086.

• Protected Mode – Processor supports multitasking – accesses more than 1 MB of memory. While in protected mode each program is given its own section of memory.

• Virtual Mode – processor can operate several real mode programs at once and access memory higher than 1 MB.

8

Jenis-jenis Pentium

• Yang dimaksud dengan keluarga Pentium adalah dari Pentium, Pentium Pro, Pentium II, Pentium III, Pentium 4

9

Name Date Transistors Microns Clock speed Data width MIPS

8080 1974 6,000 6 2 MHz 8 bits 0.64

8088 1979 29,000 3 5 MHz16 bits

8-bit bus0.33

80286 1982 134,000 1.5 6 MHz 16 bits 1

80386 1985 275,000 1.5 16 MHz 32 bits 5

80486 1989 1,200,000 1 25 MHz 32 bits 20

Pentium 1993 3,100,000 0.8 60 MHz32 bits

64-bit bus100

Pentium II 1997 7,500,000 0.35 233 MHz32 bits

64-bit bus~300

Pentium III 1999 9,500,000 0.25 450 MHz32 bits

64-bit bus~510

Pentium 42000

42,000,000 0.18 1.5 GHz32 bits

64-bit bus~1,700

Pentium 4 HT 2002 55,000,000 0.13 2.4 GHz2x32 bits

64-bit bus

From http://www.howstuffworks.com/microprocessor1.htmProgress @ Intel

10

Pentium Family

• Pentium -1993 – 200 MHz – 64 bit bus – pipeline and superscalar architecture.

• Pentium Overdrive – 1995 – 100 MHz – upgrade for 486.

• Pentium Pro – 1995 – 200MHz.

• Pentium with MMX – 1997 – 200 MHz-designed for multimedia applications.

• MMX Overdrive – 1997 – 200 MHz - upgrade for Pentium

11

Pentium Family

• Pentium II – 1997 – 450 MHz – Single Edge Contact (SEC) processor package.

• Pentium III – 1999 -70 new instructions for graphics, video, audio and speech recognition – 1 GHz – up to 64GB memory

• Pentium IV – 2000 – 1.3 GHz and up – 32 bit processor – 400 MHz front side bus – 144 new programming instructions to improve video, audio and 3D applications

• Lebih detilnya di untuk setiap variant speednya http://www.xbitlabs.com/news/cpu/display/20030327134746.html

12

Pentium Pro Die Photo

5.5 Juta Transistor

13

PentiumDie Photo

14

Pentium 4 Die Photo

42 Juta Transistor

15

The Heat Problem

Courtesy of Bob ColwellIncreasing Frequency

Wat

ts/c

m2

1

10

100

1000

1.5 1.0 0.7 0.5 0.35 0.25 0.18 0.13 0.1 0.07

i386i486

PentiumPentium Pro

Pentium IIPentium III

Hot Plate

Nuclear Reactor

Rocket Nozzle

Pentium 4(Prescott)

Pentium 4(Willamette)

16

Microarchitecture Trends

Adapted from Johan De Gelas, Quest for More Processing Power,AnandTech, Feb. 8, 2005.

17

Moore’s Law Still Holds

’60 ’65 ’70 ’75 ’80 ’85 ’90 ’95 ’00 ’05 ’10

Tra

nsi

stor

s P

er D

ie

1K

4K16K

64K256K

1M

16M4M

64M

4004

8080

8086

80286i386™

i486™Pentium®

Memory

Microprocessor

Pentium® II

Pentium® III

256M

Pentium® 4

Itanium®

1G2G 4G

128M

Source: Intel

108

107

106

105

104

103

102

101

100

109

1010

1011

512M

18

19

Pentium

• 100% binary compatible with ancestors.

• Enhancements and additions to i486: – Superscalar Architecture – Dynamic Branch Prediction – Pipelined Floating-Point Unit – Improved Instruction

Execution Time – Separate 8K Code and Data

Caches – Writeback MESI Protocol

(Data Caches) – 64-Bit Data Bus – Bus Cycle Pipelining – Address Parity – Internal Parity Checking

– Functional Redundancy Checking

– Execution Tracking – Performance Monitoring – IEEE 1149.1 Boundary Scan – System Management Mode – Virtual Mode Extensions

• New instructions to accommodate the additional functionality.

• The MMU fully compatible with i386 and i486.

• The floating-point unit completely redesigned, compared with i486.

20

Pentium

21

Pentium Block Diagram

22

Pentium Pro

• Fitur tambahan:– out-of-order execution engine, – dual integer pipelines, and– improved floating-point unit

23

Fitur Pentium Pro

• Superpipelining: The Pentium Pro dramatically increases the number of execution steps, to 14, from the Pentium's 5.

• Integrated Level 2 Cache: The Pentium Pro features a dramatically higher-performance secondary cache compared to all earlier processors. Instead of using motherboard-based cache running at the speed of the memory bus, it uses an integrated level 2 cache with its own bus, running at full processor speed, typically three times the speed that the cache runs at on the Pentium. The Pentium Pro's cache is also non-blocking, which allows the processor to continue without waiting on a cache miss.

• 32-Bit Optimization: The Pentium Pro is optimized for running 32-bit code (which most modern operating systems and applications use) and so gives a greater performance improvement over the Pentium when using the latest software.

• Wider Address Bus: The address bus on the Pentium Pro is widened to 36 bits, giving it a maximum addressability of 64 GB of memory.

24

Fitur Pentium Pro

• Greater Multiprocessing: Quad processor configurations are supported with the Pentium Pro compared to only dual with the Pentium.

• Out of Order Completion: Instructions flowing down the execution pipelines can complete out of order.

• Superior Branch Prediction Unit: The branch target buffer is double the size of the Pentium's and its accuracy is increased.

• Register Renaming: This feature improves parallel performance of the pipelines.

• Speculative Execution: The Pro uses speculative execution to reduce pipeline stall time in its RISC core.

25

Pentium II

• The Pentium II utilizes features of P6 microarchitecture (namely a multi-transaction bus, Dynamic Execution performance, and Intel MMX)

• Dual Independent Bus architecture • 66MHz or 100MHz system bus • Single Edge Contact Cartridge packaging

technology • 512K unified, non-blocking L2 Cache • 233MHz through 450MHz clock speeds

26

Pentium III

• Pentium III = Pentium II + SSE

• SSE : Internet Streaming SIMD Extensions

• Seventy New Instruction

• Three Categories:– SIMD-Floating Point – New Media Instruction– Streaming Memory Instruction

27

SIMD: Single Instruction Multiple Data

• MMX instruction & SSE instruction– provides a group of instructions that

perform SIMD operations on packed integer and/or packed floating-point data elements contained in the 64-bit MMX or the 128-bit XMM registers.

– enables increased performance on a wide variety of multimedia and communications applications.

28

SIMD-FP Instruction

• SIMD feature introduce a new register file containing eight 128-bit registers– Capable of holding a vector of four IEEE

single precision FP data elements– Allow four single precision FP operations to

be carried out within a single instruction

29

SIMD

30

Fitur Pentium 4

• 400MHz System Bus

• Hyper-Pipelined Technology

• Rapid Execution Engine

• Execution Trace Cache

• Advanced Transfer Cache

• Advanced Dynamic Execution

• Enhanced Floating Point

• Streaming SIMD2 Instructions

32

Pentium Pipeline

• The Pentium's basic integer pipeline is five stages long, with the stages broken down as follows:

– Prefetch/Fetch: Instructions are fetched from the instruction cache and aligned in prefetch buffers for decoding.

– Decode1: Instructions are decoded into the Pentium's internal instruction format. Branch prediction also takes place at this stage.

– Decode2: Same as above, and microcode ROM kicks in here, if necessary. Also, address computations take place at this stage.

– Execute: The integer hardware executes the instruction.

– Write-back: The results of the computation are written back to the register file

33

Pentium Pipeline

34

P5 Architecture

35

P6 Architecture

36

NetBurst Architecturehttp://www.intel.com/cd/ids/developer/asmo-na/eng/dc/pentium4/optimization/44015.htm?page=1

37

Branch Prediction

38

Branch Prediction

• Imagine a simple microprocessor where all instructions are handled in two steps: decoding and execution. The microprocessor can save time by decoding one instruction while the preceding instruction is executing. This assembly line-principle is called pipelining. In advanced microprocessors, the pipeline may have many steps so that many consecutive instructions are underway in the assembly line at the same time, one at each stage in the pipeline.

39

Branch Prediction

• The problem now occurs when we meet a branch instruction. A branch instruction is the implementation of an if-then-else construct. If a condition is true then jump to some other location; if false then continue with the next instruction. This gives a break in the flow of instructions through the pipeline because the processor doesn't know which instruction comes next until it has finished executing the branch instruction. The longer the pipeline, the longer time it will have to wait until it knows which instruction to feed next into the pipeline. As modern microprocessors tend to have longer and longer pipelines, there has been a growing need for doing something about this problem.

40

Branch Prediction

• The solution is branch prediction. The microprocessor tries to predict whether the branch instruction will jump or not, based on a record of what this branch has done previously. If it has jumped the last four times then chances are high that it will also jump this time. The microprocessor decides which instruction to load next into the pipeline based on this prediction, before it knows for sure. This is called speculative execution. If the prediction turns out to be wrong, then it has to flush the pipeline and discard all calculations that were based on this prediction. But if the prediction was correct, then it has saved a lot of time.

41

2 Level Branch Prediction

42

Branch Prediction

• BTB used to predict the outcome of branch instructions. • Current address in D1 is applied to BTB. • If hit,

the assumption is that the branch will be taken (if the assumption is correct, execution goes without stalls and flushes).

• If miss, the assumption is the branch will not be taken.

• A mispredicted branch (weather BTB hit or miss) causes the pipeline to be flushed.

• The number of delay clocks depends on the branch type.

43

Dual Independent Bus (DIB)

• a bus architecture that is part of Intel's Pentium Pro and Pentium II microprocessors.

• As its name implies, DIB uses two buses: one from the processor to main memory, and the other from the processor to the L2 cache.

• The processor can access both buses simultaneously, which increases throughput.

44

Dual Independent Bus

• The Pentium II processor bus architecture addresses processor-to-memory bus bandwidth limitations, offering up to three times the performance bandwidth of the single-bus, "socket 7" generation processors, such as the Pentium processor. This translates into overall faster system performance.

• Two buses make up the Dual Independent Bus architecture: the L2 cache bus and the processor-to-main-memory system bus. The speed of the dedicated L2 cache bus on the Pentium II processor scales with the speed of the processor.

45

Dual Independent Bus

• The cache bus on the 300-MHz processor, for instance, runs at 150 MHz, more than twice as fast as the L2 cache on a Pentium processor, which runs at a fixed 66 MHz.

• The processor-to-main-memory system bus enables simultaneous parallel transactions instead of single, sequential transactions of previous generation processors, further increasing performance.

46

Dual Independent Bus

• Two buses make up the Dual Independent Bus architecture: the L2 cache bus and the system bus. Each is 8-bytes wide, thus doubling the available channels for data.

• As the L2 cache bus is integrated into the Single Edge Contact cartridge, it is not limited in speed by the constraints of motherboard routing.

• Therefore the L2 cache bus is designed to run at 1/2 the processor core frequency on the Pentium II processor. Peak bandwidth for a Pentium II processor with Dual Independent Bus Architecture can be calculated as 533MB/sec for the system bus plus 8 times the L2 cache bus frequency.

47

Intel SpeedStep Technology

• Speed Step was first seen helping to preserve battery life in Pentium III notebook computers by reducing the speed (and hence the power drain) of the processor when it had less work to do.

• The design allows notebooks to power down from 600MHz or 650MHz to 500MHz when running on battery power.

• Enhanced Intel Speed Step technology (EIST) What this technology does is to dynamically scale the speed of the processor between its default clock setting and a minimum speed (at the moment) of 2.8GHz based on how much CPU horsepower is needed at that moment.