et4508_review 2005.ppt

02/05/2005 ET4508_review (KR) 1

Exam

General Comments: Provisional date/time/venue: 16/05/2005, 9am, C1058-

1060(make sure to re-check at http://www.timetable.ul.ie/ !)

Duration: 2½ h

02/05/2005 ET4508_review (KR) 2

ED5532 InstructionsINSTRUCTIONS TO CANDIDATES: Use of a non-programmable pocket calculator is

permitted.

Exam is composed of two parts: Part A and Part B. Both parts must be completed.

Part A is composed of 60 multiple choice questions worth a total of 60 marks. One mark is given for each correct answer. Do not tick any answers on the exam paper. Write the answers on your script (for example 1d, 2a, 3c, and so on). Negative marking does not apply to this section.

Part B: Answer two questions out of B1, B2 and B3. Questions B1, B2 and B3 carry the same number of marks (40 marks each). Attempt no more than two questions from this section.

DETERMINATION OF FINAL MARK: Examination: 80% (60 marks + 80 marks=140 marks) Lab/Assignment: 20% (35 marks)

Total: 100% (175 marks)

02/05/2005 ET4508_review (KR) 3

ET4508 InstructionsINSTRUCTIONS TO CANDIDATES: Use of a non-programmable pocket calculator is

permitted.

Exam is composed of two parts: Part A and Part B. Both parts must be completed.

Part A is composed of 60 multiple choice questions worth a total of 60 marks. One mark is given for each correct answer. Do not tick any answers on the exam paper. Write the answers on your script (for example 1d, 2a, 3c, and so on). Negative marking does not apply to this section.

Part B: Answer two questions out of B1, B2 and B3. Questions B1, B2 and B3 carry the same number of marks (30 marks each). Attempt no more than two questions from this section.

DETERMINATION OF FINAL MARK: Examination: 80% (60 marks + 60 marks=120 marks) Lab/Assignment: 20% (30 marks)

Total: 100% (150 marks)

02/05/2005 ET4508_review (KR) 4

General recommendations

Recommendations for exam preparation Go through slides (on the web) Use lecture notes as the reference material Look at past exam papers (in particular: 2003-2004) Selected exercises taken from tutorial sheets 1-3

02/05/2005 ET4508_review (KR) 5

Focus Part A

Focus of Part A – Multiple choice MPU fundamentals

Buses, address decoding, read and write cycles, memory-mapped I/O vs separate I/O mapping, stacks, interrupts, DMA, etc…

Processors 8086 (main features, register set, functional units, bus interface, minimum system…)

Processor evolution (86, 286, 386, 486, Pentium, MMX, P6, P7), main features, register sets, bus widths, bus interface, burst mode…

Modes of operation (real mode vs protected mode) – main differences…

Instruction queues, instruction pipelines, super-scalar architecture, RISC vs CISC

Memory devices (static/dynamic RAM, ROM, EPROM, Flash), width of data bus and address bus,

ISA, EISA, VESA, main features PCI bus, main features AGP bus, main features

02/05/2005 ET4508_review (KR) 6

Focus Part B

Processors: RISC vs CISC Cache memories Pipelines Memory management (segmentation and paging)

look at exercises PC Architectures and Expansion Buses Legacy ports USB Excluded topics:

Sections/details skipped during classes PC-Card Interface (L13-x)

02/05/2005 ET4508_review (KR) 7

Tutorial

Selected questions from ET4508 / ED5532 Tutorial Sheets 1-3(http://www.ul.ie/~rinne/et4508.htm)

02/05/2005 ET4508_review (KR) 8

Tutorial #1 – Q4

IF IF

Pipeline uPipeline v

D1 D1

D2 D2

EX EX

WB WB

Superscalar processors use more than one pipeline

Under best case conditions the Pentium can complete two instructions in every clock

IA-32 instructions have to be ‘paired’ according to Intel rules

Pipeline u can execute any IA-32 instruction Pipeline v can execute ‘simple’ instructions Pipeline u gets filled first If the second instruction is NOT part of a pair – it

waits for the next slot All pairing & decoding decisions are done in

hardware –software support not required – but helps performance

Instructionk

Instructionk+1

Instructionk-2

Instructionk-4

Instructionk-6

Instructionk-8

Instructionk-7

u pipeline

v pipeline

Cycle n

Instructionk+2

Instructiionk+3

u pipeline

v pipeline

Cyclen+1

Instructionk+4

Instructionk+5

u pipeline

v pipeline

Cyclen+2

Instructionk+6

Instructionk+7

Instructionk+4

Instructionk+5

Instructionk+2

Instructionk+3

Instructionk

Instructionk+1

Instructionk-2

Instructionk-1

u pipeline

v pipeline

Cyclen+3

Instructionk+8

Instructionk+9

Instructionk+6

Instructionk+7

Instructionk+4

Instructionk+5

Instructionk+2

Instructionk+3

Instructionk

Instructionk+1

u pipeline

v pipeline

Cyclen+4

Result k-8

Result k-7

Instructionk-6

Instructionk-1

Instructionk-3

Instructionk-5

Instructionk-1

Instructionk-3

Instructionk-5

Instructionk-2

Instructionk-4

Instructionk

Instructionk+1

Result k-6

Result k-5

Result k-4

Result k-3

Instructionk-2

Instructionk-4

Instructionk

Instructionk-1

Instructionk-3

Instructionk+1

Instructionk+2

Instructionk+3

Result k-2

Result k-1

Result k

Result k+1

IF D1

D2

EX

WB

02/05/2005 ET4508_review (KR) 9

Tutorial #1 – Q4

Instruction Fetch(IF)

D1(v-pipe)

D1(u-pipe)

D2 D2

EX EX

WB WB/X1 X2

WF

ER

Adder

Multiplier

Divider

RegisterStackST(0)-ST(7)

Floating Point Unit

FP Pipeline has 8 stageShares first 4 stages with u integer pipelineWB of U is first execution stage of FP pipline

02/05/2005 ET4508_review (KR) 10

CISC = Complex Instruction Set Computer Complex instructions. Code-size efficient Micro-encoding of the machine instructions Extensive addressing capabilities for memory operations Few, but very useful CPU registers… CISC drawback: Most instructions are so complicated, they have to be

broken into a sequence of micro-steps These steps are called Micro-Code Stored in a ROM in the processor core Micro-code ROM: Access-time and size... They require extra ROM and decode logic

Tutorial #1 – Q5

02/05/2005 ET4508_review (KR) 11

RISC = Reduced Instruction Set Computer Sometimes executing a sequence of simple instructions runs

quicker than a single complex machine instruction that has the same effect

Reduce the instruction set to simplify the decoding Smaller Instruction Set -> Simpler Logic -> Smaller Logic ->

Faster Execution Eliminate microcode – hardwire all instruction execution Pipeline instruction decoding and executing – do more

operations in parallel

Tutorial #1 – Q5

02/05/2005 ET4508_review (KR) 12

Load/Store Architecture – only the load and store instructions can access memory All other instructions work with the

processor internal registers This is necessary for single-cycle execution

– the execution unit can’t wait for data to be read/written

Tutorial #1 – Q5

02/05/2005 ET4508_review (KR) 13

Increase number of internal register due to Load/Store Architecture

Also registers are more general purpose and less associated with specific functions

Compiler designed along with the RISC processor design. Compiler has to be aware of the processor architecture to produce code that can be executed efficiently

Tutorial #1 – Q5

02/05/2005 ET4508_review (KR) 14

Problem of given code sequence: Register dependency

Resulting in pipeline flush, temporary loss of performance

Problem avoided by smart compilers

Tutorial #1 – Q5

02/05/2005 ET4508_review (KR) 15

Tutorial #2 – Q1

CPU

DRAM DRAM

DRAM DRAM

DRAM DRAM

DRAM DRAM

DRAM DRAM

DRAM DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

SRAM Cache

SRAM Cache

The fast SRAM cache is placed between the CPU and the slower DRAM. The SRAM holdsthe most frequently or recently accessed data and makes it available very quickly. The cachecontroller controls the process. It has to keep the data in the cache and the main memory thesame (cache coherency) and uses various strategies for this, such as write-through and write-

back.

Cache Controller

Unified cache for code and data – e.g. i486: More efficient use of resources

Separate (Harvard) code and data caches – e.g. Pentium Faster because you can access code and data in the same clock cycle

02/05/2005 ET4508_review (KR) 16

Tutorial #2 – Q1

Cache Hit: if data required by the CPU is in the cache we have a cache hit, otherwise a cache miss

Cache Hit Rate: Proportion of memory accesses satisfied by cache, Miss Rate more commonly referred to

To prevent memory bottlenecks cache miss rate needs to be no more than a few percent

Cache Line: A block of data held in the cache. It’s the smallest unit of storage that can be allocated in a cache. Processor always reads or writes entire cache lines. Popular cache line size: 16-32 bytes

Cache Line Fill: occurs when a block of data is read from main memory into a cache line

02/05/2005 ET4508_review (KR) 17

Tutorial #2 – Q1

Direct-mapped cache two different memory locations sharing the same set address cannot

be held in the cache at the same time. They will contend…

Tag Set Byte Addr

A3 A0A4A12A13A31

Compare

Hit

Decoder

Tag RAM

16 byte Cache Line

Mux

Data

Data RAM

8K Byte Direct-mappedCache

A12-A4

02/05/2005 ET4508_review (KR) 18

Tutorial #2 – Q1

Two-way Set-associative cache two different memory locations sharing the same set address can be

held in the cache at the same time

Tag Set Byte Addr

A3 A0A4A11A12A3116 byte Cache Line

A11-A4

Decoder Tag RAM Data RAM

Decoder Tag RAM Data RAM

MuxCompare

Hit

MuxCompare

Data

A11-A4

02/05/2005 ET4508_review (KR) 19

Tutorial #2 – Q1

Processor

Cache

Main Memory

Write Buffering - ModifiedWrite Through

Processor

Cache

Main Memory

Write Back

Processor

Cache

Main Memory

Write Through

Processor

Cache

Main Memory

No caching ofInvalid Write cycles

INVALID

Write Buffer(FIFO)

02/05/2005 ET4508_review (KR) 20

Tutorial #2 – Q1 MESI Protocol

Formal Mechanism for controlling cache consistency using snooping Every cache line is in 1 of 4 MESI states (encoded in 2 bits)

ModifiedAn M-state line is available in only one cache and it is also MODIFIED (different from main memory). An M-state line can be accessed (read/written to) without sending a cycle out on the bus

ExclusiveAn E-state line is also available in only one cache in the system, but the line is not MODIFIED (i.e., it is the same as main memory). An E-state line can be accessed (read/written to) without generating a bus cycle. A write to an E-state line causes the line to become MODIFIED

SharedThis state indicates that the line is potentially shared with other caches (i.e., the same line may exist in more than one cache). A read to an S-state line does not generate bus activity, but a write to a SHARED line generates a write-through cycle on the bus. The write-through cycle may invalidate this line in other caches. A write to an S-state line updates the cache

InvalidThis state indicates that the line is not available in the cache. A read to this line will be a MISS and may cause the processor to execute a LINE FILL (fetch the whole line into the cache from main memory). A write to an INVALID line causes the processor to execute a write-through cycle on the bus

02/05/2005 ET4508_review (KR) 21

Tutorial #2 – Q1

Data cache: 2 bits required for encoding of 4 possible states (MESI) Code cache: inherently write protected. only 1 bit required for 2

possible states (SI)

Tag Address

MESIState

LRU

WAY 0

Tag Address

MESIState

WAY 1Data Cache

Set

Tag Address

LRU

Tag Address

State Bit(S or I)

Set

WAY 0 Code Cache WAY 1

State Bit(S or I)

et4508_review 2005.ppt

Documents

memory locations

main memory

loadstore

main features

bus interface

set address

state line

focus part