et4508_review 2005.ppt
DESCRIPTION
TRANSCRIPT
02/05/2005 ET4508_review (KR) 1
Exam
General Comments: Provisional date/time/venue: 16/05/2005, 9am, C1058-
1060(make sure to re-check at http://www.timetable.ul.ie/ !)
Duration: 2½ h
02/05/2005 ET4508_review (KR) 2
ED5532 InstructionsINSTRUCTIONS TO CANDIDATES: Use of a non-programmable pocket calculator is
permitted.
Exam is composed of two parts: Part A and Part B. Both parts must be completed.
Part A is composed of 60 multiple choice questions worth a total of 60 marks. One mark is given for each correct answer. Do not tick any answers on the exam paper. Write the answers on your script (for example 1d, 2a, 3c, and so on). Negative marking does not apply to this section.
Part B: Answer two questions out of B1, B2 and B3. Questions B1, B2 and B3 carry the same number of marks (40 marks each). Attempt no more than two questions from this section.
DETERMINATION OF FINAL MARK: Examination: 80% (60 marks + 80 marks=140 marks) Lab/Assignment: 20% (35 marks)
Total: 100% (175 marks)
02/05/2005 ET4508_review (KR) 3
ET4508 InstructionsINSTRUCTIONS TO CANDIDATES: Use of a non-programmable pocket calculator is
permitted.
Exam is composed of two parts: Part A and Part B. Both parts must be completed.
Part A is composed of 60 multiple choice questions worth a total of 60 marks. One mark is given for each correct answer. Do not tick any answers on the exam paper. Write the answers on your script (for example 1d, 2a, 3c, and so on). Negative marking does not apply to this section.
Part B: Answer two questions out of B1, B2 and B3. Questions B1, B2 and B3 carry the same number of marks (30 marks each). Attempt no more than two questions from this section.
DETERMINATION OF FINAL MARK: Examination: 80% (60 marks + 60 marks=120 marks) Lab/Assignment: 20% (30 marks)
Total: 100% (150 marks)
02/05/2005 ET4508_review (KR) 4
General recommendations
Recommendations for exam preparation Go through slides (on the web) Use lecture notes as the reference material Look at past exam papers (in particular: 2003-2004) Selected exercises taken from tutorial sheets 1-3
02/05/2005 ET4508_review (KR) 5
Focus Part A
Focus of Part A – Multiple choice MPU fundamentals
Buses, address decoding, read and write cycles, memory-mapped I/O vs separate I/O mapping, stacks, interrupts, DMA, etc…
Processors 8086 (main features, register set, functional units, bus interface, minimum system…)
Processor evolution (86, 286, 386, 486, Pentium, MMX, P6, P7), main features, register sets, bus widths, bus interface, burst mode…
Modes of operation (real mode vs protected mode) – main differences…
Instruction queues, instruction pipelines, super-scalar architecture, RISC vs CISC
Memory devices (static/dynamic RAM, ROM, EPROM, Flash), width of data bus and address bus,
ISA, EISA, VESA, main features PCI bus, main features AGP bus, main features
02/05/2005 ET4508_review (KR) 6
Focus Part B
Processors: RISC vs CISC Cache memories Pipelines Memory management (segmentation and paging)
look at exercises PC Architectures and Expansion Buses Legacy ports USB Excluded topics:
Sections/details skipped during classes PC-Card Interface (L13-x)
02/05/2005 ET4508_review (KR) 7
Tutorial
Selected questions from ET4508 / ED5532 Tutorial Sheets 1-3(http://www.ul.ie/~rinne/et4508.htm)
02/05/2005 ET4508_review (KR) 8
Tutorial #1 – Q4
IF IF
Pipeline uPipeline v
D1 D1
D2 D2
EX EX
WB WB
Superscalar processors use more than one pipeline
Under best case conditions the Pentium can complete two instructions in every clock
IA-32 instructions have to be ‘paired’ according to Intel rules
Pipeline u can execute any IA-32 instruction Pipeline v can execute ‘simple’ instructions Pipeline u gets filled first If the second instruction is NOT part of a pair – it
waits for the next slot All pairing & decoding decisions are done in
hardware –software support not required – but helps performance
Instructionk
Instructionk+1
Instructionk-2
Instructionk-4
Instructionk-6
Instructionk-8
Instructionk-7
u pipeline
v pipeline
Cycle n
Instructionk+2
Instructiionk+3
u pipeline
v pipeline
Cyclen+1
Instructionk+4
Instructionk+5
u pipeline
v pipeline
Cyclen+2
Instructionk+6
Instructionk+7
Instructionk+4
Instructionk+5
Instructionk+2
Instructionk+3
Instructionk
Instructionk+1
Instructionk-2
Instructionk-1
u pipeline
v pipeline
Cyclen+3
Instructionk+8
Instructionk+9
Instructionk+6
Instructionk+7
Instructionk+4
Instructionk+5
Instructionk+2
Instructionk+3
Instructionk
Instructionk+1
u pipeline
v pipeline
Cyclen+4
Result k-8
Result k-7
Instructionk-6
Instructionk-1
Instructionk-3
Instructionk-5
Instructionk-1
Instructionk-3
Instructionk-5
Instructionk-2
Instructionk-4
Instructionk
Instructionk+1
Result k-6
Result k-5
Result k-4
Result k-3
Instructionk-2
Instructionk-4
Instructionk
Instructionk-1
Instructionk-3
Instructionk+1
Instructionk+2
Instructionk+3
Result k-2
Result k-1
Result k
Result k+1
IF D1
D2
EX
WB
02/05/2005 ET4508_review (KR) 9
Tutorial #1 – Q4
Instruction Fetch(IF)
D1(v-pipe)
D1(u-pipe)
D2 D2
EX EX
WB WB/X1 X2
WF
ER
Adder
Multiplier
Divider
RegisterStackST(0)-ST(7)
Floating Point Unit
FP Pipeline has 8 stageShares first 4 stages with u integer pipelineWB of U is first execution stage of FP pipline
02/05/2005 ET4508_review (KR) 10
CISC = Complex Instruction Set Computer Complex instructions. Code-size efficient Micro-encoding of the machine instructions Extensive addressing capabilities for memory operations Few, but very useful CPU registers… CISC drawback: Most instructions are so complicated, they have to be
broken into a sequence of micro-steps These steps are called Micro-Code Stored in a ROM in the processor core Micro-code ROM: Access-time and size... They require extra ROM and decode logic
Tutorial #1 – Q5
02/05/2005 ET4508_review (KR) 11
RISC = Reduced Instruction Set Computer Sometimes executing a sequence of simple instructions runs
quicker than a single complex machine instruction that has the same effect
Reduce the instruction set to simplify the decoding Smaller Instruction Set -> Simpler Logic -> Smaller Logic ->
Faster Execution Eliminate microcode – hardwire all instruction execution Pipeline instruction decoding and executing – do more
operations in parallel
Tutorial #1 – Q5
02/05/2005 ET4508_review (KR) 12
Load/Store Architecture – only the load and store instructions can access memory All other instructions work with the
processor internal registers This is necessary for single-cycle execution
– the execution unit can’t wait for data to be read/written
Tutorial #1 – Q5
02/05/2005 ET4508_review (KR) 13
Increase number of internal register due to Load/Store Architecture
Also registers are more general purpose and less associated with specific functions
Compiler designed along with the RISC processor design. Compiler has to be aware of the processor architecture to produce code that can be executed efficiently
Tutorial #1 – Q5
02/05/2005 ET4508_review (KR) 14
Problem of given code sequence: Register dependency
Resulting in pipeline flush, temporary loss of performance
Problem avoided by smart compilers
Tutorial #1 – Q5
02/05/2005 ET4508_review (KR) 15
Tutorial #2 – Q1
CPU
DRAM DRAM
DRAM DRAM
DRAM DRAM
DRAM DRAM
DRAM DRAM
DRAM DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
SRAM Cache
SRAM Cache
The fast SRAM cache is placed between the CPU and the slower DRAM. The SRAM holdsthe most frequently or recently accessed data and makes it available very quickly. The cachecontroller controls the process. It has to keep the data in the cache and the main memory thesame (cache coherency) and uses various strategies for this, such as write-through and write-
back.
Cache Controller
Unified cache for code and data – e.g. i486: More efficient use of resources
Separate (Harvard) code and data caches – e.g. Pentium Faster because you can access code and data in the same clock cycle
02/05/2005 ET4508_review (KR) 16
Tutorial #2 – Q1
Cache Hit: if data required by the CPU is in the cache we have a cache hit, otherwise a cache miss
Cache Hit Rate: Proportion of memory accesses satisfied by cache, Miss Rate more commonly referred to
To prevent memory bottlenecks cache miss rate needs to be no more than a few percent
Cache Line: A block of data held in the cache. It’s the smallest unit of storage that can be allocated in a cache. Processor always reads or writes entire cache lines. Popular cache line size: 16-32 bytes
Cache Line Fill: occurs when a block of data is read from main memory into a cache line
02/05/2005 ET4508_review (KR) 17
Tutorial #2 – Q1
Direct-mapped cache two different memory locations sharing the same set address cannot
be held in the cache at the same time. They will contend…
Tag Set Byte Addr
A3 A0A4A12A13A31
Compare
Hit
Decoder
Tag RAM
16 byte Cache Line
Mux
Data
Data RAM
8K Byte Direct-mappedCache
A12-A4
02/05/2005 ET4508_review (KR) 18
Tutorial #2 – Q1
Two-way Set-associative cache two different memory locations sharing the same set address can be
held in the cache at the same time
Tag Set Byte Addr
A3 A0A4A11A12A3116 byte Cache Line
A11-A4
Decoder Tag RAM Data RAM
Decoder Tag RAM Data RAM
MuxCompare
Hit
MuxCompare
Data
A11-A4
02/05/2005 ET4508_review (KR) 19
Tutorial #2 – Q1
Processor
Cache
Main Memory
Write Buffering - ModifiedWrite Through
Processor
Cache
Main Memory
Write Back
Processor
Cache
Main Memory
Write Through
Processor
Cache
Main Memory
No caching ofInvalid Write cycles
INVALID
Write Buffer(FIFO)
02/05/2005 ET4508_review (KR) 20
Tutorial #2 – Q1 MESI Protocol
Formal Mechanism for controlling cache consistency using snooping Every cache line is in 1 of 4 MESI states (encoded in 2 bits)
ModifiedAn M-state line is available in only one cache and it is also MODIFIED (different from main memory). An M-state line can be accessed (read/written to) without sending a cycle out on the bus
ExclusiveAn E-state line is also available in only one cache in the system, but the line is not MODIFIED (i.e., it is the same as main memory). An E-state line can be accessed (read/written to) without generating a bus cycle. A write to an E-state line causes the line to become MODIFIED
SharedThis state indicates that the line is potentially shared with other caches (i.e., the same line may exist in more than one cache). A read to an S-state line does not generate bus activity, but a write to a SHARED line generates a write-through cycle on the bus. The write-through cycle may invalidate this line in other caches. A write to an S-state line updates the cache
InvalidThis state indicates that the line is not available in the cache. A read to this line will be a MISS and may cause the processor to execute a LINE FILL (fetch the whole line into the cache from main memory). A write to an INVALID line causes the processor to execute a write-through cycle on the bus
02/05/2005 ET4508_review (KR) 21
Tutorial #2 – Q1
Data cache: 2 bits required for encoding of 4 possible states (MESI) Code cache: inherently write protected. only 1 bit required for 2
possible states (SI)
Tag Address
MESIState
LRU
WAY 0
Tag Address
MESIState
WAY 1Data Cache
Set
Tag Address
LRU
Tag Address
State Bit(S or I)
Set
WAY 0 Code Cache WAY 1
State Bit(S or I)