2014-4-29 john lazzaro (not a prof - “john” is always ok)
DESCRIPTION
www-inst.eecs.berkeley.edu/~cs152/. CS 152 Computer Architecture and Engineering. Lecture 26 -- Midterm II Review Session. 2014-4-29 John Lazzaro (not a prof - “John” is always OK). TA: Eric Love. Play:. Today - Midterm II Review Session. Study Tips. HW 2, problem by problem - PowerPoint PPT PresentationTRANSCRIPT
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-29
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 26 -- Midterm II Review Session
Play:
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
Today - Midterm II Review Session
HW 2, problem by problem (if there is time)
Study Tips
HKN
CS152 Midterm IIMay 1st, 2014
Name:
“All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet.”
Signature:
Please write clearly, and put your name on each page. Please abide by word limits. Good luck!
Eric LoveJohn Lazzaro
1 25
2 25
3 25
4 25
Tot 100
# Points
SSID:
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
What does it cover? Lectures 9 onward
Focus will be on problems that require you to do a task
(write a small program, trace through execution ,etc)
that demonstrates that you understand a concept.
[...]No transistor-level questions (DRAM and SRAM
cells, etc)Time for a quick walk-
through ...
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-2-18
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 9 -- Memory
UC Regents Spring 2014 © UCBCS 152 L9: Memory
8192 rows
16384 columns
134 217 728 usable bits(tester found good bits in bigger array)
1
of
8192
decoder
13-bitrow
address input
16384 bits delivered by sense amps
Select requested bits, send off the chip
Latency is not the same as bandwidth!
What if we want all of the 16384 bits? In row access time (55 ns) we can do
22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits
<< 16384Now the row access time looks fast!
Thus, push to faster DRAM
interfaces
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-2-20
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 10 -- Cache I
UC Regents Fall 2008 © UCBCS 194-6 L8: Cache
Latency: A closer look
Reg L1 Inst L1 Data L2 DRAM Disk
Size 1K 64K 32K 512K 256M 80G
Latency(cycles) 1 3 3 11 160 1E+07
Latency(sec) 0.6n 1.9n 1.9n 6.9n 100n 12.5m
Hz 1.6G 533M 533M 145M 10M 80
Architect’s latency toolkit:
Read latency: Time to return first byte of a random access
(1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-2-25
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 11 -- Cache II
UC Regents Spring 2014 © UCBCS 152 L11: Cache II
Issue #4: When to write to lower level ...
Write-Through Write-Back
Policy
Data written to cache block
also written to lower-level
memory
Write data only to the cache
Update lower level when a
block falls out of the cache
Do read misses produce writes? No Yes
Do repeated writes make it to lower level?
Yes No
Related issue:
Do writes to blocks not in the cache get put in the
cache (”write-
allocate”) or not?
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-2-27
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 12 -- Virtual Memory
UC Regents Fall 2006 © UCBCS 152 L15: Virtual Memory
V=0 pages either reside on disk or have not
yet been allocated.
OS handles V=0“Page fault”
In this example,physical and virtual
pages must be the same size!
The TLB caches page table entries
MIPS handles TLB misses
in software (random replacement). Other
machines use hardware.
for ASID
Physicalframe
address
TLB
Page Table
2
0
1
3
virtual address
page off
2frame page
250
physical address
page off
TLB caches page table
entries.
frame
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-3-4
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 13 - Synchronization
CS 152 L24: Multiprocessors UC Regents Fall 2006 © UCB
Non-blocking consumer synchronization
Compare&Swap(Rt,Rs, m)if (Rt == M[m])then M[m] = Rs; Rs = Rt; /* do swap */else /* do not swap */
Another atomic read-modify-write instruction:
If thread swaps out before Compare&Swap, no latency problem;this code only “holds” the lock for one instruction!
try: LW R3, head(R0) ; Load queue head into R3spin: LW R4, tail(R0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDI R6, R3, 4 ; Shift head by one word
Compare&Swap R3, R6, head(R0); Try to update head BNE R3, R6, try ; If not success, try again
Assuming sequential consistency: MEMBARs not shown ...
If R3 != R6, another thread got here first, so we must try again.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-3-6
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 14 - Cache Design and Coherence
CS 152 L14: Cache Design and Coherency UC Regents Spring 2014 © UCB
Writes from 10,000 feet ... for write-thru L1
CPU0
Cache Snooper
CPU1
Shared Main Memory Hierarchy
Cache SnooperMemory bus
1. Writing CPU takes control of bus.
2. Address to be written is invalidated in all other caches.
3. Write is sent to main memory.
Reads will no longer hit in cache and get stale data.
Reads will cache miss, retrieve new value from main memory
For write-thru caches ...
To a first-order, reads will “just work” if write-thru caches implement this policy.A “two-state” protocol (cache lines are “valid” or “invalid”).
UC Regents Spring 2014 © UCBCS 152 L15: Superscalars and Scoreboards
2014-3-11
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 15 -- Advanced CPUs
UC Regents Fall 2008 © UCBCS 194-6 L9: Advanced Processors I
Split pipelines: a write-after-write hazard.
The pipeline splits after the RF stage, feeding functional units
with different latencies.
WAW Hazard
SUB R1, R2, R3DIV R1, R2, R3
If long latency DIV and short latency SUB are sent to
parallel pipes, SUB may finish first.
Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard.
UC Regents Fall 2008 © UCBCS 194-6 L9: Advanced Processors I
IR IR
IF (Fetch) ID (Decode) EX (ALU)
IR IR
MEM WB
IR IR
IF (Fetch) ID (Decode) EX (ALU)
IR IR
MEM WB
rd1
RegFile
rd2
WE1
wd1
rs1
rs2
ws1
WE2
rd3
rd4
rs3
rs4
wd2
ws2
A
B
A
B
Y
Y
R
R
Superscalar R machine
Addr
Data
InstrMem
64
32PC and
Sequencer
Instruction Issue Logic
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-3-20
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 17 -- Networks, Routers, Google
6 key parameters scale across dimension of “by one server”, “by 80-server rack” and “by
array”
To get more DRAM and disk capacity, you must work on a scale larger than a single
server.But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned.
Exception: disk latency is roughly scale-independent.
you must work on a scale larger than a single server.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-1
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 18 -- Dynamic Scheduling I
Thanks to Krste Asanovic ...
UC Regents Spring 2014 © UCBCS 152 L18: Dynamic Scheduling I
ADDI PR01,PR00,64
LD PF00 0(PR01)
ADDD PF04, PF00, PF02
SD PF04, 0(PR01)
SUBI PR11, PR01, 8
BEQZ PR11 ENDLOOP
ITER2: LD PF10 0(PR11)
ADDD PF14, PF10, PF02
SD PF14, 0(PR11)
SUBI PR21, PR11, 8
BEQZ PR21 ENDLOOP
ITER3: LD PF20 O(PR21)
[...]
R1→ PR01F0→ PF00
Given an endless supply of registers ...
Rename “architected registers” (Ri, Fi) to new “physical registers” (PRi, PFi) on each write.
What was gained?An instruction
may execute once all of its source registers
have been written.
ADDI R1,R0,64
F4,0(R1)
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-3
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 19 -- Dynamic Scheduling II
Rename stage close-up:(1) Allocates new physical registers for destinations,
(2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4
issuing instructions in one clock cycle!
Output:12 physical
registers numbers:
1 destination and 2
sources for the 4
instructions to be issued.Input: 4 instructions specifying
architected registers.
For mis-speculation recovery
Time-stamped.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-8
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 20 -- Dynamic Scheduling III
UC Regents Fall 2006 © UCBCS 152 L20: Dynamic Scheduling III
Micro-op translation example ...
ADC m32, r32: // for a simple m32 address mode
Becomes:
LD T1 0(EBX); // EBX register point to m32ADD T1, T1, CF; // CF is carry flag from EFLAGSADD T1, T1, r32; // Add the specified registerST 0(EBX) T1; // Store result back to m32Instruction traces of IA-32 programs show most
executed instructions require 4 or fewer micro-ops.Translation for these ops are cast into logic gates, often over several pipeline cycles.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-10
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 21 -- Dataflow
Input: Instruction
s that
referencephysical
registers.
Scoreboard: Tracks writes to physical registers.
Dataflow stages of 21264
Idea: Write dataflow programs that reference physical registers, to execute on
this machine.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-15
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 22 -- GPU + SIMD + Vectors I
Pure data
move opcode.
Or, part of a
math opcode.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-17
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 23 -- GPU + SIMD + Vectors II
Assume MacBook Air ... 1386 x 768 screen ...
We are all zoomed in on Google Maps
Top pyramid image is 4K x 4K ...
Idea: Keep only a 1386 x 768 window of top images in RAM
...
Lets us cache a 1024 x 1024 window of the
11 PB Earth map in 34.7 MB!
Zoom all the way in ...units of pixels
Graphics hardware displays bottom stack image, which fills MacBook Air display.
Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image.
units of sq. miles
units of miles
Hardwareinterpolation of stack levels.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-22
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 24 -- Voxel Processing
After processing ...
A 3-D matrix of cubes, in
object space (X,Y,Z).
8-bit density value stored
for each cube (0 = “air”).
256^3 = 16 MB
= 10 inch cube (for 1mm voxels)0.125 mm voxels?
8 GB
Interesting to computer architects
because n^3 grows so quickly!
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
2014-4-24
John Lazzaro(not a prof - “John” is always OK)
CS 152Computer Architecture and Engineering
www-inst.eecs.berkeley.edu/~cs152/
TA: Eric Love
Lecture 25 -- Digital Imaging
UC Regents Fall 2012 © UCBCS 250 L12: CMOS Imagers
Camera interface to the outside world
8-bit Dout Port
@ 15 fps
1280 x 1024
54 MHz Clk
@ 30 fps
640 x 512
YCrCb 4:2:2
Serial port to control
the camera.
Simple Power Hookup
AWARE-2:Array of 98 phone cameramodules(14 M-pixel)
1.3G-pixelcamera@ 3 frames/sec
On Thursday
Mid-term II ...
Ground rules ...
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
Mid-term: How to do well ...
Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you’re starting out behind.
Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you.
There will not be “you can only get it if do the reading” problems ... but the reading helps you understand how to think through the problem.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
Mid-term: There may be math ...
No memorization: If we ask about Amdahl’s Law, we will show its definition lecture slide.
Understanding is needed: A problem may require you to apply equation to a design, etc.
You may need to do: simple algebra and calculus, add a few numbers by hand, etc.
Cannot use electronic devices ...
more administrative
info after we do some content.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
When is it? Where is it? Ground rules.
9:30 AM sharp, Tuesday May 1st, 306 Soda.
Every-other-seat seating, except for the front rows, where every-seat is permitted.
No blue-books needed. We will be handing out a paper test. Pencil is preferred.
Pencils down @ 10:55 AM, so we can collect papers before next class comes in.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
When is it? Where is it? Ground rules.
No use of calculators, smartphones, laptops, etc ... during the exam.
Closed-book, closed-notes. Just pencils, erasers. No consulting with students.
Restroom breaks are OK, but you’ll still need to hand in your exam @ 10:55.
Questions are reserved for serious concerns about a bug in the question.
UC Regents Spring 2014 © UCBCS 152 L16: Midterm I Review
Today - Midterm II Review Session
HW 2, problem by problem (if there is time)
Study Tips
HKN
On Thursday
Mid-term II ...
See you there !