CS250 Section 59/30/10Yunsup Lee
A4 Processor in the iPhone
• Checkout the following link for more details:
• http://www.ifixit.com/Teardown/Apple-A4-Teardown/2204/1
Any questions on lab 2 & lab 3?
Announcements
• Project proposal is due before class (10:30am) on October 11th.
• Any volunteers to help me out making the UCB SRAM Model Compiler and the Cache Model Compiler?
Why do we do gate-level simulation?
• Why do we want to do gate-level simulation?
• We would like to figure out whether our netlist is correct.
• Can’t we do this just using formal verification?
• Yes, but we want activity factors to get energy numbers out.
• What is back annotated gate-level simulation?
• After synthesis and place and route, you know the exact delays.
• Why? Since we know the exact loads on every net, plus now we know wire delays too.
• We want to simulate the exact timing on each net.
• We can be more confident that our circuit will work.
• This information is saved in the *.sdf file.
Setup Time / Hold Time
DFF DFF
250ps skew
CLK A
CLK B
A B
CLK A
CLK B
• How can you detect you have a setup time violation?
• How can you detect you have a hold time violation?
• How can you fix setup time?
• How can you fix hold time?
Setup Time / Hold Time
• This is how we do setup-hold time checks in the tool
• We have a setup limit and a hold limit
• We encode this case as (10,11)
• $setuphold (posedge clock, data, 10, 11, notify)
• What about negative setup limits?
• What about negative setup limits?
Negative Setup Limits
• We encode this case as (-10,31)
• $setuphold (posedge clock, data, -10, 31, notify)
Negative Hold Limits
• We encode this case as (31, -10)
• $setuphold (posedge clock, data, 31, -10, notify)
Actual Violation
• "/scratch/cad/stdcells/synopsys-90nm/default/verilog/cells.v", 11774: Timing violation in riscvTestHarness.proc.dpath.rfile.waddr_latch_reg_3_
• $setuphold( negedge CLK:44213, posedge D:44162, limits: (58,-43) );
• You can see 44213 - 44162 = 51 is between 58 > 51 > 43. That’s why it’s a setuphold violation.
• Now let’s look into the wave form. Change your time to 44213
Gate-level Simulation Methodology
• We can’t know the exact clock skew after synthesis, since the tool doesn’t create the clock tree.
• That’s why we were seeing random hold time violation during back annotated post-synthesis gate-level simulation.
• These hold time violations go away after place-and-route, and as a result back annotated post-pnr gate-level simulation works.
• New methodology:
• After synthesis, we only do functional gate-level simulation.
• After place and route, we do back-annotated gate-level simulation.
Infrastructure
• We have 8 machines ...
• icluster16 - icluster22: NX installed on these machines
• icom1: I discourage you to use this machine, it’s different than the icluster machines
• Try using NX, your productivity will go straight up.
• When we go into project mode, we will statically partition machines among groups.
More disk space?
• See the following link:
• http://inst.eecs.berkeley.edu/cgi-bin/pub.cgi?file=disk.quotas#1
• In summary,
• Login to cory.eecs.berkeley.edu
• Run mkhometmpdir
• Now you have a nfs mounted temporary directory where you can access on any machine at /home/tmp/<login name>
Questions for Lab 2
Going from a 2-stage pipe to a 3-stage pipe.
• In what stage do you read registers?
• In what stage do you do ALU ops?
• In what stage do you do branch evaluation? How about jumps? How about jump register targets?
• If a branch is taken, what do you need to do?
• Where did you put bypass muxes? Why do we need bypass muxes?
F X
F XD
What’s in Makefrag?
• Take a look at build/Makefrag.
• clock_period = 2.5
• vcs_clock_period = 0.5 * clock_period
• dc_clock_period = 0.9 * clock_period
• comb0_delay = 0.2
• imem_access_delay = 0.4
• comb1_delay = 1.5
• dmem_access_delay = 0.4
I can’t clock my 3-stage pipeline faster.
• You can change comb0_delay and comb1_delay.
• You should not change imem_access_delay and dmem_access_delay.
riscvTestHarness
riscvCore
riscvProc
InstructionMemory
DataMemory
comb0_delay
imem_access_delay
dmem_access_delay
comb1_delay
clock period
Questions for Lab 3
Dealing with ready/valid signals
• What if imemreq_rdy = 1’b0?
• What if imemresp_val = 1’b0?
• What if dmemreq_rdy = 1’b0?
• What if dmemresp_val = 1’b0?
• What if the branch target address, jump target address, or jump register target address is updated during imemreq_rdy = 1’b0?
F XD M
What are tags? Why do we need them?
riscvCore
riscvTestHarness
riscvProc
testrig_fromhostimemreq_bits_addr
imemreq_rdytestrig_tohost
clkreset
InstructionCache
DataCache
imemresp_bits_data
dmemreq_rw
dmemreq_bits_addr
dmemreq_bits_data
dmemresp_val
dmemresp_bits_data
log_control imemreq_val
imemresp_val
dmemreq_rdy
dmemreq_valclkreset
ic_mem_req_addr
ic_mem_req_rdy
mem_resp_data
ic_mem_req_val
ic_mem_resp_val
dc_mem_req_rw
dc_mem_req_addr
mem_req_data
dc_mem_resp_val
mem_resp_data
dc_mem_req_rdy
dc_mem_req_valArbiter
mem_req_rw
mem_req_addr
mem_req_data
dc_mem_resp_val
mem_resp_data
mem_req_rdy
mem_req_val
clkreset
mem_req_tag
mem_resp_tag
clkreset_ext
32
32
32
128
128
128
128
128
clkreset
Negative Energy/Instruction
• It’s likely that your energy or instruction counts are wrong.
• Notice that we only record the toggling for the region we are interested in.
• This region is between mtpcr 1, $cr10 and mtpcr 0, $cr10.
• You should only count cycles and instructions during this region.
• pt-pwr gives you a power number, so you need to multiply this with the running time.
• energy = power * cycle count * clock period
• Make sure stats=1 in build/vcs-sim-gl-par/Makefile
• MATLAB sequence:
• A = csvread(‘A.csv’); b = csvread(‘b.csv’);
• x = regress(b, A);
• Make sure the number of rows match. You can check out size(A),size(b).