qed effective post-silicon validation and debug qed€¦ · use qed check ra –original register...
TRANSCRIPT
Use QED check
Ra – original register
Ra’ – corresponding duplicated register
Ra ≠ Ra’ – ERROR DETECTED
L2Bank 1
QED Effective Post-Silicon Validation and DebugEshan Singh, David Lin, PI: Subhasish Mitra, Robust Systems Group, Stanford University
Post-Silicon Validation Critical Quick Error Detection Quick Error Detection Highlights Symbolic QED
Electrical Bugs
Structured and Effective
109X quicker detection, 4X coverage
Automatically localize logic bugs
No failure reproduction, no simulation
Broadly applicable
Cores, uncore, power management, logic
& electrical, acceleratorsSource: Intel
Post-silicon bug
count
Year
Pre-silicon
verification
inadequate
“Post-silicon cost & complexity rising faster than design cost”
– S. Yerramilli, V.P., Intel
DesignPre-silicon
Verification
Post-silicon
Validation
High
Volume
Fab
Localization Dominates Cost
Detect bugs
Root-cause & fix
Run tests (OS, games)
Debug time:
1-4 weeks per bug
Localize bugs
Long Error Detection Latency Challenge
Localization
Timeline
Error
occurred
Error detection latency
Ideal ~ 1,000 cycles
Reality ~ Billions cycles
Error
detected
Test
execution
Intel® 48-Core SCC
Symbolic QED Results
Fast QED using Hardware Support
QED
Wide variety Diversity
SystematicAutomated
QED family
Tests
QED Test 1
QED Test 2
…
…
QED Test N
Original
TestsTest 1
Test 2
…
…
Test N
Error detection latency: guaranteed short
Coverage: improved
Software & hardware approaches
De
tecte
d e
rro
r co
un
t
(no
rma
lize
d t
o Q
ED
)
QED
0
0.5
1
1-10 Billion
No-QED
Error detection latency (clock cycles)
0-10K
De
tecte
d e
rro
r co
un
t
(no
rma
lize
d t
o Q
ED
)
QED
0
0.5
1
1-10 Billion
No-QED
Error detection latency (clock cycles)
0-10K
106X
4X
Software-only QED
no hardware modifications, bugs inside processor cores, bugs inside uncore components, bugs from power-management features
Hybrid QED Non-programmable accelerators, logic bugs and electrical bugs
Symbolic QED Automatically localize logic bugs, no additional hardware
Fast QED 0.4% area overhead, very low runtimes
QED Transformation Examples
Fully automated logic bug localization using
Bounded Model Checking (BMC)
No trace buffers → No area overhead
Effective for large SoCs
No failure reproduction, no simulation
Collaborator: Prof. Clark Barrett (NYU)
Traditional debug Automatic S-QED
Weeks to months 20 mins. to 7 hours
Long bug traces 3- to 22-cycle bug traces
...
Core 1 Core 2
<PLC mem[1..N]>
<PLC mem[1..N]>
<PLC mem[1..N]>
<PLC mem[1..N]>
<PLC mem[1..N]>
Core N
<PLC mem[1..N]>
<PLC mem[1..N]>
<PLC mem[1..N]>
A’=A B’=B C’=C
A = B * 2
A’= B’* 2
Check(A==A’)
D’=D E’=E F’=F
G’=G H’=H
E = F * G
E’= F’* G’
Check(E==E’)
H = D + E
H’= D’+ E’
Check(H==H’)
E’=E I’=E
J’=J K’=K
I = E / 2
I’= E’/ 2
Check(I==I’)
Load J ← mem[7 ]
Load J’← mem[7’]
Check(J==J’)
K = J + 1
K’= J’+ 1
Check(K==K’)
Lock(1,’1)
Store mem[1 ] ← C
Store mem[1’] ← C’
Unlock(1,1’)
Lock(5,5’)
Store mem[5 ] ← H
Store mem[5’] ← H’
Unlock(5,5’)
ALL Cores
ALL Threads
<PLC mem[1..N]>
for ALL i,i’
Lock(i)
Lock(i’)
Load X ← mem[i]
Load X’← mem[i’]
Check (X == X’)
Unlock(i’)
Unlock(i)
IEEE TCAD comments (QED paper)
“All reviewers agree this will be a classic paper for years to come.”
“I will personally pay for page charges if you promise to thank me (anonymously) when you win a major award for this paper!”
Intel (Nagib Hakim, PE)
“QED is revolutionary... Intel is in the process of implementing a prototype of QED. This would enable a whole slew of applications.”
AMD (Jeff Rearick, Senior Fellow)
QED: “magical thinking needed” in ETS keynote.
Freescale (Sharad Kumar, Manager)
“We evaluated QED & are adopting in our tools flow for multi-core debug.”
QED is one such promising technique that we have evaluated and are adopting in our tools flow for multi-core debug.
Proactive Load and Check
Control Flow Tracking Using Software Signatures
if ((last_signature == #3) or(last_signature == #4)):
last_signature = #5
else:ERROR_DETECTED!
<Block 5>
CFCSS-V
Block 2
CFCSS-V
CFCSS-V
CFCSS-V
CFCSS-V
Block 3
Block 4
Block 1
Block 5
CFCSS-V Block 5:
ERROR!
Freescale SoC Logic Bug
Error detection latency (cycles)
Original QED
15 Billion 9
Interconnection network
Core 1Core 0 Core NCore 2 Core 3
Random Instruction Test Generator
Shared
Caches
Memory
ControllersAccelerators
Other uncore
components
Error detection latency (cycles)
Cu
mu
lati
ve m
emo
ry b
ugs
det
ecte
d
100 1K 10K 10 Billion
0%
20%
40%
60%
80%
100%
106X
improvedQED
Original test
8-Core Industrial TestQED Med., Max. EDL:392, 3k
Original testMed., Max. EDL:10M, 100M
0%
20%
40%
60%
80%
100%
100 1k 10k 100k 1M 10M >100M
104X
2X
Cu
mu
lati
ve B
ugs
Det
ecte
d
Error detection latency (clock cycles)
Power Management Bugs
0
10k
20k
0 20 100 60 140
PLC-H checkers count
Area cost
0.05% 0.4%
0.05% - 0.4%
area impact
Erro
r d
etec
tio
n l
aten
cy (
cycl
es)
Fast QED
105X quicker detection
2X coverage
No intrusiveness
Runtime: 1.04X – 6X
MBIST reuse
Core, uncore, power management bugs
Uncore Bugs
No boot
Pass
48 processor cores
0.9V, 800 MHz
QED unique detect
QED enhanced detect
QED quick detect
Error detection latency (cycles)
Cu
mu
lati
ve b
ugs
det
ecte
d
100 1k 10k 100k 1M 10M
0%
20%
40%
60%
80%
100%
104X2X
OriginalMed., Max. EDL:241k, 10M
QEDMed., Max. EDL: 675, 8k
Difficult Logic Bugs
QED Techniques
Hybrid QED
Error Detection Latency (cycles)
Co
vera
ge (
per
cen
tage
)
1 10 100 1k 10k 100k 1M 10M0%
20%
40%
60%
80%
100%Hybrid QED: Mean EDL= 705 cycles
OriginalMean EDL =124k cycles
102X
Improved
Accelerator validation and debug
Using high-level synthesis
Collaborator: Prof. Deming Cheng (UIUC)
0%
20%
40%
60%
80%
100%
0 100 1K 10K 100k 1M
Cu
mu
lati
ve b
ugs
det
ecte
d
Bug Trace Length (cycles)>10M
OriginalMin., Mean, Max.: 722, 1.9M, 11M
Symbolic QEDMin., Mean, Max.: 13, 20, 29
106X
2X
BMC ToolAutomaticallyOvernight
1. “Universal” PropertyQED Check + Initial State
Logic Bugs Localized
2. Partial Instances +QED Modules
1. “Universal” Property: QED CheckWhat property should the BMC tool check?
2. Partial InstantiationHow to ensure the design fits in the BMC tool?
CMP Ra == Ra’
QED checks are Compositional
Not design/implementation specific
Preserved across partial instances
Unlike tradition properties
Systematically instantiate only the modules needed to activate the bug
BMC tool finds a bug trace
Core
1
Core
0
Core
2
Core
3
Core
4
Core
5
Core
6
Core
7
L2Bank 0
L2Bank 1
L2Bank 2
L2Bank 3
L2Bank 4
L2Bank 5
L2Bank 6
L2Bank 7
Memory
controller 0
Memory
controller 1
Memory
controller 2
Memory
controller 3
I/O
controllers
Crossbar interconnect
Core
0
L2Bank 0
Crossbar
interconnect
Core
0
L2Bank 0
Memory
controller 0
Crossbar
interconnect
Core
1
Core
0
L2Bank 0
Crossbar
interconnect
Memory
controller 0
Reduce InstancesKeep at least 1 core
Run Each
No Trace Found Trace Found Trace FoundBest Localization