quick error detection for effective post-silicon ... · quick error detection for effective...

1
Quick Error Detection for Effective Post-Silicon Validation Detect bugs Root-cause & fix Run tests Debug time: 1-4 weeks per bug Localize bugs Stanford University Intel Corporation David Lin, Christine Cheng, Ted Hong, Yanjing Li, Farzan Fallah, Donald S. Gardner, Nagib Hakim, Subhasish Mitra QED Core + Uncore Wide variety Diversity 8 Cores FFT Test from Splash2 0% 20% 40% 60% 80% 100% 100 1K 10K 100K 1M >10M Error detection latency (cycles) Cumulative bugs detected 10 4 X 2X Original PLC+QED 8 Cores LU Test from Splash2 0% 20% 40% 60% 80% 100% 100 1K 10K 100K 1M >10M Cumulative bugs detected Error detection latency (cycles) Original PLC+QED 10 4 X 2X Long Error Detection Latency Timeline Error occurred Error detection latency Ideal ~ 1,000 cycles Reality ~ Billions cycles Error detected Test execution QED Tests QED Test 1 QED Test 2 QED Test N Original Tests Test 1 Test 2 Test N Detected error count (normalized to QED) QED 0 0.5 1 1-10 Billion No-QED Error detection latency (clock cycles) 0-10K Detected error count (normalized to QED) QED 0 0.5 1 1-10 Billion No-QED Error detection latency (clock cycles) 0-10K 10 6 X 4X 8 Cores Industrial Validation Test 0% 20% 40% 60% 80% 100% 100 1K 10K 10 Billion Cumulative memory bugs detected Error detection latency (cycles) Original PLC+QED Improved 10 6 X Intel ® Core i7 TM Hardware Localization Dominates Cost Quick Error Detection QED Core + Uncore Transformation Example QED techniques Code change Hardware change Detection latency Targeted component EDDI-V Some None Small Core CFCSS-V Core SW-RMT-V Core HW-RMT-V None Some Core Proactive Load & Check Some None Uncore ... Core 1 Core 2 <PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]> Core N <PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]> A’=A B’=B C’=C A = B * 2 A’= B’* 2 Check(A==A’) D’=D E’=E F’=F G’=G H’=H E = F * G E’= F’* G’ Check(E==E’) H = D + E H’= D’+ E’ Check(H==H’) E’=E I’=E J’=J K’=K I = E / 2 I’= E’/ 2 Check(I==I’) Load J ← mem[z ] Load J’← mem[z’] Check(J==J’) K = J + 1 K’= J’+ 1 Check(K==K’) Lock(a); Lock(a’) Store mem[a ] C Store mem[a’] C’ Unlock(a’); Unlock(a) Lock(c); Lock(c’) Store mem[c ] H Store mem[c’] ← H’ Unlock(c’); Unlock(c) ALL Cores ALL Threads <PLC mem[a..z]> for i in [a..z] i’ in [a’..z’] Lock(i) Lock(i’) Load X mem[i] Load Xmem[i’] Check (X == X’) Unlock(i’) Unlock(i) Key challenge: Long error detection latency New technique: Q uick E rror D etection Systematic, structured, automated Error detection latency: 10 6 X improved Coverage: 4X improved Software only: readily application . . . . . . . . . Localization

Upload: hoangtram

Post on 04-Apr-2018

237 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Quick Error Detection for Effective Post-Silicon ... · Quick Error Detection for Effective Post-Silicon Validation Detect bugs Root-cause & fix Run tests Debug time: 1-4 weeks per

Quick Error Detection for

Effective Post-Silicon Validation

Detect bugs

Root-cause & fix

Run tests

Debug time:

1-4 weeks per bug

Localize bugs

Stanford University Intel Corporation

David Lin, Christine Cheng, Ted Hong, Yanjing Li, Farzan Fallah, Donald S. Gardner, Nagib Hakim, Subhasish Mitra

QED

Core + Uncore Wide variety

Diversity

8 Cores FFT Test from Splash2

0%

20%

40%

60%

80%

100%

100 1K 10K 100K 1M >10M

Error detection latency (cycles)

Cum

ula

tive

bu

gs d

ete

cte

d

104X

2X

Original

PLC+QED

8 Cores LU Test from Splash2

0%

20%

40%

60%

80%

100%

100 1K 10K 100K 1M >10M

Cum

ula

tive

bu

gs d

ete

cte

d

Error detection latency (cycles)

Original

PLC+QED

104X

2X

Long Error Detection Latency

Timeline

Error

occurred

Error detection latency

Ideal ~ 1,000 cycles

Reality ~ Billions cycles

Error

detected

Test

execution

QED Tests

QED Test 1

QED Test 2

QED Test N

Original Tests

Test 1

Test 2

Test N

De

tecte

d e

rro

r co

un

t

(no

rma

lize

d t

o Q

ED

)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)

0-10K

De

tecte

d e

rro

r co

un

t

(no

rma

lize

d t

o Q

ED

)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)

0-10K

106X

4X

8 Cores Industrial Validation Test

0%

20%

40%

60%

80%

100%

100 1K 10K 10 Billion

Cum

ula

tive

mem

ory

bu

gs d

ete

cte

d

Error detection latency (cycles)

Original

PLC+QED Improved

106X

Intel ® Core i7TM Hardware

Localization Dominates Cost Quick Error Detection QED Core + Uncore Transformation Example

QED techniques

Code change

Hardware change

Detection latency

Targeted component

EDDI-V

Some None

Small

Core

CFCSS-V Core

SW-RMT-V Core

HW-RMT-V None Some Core

Proactive Load & Check

Some None Uncore

...

Core 1 Core 2

<PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]>

<PLC mem[a..z]> <PLC mem[a..z]>

Core N

<PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]>

A’=A B’=B C’=C A = B * 2 A’= B’* 2 Check(A==A’)

D’=D E’=E F’=F G’=G H’=H E = F * G E’= F’* G’ Check(E==E’)

H = D + E H’= D’+ E’ Check(H==H’)

E’=E I’=E J’=J K’=K

I = E / 2 I’= E’/ 2 Check(I==I’)

Load J ← mem[z ] Load J’← mem[z’] Check(J==J’)

K = J + 1 K’= J’+ 1 Check(K==K’)

Lock(a); Lock(a’) Store mem[a ] ← C Store mem[a’] ← C’ Unlock(a’); Unlock(a)

Lock(c); Lock(c’) Store mem[c ] ← H Store mem[c’] ← H’ Unlock(c’); Unlock(c)

ALL Cores

ALL Threads

<PLC mem[a..z]>

for i in [a..z]

i’ in [a’..z’]

Lock(i)

Lock(i’)

Load X ← mem[i]

Load X’ ← mem[i’]

Check (X == X’)

Unlock(i’)

Unlock(i)

Key challenge: Long error detection latency

New technique: Quick Error Detection

Systematic, structured, automated

Error detection latency: 106X improved

Coverage: 4X improved

• Software only: readily application

. . .

. . .

. . .

Localization