quick error detection for effective post-silicon ... · quick error detection for effective...

Post on 04-Apr-2018

237 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Quick Error Detection for

Effective Post-Silicon Validation

Detect bugs

Root-cause & fix

Run tests

Debug time:

1-4 weeks per bug

Localize bugs

Stanford University Intel Corporation

David Lin, Christine Cheng, Ted Hong, Yanjing Li, Farzan Fallah, Donald S. Gardner, Nagib Hakim, Subhasish Mitra

QED

Core + Uncore Wide variety

Diversity

8 Cores FFT Test from Splash2

0%

20%

40%

60%

80%

100%

100 1K 10K 100K 1M >10M

Error detection latency (cycles)

Cum

ula

tive

bu

gs d

ete

cte

d

104X

2X

Original

PLC+QED

8 Cores LU Test from Splash2

0%

20%

40%

60%

80%

100%

100 1K 10K 100K 1M >10M

Cum

ula

tive

bu

gs d

ete

cte

d

Error detection latency (cycles)

Original

PLC+QED

104X

2X

Long Error Detection Latency

Timeline

Error

occurred

Error detection latency

Ideal ~ 1,000 cycles

Reality ~ Billions cycles

Error

detected

Test

execution

QED Tests

QED Test 1

QED Test 2

QED Test N

Original Tests

Test 1

Test 2

Test N

De

tecte

d e

rro

r co

un

t

(no

rma

lize

d t

o Q

ED

)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)

0-10K

De

tecte

d e

rro

r co

un

t

(no

rma

lize

d t

o Q

ED

)

QED

0

0.5

1

1-10 Billion

No-QED

Error detection latency (clock cycles)

0-10K

106X

4X

8 Cores Industrial Validation Test

0%

20%

40%

60%

80%

100%

100 1K 10K 10 Billion

Cum

ula

tive

mem

ory

bu

gs d

ete

cte

d

Error detection latency (cycles)

Original

PLC+QED Improved

106X

Intel ® Core i7TM Hardware

Localization Dominates Cost Quick Error Detection QED Core + Uncore Transformation Example

QED techniques

Code change

Hardware change

Detection latency

Targeted component

EDDI-V

Some None

Small

Core

CFCSS-V Core

SW-RMT-V Core

HW-RMT-V None Some Core

Proactive Load & Check

Some None Uncore

...

Core 1 Core 2

<PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]>

<PLC mem[a..z]> <PLC mem[a..z]>

Core N

<PLC mem[a..z]> <PLC mem[a..z]> <PLC mem[a..z]>

A’=A B’=B C’=C A = B * 2 A’= B’* 2 Check(A==A’)

D’=D E’=E F’=F G’=G H’=H E = F * G E’= F’* G’ Check(E==E’)

H = D + E H’= D’+ E’ Check(H==H’)

E’=E I’=E J’=J K’=K

I = E / 2 I’= E’/ 2 Check(I==I’)

Load J ← mem[z ] Load J’← mem[z’] Check(J==J’)

K = J + 1 K’= J’+ 1 Check(K==K’)

Lock(a); Lock(a’) Store mem[a ] ← C Store mem[a’] ← C’ Unlock(a’); Unlock(a)

Lock(c); Lock(c’) Store mem[c ] ← H Store mem[c’] ← H’ Unlock(c’); Unlock(c)

ALL Cores

ALL Threads

<PLC mem[a..z]>

for i in [a..z]

i’ in [a’..z’]

Lock(i)

Lock(i’)

Load X ← mem[i]

Load X’ ← mem[i’]

Check (X == X’)

Unlock(i’)

Unlock(i)

Key challenge: Long error detection latency

New technique: Quick Error Detection

Systematic, structured, automated

Error detection latency: 106X improved

Coverage: 4X improved

• Software only: readily application

. . .

. . .

. . .

Localization

top related