scaling formal methods toward hierarchical protocols in shared memory processors

55
Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Presenters: Ganesh Gopalakrishnan and Xiaofang Ch School of Computing , University of Utah, Salt Lake City, UT 84112 {ganesh, xiachen}@cs.utah.edu http://www.cs.utah.edu/formal_verification GRC CADTS Review, Berkeley, March 18, 2008 Supported by SRC Contract TJ-1318 (Intel Customization)

Upload: finola

Post on 07-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors. GRC CADTS Review, Berkeley, March 18, 2008. Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing , University of Utah, Salt Lake City, UT 84112 {ganesh, xiachen}@cs.utah.edu - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors

Presenters: Ganesh Gopalakrishnan and Xiaofang ChenSchool of Computing , University of Utah, Salt Lake City, UT 84112

{ganesh, xiachen}@cs.utah.edu

http://www.cs.utah.edu/formal_verification

GRC CADTS Review, Berkeley, March 18, 2008

Supported by SRC Contract TJ-1318 (Intel Customization)

Page 2: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

2

Multicores are the future! Their caches are visibly central…

(photo courtesy of

Intel Corporation.)

> 80% of chipsshipped will bemulti-core

Page 3: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

3

Hierarchical Cache Coherence Protocols will play a major role in multi-core processors

Chip-level protocols

Inter-cluster protocols

Intra-cluster protocols

dirmem dirmem

State Space grows multiplicatively across the hierarchy!

Verification will become harder

Page 4: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

4

Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability).

From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.

Page 5: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

5

Future Coherence Protocols

Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng] Producer-consumer sharing pattern-aware protocol [Cheng

et.al, HPCA07] 21% speedup and 15% reduction in network traffic

Interconnect-aware coherence protocols [Cheng et.al., ISCA06] Heterogeneous Interconnect Improve performance AND reduce power 11% speedup and 22% wire power savings

Bottom-line: Protocols are going to get more complex!

Page 6: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

6

Main Result #1 : Hierarchical

RAC

L2 Cache+Local Dir

L1 Cache

Main Mem

Home ClusterRemote Cluster 1

Remote Cluster 2

L1 Cache

Global Dir

RAC

L2 Cache+Local Dir

L1 Cache

L1 Cache

RAC

L2 Cache+Local Dir

L1 Cache

L1 Cache

Intra-cluster

Inter-cluster

Developed way to reduce verification complexity of

hierarchical (CMP) protocols using A/G

Page 7: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

7

Main Result #2 : Refinement

Developed way to Verify a Proposed Refinement of

ONE unit into its low level (RTL) implementation

Page 8: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

8

Main Result #2 : Refinement

Developed way to Verify a Proposed Refinement of

ONE unit into its low level (RTL) implementation

Murphi

Page 9: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

9

Main Result #2 : Refinement

Developed way to Verify a Proposed Refinement of

ONE unit into its low level (RTL) implementation

Murphi

Page 10: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

10

Main Result #2 : Refinement

Developed way to Verify a Proposed Refinement of

ONE unit into its low level (RTL) implementation

Murphi

HMurphi

Page 11: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

11

Differences in Modeling: Specs vs. Impls

home remote

One step in high-level

Multiple steps in low-level

an atomic guarded/command

home

router

buf

remote

Page 12: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

12

Our Refinement Check

Spec(I)

I

Spec(I’)Spec

transition

Multi-step Impl

transactionI’

Guard for Spec transition must

hold

I is a reachable Impl state

Observable vars changed

by either must match

Page 13: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

13

Workflow of Our Refinement Check

Hardware Murphi

Impl model

Product model in

Hardware Murphi

Product model in VHDL

MurphiSpec model

Property check

Muv

Check implementation meets specification

Page 14: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

14

Anticipated Future Result

Developed way to Verify a Proposed Refinement of

the ENTIRE hierarchy

Page 15: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

15

Anticipated Future Result

Deal with pipelining

Sequential InteractionPipelined Interaction

Page 16: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

16

Anticipated Future Result

Develop ways to “tease apart” protocols that are “blended in”

e.g. for power-down or post-si observability enhancement

More protocols…

.. do they interfere?

Page 17: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

17

Basics

PI : Ganesh Gopalakrishnan Industrial Liaisons : Ching Tsun Chou (Intel), Steven M. Geman (IBM),

John W. O’Leary (Intel), Jayanta Bhadra (Freescale), Alper Sen (Freescale), Aseem Maheshwari (TI)

Primary Student : Xiaofang Chen Graduation Date : Writing PhD Dissertation; in the market Other Students :Yu Yang (PhD), Guodong Li (PhD), Michael DeLisi

(BS/MS) Anticipated Results:

Hierarchical : Methodology for Hierarchical (Cache Coherence) Protocol Verification, with Emphasis on Complexity Reduction (was in original SRC proposal)

Refinement : Methodology for Expressing and Verifying Refinement of Higher Level Protocol Descriptions (not in original SRC proposal)

Page 18: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

18

Basics

Deliverables (Papers, Software, Xiaofang’s Dissertation) Hierarchical:

Methodology for Applying A/G Reasoning for Complexity Reduction

Verified Protocol Benchmarks – Inclusive, Non-Inclusive, Snoopy (Large Benchmarks)

Automatic Abstraction Tool in support of A/G Reasoning Refinement:

Muv Language Design (for expressing Designs) Refinement Checking Theory and Methodology Complete Muv tool implementation

Page 19: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

19

What’s Going On

Accomplishments during the past year Hierarchical:

Finishing Non-inclusive Hierarchical Protocol Verif

Developing and Verifying a Hier. Protocol with a

Snoopy First Level

Page 20: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

20

Experimental Results on One Hierarchical Protocol

Model checkpassed

Use mem(GB)

18

18

18

1.8

1.8

1.8

Model checktime (sec)

> 125,410

44,978

66,249

270

50

21

# of states

> 438,120,000

284,088,425

636,613,051

1,500,621

574,198

198,162

Original model

Abs. model 1

Abs. model 2

Abs. model 1

Abs. model 2

Abs. model 3

Monolithicapproach

FMCAD’06approach

HLDVT’07approach

Nonconclusive

Yes

Yes

Yes

Yes

Yes

Page 21: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

21

A Snoopy Multicore Protocol

Motivation: Snoop protocols commonly used in 1st level of caches

Have applied our approach on directory protocols

How about snoop protocols?

L1 Cache

L2 Cache

RAC

Global Dir

Main Mem

Cluster 1

L1 Cache L1 Cache

L2 Cache

RAC

Cluster 2

L1 Cache

Page 22: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

22

Applying Our Approach

L1 Cache

L2 Cache

Global Dir

Main Mem

Cluster 1

L1 Cache L2 Cache

RAC

Cluster 2

L2 Cache

RAC

Cluster 1

Abstracted protocols

Experimental results

Model checkpassed

Use mem (GB)

1.8

1.8

1.8

Model check time (sec)

86

6

7

# of states

552,375

474

15,371

Original model

Abs. intra

Abs. inter

Monolithicapproach

Our approach

Yes

Yes

Yes

Page 23: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

23

What’s Going On

Accomplishments during the past year (contd.) Refinement:

HMurphi was fleshed out in great detail

Most of Muv was implemented (a large portion during

IBM T.J. Watson Internship) – joint work with Steven

German and Geert Janssen

Page 24: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

24

What’s Going On

Future directions Hierarchical + Refinement

Develop ways to verify hierarchies of HMurphi modules interacting Pipelining Teasing out protocols supporting non-functional aspects

Power-down protocols Protocols to enhance Post-si Observability

Architectural Characterization How do we describe the “ISA” of future multi-core

machines? How do we make sure that this ISA has no hidden

inconsistencies

Page 25: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

25

What’s Going On

Technology Transfer & Industrial Interactions With Liaisons

Publications FMCAD 06, FMCAD 07, HLDVT 07

TECHCON 07 (best session paper award)

Journal paper and Dissertation (under prep)

Request to IBM for Open-sourcing Muv has been

placed

Page 26: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

26

Overview of “Hierarchical”

Given a protocol to verify, create a verification

model that models a small number of clusters

acting on a single cache line

Verification Model

Inv P

Home

Remote

Global directory

Page 27: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

27

2. Exploit Symmetries

Model “home” and the two “remote”s (one remote,

in case of symmetry)

Verification Model

Inv P

Page 28: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

28

3. Initial abstraction will be extreme; slowly back-off from this extreme…

Inv P1 Inv P2

Inv P3

P1 fails

Diagnose failure

Bug

report to user

False Alarm

Diagnose where guard

is overly weak

Add Strengthening Guard

Introduce Lemma to ensure

Soundness of Strengthening

Page 29: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

29

Overview of Theory Involved

rule g1 ==> a1;

rule g2 ==> a2;

invariant P;rule g1 ==> a1;

rule g2 /\ cond2 ==> a2;

invariant P /\ (g1 => cond1);

rule g1 /\ cond1 ==> a1;

rule g2 ==> a2;

invariant P /\ (g2 => cond2);

Page 30: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

30

3. Create Abstract Models (three models in this example)

Inv P

Inv P1 Inv P2

Inv P3

Page 31: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

31

Step 1 of Refinement

Inv P1 Inv P2

Inv P3

Inv P1 Inv P2

Inv P3’

Page 32: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

32

Step 2 of Refinement

Inv P1 Inv P2

Inv P3

Inv P1 Inv P2

Inv P3’

Inv P1 Inv P2’

Inv P3’

Page 33: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

33

Final Step of Refinement

Inv P1 Inv P2

Inv P3

Inv P1 Inv P2

Inv P3’

Inv P1’ Inv P2’

Inv P3’

Inv P1 Inv P2’

Inv P3’’

Page 34: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

34

Detailed Presentation of Refinement

Note: Three examples have been presented in full detail at

http://www.cs.utah.edu/formal_verification/muv

Page 35: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

35

Our Approach of Refinement Check

Hardware Murphi

Impl model

Product model in

Hardware Murphi

Product model in VHDL

MurphiSpec model

Property check

Muv

Check implementation meets specification

Page 36: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

36

Basic Features of Hardware Murphi vs Murphi

signal s1, s2 …

s1 <= …

chooserule rules; end; …

firstrule rules; end; …

transaction

rule-1; rule-2; …

end; …

Page 37: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

37

Language Extensions to Hardware Murphi (I)

--include spec.m

correspondence

u1[0..7] :: v1[1..8]; u1 :: v2; end;

Directives

Joint variables correspondence

Page 38: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

38

Language Extensions to Hardware Murphi (II)

transactionset p1:T1; p2:T2 do

transaction …

end;

Transactionset

rule:id guard ==> action;

ruleset p1:T1; p2:T2 do

rule:id …

end;

Rules with IDs

Page 39: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

39

Language Extensions to Hardware Murphi (III)

<< id.guard() >>;

<< id.action() >>;

<< id[v1][v2].guard() >>; …

Execute a rule by ID

var[i] <:= data;

Fine-grained assignments for write-write conflicts

Page 40: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

40

var spec_nodes: array [node_id] of spec_node_type;

startstate "initialize" for i: node_id do reset_memory_type( spec_nodes[i].memory); … end;end;

ruleset src: node_id; ch: chan_id; dest: node_id dorule:R1 "1. Transfer msg from src via ch" spec_nodes[src].outchan[ch].valid & dest = spec_nodes[src].outchan[ch].msg.dest & ! spec_nodes[dest].inchan[ch].valid==>begin spec_nodes[dest].inchan[ch] := spec_nodes[src].outchan[ch]; reset_outchan( spec_nodes[src].outchan[ch]);endrule;endruleset;

...

const num_nodes: 2; num_addr: 2; ...

type cache_state: enum {cache_invalid, cache_shared, cache_exclusive}; ...

var msg_0: message_type; ...

signal random_data: node_memory_type;

procedure reset_data_type(var param: data_type); begin for i: data_range do param[i] := false; end; end; …

function router_turn_next(turn: node_id): node_id; var ret: node_id; begin ... end; …

...

cache_common.m cache_spec.m

Page 41: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

41

transactionset src: node_id; dest: node_id dotransaction "transfer msg from src via ch-1"rule "buf deliver msg via chan-1" nodes_internal[dest].buf[1].update & nodes_internal[dest].buf[1].n_valid & nodes_internal[dest].buf[1].n_msg.src = src==>begin nodes[dest].buf[1].msg_buf.msg := nodes_internal[dest].buf[1].n_msg; nodes[dest].buf[1].msg_buf.valid := nodes_internal[dest].buf[1].n_valid;

<< R1[src][1][dest].guard() >>; << R1[src][1][dest].action() >>;endrule;

rule "local reset src node outchan-1" nodes_io[src].local.reset_out1==>begin reset_message_buf_type( nodes[src].local.buf1); reset_transaction;endrule;endtransaction;endtransactionset;...

--include cache_common.m--include cache_spec.m

var router: router_unit_type; nodes: array [node_id] of node_unit_type;

signal nodes_io: nodes_io_type; ...

correspondence "joint vars" spec_nodes[0..1].mem :: nodes[0..1].home.mem; ...end;

assign nodes_io[0].buf[1].data_in <= router_io.chans_out[0][1].msg;

startstate "initialize" router.turn := 0; ...end;

cache_impl.m

Page 42: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

42

Our Extensions to Muv

Language extensions support

Automatic assertion generation for refinement Ensure exclusive write to a var in each clock cycle

Serializability check for spec rules

Enableness for spec rules

Joint vars equivalence when inactive

Many done with static analysis

Page 43: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

43

Refinement Extensions to Muv (I)

v := d;

for i: s1..s2 do

assert (update_bits[i] = false);

end;

v := d;

for i: s1..s2 do

update_bits[i] := true;

end;

No write-write conflicts

Page 44: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

44

Refinement Extensions to Muv (II)

Serializability for specification rules

S0 S1 S0 S1

t1

t2

t3S’1 S’2

t1 t2 t3

Obtain read and write sets of variables of each rule

Analyze read-write dependency

Check for cycles

Page 45: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

45

Check for Dependency Cycles

S0 S1 S0 S1

t1

t2

t3S’1 S’2

t1 t2 t3

t3 write v2, read v3

t1 read v1, write v3

t2 write v1, read v2

Page 46: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

46

Refinement Extensions to Muv (III)

rule:id

guard action;

bool function id_guard() {…}

void procedure id_action(…) {…}

Enableness of specification rules

<< id.guard() >>;

<< id.action() >>;

assert id_guard();

id_action();

Page 47: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

47

Refinement Extensions to Muv (IV)

Joint variables equivalence when inactive

For each joint variable v When all transactions that write to v are inactive

v must be equivalent in Impl and Spec

transaction T1 …

transaction T2 …

assert

inactive(T1) & inactive(T2)

=>

v = v’;

Page 48: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

48

The Cache Coherence Protocol Benchmark

S. German and G. Janssen, IBM Research Tech Report 2006

Buf

Buf

Buf Remote

Dir Cache

Mem

Router

Buf

Buf

Buf

Local

Home

Remote

Dir Cache

Mem

Local

Home

Page 49: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

49

Details of the Cache Example

Hardware Murphi model ~2500 LOC

15 transactionsets

Generated VHDL ~1000 assertions, of which ~800 are write-write

conflicts check assertions

Took ~16min with SixthSense for all assertions

Took ~13min w/o write-write conflicts check

Page 50: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

50

Bugs Found with Refinement Check

Benchmark satisfies cache coherence already

Bugs still found Bug 1: router unit loses messages

Bug 2: home unit replies twice for one request

Bug 3: cache unit gets updated twice from 1 reply

Refinement check is an automatic way of

constructing such checks

Page 51: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

51

Model Checking Approaches

Monolithic Straightforward property check

Compositional Divide and conquer

Page 52: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

52

Compositional Refinement Check

Reduce the verification complexity

Basic Techniques Abstraction

Removing details to make verification easier

Assume guarantee A simple form of induction which introduces

assumptions and justifies them

Page 53: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

53

Experimental Results

Verification Time

1-bit 10-bit

1-day

Datapath

Configurations 2 nodes, 2 addresses, SixthSense

30 min

Monolithic approach

Compositional approach

Page 54: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

54

A Simple 2-Stage Pipelined Stack

pipelined pushes pipelined pops

overlapped pop & push

Push: increase counter + push data

Pop: decrease counter + pop data

Page 55: Scaling Formal Methods Toward Hierarchical  Protocols in Shared Memory Processors

55

Future Work

Muv-like refinement check for interaction modules RTL modules interaction via communication

protocols

Interfaces involving buffers and pipelining

Refinement of initial RTL protocols Power-down issues

Post-silicon validation support

Runtime verification support

Safe augmentation of verified protocols

Cheap re-verification