ila-mcm: integrating memory consistency models with ...evaluation on real-world socs designs: first,...

10
ILA-MCM: Integrating Memory Consistency Models with Instruction-Level Abstractions for Heterogeneous System-on-Chip Verification Hongce Zhang, Caroline Trippel, Yatin A. Manerkar, Aarti Gupta, Margaret Martonosi, Sharad Malik Princeton University, USA Abstract—Modern Systems-on-Chip (SoCs) integrate hetero- geneous compute elements ranging from non-programmable specialized accelerators to programmable CPUs and GPUs. To ensure correct system behavior, SoC verification techniques must account for inter-component interactions through shared memory, which necessitates reasoning about memory consistency models (MCMs) This paper presents ILA-MCM, a symbolic reasoning framework for automated SoC verification, where MCMs are integrated with Instruction-Level Abstractions (ILAs) that have been recently proposed to model architecture-level program-visible states and state updates in heterogeneous SoC components. ILA-MCM enables reasoning about system-wide properties that depend on functional state updates as well as ordering relations between them. Central to our approach is a novel facet abstraction, where a single program-visible variable is associated with potentially multiple facets that act as auxiliary state vari- ables. Facets are updated by ILA “instructions,” and the required orderings between these updates are captured by MCM axioms. Thus, facets provide a symbolic constraint-based integration between operational ILA models and axiomatic MCM specifica- tions. We have implemented a prototype ILA-MCM framework and use it to demonstrate two verification applications in this paper: (a) finding a known bug in an accelerator-based SoC, plus a new potential bug under a weaker MCM, and (b) checking that a recently proposed low-level GPU hardware implementation is correct with respect to a high-level ILA-MCM specification. I. I NTRODUCTION Systems-on-Chip (SoCs) integrate specialized hardware to meet the power-performance requirements posed by emerging applications. Specialized hardware can be programmable (e.g., Graphics Processing Units or GPUs) or non-programmable (e.g., an AES cryptographic accelerator). They outperform general purpose processors in specific domains like machine learning [1], scientific computation [2], and cryptographic op- erations [3]. The multiple processing units in an SoC typically run concurrently. This concurrency can be difficult to reason about, leading to design and implementation bugs in functional correctness as well as security. Furthermore, when SoC components interact via shared memory or memory-mapped input and output (MMIO), one also needs to reason about memory consistency models (MCMs). Although programmers generally find it easier to think about concurrent code with sequentially consistent (SC) ordering semantics, modern instruction set architectures (ISAs) have weaker MCMs in an effort to achieve better performance and scalability. This work was supported in part by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA. This work was supported in part by the National Science Foundation, XPS Program, Grant No. 1628926. Previous MCM verification efforts have focused on modeling and analyzing MCMs at different levels of the software/hardware stack in parallel systems [4–11]. These approaches typically use small parallel programs, called litmus tests, for reasoning about the MCMs themselves. They focus on ordering relations between simple instructions, rather than on symbolic reasoning of complex control and data flow in programs, which is often needed in SoC verification. Moreover, none of these efforts consider non-programmable hardware accelerators, which may not have an ISA. Recently, an instruction-centric operational model for het- erogeneous SoC components has been proposed, called an Instruction-Level Abstraction (ILA) [12]. Analogous to a pro- cessor ISA, an ILA models a hardware component’s program- visible states and their updates in the form of instructions. This provides a well-defined interface between sequential software and the underlying hardware component. For an accelerator, its ILA instructions correspond to commands at its interface. ILAs have been successfully generated (using semi-automated synthesis-based techniques) for many accelerators in prac- tice [12–14]. In the rest of this paper, we use “instructions” to denote ILA instructions, which correspond to instructions in a processor ISA or to derived instructions for an accelerator. An ILA can uniformly model rich instruction semantics (i.e., including control and data flow) of a single processing unit, e.g., a processor or an accelerator. Although existing MCM specifications and verifiers are well-suited for representing orderings between memory operations of multiple processing units, they lack such rich instruction models. We show that for general SoC verification, it is essential to reason about both rich instructions in heterogeneous components and memory orderings between them. In this paper, we address this central challenge by proposing a general symbolic framework called ILA-MCM, shown in Figure 1. In this framework, each processing unit in an SoC, such as a programmable processor or an accelerator, is uniformly represented by an ILA. The MCM is described using axioms, as in previous efforts [4–11], but is integrated with the ILA operational models. This enables our ILA- MCM framework to reason about functional state updates in instructions as well as the effects of MCMs, thereby supporting expressive properties involving both states and orderings for SoC verification. A novel feature of our ILA-MCM framework is the facet abstraction, where a single program variable in an instruction can be associated with multiple auxiliary state variables called facets in the verification model. Facets are useful for modeling

Upload: others

Post on 20-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

ILA-MCM: Integrating Memory Consistency Models withInstruction-Level Abstractions for Heterogeneous System-on-Chip Verification

Hongce Zhang, Caroline Trippel, Yatin A. Manerkar, Aarti Gupta, Margaret Martonosi, Sharad MalikPrinceton University, USA

Abstract—Modern Systems-on-Chip (SoCs) integrate hetero-geneous compute elements ranging from non-programmablespecialized accelerators to programmable CPUs and GPUs. Toensure correct system behavior, SoC verification techniquesmust account for inter-component interactions through sharedmemory, which necessitates reasoning about memory consistencymodels (MCMs) This paper presents ILA-MCM, a symbolicreasoning framework for automated SoC verification, whereMCMs are integrated with Instruction-Level Abstractions (ILAs)that have been recently proposed to model architecture-levelprogram-visible states and state updates in heterogeneous SoCcomponents.

ILA-MCM enables reasoning about system-wide propertiesthat depend on functional state updates as well as orderingrelations between them. Central to our approach is a novel facetabstraction, where a single program-visible variable is associatedwith potentially multiple facets that act as auxiliary state vari-ables. Facets are updated by ILA “instructions,” and the requiredorderings between these updates are captured by MCM axioms.Thus, facets provide a symbolic constraint-based integrationbetween operational ILA models and axiomatic MCM specifica-tions. We have implemented a prototype ILA-MCM frameworkand use it to demonstrate two verification applications in thispaper: (a) finding a known bug in an accelerator-based SoC, plusa new potential bug under a weaker MCM, and (b) checking thata recently proposed low-level GPU hardware implementation iscorrect with respect to a high-level ILA-MCM specification.

I. INTRODUCTION

Systems-on-Chip (SoCs) integrate specialized hardware tomeet the power-performance requirements posed by emergingapplications. Specialized hardware can be programmable (e.g.,Graphics Processing Units or GPUs) or non-programmable(e.g., an AES cryptographic accelerator). They outperformgeneral purpose processors in specific domains like machinelearning [1], scientific computation [2], and cryptographic op-erations [3]. The multiple processing units in an SoC typicallyrun concurrently. This concurrency can be difficult to reasonabout, leading to design and implementation bugs in functionalcorrectness as well as security. Furthermore, when SoCcomponents interact via shared memory or memory-mappedinput and output (MMIO), one also needs to reason aboutmemory consistency models (MCMs). Although programmersgenerally find it easier to think about concurrent code withsequentially consistent (SC) ordering semantics, moderninstruction set architectures (ISAs) have weaker MCMs in aneffort to achieve better performance and scalability.

This work was supported in part by the Applications Driving Architectures(ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.

This work was supported in part by the National Science Foundation, XPSProgram, Grant No. 1628926.

Previous MCM verification efforts have focused onmodeling and analyzing MCMs at different levels of thesoftware/hardware stack in parallel systems [4–11]. Theseapproaches typically use small parallel programs, called litmustests, for reasoning about the MCMs themselves. They focuson ordering relations between simple instructions, ratherthan on symbolic reasoning of complex control and dataflow in programs, which is often needed in SoC verification.Moreover, none of these efforts consider non-programmablehardware accelerators, which may not have an ISA.

Recently, an instruction-centric operational model for het-erogeneous SoC components has been proposed, called anInstruction-Level Abstraction (ILA) [12]. Analogous to a pro-cessor ISA, an ILA models a hardware component’s program-visible states and their updates in the form of instructions. Thisprovides a well-defined interface between sequential softwareand the underlying hardware component. For an accelerator,its ILA instructions correspond to commands at its interface.ILAs have been successfully generated (using semi-automatedsynthesis-based techniques) for many accelerators in prac-tice [12–14]. In the rest of this paper, we use “instructions” todenote ILA instructions, which correspond to instructions ina processor ISA or to derived instructions for an accelerator.

An ILA can uniformly model rich instruction semantics(i.e., including control and data flow) of a single processingunit, e.g., a processor or an accelerator. Although existingMCM specifications and verifiers are well-suited forrepresenting orderings between memory operations of multipleprocessing units, they lack such rich instruction models. Weshow that for general SoC verification, it is essential to reasonabout both rich instructions in heterogeneous components andmemory orderings between them.

In this paper, we address this central challenge by proposinga general symbolic framework called ILA-MCM, shown inFigure 1. In this framework, each processing unit in anSoC, such as a programmable processor or an accelerator,is uniformly represented by an ILA. The MCM is describedusing axioms, as in previous efforts [4–11], but is integratedwith the ILA operational models. This enables our ILA-MCM framework to reason about functional state updates ininstructions as well as the effects of MCMs, thereby supportingexpressive properties involving both states and orderings forSoC verification.

A novel feature of our ILA-MCM framework is the facetabstraction, where a single program variable in an instructioncan be associated with multiple auxiliary state variables calledfacets in the verification model. Facets are useful for modeling

Page 2: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

load

add

jeq

load

add

jeq

...

read

compute

write

read

compute

write

MCM Axiomsfor memory

ordering

...

ILA

Model Constraints

Bounded program sketch

Bounded program sketch

Property Constraints

SMT Solver

ILA(operationalmodel)

(operationalmodel)

Facetsfor integration

accelerator

load

add

jeq

load

add

jeq

...

read

compute

write

read

compute

write

MCM Axiomsfor memory

ordering

...

ILA

Model Constraints

Bounded program sketch

Bounded program sketch

Property Constraints

SMT Solver

ILA(operationalmodel)

(operationalmodel)

Facetsfor integration

processor

load

add

jeq

load

add

jeq

...

read

compute

write

read

compute

write

MCM Axiomsfor memory

ordering

...

ILA

Model Constraints

Bounded program sketch

Bounded program sketch

Property Constraints

SMT Solver

ILA

Facetsfor integrationA processor

An accelerator

(operationalmodel)

(operationalmodel)

accelerator

load

add

jeq

load

add

jeq

...

read

compute

write

read

compute

write

MCM Axiomsfor memory

ordering

...

ILA

Model Constraints

Bounded program sketch

Bounded program sketch

Property Constraints

SMT Solver

ILA(operationalmodel)

(operationalmodel)

Facetsfor integration

processor

Fig. 1. The ILA-MCM Framework for heterogeneous SoC verification

memory subsystems and consistency effects, where differentobservers in an SoC may see logically distinct values of thesame program-visible variable. The allowed values of facetsare constrained by the operational semantics of the instructionsas well as the memory consistency axioms. Thus, facets forma critical link between operational ILA models and axiomaticMCM specifications.

Another feature is that our verification procedure supportsboth operational and axiomatic models in general. (For ex-ample, our second application uses a low-level operationalmodel for memory consistency.) The executions of operationalmodels (e.g., ILAs) are based on a program sketch [15]1,which depends on the property to be verified. This createssymbolic trace events (events, in short). Each event is guardedby a condition and updates the state in an ILA or a facet.The axioms are then instantiated, which may create additionalevents or impose happens-before [16] ordering relations be-tween events. We refer to these sets of constraints as the modelconstraints. Finally, we add property constraints that refer tostates and ordering requirements for verification.

We use standard theories in first order logic to capture allconstraints, including the semantics of instructions in a pro-gram and happens-before ordering relations between events.The formula comprising all constraints is checked by a Sat-isfiability Modulo Theory (SMT) solver [17]. Our frameworksupports diverse verification tasks formulated as SMT queries,including finding bugs (via falsification) or proving correctness(via verification condition generation). We have implementeda prototype ILA-MCM framework and demonstrate its use intwo challenging SoC verification applications in this paper.

To summarize, this paper makes the following contributions:

• ILA-MCM framework: We propose a framework thatcombines operational models for processing cores (in-cluding accelerators) with axiomatic memory consistencymodels to enable SMT-based reasoning of complex inter-actions between hardware, software, and memory subsys-tems in heterogeneous SoCs.

• Facet abstraction: We propose the facet abstraction,where a single program-visible state variable can be

1Similar to automated program synthesis, the “holes” in our program sketchare filled in by a solver.

associated with multiple logically-distinct variables, torepresent updates on program-visible states with memoryconsistency effects. The facets provide the basis for aconstraint-based integration of ILAs with MCMs.

• Evaluation on real-world SoCs designs: First, we showan application of the ILA-MCM framework for findingsecurity bugs in SoC firmware [18], where our support forexpressive properties enables finding a malicious exploitfrom a program sketch. Second, we show an applicationfor checking correctness of a low-level GPU hardwareimplementation [19] against a high-level ILA-MCM spec-ification, where our instruction-centric approach enablesits decomposition into simpler verification tasks.

An overview of various components in the ILA-MCMframework is shown in Figure 2, annotated by the section num-bers that describe these components. We start by introducingthe relevant background on ILAs and MCMs.

II. BACKGROUND

A. Instruction-Level Abstraction (ILA)

An ILA is a uniform abstraction for hardware acceleratorsas well as general-purpose/specialized programmable proces-sors [12]. It is an operational model that captures updatesby hardware to program-visible states (i.e., the states that areaccessible or observable via a user-facing program instruction).It can be viewed as a generalization of the processor ISA in theheterogeneous context, where the instructions for acceleratorsare defined as the commands on their interface that updateprogram-visible states. In an ILA, each instruction has adecode condition, and the instruction executes only when thiscondition is true. An ILA also supports hierarchy, where aninstruction at a high level can be represented as a sequenceof child instructions at a lower level, as shown in Figure 2for Instr A of ILA1 (under the “ILAs” column). Thus,the granularity of ILA instructions can vary, ranging fromprocessor instructions to software functions. Furthermore, anILA is used for modeling a sequential thread of control, whileparallelism is modeled using multiple such threads.

B. Memory Consistency Model (MCM)

An MCM provides a specification to a programmer of theorder in which memory operations appear to execute [20].Sequential consistency (SC), defined by Lamport [21],specifies that: (1) memory accesses preserve the order withineach thread of a program, and (2) across threads, there is anorder of accesses that every observer agrees upon. Despite theintuition of SC, nearly all modern ISAs adopt MCMs weakerthan SC. A weak MCM allows certain memory accesses tobe reordered within a program, and supplies fences or othersynchronization mechanisms to enforce required orders whennecessary. For example, the Total Store Order (TSO) modelallows a load to be reordered with earlier stores that access adifferent address to allow the store-buffer optimization [22].

Figure 3 illustrates the effects of MCMs on a small multi-threaded program with a proposed outcome, called a litmustest. In this litmus test, each thread executes a store (st)

Page 3: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

(a) ILA+MCM Framework (Components)

Program Sketch (𝑃) ILAs (𝐼)

§ III-A

Instr 1 : op ?,?

Instr 2 : store M[1000], ?

Instr 3 : load r?, M[?]

States: S

Instr A Instr Z…

Child-instr 1

Child-instr 2

Child-instr n

hierarchy (optional)

Facets (𝐹)

Facets: F variable.agentFacet events: Instr.wfe.<attr>

Instr.rfe.<attr>

Write-facet event:Instr.wfe.<attr>

Read-facet event:Instr.rfe.<attr>

Axioms (𝐴)

Ordering relations

e.g. rf, fr,co, ppo

Facet-Axioms

Property (𝜙)

𝜙 𝑆, ��, 𝑅

(b) Illustrated ExampleProc, Device, CE

……

SetLock r?…

…P1 P2 P3

Processor Instruction: SetLock @t1 SetLock.wfe.local @t2

SetLock.wfe.global @t3

Verification Procedure

AxiomTSO_WriteFacetOrder

adds the blueHB relation(See also Fig. 5)

𝑆: states��: facets𝑅: orderings

§ II-A § III-B § III-C

lock := r[?] § III-D

lock.proc := r[?]

lock.dev := lock.proclock.CE := lock.proc

happens-before relation (HB) HB

§ II-B

P2 for ILA2

P1 for ILA1

ILA1:

Fig. 2. Components of the ILA-MCM framework, with example fragments.

st [x], 1

st [y], 1

ld r1, [y]

setp.eq p1, r1, 0

@p1 ld r2, [x]

ppo data

rffr

ctrl

ppo

T2:T1:

st [y], 1

ld r2, [x]

st [x], 1

ld r1, [y]ppo ppo

fr fr

T2:T1:

st [r2], 1

ld r1, [r3]

(b)

r2==r3 : forbiddenr2!=r3 : permitted

Reorder

Instruction Toggle: lock ≔ ¬ 𝑙𝑜𝑐𝑘

lock. proc ≔ ¬ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

lock. dev ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐lock. CE ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

Write-facet-events(wfevt)

(b) Instruction and facet-events

Toggle.wfe.local

Toggle.wfe.global

Processor

Device

lock

CE

lock.dev

lock.CE

lock.proc

(a) Agents and facets of lock

Axiom store_DV_N_WG_RELforall s1:store_DV_N | forall s2:STORE (not s1) |( HB[s2.rfe.WG, s1.rfe.WG]

/\ SameWg[s1,s2] ) => HB[s2.rfe.DV, s1.rfe.DV]

STORE

storeDV,N

Sets of possible environmental

transitions

(a) (b)

Abstract Transition

Abstract Transition

Forbidden under SCr1 == 0 ∧ r2 == 0:

st [y], 1

ld r2, [x]

st [x], 1

ld r1, [y] fr fr

T2:T1:

r1 == 0 ∧ r2 == 0:Permitted under TSO

(a) r1 == 0 ∧ r2 == 0:Observable under TSO

Reordering allowed in TSOif r2!=r3

INVL1 WG @1

FETCHL1 [18] @3LD r1 [18] @4

FETCHL1 [19] @2

Abstract Transition

Abstract Transition

LD r2 [19] @6

loadDV,N r1, [18]

loadna r2, [19]

Environmental Transitions

Problem:Later WG-read-facet can happen before

the earlier one

(b)(a)

Fig. 3. A forbidden outcome in SC can become permitted under a weakerMCM. Arrows show the ordering relations (in blue) between instructions.

and then a load (ld) instruction, where all memory locationsand registers are initially 0. Figure 3(a) assumes the SCMCM, and thus forbids a program outcome where both loadinstructions return 0. This is evident in a cycle of edges thatcomprise the preserved program order between the store andload instructions (shown as ppo edges) and the order betweenthe read in one thread and the write in the other (shown byfrom-read (fr) edges). In contrast, under TSO (Figure 3(b)),the ppo edges are removed (since a read can be reordered withan earlier write), so the proposed outcome is permitted sincethere is no cycle. In general, MCMs also consider the co edge(coherence order between writes to the same address) and therf edge (reads-from order from a write to a load which readsfrom that value).

C. Gaps in Prior Work

Despite a rich history of prior work in MCM verification,they lack some key capabilities described below.Symbolic Reasoning with Conditional Orderings. Our maingoal is to support general verification of SoC software andhardware. However, most prior efforts in MCM verifica-tion rely upon an explicit enumeration over addresses, data,and conditional predicates that may affect orderings betweenmemory operations. Specifically, we consider the followingtwo types of conditional orderings: ¶ relations involvingpredicated instructions or instructions after branches, and ·relations involving address/data-dependent values.

For example, Figure 4(a) shows ¶, with a predicate p1 onthe last load instruction in thread T2. Note that the existenceof the load event and the related fr edge (shown as a dashed

st [x], 1

st [y], 1

ld r1, [y]

setp.eq p1, r1, 0

@p1 ld r2, [x]

ppo data

rffr

ctrl

ppo

T2:T1:

st [y], 1

ld r2, [x]

st [x], 1

ld r1, [y]ppo ppo

fr fr

T2:T1:

st [r2], 1

ld r1, [r3]

(b)

r2==r3 : forbiddenr2!=r3 : permitted

Reorder

Instruction Toggle: lock ≔ ¬ 𝑙𝑜𝑐𝑘

lock. proc ≔ ¬ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

lock. dev ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐lock. CE ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

Write-facet-events(wfevt)

(b) Instruction and facet-events

Toggle.wfe.local

Toggle.wfe.global

Processor

Device

lock

CE

lock.dev

lock.CE

lock.proc

(a) Agents and facets of lock

Axiom store_DV_N_WG_RELforall s1:store_DV_N | forall s2:STORE (not s1) |( HB[s2.rfe.WG, s1.rfe.WG]

/\ SameWg[s1,s2] ) => HB[s2.rfe.DV, s1.rfe.DV]

STORE

storeDV,N

Sets of possible environmental

transitions

(a) (b)

Abstract Transition

Abstract Transition

Forbidden under SCr1 == 0 ∧ r2 == 0:

st [y], 1

ld r2, [x]

st [x], 1

ld r1, [y] fr fr

T2:T1:

r1 == 0 ∧ r2 == 0:Permitted under TSO

(a) r1 == 0 ∧ r2 == 0:Observable under TSO

Reordering allowed in TSOif r2!=r3

INVL1 WG @1

FETCHL1 [18] @3LD r1 [18] @4

FETCHL1 [19] @2

Abstract Transition

Abstract Transition

LD r2 [19] @6

loadDV,N r1, [18]

loadna r2, [19]

Environmental Transitions

Problem:Later WG-read-facet can happen before

the earlier one

(b)(a)

Fig. 4. Examples of conditional orderings. (a) @p1 ld executes iff p1 istrue, i.e., iff r1==0 (setting/using predicate p1 is marked in red). (b) InTSO, store-to-load reordering is allowed if the addresses are different.

arrow) are control-dependent. If this control-dependency isignored, the analysis will incorrectly deduce that the graphis cyclic, i.e., the outcome is unobservable. Figure 4(b) showsan example for case ·, where reordering is allowed only whenthe addresses in registers r2 and r3 are different.

In prior MCM efforts based on relational models, e.g.,using Alloy [5] or Check tools [7–11], the addresses anddata are modeled by relational predicates, e.g., whether twoaddresses are the same. However, such relations have tobe pre-specified and are not explored symbolically in thesolver. Similarly, Herd uses enumeration over all possiblevalues of relevant addresses/data. In contrast, ILA-MCM usessymbolic reasoning to represent ordering relations dependenton complex contro/data flow and avoids explicit enumeration.Rich Instruction-Centric Models. Most previous efforts inMCM verification focus on ordering relations between in-structions, rather than on updates of program-visible states.For example, arithmetic instructions are abstracted away inrelational models [5]. In Herd [4], the instructions are hard-coded and do not model bit-precise hardware (e.g., there is noregister overflow behavior). Our goal is to support SoC veri-fication by modeling rich instruction semantics for processorsas well as non-programmable accelerators, which is requiredfor reasoning about general (not just litmus) programs.Expressive Properties. MCM verification has typically fo-cused on specifying orderings and litmus tests, while pro-gram/processor verification has focused on state-based veri-

Page 4: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

fication or control-oriented properties. We aim to support SoCverification using a wide range of expressive properties thatcan refer to both states and orderings.

III. ILA-MCM FRAMEWORK

We now provide details of the main components of ourILA-MCM framework (shown in Figure 2): program sketches,facets, axioms, and verification procedure.

A. Program Sketch

We leverage existing work on programming bysketches [23, 24] to synthesize a program that wouldexercise a bug or abstractly capture unbounded executions.Our program sketch comprises: (1) a set of partially-specifiedstate updates in instructions (and any child instructions), and(2) a partial order on them. Holes (shown as question marksin Figure 2) are allowed in the sketch. These are filled inby the SMT solver during verification. Examples of holesinclude symbolic values (e.g., content of a memory location)or fields in an instruction encoding (e.g., address/data fieldof the store and load in Figure 2).

The program sketch, which needs to be provided by theuser, typically depends on the correctness property. Althougha program sketch has a bounded number of instructions, onecan use an outer procedure to iteratively increase the bound, toperform a deeper search for bugs or for a proof by inductionusing given invariants. In the first column in Figure 2(b), theexample considers an SoC with a processor, a device, anda cryptographic engine (CE). Thus, there are three programsketches (P1, P2, P3) and a SetLock instruction is illustratedin the program sketch (P1) for the processor. The secondcolumn (under ILA) shows the related event, which updatesthe lock variable by the value of some register (left as a holer?) and associates a symbolic timestamp t1 with the event.

B. The Facet Abstraction

To reason about the interactions between SoC componentsvia shared memory, we need to establish a relation betweenprogram variables in instructions of different ILA models viaaxioms in MCMs. We model this using a novel abstractiondescribed below.

1) State Variables for Facets: Facets are auxiliary variablesassociated with a shared program-visible state variable that canbe observed by an “agent,” which may be a thread, a physicalstructure or a processing core/accelerator, depending on theILA modeling granularity. Facets reflect the fact that differentagents may observe distinct values of the same shared variablein different orders. For example, the store-to-load reorderingin TSO can result in the load seeing the new value from thestore earlier than instructions on another thread. In general,each agent can potentially have its own facet for a sharedvariable. In our experience, this per-agent-facet is generalenough to model weak consistency behaviors. (More facets canbe added if one wishes to model memory consistency at themicroarchitecture level, e.g., with store-buffers or caches, etc.)

We use the notation variable.agent for the facet thatcorresponds to a specific agent’s view of a given programvariable. For the example considered in Figure 2(b), supposethere is an on-chip interconnect between the three components,and that there is a register in the device denoting a lock.The device observes its value by directly reading the register,which is regarded as the facet of the device (denoted lock.dev).The device provides a memory-mapped interface, where otheragents can access the lock register as if accessing a memorylocation. We model the lock register seen by the other agentsas facets, denoted lock.proc and lock.CE, respectively.

2) State Updates for Facets: Continuing with our example,the ILA instruction SetLock on the processor can update thelock by writing to the memory-mapped address of the lockregister in the device. The new value may first appear in theprocessor’s local buffer, then go into a cache, and through theinterconnect, propagate to the device and finally update thedevice’s register. This could result in different agents seeingdifferent values in different orders. We model this by creatingnew events: write-facet events to update facets, and read-facetevents to read facets.

For example, TSO can be modeled such that each agenthas a facet for a shared program variable. A store instructioncreates two write-facet events, one to its own facet (localwrite-facet event) and the other to all other facets (globalwrite-facet event). A load instruction corresponds to one read-facet event, since it only needs to read from its own facet.In general, any instructions or child-instructions accessingshared variables can have associated facet events. The valuesthat facet read/write events use for updates are derived fromthe ILA instruction semantics, while the orderings of facetread/write events are specified by the facet-axioms in theMCM. We use the notation instr.wfe/rfe.<attr> to refer tothe write-facet events (wfe) or read-facet events (rfe), relatedto a given instruction (instr), with a given attribute <attr>. Inthe TSO model, <attr> can be local or global. The examplein Figure 2(b) shows two write-facet events (under Facets)related to the SetLock instruction under the TSO model.

C. Facet-Axioms for Integrating ILAs and MCMs

So far, we have described facets as state variables, and newfacet events associated with ILA instructions that update orread them. The orderings between these events are specifiedby MCM axioms. For SC and TSO, the complete set of facet-axioms can be found in the Appendix. We highlight somefragments of these in Figure 5. Note that we uniformly usehappens-before relations (denoted as HB) to specify order-ings between events. In the SC model (top part), all facetread/write events are synchronous (i.e., these events occurat the same time) with the instructions (lines 1-2). In theTSO model (lower part), the two write-facet events (localor global) of a store instruction happen after the instructionand follow the program order (lines 3-9). These axioms aresimilar to those used in prior work, e.g., in the µspec TSOmodel [7], except that facet-axioms relate instructions withfacet read/write events, while µspec axioms relate instructions

Page 5: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

1: Axiom SC_WriteFacetOrder 2: forall w:WRITE | Sync[ w , w.wfe.global ] … 3: Axiom TSO_WriteFacetOrder 4: forall w:WRITE | HB[ w , w.wfe.local ] /\ 5: HB[ w.wfe.local , w.wfe.global ] 6: Axiom TSO_Store 7: forall w1:WRITE | forall w2: WRITE (not w1) | 8: PO[ w1, w2 ] => HB[ w1.wfe.local, w2.wfe.local] 9: /\ HB[ w1.wfe.global, w2.wfe.global] … 10: Axiom RF_CO_FR 11: forall r:READ | exists w:WRITE | 12: SameAddress[w,r] /\ SameData[w,r]/\ w.decode /\RF[w,r]/\( 13: forall w2:WRITE (not w) | ( SameAddress[w,w2] /\ 14: w2.decode )=> CO[w2,w] \/ FR[r,w2] ) 15: Define RF[ w, r] := … 16: Define CO[w1,w2] := … 17: Define FR[ r, w] := … Fig. 5. SC and TSO axioms (fragments)

with microarchitectural structures like pipeline stages andcaches. Further, axioms for other MCMs can be similarlydefined. We have designed these axioms by hand (similar toprior MCM work); addressing their correctness is beyond thescope of this work.

The main highlight of the facet-axioms is that the relationsover facet events in the MCM are linked with control/dataflow in the ILA instructions via predicates interpreted overILA state variables and facets. Consider the RF CO FR axiom(lines 10-14), which states that: (a) all read events shouldread from some executed write event with the same address,and the data values of read and write should match, (b) ifa read r reads from a write w, any other executed write w2

should not interfere. Here, the predicates SameAddress andSameData are interpreted over ILA state variables and facets.Similarly, the symbolic decode condition of an instruction(denoted instruction.decode) is a predicate over ILA statevariables. Note also that the definitions of rf, fr, and co edgesare based on the happens-before relation over facet-events.

D. ILA-MCM Verification ProcedureOur verification procedure is shown in Algorithm 1. Among

its inputs, the first is a program sketch P (T,R), where T isa set of instances2 of partially-specified (child-) instructions,and R is a partial order. Other inputs are a set of ILAs I ,the axioms A, and a property φ. For each possible instructioninstance, the algorithm creates a trace step (simply called step)using the instruction semantics3 (line 5). We also associatea symbolic timestamp with the step, encoded as an integer(ta for step a). Values of timestamps only reflect relativeorderings. Recall that the instructions/child-instructions maylead to facet read/write events, and steps are also created forthese events (lines 6-8). Next, any happens-before orderingsin the program sketch are interpreted as a less-than relationon the associated timestamps (line 10). Then, we instantiatethe quantifiers and interpret the predicates in the axioms over

2Multiple occurrences of the same (child-) instruction are regarded asseparate instances in a trace.

3Although not shown here, we use a concurrent static single assignment(CSSA) encoding [25, 26], where uses of shared state variables are encodedas π-variables and updates to them are encoded as new definitions.

Algorithm 1 ILA-MCM Verification Procedure1: procedure VERIFY(P (T,R), I, A, φ)2: . P (T,R): program sketch P , where T is a set of instances

of (child-) instructions and R is a partial order, I: set ofILAs, A: axioms, φ: property

3: C ← > . C is set of constraints4: for each ts ∈ T do5: C ← C ∧ CreateStep(ts, I)6: T ′ ← AssocFacetEvent(ts, A) . Get facet-events7: for each ts′ ∈ T ′ do8: C ← C ∧ CreateStep(ts′, I)9: for each a→ b ∈ R do

10: C ← C ∧ ta < tb . Orders are on timestamps11: C ← C ∧ InstantiateAxioms(A)12: C ← C ∧ ¬φ13: if SMTCheck(C) = SAT then14: return INVALID, GetModel(C)15: else return VALID

the set of steps (line 11), and add the negation of the property(line 12). Finally, the set of constraints is checked by an SMTsolver. (Our prototype uses Z3 [27].) If the constraints aresatisfiable, we get a counterexample in the form of an eventtrace; otherwise, the property is valid within the space allowedby the program sketch. To verify unbounded correctness,we can check whether given invariants are inductive anduse abstractions to model nondeterministic environments, asdiscussed later in Section IV-B.

IV. VERIFICATION APPLICATIONS

A. Security Bug in a Firmware Load Protocol

1) System Overview: The SoC [18] used in this applicationconsists of a processor, a device, and a cryptographicaccelerator engine (CE). The processor runs a driver thatloads a firmware image onto the device. The CE is responsiblefor authenticating the image before it can be used by thedevice. The SoC has a system memory (SM) that all threeagents can access, and an isolated memory (IM) that can onlybe written by the device but is readable by both the deviceand the CE. The threat model assumes that the driver on theprocessor can be compromised. The attacker’s goal is to foolthe device into running a malicious firmware image that doesnot carry a correct signature.

2) ILAs and Instructions: The first step is to construct anILA for each of the agents: the processor, the device, andthe CE. The set of instructions and child instructions areshown in Figure 6(a) (along with a legend). The processoruses store operations to send commands to the memory-mapped device or the accelerator interface, and can query thestatus via reading through this interface. The ILA instructionsin the processor (device driver) are Send Command Reset,Store Firmware, or Send Command Load. The processoralso has a Receive Report instruction that, when trig-gered by an interrupt, reads from the device’s status reg-ister to learn the result of firmware image authentication.The device ILA has three instructions: Reset, Load andHandle CE Response. The CE ILA has only one instruction(Authentication), which handles the authentication request.

Page 6: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

No. (Child-) instructions in three ILAs1. Send Command: Reset2. Reset3. Store Firmware to SM4. Send Command: Load Firmware5. Load Firmware (child: 5a, 5b)5a. Copy Firmware from SM to IM5b. Send Authentication Request6. Authentication (child: 6a, 6b)6a. Verify Signature6b. Send Response7. Handle Response (child: 7a-7c)7a. Read Status Bit7b. Send Interrupt to Processor7c. If ( status == PASS )

PC := IMAGE_ADDR8. Receive ReportLock Write Lock := “LOCK”

(a) (b) (c) (d)

Design A and Design Bhave the same execution flow, except that Design Bhas a child instruction Lock in Step 6.

1. @1

Device CE

2. @23. @3

4. @4

5a. @6

5b. @7

6a. @9

6b. @13

7a. @17

7b. @18

7c. @19

5. @5

6. @8

7. @16

3. @7

4. @8

5a. @10

5b. @11

5. @9

Processor

1. @1→2

Device CE

2. @3

1. @11→12

3. @7→8

5a. @15→16

5b. @16→17

5. @10

4. @8→9

3. @12→16

4. @16→17

2. @17

5. @18

5a. @23→26

5b. @33→35

Lock. @22→24

6b. @34→36

6. @21

6a. @25

7a. @38

7b. @39

7c. @40

7. @37

Processor

1.

Processor Device CE

2.3.

4.

5a.

5b.

6a.

6b.

7a.

7b.

7c.

5.

6.

7.

8.

(Lock)

E. @T E represents an event, T is the timestamp

E. E represents an event

Gray boxes participate in writing the malicious image

E. @T1→T2 T1 is the time of local facet update, T2 is the time of global facet update

The lock operation added in Design B

Fig. 6. (a) Instructions/child-instructions in the ILAs, plus legend. (b) Intended execution flow for Designs A and B, where a dashed arrow indicates anagent triggering an instruction in another agent via instruction-decode conditions. (c) Malicious exploit for Design A under SC, with event timestamps (@T)generated by the SMT solver. (d) Malicious exploit for Design B under TSO, with timestamps for local/global facet updates also generated by the SMT solver.

The intended execution flow of these instructions is shownin Figure 6(b). First, the driver sends a Reset command to thedevice by writing into the command register and the deviceperforms reset (Step 1 and 2). The driver stores the firmwareimage in a dedicated region in SM (3) and invokes the device(4). Upon receiving the Load Firmware command (5), thedevice copies the firmware image into its IM (child-instruction5a) and sends an authentication request to the CE (5b). TheCE checks the signature of the image in IM (6a), stores theresult into its register and signals the device of its completion(6b). The device will read the verification result from the CE’saddress space (7a) and report the result to the driver (7b). Ifthe result indicates that the image is authenticated, the devicesets its own program counter to point to the firmware locationin IM and starts its execution from there (7c). Finally, theprocessor handles the interrupt and knows that the firmwareimage has been loaded (8).

We refer to the above implementation as Design A, whichis known to have a time-of-check to time-of-use (TOCTOU)vulnerability. Prior work originally identified and presented asolution to this vulnerability, namely Design B [18], where thedevice protects IM contents with a lock that is accessible onlyby the device and the CE. Once locked, the image stored inIM cannot be changed. Our ILA model for Design B is similarto Design A, except that the CE has an extra child-instructionLock in ILA instruction 6 which enables the lock.

3) Program Sketch: We created a program sketch basedon the instructions shown in Figure 6(a), where the solverexplores which instructions to include in the malicious exploitby creating a hole for the decode condition of each instruction.Further, the values and addresses of the stores by the driver

are left as holes in the program sketch.4) Facets and Axioms: In this application, we consider two

possible MCMs: SC and TSO. We use facets and axioms(shown in the Appendix) to model the MCMs.

5) The Property: The SoC should ensure the fol-lowing safety property φ: (DevPC = FwAddr) →Check(IM [FwAddr]) 6= FAIL. It says that when the device’sprogram counter points to the region holding the firmwareimage, the image should not be malicious. Our verificationprocedure aims to synthesize an exploit that violates thisproperty.

6) Results: Under the SC model, our verification proceduresuccessfully reproduced the known malicious exploit [18] forDesign A in 3.5 seconds, with a bound of 30 ILA instructions.The malicious exploit is shown in Figure 6(c), where thetimestamps (@T) found by the SMT solver are shown for eachevent. Note that the correct image is authenticated, but thefirmware overwrites it with a malicious image, which is thenexecuted. This is a TOCTOU vulnerability.

Design B is intended to fix the above issue and workscorrectly under the SC model. However, under the TSOmodel, our verification procedure found a malicious exploitin 6.5 seconds, with a 32-instruction bound. To the best ofour knowledge, this TSO-based vulnerability was not knownbefore. The resulting trace is shown in Figure 6(d), where theessential problem is around timestamp 22 to 24. Although theCE updates the device’s lock register at time 22, the devicedoes not see this update until later. As shown, at time 23, thedevice overwrites the firmware with a malicious image. Thisbug can be fixed by adding a fence on the CE to ensure that thedevice sees the lock before the CE proceeds to authenticate.

Page 7: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

B. Verifying Correctness of a GPU ImplementationGraphics Processing Units (GPUs) often use very weak

consistency models that allow for a large amount of bufferingand reordering of memory requests, to provide mitigationof high memory latency. An operational model of a GPUimplementation is discussed by Wickerson et al. [19]. The im-plementation is intended to be compliant with OpenCL [28] (avariant of the heterogeneous-race-free (HRF) MCM [29]), withan extension called remote scope promotion (RSP) proposedby AMD. Under OpenCL, all programs must be free of dataraces (i.e., two unsynchronized accesses to the same addresswith at least one write); the behavior is undefined otherwise.Synchronization can be achieved by an acquire-load readingfrom a release-store with or promoted to a matched scope.

We aim to verify that the given hardware implementationis correct with respect to a high-level specification modelthat we build in ILA-MCM. We should mention that ourspecification is actually more conservative than the language-level OpenCL+RSP model described by Wickerson et al. –developing an equivalent ILA-MCM model for the latter isleft to future work.

1) ILA-MCM Specification Model: This model comprisesthe functions of store, load, and atomic increment operations,plus the ordering relations they enforce. Each operation mayhave additional attributes that affect the ordering relations: (a)whether it is a release (for a store), an acquire (for a load),neither, or both, (b) the scope of the synchronization, and (c)whether it promotes the scope of a remote synchronization. Wemodel these operations using ILA instructions, where differentattributes lead to different instructions, e.g., store-relaxed andstore-release are modeled as two distinct instructions. Theyhave the same state updates, but the difference in their order-ings is captured by the associated MCM axioms.

The system has a hierarchical structure comprising Mdevices, each device with N workgroups, with a workgrouphaving L threads. For a shared program variable, each threadpossesses a facet, and additionally each workgroup (and eachdevice) also has a facet. A store instruction will first update thefacet of its own thread (TH-facet update), then the facet of itsworkgroup (WG-facet update) and the device facet (DV-facetupdate). A load instruction will have a TH-facet-read event(and potentially WG-facet-read and DV-facet-read events).

For each instruction, we use facet-axioms to model theenforced ordering requirements. For example, for the store-release (device scope, no remote promotion) instructionstoreDV,N , one of its axioms is shown in Figure 7(a). Itsays that for a storeDV,N instruction s1, for all the otherstore instructions s2 different from s1, if they are on the sameworkgroup and there is a happens-before relation on their WG-facet updates, then their DV-facet update events also followa happens-before relation. For each instruction, there can bemultiple axioms specifying its ordering relations with differenttypes of instructions under different conditions.

2) SoC Implementation: The implementation model, fromWickerson et al. [19], is fully operational (does not requirefacets or axioms). It contains a number of GPUs, where each

(a)

st [r2], 1

ld r1, [r3]

r1 == 1 /\ r2 == 0 Observable under TSO

(b)

r2==r3 : forbiddenr2!=r3 : permitted

Reorderst [x], 1

st [y], 1

ld r1, [y]

setp.eq p1, r1, 0

@p1 ld r2, [x]

ppo data

rffr

ctrl

ppo

T2:T1:

In TSO, this reordering is allowed if r2!=r3

Instruction Toggle: lock ≔ ¬ 𝑙𝑜𝑐𝑘

lock. proc ≔ ¬ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

lock. dev ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐lock. CE ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

Write-facet-events(wfevt)

(b) Instruction and facet-events

Toggle.wfe.local

Toggle.wfe.global

Processor

Device

lock

CE

lock.dev

lock.CE

lock.proc

(a) Agents and facets of lock

Axiom store_DV_N_WG_RELforall s1:store_DV_N | forall s2:STORE (not s1) |( HB[s2.rfe.WG, s1.rfe.WG]

/\ SameWg[s1,s2] ) => HB[s2.rfe.DV, s1.rfe.DV]

STORE

storeDV,N

Sets of possible environmental

transitions

(a) (b)

Abstract Transition

Abstract Transition

Fig. 7. (a) An axiom for instruction storeDV,N (b) related program sketch

GPU performs a series of operations to achieve the effect of aninstruction in the high-level specification. These operations aremodeled as child instructions, which make use of the physicallocks, FIFOs, and caches to guarantee correct data transfersand orderings.

We model 13 child instructions. Some examples are LD

(load from L1 cache to register), ST (store from register toL1 cache), FLUL1WG (flush the L1 cache in its workgroup),INVL1WG (invalidate L1 cache of its workgroup). Inside aGPU, there are also other environmental transitions, e.g., astore may later trigger a cacheline flush. We model thesestate changes by child instructions as well.

3) Verification: We verify correctness of the implementa-tion by checking that: (1) the program variables are updated tothe same values as in the specification, and (2) the ordering ofthe updates is correct. The first check corresponds to functionalequivalence checking between child-instructions on the GPUand the instructions in an ILA-MCM model, which can behandled using prior techniques [12]. Therefore, we focus hereon the second check, where we use our facet-axioms asproperties, and check if it is possible to synthesize a sequenceof child instructions whose execution can violate the property.To ensure correctness using bounded traces, we need to furtheruse invariants and abstractions.

We perform verification as follows. First we choose aninstruction from the ILA-MCM specification model, collectaxioms that refer to this instruction, and verify these axiomsone by one. Since our facets and axioms are all instruction-centric, this instruction-based decomposition of the overallverification problem is directly enabled by our ILA-MCMframework, thereby providing a potential scalability benefitin comparison to handling all axioms monolithically.

An axiom may refer to other related instructions. Forexample, in the axiom in Figure 7(a) for the storeDV,N

instruction, there is a reference to another store instruction(of any type). We build a program sketch accordingly, asshown in Figure 7(b) for this example. Here, each of the twowhite boxes (storeDV,N and STORE) denotes the sequence ofchild instructions that implement the high-level specificationinstruction, respectively. Since GPU operations may triggerenvironment transitions, we also add them in our programsketch. Finally, we add abstract transitions before and betweenthe two sequences of child instructions. An abstract transitionis allowed to update the state to any value (i.e., it is ahavoc operation), which is constrained subsequently by giveninvariants. The given invariants are checked separately on allchild instructions (some require checking on all pairs). Anexample invariant is that the tail of a FIFO never passes the

Page 8: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

st [x], 1

st [y], 1

ld r1, [y]

setp.eq p1, r1, 0

@p1 ld r2, [x]

ppo data

rffr

ctrl

ppo

T2:T1:

st [y], 1

ld r2, [x]

st [x], 1

ld r1, [y]ppo ppo

fr fr

T2:T1:

st [r2], 1

ld r1, [r3]

(b)

r2==r3 : forbiddenr2!=r3 : permitted

Reorder

Instruction Toggle: lock ≔ ¬ 𝑙𝑜𝑐𝑘

lock. proc ≔ ¬ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

lock. dev ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐lock. CE ≔ 𝑙𝑜𝑐𝑘. 𝑝𝑟𝑜𝑐

Write-facet-events(wfevt)

(b) Instruction and facet-events

Toggle.wfe.local

Toggle.wfe.global

Processor

Device

lock

CE

lock.dev

lock.CE

lock.proc

(a) Agents and facets of lock

Axiom store_DV_N_WG_RELforall s1:store_DV_N | forall s2:STORE (not s1) |( HB[s2.rfe.WG, s1.rfe.WG]

/\ SameWg[s1,s2] ) => HB[s2.rfe.DV, s1.rfe.DV]

STORE

storeDV,N

Sets of possible environmental

transitions

(a) (b)

Abstract Transition

Abstract Transition

(a) Forbidden under SCr1 == 0 ∧ r2 == 0:

st [y], 1

ld r2, [x]

st [x], 1

ld r1, [y] fr fr

T2:T1:

r1 == 0 ∧ r2 == 0:(b) Permitted under TSO

(a) r1 == 0 ∧ r2 == 0:Observable under TSO

Reordering allowed in TSOif r2!=r3

INVL1 WG @1

FETCHL1 [18] @3LD r1 [18] @4

FETCHL1 [19] @2

Abstract Transition

Abstract Transition

LD r2 [19] @6

loadDV,N r1, [18]

loadna r2, [19]

Environmental Transitions

Problem:Later WG-read-facet can happen before

the earlier one

Fig. 8. The counterexample found for loadDV,N , where the addresses,timestamps and specific environmental transitions are generated by the solver

head, i.e., the FIFO does not underflow. In the future, we aimto maintain a library of invariants and abstract transitions forreuse. Further, the ILA-MCM verification procedure could beintegrated with a general-purpose theorem prover to formallyensure their soundness and aid bookkeeping.

Although our ILA-MCM specifications are parametric, wedo not perform parametric verification here, since the GPUimplementation is fixed by a concrete system configuration(M,N,L). We currently performed verification for M,N,L =2 and M,N,L = 3.

4) Results: For the original GPU implementation verifiedby Wickerson et al. [19], our verification failed with counterex-amples for the following 5 instructions: loadDV,N , loadDV,R,storeDV,R, fetch incDV,N , and fetch incDV,R. Amongthem, loadDV,N , fetch incDV,N , storeDV,R match withthe buggy scenarios discussed in the previous work [19].Specifically, Figure 8 shows a buggy trace that we found forinstruction loadDV,N , where the facet-read event of the laternon-atomic load instruction comes earlier than the facet-readevent of the load-acquire instruction. This violates the load-acquire semantics. On the other hand, the counterexamplesfor fetch incDV,R and loadDV,R are false positives, sincethese traces cannot be extended to litmus tests with a propertyviolation without having data races (prohibited by OpenCL).Interestingly, the proposed changes by Wickerson et al. to thecompiler mappings of OpenCL+RSP operations strengthenedthe ordering guarantees of the hardware operations to matchour ILA-MCM model. Under their new compiler mappings,we successfully validated that the hardware implementationis compliant with our ILA-MCM model. This validation wascompleted in 14 minutes 9 seconds (for M,N,L = 3) on alaptop with a 2.8GHz Core-i5 processor and 16GB memory.

V. RELATED WORK

A. Hardware Specification and VerificationA number of formal hardware abstractions have been de-

veloped that enable verification. Kami [30, 31] is a Coq-basedframework that supports hardware design and verification inBluespec. In comparison to Kami, ILA-MCM is an ISA-levelabstraction that provides the interface between hardware andsoftware. In addition to verifying hardware, it can also be usedfor verifying correctness/security of software interacting withaccelerators, as demonstrated in our paper. Furthermore, it canreason about a wide range of memory consistency behaviors,including SC, TSO, and HRF. In contrast, currently Kami has

only been applied for SC. Finally, the ILA-MCM frameworktargets automated reasoning using SMT solvers, in contrast tointeractive theorem-proving in Kami.

ISA-Formal [32, 33] has been developed to formallymodel and verify ARM processors. As its name suggests, itis an ISA-level model. However, it has not been applied toaccelerators or other heterogeneous SoC components. Further,as far as we know, it has not been integrated with MCMs toreason about multicore memory consistency.

B. MCM and Program VerificationWe have already discussed prior MCM verification tools and

techniques. For reasoning about general concurrent programs,there are many related efforts in weak consistency models [34–36], logics [37, 38], and verification tools [39–41]. Here wediscuss details of specific related ideas.

1) Facets vs. ViCLs: In the Check tools [7–11], the Valuein Cache Lifetime (ViCL) abstraction has been proposed tocapture cache occupancy. Although both facets and ViCLs canmodel multiple “live” data for the same memory location, theyare inherently different. First, facets are state variables that areupdated according to instructions in ILAs and MCM axioms;they are not created or destroyed. In contrast, ViCLs have cre-ation and expiration events in happens-before graphs. Second,facets are more general than ViCLs and are not necessarilytied to caches or other microarchitectural structures. Third,facets enable integration of axiomatic MCMs with operationalinstruction semantics, while the latter are ignored by ViCLs.

2) Facets vs. Views: In recent work [34, 40], a viewabstraction was proposed to model the C11 MCM. Our facetsare different from views as follows: (i) a view is a mapfrom locations to timestamps, whereas facets are auxiliarystate variables, (ii) the views assign explicit timestamps toevents, whereas facet-axioms associate events with symbolictimestamps, whose values are not fixed but explored implicitlyduring verification, (iii) unlike views, facets have been appliedin automated SMT-based reasoning.

VI. CONCLUSIONS

In this paper, we have presented the ILA-MCM framework,which combines the benefits of operational ILA models withaxiomatic MCMs for reasoning about concurrent interactionsbetween heterogeneous components in an SoC. We haveintroduced a novel facet abstraction that models consistencyeffects on program-visible states, and use facet-axioms tospecify consistency ordering requirements. This provides aconstraint-based integration between operational ILA modelsand axiomatic MCM specifications. Our SMT-based verifi-cation procedure supports symbolic reasoning for expressiveproperties involving both rich instruction semantics and or-derings. We have demonstrated two verification applicationsof our prototype ILA-MCM framework, where we reasonedabout an SoC firmware program and a GPU hardware imple-mentation, respectively. Our support for expressive propertiesallowed synthesizing a malicious exploit in the first case,and our instruction-centric approach enabled compositionalverification in the second.

Page 9: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

REFERENCES

[1] C. Pilato, Q. Xu, P. Mantovani, G. Di Guglielmo, and L. P.Carloni, “On the Design of Scalable and Reusable Acceleratorsfor Big Data Applications,” in Proceedings of the ACM Interna-tional Conference on Computing Frontiers, 2016, pp. 406–411.

[2] I. Ohmura, G. Morimoto, Y. Ohno, A. Hasegawa, and M. Taiji,“MDGRAPE-4: a Special-Purpose Computer System for Molec-ular Dynamics Simulations,” Philosophical Transactions, SeriesA, vol. 372, no. 2021, 2014.

[3] J. Rott, “Intel Advanced Encryption Standard Instructions(AES-NI),” Technical Report, Intel, 2012.

[4] J. Alglave and M. Tautschnig, “Herding Cats: Modelling, Sim-ulation, Testing, and Data-Mining for Weak Memory,” ACMTransactions on Programming Languages and Systems, vol. 36,no. 2, pp. 1–7, 2014.

[5] J. Wickerson, M. Batty, T. Sorensen, and G. A. Constantinides,“Automatically Comparing Memory Consistency Models,” inProceedings of the Symposium on Principles of ProgrammingLanguages (POPL), 2016.

[6] J. Bornholt and E. Torlak, “Synthesizing Memory Models fromFramework Sketches and Litmus Tests,” in Proceedings of theConference on Programming Language Design and Implemen-tation (PLDI), 2017, pp. 467–481.

[7] D. Lustig, M. Pellauer, and M. Martonosi, “PipeCheck: Specify-ing and Verifying Microarchitectural Enforcement of MemoryConsistency Models,” in Proceedings of the Annual Interna-tional Symposium on Microarchitecture (MICRO), 2015, pp.635–646.

[8] Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi,“CCICheck: Using hb Graphs to Verify the Coherence-Consistency Interface,” in Proceedings of the Annual Interna-tional Symposium on Microarchitecture (MICRO), 2015, pp. 26–37.

[9] D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee,“COATCheck : Verifying Memory Ordering at the Hardware-OS Interface,” in Proceedings of the International Conferenceon Architectural Support for Programming Languages andOperating Systems (ASPLOS), no. 212, 2016, pp. 233–247.

[10] C. Trippel, Y. A. Manerkar, D. Lustig, M. Pellauer, andM. Martonosi, “TriCheck : Memory Model Verification at theTrisection of Software, Hardware, and ISA,” in Proceedings ofInternational Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS), 2017, pp.119–133.

[11] Y. A. Manerkar, D. Lustig, M. Martonosi, and M. Pellauer,“RTLCheck : Verifying the Memory Consistency of RTL De-signs,” in Proceedings of the Annual International Symposiumon Microarchitecture (MICRO), 2017, pp. 463–476.

[12] B.-Y. Huang, H. Zhang, P. Subramanyan, Y. Vizel, A. Gupta,and S. Malik, “Instruction-Level Abstraction (ILA): A UniformSpecification for System-on-Chip (SoC) Verification,” 2018.[Online]. Available: http://arxiv.org/abs/1801.01114

[13] P. Subramanyan, Y. Vizel, S. Ray, and S. Malik, “Template-based Synthesis of Instruction-Level Abstractions for SoC Ver-ification,” in Proceedings of the Conference on Formal Methodsin Computer-Aided Design (FMCAD), 2017, pp. 160–167.

[14] P. Subramanyan, B.-Y. Huang, Y. Vizel, A. Gupta, andS. Malik, “Template-based Parameterized Synthesis of Uni-form Instruction-Level Abstractions for SoC Verification,” IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems, no. 99, 2017.

[15] R. Alur, R. Bodik, G. Juniwal, M. M. Martin, M. Raghothaman,S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, andA. Udupa, “Syntax-guided synthesis,” in Formal Methods inComputer-Aided Design (FMCAD), 2013, pp. 1–8.

[16] L. Lamport, “Time, Clocks, and the Ordering of Events in

a Distributed System,” Communications of the ACM, vol. 21,no. 7, pp. 558–565, 1978.

[17] C. W. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli,“Satisfiability Modulo Theories,” Handbook of Satisfiability,vol. 185, pp. 825–885, 2009.

[18] S. Krstic, J. Yang, D. W. Palmer, R. B. Osborne, and E. Talmor,“Security of SoC Firmware Load Protocols,” in Proceedingsof the International Symposium on Hardware-Oriented Securityand Trust, 2014, pp. 70–75.

[19] J. Wickerson, M. Batty, B. M. Beckmann, and A. F. Donaldson,“Remote-Scope Promotion : Clarified , Rectified , and Verified,”in Proceedings of the ACM SIGPLAN International Conferenceon Object-Oriented Programming, Systems, Languages, andApplications (OOPSLA), 2015.

[20] S. V. Adve and K. Gharachorloo, “Shared Memory ConsistencyModels: A Tutorial,” Computer, vol. 29, no. 12, pp. 66–76,1996.

[21] L. Lamport, “How to Make a Multiprocessor Computer thatCorrectly Executes Multiprocess Progranm,” IEEE Transactionson Computers, no. 9, pp. 690–691, 1979.

[22] Intel, “Intel R© 64 and IA-32 Architectures Software DevelopersManual,” Volume 3A: System Programming Guide, Chapter 8:Multiple-Processor Management, no. 253665-067US, 2018.

[23] A. Solar-Lezama, R. Rabbah, R. Bodık, and K. Ebcioglu,“Programming by Sketching for Bit-Streaming Programs,” inProceedings of the Conference on Programming LanguageDesign and Implementation (PLDI), 2005, pp. 281–294.

[24] A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, andV. Saraswat, “Combinatorial Sketching for Finite Programs,” inProceedings of the International Conference on ArchitecturalSupport for Programming Languages and Operating Systems(ASPLOS), 2006, pp. 404–415.

[25] J. Lee, D. A. Padua, and S. P. Midkiff, “Basic CompilerAlgorithms for Parallel Programs,” in Symposium on Principlesand Practice of Parallel Programming, 1999, pp. 1–12.

[26] C. Wang, S. Kundu, R. Limaye, M. Ganai, and A. Gupta, “Sym-bolic Predictive Analysis for Concurrent Programs,” FormalAspects of Computing, vol. 23, no. 6, pp. 781–805, 2011.

[27] L. De Moura and N. Bjørner, “Z3: An Efficient SMT Solver,” inProceedings of the International Conference on Tools and Algo-rithms for the Construction and Analysis of Systems (TACAS),2008, pp. 337–340.

[28] L. Howes and A. Munsh, “The OpenCL Specification,” 2015.[Online]. Available: https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf

[29] D. R. Hower, B. A. Hechtman, B. M. Beckmann, B. R. Gaster,M. D. Hill, S. K. Reinhardt, and D. A. Wood, “Heterogeneous-Race-Free Memory Models,” in Proceedings of the Interna-tional Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS), 2014, pp. 427–440.

[30] J. Choi, M. Vijayaraghavan, B. Sherman, A. Chlipala, andArvind, “Kami: a platform for high-level parametric hardwarespecification and its modular verification,” Proceedings of theACM on Programming Languages (PACMPL), vol. 1, no. ICFP,pp. 24:1–24:30, 2017.

[31] M. Vijayaraghavan, A. Chlipala, Arvind, and N. Dave, “ModularDeductive Verification of Multiprocessor Hardware Designs,” inProceedings of the International Conference on Computer AidedVerification (CAV), 2015, pp. 109–127.

[32] A. Reid, “Trustworthy Specifications of ARM R© v8-A and v8-MSystem Level Architecture,” in Proceedings of the Conferenceon Formal Methods in Computer-Aided Design (FMCAD),2017, pp. 161–168.

[33] A. Reid, R. Chen, A. Deligiannis, D. Gilday, D. Hoyes,W. Keen, A. Pathirane, O. Shepherd, P. Vrabel, and A. Zaidi,“End-to-End Verification of ARM R© Processors with ISA-

Page 10: ILA-MCM: Integrating Memory Consistency Models with ...Evaluation on real-world SoCs designs: First, we show an application of the ILA-MCM framework for finding security bugs in SoC

Formal,” in Proceedings of the International Conference onComputer Aided Verification (CAV), 2016, pp. 42–58.

[34] J. Kang, C.-K. Hur, O. Lahav, V. Vafeiadis, and D. Dreyer,“A Promising Semantics for Relaxed-Memory Concurrency,” inProceedings of the Symposium on Principles of ProgrammingLanguages (POPL), 2017, pp. 175–189.

[35] O. Lahav and D. Dreyer, “Repairing Sequential Consistency inC/C++11,” in Proceedings of the Conference on ProgrammingLanguage Design and Implementation (PLDI), 2016.

[36] O. Lahav, N. Giannarakis, and V. Vafeiadis, “Taming Release-Acquire Consistency,” in Proceedings of the Symposium onPrinciples of Programming Languages (POPL), 2016, pp. 649–662.

[37] V. Vafeiadis and C. Narayan, “Relaxed Separation Logic: AProgram Logic for C11 Concurrency,” in Proceedings of theACM SIGPLAN International Conference on Object OrientedProgramming Systems Languages & Applications (OOPSLA),2013, pp. 867–884.

[38] A. Turon, V. Vafeiadis, and D. Dreyer, “GPS: NavigatingWeak Memory with Ghosts, Protocols, and Separation,” inProceedings of the International Conference on Object OrientedProgramming Systems Languages & Applications (OOPSLA),2014, pp. 691–707.

[39] R. Jung, D. Swasey, F. Sieczkowski, K. Svendsen, A. Turon,L. Birkedal, and D. Dreyer, “Iris: Monoids and Invariants as anOrthogonal Basis for Concurrent Reasoning,” in Proceedingsof the Symposium on Principles of Programming Languages(POPL), 2015, pp. 637–650.

[40] J.-O. Kaiser, H.-H. Dang, D. Dreyer, and O. Lahav, “StrongLogic for Weak Memory : Reasoning About Release-AcquireConsistency in Iris,” in Proceedings of the European Conferenceon Object-Oriented Programming (ECOOP), 2017, pp. 1–17.

[41] A. J. Summers and P. Muller, “Automating Deductive Veri-fication for Weak-Memory Programs,” in Proceedings of theInternational Conference on Tools and Algorithms for theConstruction and Analysis of Systems (TACAS), 2018, pp. 190–209.

APPENDIXFACET-AXIOMS FOR SC AND TSO

The facet-axioms for SC and TSO are shown in Figure 9 andFigure 10, respectively. In the SC model, all facet read/writeevents are synchronous with the instructions (lines 7-10 inFigure 9), while in TSO model, the two write-facet events of astore instruction follow the program order of the stores (lines7-13 in Figure 10). Read-facet events are still synchronous(lines 14-15). Fences ensure that previous writes are globallyvisible at that point, and read-modify-write (RMW) is atomicin the sense that its read and write facets are not breakable

(lines 17-21). We define additional functions to specify thecorresponding read-from, from-read, and coherence-order rela-tions based on happens-before (HB) relations over facet events,e.g., lines 13-15 in Figure 9 and lines 23-31 in Figure 10.These functions are defined for use in the first axiom in bothmodels.

Axiom RF_CO_FR 1 forall r:READ | exists w:WRITE | 2 SameAddress[w,r] /\ SameData[w,r]/\ w.decode /\ 3 RF[w,r] /\( forall w2:WRITE (not w) | 4 ( SameAddress[w,w2] /\ w2.decode )=> 5 CO[w2,w] \/ FR[r,w2] ) 6 Axiom SC_WriteFacetOrder 7 forall w:WRITE | Sync[ w , w.wfe.global ] 8 Axiom SC_ReadFacetOrder 9 forall r: READ | Sync[ r , r.rfe.global ] 10 11 Define RF[w,r] := HB[ w.wfe.global , r.rfe.global ] 12 Define FR[r,w] := HB[ r.rfe.global , w.wfe.global ] 13 Define CO[w1,w2] := HB[ w1.wfe.global , w2.wfe.global ] 14

Fig. 9. Facet-Axioms for SC Axiom RF_CO_FR 1 forall r:READ | exists w:WRITE | 2 SameAddress[w,r] /\ SameData[w,r]/\ w.decode /\ 3 RF[w,r] /\( forall w2:WRITE (not w) | 4 ( SameAddress[w,w2] /\ w2.decode )=> 5 CO[w2,w] \/ FR[r,w2] ) 6 Axiom TSO_WriteFacetOrder 7 forall w:WRITE | HB[ w , w.wfe.local ] /\ 8 HB[ w.wfe.local , w.wfe.global ] 9 Axiom TSO_Store 10 forall w1:WRITE | forall w2: WRITE (not w1) | 11 PO[ w1, w2 ] => HB[ w1.wfe.local, w2.wfe.local] 12 /\ HB[ w1.wfe.global, w2.wfe.global] 13 Axiom TSO_ReadFacetOrder 14 forall r:READ | Sync[ r , r.rfe.local ] 15 16 Axiom TSO_Fence 17 forall f:FENCE | forall w: WRITE | PO[w,f] => 18 HB[ w.wfe.global, f ] 19 Axiom TSO_RMW 20 forall i:RMW | 21 Sync[i.rfe.local, i.wfe.local, i.wfe.global] 22 Define RF[w,r] := 23 SameCore[w,r] => HB[w.wfe.local , r.rfe.local ] /\ 24 ~SameCore[w,r] => HB[w.wfe.global, r.rfe.local ] 25 Define FR[r,w] := 26 SameCore[w,r] => HB[r.rfe.local , w.wfe.local ] /\ 27 ~SameCore[w,r] => HB[r.rfe.local, w.wfe.global ] 28 Define CO[w1,w2] := 29 SameCore[w1,w2] => HB[w1.wfe.local, w2.wfe.local] /\ 30 ~SameCore[w1,w2] => HB[w1.wfe.global, w2.wfe.global] 31

Fig. 10. Facet-Axioms for TSO