copyright © bluespec inc. 2006 confidential and proprietary from esl to implementation: reinventing...

198
Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006, Bluespec, Inc.

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

Copyright © Bluespec Inc. 2006 Confidential and Proprietary

From ESL to Implementation:Reinventing Hardware Design

using

Bluespec SystemVerilog™

© 2006, Bluespec, Inc.

Page 2: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

2Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Joe Stoy

Founder and Principal Engineer

Bluespec Inc.14-16 Spring Street

Waltham MA 02451, USA+1 781 250 2206

[email protected]

www.bluespec.com

Page 3: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

3Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogWorkshop Agenda

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigmSynthesis quality: as good as hand-coded RTLTool flowsFutures:

Formal verification

Page 4: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

4Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Intro: why an improved HDL is a central need to address today’s chip design complexities

Page 5: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

5Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Moore’s Law: “Silicon capacitydoubles every 18 to 24 months”

Source: http://www.intel.com/technology/silicon/mooreslaw/index.htm

Today (2005):• ~10-20 M gates• 90nm, 65nm

Page 6: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

6Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Today’s chips: “SoC”s(System on a Chip)

“IP” blocks (“Intellectual Property”)

ProcessorsCaches, MemoriesInterconnectsDMAsOther peripheral blocksI/O blocks

E.g., cell phones, cell network base stations, TV set-top boxes, iPods, digital cameras, …

System Bus

Peripheral Bus

BusBridge

MemoryControllerProcessor

DMAController

DSP

PowerManagement

Arbitration

ApplicationSpecific

DRAMSRAM

L2Cache

Ctlr

SerialController

Audio VideoFlash/Mem

I/FBus

Controller

Page 7: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

7Copyright © Bluespec Inc. 2006 Confidential and Proprietary

ASIC design flow, and costs

Architecture

Design

Verification and Test

Physical Design

time

Can take ~ 12-24 monthsCan cost $10 Million+ (and rising)Bug respin cost + market window cost

Page 8: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

8Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification costs and chip quality are getting worse

66% of new ICs/ASICs require at least one re-spin

75% are due to logical/functional errors(an increase from 71% two years prior)

I C Design Costs

0

5

10

15

20

25

30

0.18µm 0.13µm 90nm

Silicon Feature Dimension

Cost

($

M) Prototype

ValidationPhysicalVerificationArchitecture

Source: IBM/IBS, Inc.

Source: 2004 Collett study

Page 9: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

9Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Design affects everything!

Myth: improving the Design language will have little impact

In fact, the Design language impacts all activities

ArchitectureDesignVerification and Test

Physical Design

ArchitectureDesign

Verification and Test

Physical Design

ArchitectureDesign

Verification and Test

Physical Design

Page 10: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

10Copyright © Bluespec Inc. 2006 Confidential and Proprietary

“It is a profoundly erroneous truism, repeated by all copybooks and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. …”

Alfred North WhiteheadMathematician and philosopher (1861-1947)

[ Example: long division used to be an advanced subject in the days of Roman numerals; Arabic numerals changed that ]

How to improve productivity?

Page 11: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

11Copyright © Bluespec Inc. 2006 Confidential and Proprietary

The language of design is crucial!

Software analogy:

Assembler Fortran C C++ Java

No theoretical difference (all Turing-complete) “I can produce better code by writing it in Assembler”

Maybe, if you are given enough time! “Better” = more efficient, but not more readable, maintainable, or reusable

You can still write incorrect code; you still need to debug; you still need to verify. But the probabilities of certain bugs decrease and the kinds of bugs change, as you go to higher levels:

Register protocol, argument/result-passing protocol, stack protocol, byte/word-alignment issues

Reentrancy and recursion protocols Memory layout of complex data Memory allocation/deallocation Type-misinterpretation Code reuse (parameterization and polymorphism)

Page 12: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

12Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Some lessonsfrom SW language history

The size/complexity of the system that you can build, correctly, within a short time, improves with higher levels of abstraction

But also, crucially, people will not/ cannot use your new higher level language for serious work

if it sacrifices efficiency if it is unpredictable/uncontrollable

Page 13: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

13Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Evolution of HDLs(Hardware Description Languages)

Hand-drawncircuitdiagrams(schematics)

SchematicCapture(automated)

~1985Text-basedRTL langs:Verilog &VHDL

time

IEEE Verilog standards(also VHDL standards)

2004SystemVerilog

(Accellera)

1995 2001 2005

2005IEEE

(RTL = Register-Transfer Level)

?

Page 14: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

14Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Fully synthesizable – without compromise!

Bluespec: Better Design Accelerates Everything!

Architecture

Design

Verification and Test

Physical Design

More architectural flexibility during

design

50% reduction in errors, faster

correction

50% reduction from design to verified

netlist

Architectural exploration

Early executable models

Early executable models

Better reuse

Faster fixes, to achieve closure

Page 15: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

15Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec, Inc. company and technology background

Page 16: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

16Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Sandburst Corp: 10Gb/s core router ASICs(Bluespec: further technology development)

Bluespec, Inc. background

Research@MIT on high-level synthesis & verification

Technology

TechnologyVC funding

VC funding

~1996 2000 2003

Bluespec, Inc.: high-level design and syn-thesis tool(SystemVerilog-based)

Page 17: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

17Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec, Inc.

Headquartered in Waltham, MA ~45 people (MA, CA, Europe, Armenia, India)

Technology, 1997-present MIT research: Professor Arvind, students &

colleagues Patented: HW synthesis from Rules

Active IEEE P1800/Accellera member; SV language contributor, System C language contributor

Page 18: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

18Copyright © Bluespec Inc. 2006 Confidential and Proprietary

What does Bluespec offer?

A new and powerful way to explore and express designs;

tools to simulate and to synthesize into quality RTL;

feeding into existing RTL-to-chip tools/flows

SystemVerilog(design subset)

Verilog 95 RTL

Verilog sim

Bluespec Synthesis

RTL synthesis,Physical design

Tapeout

Bluesim

Rulesand

Rule-based Interfaces

with

CycleAccurate

Page 19: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

19Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec Solutions

Page 20: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

20Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec core technologies

Design – executable specifications Synthesizable, high-level concurrency semantics Transactional interfaces for design with self-documenting protocol

Verification – static and formal Strong type checking Interface connectivity and protocol checking Race condition identification and management Multiple-domain clock and interface checking Rapid simulation with C/C++ functions

Page 21: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

21Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec tools

Scheduling

Optimization

RTL Generation

Static Checking

Power Optimization

Parsing Parsing

BSVSystemC [ESE]

RTL

gcc

libsystemc.h

.exe

TRANSLATE

CommonSynthesis

Engine

Bluespec Synthesis BluesimSystemC Simulation

Rapid,Source-Level

Simulation andInteractive

Debug of BSV

Cycle-Accurate

w/Verilog sim

Cycle-Accurate

w/Verilog sim

Blu

evie

w D

ebu

g

Page 22: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

22Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Creating ESL methodologies

Abstraction Level Purposes Design Components Prerequisites

Bandwidth Accurate

Architectural Exploration

Transactions, Functional Model

Simulation speed, instrumentation, protocol checking

Latency Accurate

Software Test Platform

Functional Model with accurate timing and full concurrency

Simulation speed and register interfaces

Cycle Accurate

Power Optimization &

Firmware Development

Defined Buses, registers,

concurrency

Rapid changes in micro-architecture and automatic RTL

generation

Bit AccurateImplementation

& Integration

Automatically generated with rules & formal

interfaces

Easy ECOs and timing closure

Con

sist

ent

Con

nect

ivity

thr

ough

For

mal

I/F

Met

hods

Con

sist

ent

Ver

ifica

tion

and

Deb

uggi

ng P

arad

igm

s

Page 23: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

23Copyright © Bluespec Inc. 2006 Confidential and Proprietary

ESL to Implementation

BSVSystemC [ESE]

RTL

gcc

libsystemc.h

.exe

TRANSLATE

BluespecSynthesis

Bluesim

Blu

evie

w

SystemCSimulation

Technologies Tools Methodologies

Concurrency Semantics

Formal Interfaces

Static, Formal Checking

Low Power Optimization

Bandwidth Accurate

Latency Accurate

Cycle Accurate

Bit Accurate

Page 24: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

24Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 25: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

25Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Str

uctu

ral

Beh

avio

ral

Bluespec SystemVerilog™A one slide overview

Rules and Rule-based Interfaces

For complex concurrency and control, across multiple shared resources, across module boundaries

Two dimensions raising the level of abstraction (fully synthesizable)

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

High-level abstract typesPowerful static checking

Powerful parameterizationPowerful static elaboration

Advanced clock management

Page 26: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

26Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Str

uctu

ral

Beh

avio

ral

Bluespec SystemVerilog™A one-slide overview

Rules and Rule-based Interfaces

For complex concurrency and control, across multiple shared resources, across module boundaries

Two dimensions raising the level of abstraction (fully synthesizable)

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

High-level abstract typesPowerful static checking

Powerful parameterizationPowerful static elaboration

Advanced clock management

Page 27: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

27Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Complex concurrencywith shared resources

HW by its very nature is highly concurrent A HW design can be viewed as a set of cooperating concurrent

FSMs The cooperation occurs through shared resources

Today’s SoCs have enormous amounts of complicated concurrency and shared resourcesHow do we express this today?

Concurrency expressed with processes (“always” blocks in RTL) Access to shared resources are tediously micro-managed (if-

then-elses inside always blocks)

Unfortunately: this does not scale Leads to race conditions (inconsistent state in the shared

resources) which are very tricky to discover, diagnose, fix

Page 28: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

28Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Simple example withconcurrency and shared resources

Process 0: increments register x when cond0

Process 1: transfers a unit from register x to register y when cond1

Process 2: decrements register y when cond2

Each register can only be updated by one process on each clock. Priority: 2 > 1 > 0

Just like real applications, e.g.: Packet arrives, is processed, departs

0 1 2x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0

cond0 cond1 cond2

Page 29: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

29Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Which oneis correct?

What’s required to verify that they’re correct?What if the priorities changed: cond1 > cond2 > cond0?What if the processes are in different modules?

always @(posedge CLK) begin

if (!cond2 || cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

0 1 2x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0cond0 cond1 cond2

always @(posedge CLK) begin

if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

Page 30: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

30Copyright © Bluespec Inc. 2006 Confidential and Proprietary

With Bluespec, the design is direct

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0); x <= x + 1;endrule

rule proc1 (cond1); y <= y + 1; x <= x – 1;endrule

rule proc2 (cond2); y <= y – 1;endrule

Hand-written RTL:Complexity due to: State-centric (for synthesizability) Scheduling clutter

BSV:Functional correctness follows directly from rule semantics

Executable spec (operation-centric)

Automatic handling of shared resource mux logic

Same hardware as the RTL

0 1 2x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0

cond0 cond1 cond2

Page 31: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

31Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Now, let’s make a small change: add a new process and insert its priority

01

2

x y

+1

-1 +1

-1

Process priority: 2 > 3 > 1 > 0

cond0 cond1 cond2

3+2 -2

cond3

Page 32: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

32Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Process priority: 2 > 3 > 1 > 0

Changing the Bluespec design

01

2

x y

+1

-1 +1

-1

cond0 cond1 cond2

3+2 -2

cond3

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0); x <= x + 1;endrule

rule proc1 (cond1); y <= y + 1; x <= x – 1;endrule

rule proc2 (cond2); y <= y – 1;endrule

(* descending_urgency = "proc2, proc3, proc1, proc0" *) rule proc0 (cond0); x <= x + 1;endrule rule proc1 (cond1); y <= y + 1; x <= x - 1;endrule rule proc2 (cond2); y <= y - 1; x <= x + 1;endrule rule proc3 (cond3); y <= y - 2; x <= x + 2;endrule

Pre-Change

?

Page 33: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

33Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Process priority: 2 > 3 > 1 > 0

Changing the Verilog design

01

2

x y

+1

-1 +1

-1

cond0 cond1 cond2

3+2 -2

cond3

always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

always @(posedge CLK) begin if ((cond2 && cond0) || (cond0 && !cond1 && !cond3)) x <= x + 1; else if (cond3 && !cond2) x <= x + 2; else if (cond1 && !cond2) x <= x - 1 if (cond2) y <= y - 1; else if (cond3) y <= y - 2; else if (cond1) y <= y + 1;end

Pre-Change

?

Page 34: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

34Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Key Benefits

Executable specificationsRapid changesBut, with fine-grained control of RTL:

Define the optimal architecture/micro-architecture

Debug at the source OR RTL level – designer understands both

The Quality of Results (QoR) of RTL!

Page 35: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

35Copyright © Bluespec Inc. 2006 Confidential and Proprietary

The concurrency complexities illustrated in the simple example are greatly magnified in real designs

Page 36: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

36Copyright © Bluespec Inc. 2006 Confidential and Proprietary

A more complex example,from CPU design

Dave & Arvind, 2003

Speculative, out-of-orderMany, many concurrent activities

Branch

RegisterFile

ALUUnitRe-

OrderBuffer(ROB) MEM

Unit

DataMemory

InstructionMemory

Fetch Decode

FIFO

FIFO FIFO FIFO FIFO

FIFO

FIFOFIFO

FIFOFIFORe-

OrderBuffer(ROB)

Branch

RegisterFile

ALUUnit

MEMUnit

DataMemory

InstructionMemory

Fetch Decode

Page 37: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

37Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Many concurrent actions on common state: nightmare to manage explicitly

EmptyWaiting

EW

Head

Tail

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V 0 -Instr B V 0W

V 0 -Instr C V 0W

-Instr D V 0W

V 0 -Instr A V 0W

V - -Instr - V -

V - -Instr - V -E

E

E

E

E

E

E

E

E

E

E

E

V 0

Re-Order Buffer

Put aninstr into

ROB

DecodeUnit

RegisterFile

Get operandsfor instr

Writebackresults

Get a readyALU instr

Get a readyMEM instr

Put ALU instr results in ROB

Put MEM instr results in ROB

ALUUnit(s)

MEMUnit(s)Resolve

branches

Operand 1 ResultInstruction Operand 2State

Page 38: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

38Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Branch Resolution• …• …• …

Commit Instr• Write results to registerfile (or allow memorywrite for store)• Set to Empty• Increment head pointer

Write Back Results to ROB• Write back results toinstr result• Write back to all waitingtags• Set to done

Dispatch Instr• Mark instructiondispatched• Forward to appropriateunit

But in BSV…

..you can code each operation in isolation, as a rule

..the tool guarantees that operations are INTERLOCKED (i.e. each runs to completion without external interference)

Insert Instr in ROB• Put instruction in firstavailable slot• Increment tail pointer• Get source operands

- RF <or> prev instr

Page 39: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

39Copyright © Bluespec Inc. 2006 Confidential and Proprietary

The key:

Rules execute atomically

Reference semantics:while some rules are enabled

choose one enabled rule execute it

Page 40: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

40Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Atomicity

atomic

Page 41: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

41Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Atomicity

ατομος

Page 42: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

42Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Atomicity

a_tomic not

asymmetric atypical amoral

Page 43: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

43Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Atomicity

a_tomic not

asymmetric atypical amoral

cut microtome Tomography appendectomy tome (of a multi_volume book)

Page 44: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

44Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Atomicity

Rules are atomic“Not cut”

Whenever they run, they run to completion never interrupted

No other activities are interleaved with them

This greatly simplifies design avoids many race conditions easier to prove invariants

Page 45: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

45Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Extensive supporting theoryin computer science literature

Term Rewriting Systems, Terese, Cambridge Univ. Press, 2003, 884 pp.Parallel Program Design: A Foundation, K. Mani Chandy and Jayadev Misra, Addison Wesley, 1988

UNITY programming language for concurrent, reactive systemsTerm Rewriting and All That, Franz Baader and Tobias Nipkow, Cambridge Univ. Press, 1998, 300pp.Using Term Rewriting Systems to Design and Verify Processors, Arvind and Xiaowei Shen, IEEE Micro 19:3, 1998, p36-46Proofs of Correctness of Cache-Coherence Protocols, Stoy et al, in Formal Methods for Increasing Software Productivity, Berlin, Germany, 2001, Springer-Verlag LNCS 2021Superscalar Processors via Automatic Microarchitecture Transformation, Mieszko Lis, Masters thesis, Dept. of Electrical Eng. and Computer Science, MIT, 2000… and more …

The intuitions underlying this theoryare easy to use in practice

Page 46: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

46Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Synthesizing Rules into efficient clocked synchronous HW

- Automatically generates correct HW for the most error-prone parts of hand-written RTL

- While retaining transparency, predictability and designer control

Page 47: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

47Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Clocked synchronous hardware

The compiler translates BSV source code into Verilog RTL

TransitionLogic

IOS“Next” SCollection

ofState

Elements

Page 48: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

48Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Clocked semantics

Reference semantics:while some rules are enabled

choose one enabled rule execute it

Clocked semantics:every clock cycle: execute as many rules as you

can provided the overall effect is as

if they executed serially in some order

Page 49: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

49Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Rule semanticsmapped to hardware semantics

Rules

HW

Ri Rj Rk

clocks

rule steps

Ri

RjRk

The effect of each cycle is as if a sequence of ruleswas executed one-at-a-time

Consequence: The HW state can never be aninterleaving of actions from different rules

Rule atomicity (therefore, correctness) is preserved

Page 50: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

50Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Synthesizing a single rule

x

y

zcurrentstate

nextstate

enablesignals

x’

y’

z’

rule foo (… cond … (x < y) …); … action … x <= x + z …endrule

next-statevaluesQ D

EN

actionlogic

condlogic

rule foo

Page 51: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

51Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Synthesizing multiple rules

Different rules can read/write common state. Therefore,

Need multiplexing of next state values into shared state element inputs

Need control of which rules get to update next state elements

Control of next state “enables” Control of next state data multiplexers

Page 52: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

52Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Synthesizing multiple rules

Scheduler ensures consistency with Rule semanticsUsually the most error-prone part of hand-written RTL

Here, correct by constructionBluespec patented technology

Scheduler

DataSelect

State

D Q

Enable

RuleN

CondN

ActionN

Cond1

Action1

Rule1 Rule Control

Page 53: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

53Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Transparency and predictability

Scheduler

DataSelect

State

D Q

Enable

RuleN

CondN

ActionN

Cond1

Action1

Rule1 Rule Control

Bluespec synthesisonly adds this part

User-specified structures dominates area, critical pathsMicroarchitecture remains completely under user control

Page 54: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

54Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Comparing BSV to traditional“Behavioral Synthesis”

Page 55: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

55Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Function vs. Algorithm

People often say: “I’m describing the algorithm of my HW block using C/C++ or Behavioral RTL”Actually, they’re describing the function, not the algorithm

A function: spec of I/O behavior, without consideration for implementability, and in particular without consideration for cost in space (circuitry) or time (performance)

An algorithm: a specific implementation with a particular cost model

Different computation models, with different cost models, usually require radically different algorithms for implementing the same function

Page 56: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

56Copyright © Bluespec Inc. 2006 Confidential and Proprietary

“Behavioral Synthesis”

Past products: Synopsys Behavioral Compiler

(withdrawn) Get2Chip (absorbed into

Cadence)

Current products: Mentor’s CatapultC Synfora Forte (in SystemC) …

Behavioral Synthesis tool

“Behavior” of designexpressed as sequential program(e.g., in C or procedural Verilog)

RTL

Page 57: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

57Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Behavioral Synthesis:the technology has a long history

Control-flow graph (sequential CDFG)

Parallel CDFG(Control/Data Flow Graph)

Parsing …

Dependency Analysis and associatedtransforms (“automatic parallelization”)

Vector computers (~1975 …)

VLIW/IA64, Cellular, SIMD, dataflow, SMP, cluster, cache-

friendly, … (~1980s …)

Hardware (RTL)(~1990s …)

Tractable only forcertain loop-and-array codes, without anycomplex control (where it can workspectacularly well)

Synthesis (target-specific)

Sequential source program

Page 58: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

58Copyright © Bluespec Inc. 2006 Confidential and Proprietary

The “Automatic Parallelization” problem

The input (C program) is totally sequential, because of C semantics

We want the synthesized hardware to exploit parallelism, for high performance

The Automatic Parallelization problem: Undo/remove the input’s sequentiality, converting into a parallel form

Page 59: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

59Copyright © Bluespec Inc. 2006 Confidential and Proprietary

“Automatic Parallelization”:Example — matrix multiplication

void matmult (int A[N,N], B[N,N], C[N,N]){ int i, j, k, innerProductSum; for (i = 0; i < N; i++) for (j = 0; j < N; j++) { innerProductSum = 0; for (k = 0; k < N; k++) innerProductSum += A[i,k] * B[k,j]; C[i,j] = innerProductSum; }}

C A B

innerproduct

i

j

i

j

Page 60: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

60Copyright © Bluespec Inc. 2006 Confidential and Proprietary

+

x

+

x

+

x

+

x

“Automatic Parallelization”:Example — matrix multiplication

Can the k loop (inner product) be executed in parallel?

The “*”s can be done in parallel, but the “+”s are still sequenced

A[i,*]B[*,j]

0 C[i,j]

k=0 k=N-1

Page 61: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

61Copyright © Bluespec Inc. 2006 Confidential and Proprietary

+

x

+

xx

+

xx

+

x

“Automatic Parallelization”:Example — matrix multiplication

A clever compiler could transform it into tree accumulation, which has more parallelism

Depends on commutativity, associativity of “+” May not be true if the integers can overflow! May not be true for floating point numbers!

A[i,*]B[*,j]

C[i,j]

k=0 k=N-1

Page 62: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

62Copyright © Bluespec Inc. 2006 Confidential and Proprietary

“Automatic Parallelization”:Example — matrix multiplication

Can the i and j loops be executed in parallel? Not as written, because all the k loops read and write a single

common variable, “innerProductSum”! A clever compiler can eliminate this using “scalar expansion”:

converting it into an array Note: most clever programmers would do the opposite!

void matmult (int A[N,N], B[N,N], C[N,N]){ int i, j, k, innerProductSum [N,N] ; for (i = 0; i < N; i++) for (j = 0; j < N; j++) { innerProductSum [i,j] = 0; for (k = 0; k < N; k++) innerProductSum [i,j] += A[i,k] * B[k,j]; C[i,j] = innerProductSum [i,j] ; }}

Page 63: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

63Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Automatic Parallelization: history

Studied extensively since the 1960s (vectorizing/ parallelizing/ VLIW/ EPIC software compilers)

Fundamental problems: Complex control structures, pointers and aliasing (memory

indirection), dynamic data allocation, … are all difficult/ impossible to parallelize automatically

C is often a bad starting point: best parallel algorithm for a given function can be quite different from best sequential algorithm

Parallel algorithm designers prefer to start with a clean slate from a functional specification, not a C algorithm with unnecessary sequential baggage

Has succeeded only in limited domain: simple array-based loop nests

SW community has abandoned automatic parallelization of general-purpose programs; is mostly used only for scientific/ technical computing, linear algebra, …

Page 64: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

64Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Automatic Parallelization:transparency, predictability, controllability

Another common issue with automatic parallelization and behavioral synthesis

Designer loses intuition and precise control over generated output

Behavioral synthesis: tool decides microarchitecture based on complex optimization criteria

“What HW will result, with this input C program?” “What will be the effect on the resulting HW, if I make this

change to the input C program?” “What change should I make to the input C program, to

improve the HW in this way?”

Page 65: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

65Copyright © Bluespec Inc. 2006 Confidential and Proprietary

ComplexDatapaths

(e.g.processor/controller)

ComplexDatapaths

(e.g.processor/controller)

ControlControl

TechnicalAlgorithms

(e.g. DSP/math)

TechnicalAlgorithms

(e.g. DSP/math)

System Bus

Peripheral Bus

BusBridge

MemoryControllerProcessor

DMAController

DSP

PowerManagement

Arbitration

ApplicationSpecific

DRAMSRAM

L2Cache

SerialController

Audio VideoFlash/Mem

I/FBus

Controller

Behavioral Synthesis:Applicability

IDCTMotion compensatorDES

FIR filter

Only few IP blocks may benefit from Behavioral Synthesis

Page 66: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

66Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Comparing “Model of Time” in BSV vs. Automatic Synthesis from C/C++

C/C++: completely untimed No relationship between source model of time

(sequential C code execution) and target model of time (HW clocks)

BSV: untimed to timed Initially, designer writes arbitrarily complex rules,

i.e., any amount of functional computation per rule Designer refines this (splitting rules, if necessary) so

that the functional computation per rule is feasible in HW in a target clock speed/ technology

BSV tool schedules multiple rules per clock

Page 67: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

67Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Comparing Concurrency Model inBSV vs. SystemC

BSV: Rules Atomic transactions Tool generates control logic to manage concurrency

SystemC Threads and events

Higher-level synchronization abstractions built on top of events: semaphores, locks, blocking methods, …

Designer manages atomicity explicitly (consistent access to multiple shared resources)

Page 68: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

68Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Historical improvementsin concurrency control

CycleAccounting

Semaphores(locks, events, …)

Atomic objects(structured locking)

SW:pthreads

HW: RTL, SystemC

SW:Java

HW: Bluespec

Atomic transactions(multiple resources)

Higher level(less error-prone)

today1950

SW: Database Systems,Distributed Systems

Page 69: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

69Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Elevating designabove RTL

Bluespec Explicit<LOC

Rules withMethods

<LOC

SystemCFunctionality with tool

determined structure and resources, but only for

simple array-based FOR loops (for SystemC:

anything else is explicit)<LOC

ExplicitlyManaged

Wires<LOC

C/C++ N/A

RTL ExplicitExplicitlyManaged

Wires

CORRESPONDENCE/TRANSPARENCY TO HARDWARE STRUCTURE

CONCURRENCY/COORDINATION

COMMUNICATION

<LOC = Fewer Lines of Code than RTL

Page 70: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

70Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 71: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

71Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Str

uctu

ral

Beh

avio

ral

Bluespec SystemVerilog™A one-slide overview

Rules and Rule-based Interfaces

For complex concurrency and control, across multiple shared resources, across module boundaries

Two dimensions raising the level of abstraction (fully synthesizable)

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

High-level abstract typesPowerful static checking

Powerful parameterizationPowerful static elaboration

Advanced clock management

Page 72: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

72Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Consider a FIFO, in RTL

enq() first()/deq()

module mkFIFO_model (output notFull, input [31:0] dataIn, input enq_enab, output notEmpty, output [31:0] first, input deq_enab); …endmodule

module mkFIFO_implem (output notFull, input [31:0] dataIn, input enq_enab, output notEmpty, output [31:0] first, input deq_enab); …endmodule

32

32notFull

enq_enab

notEmpty

deq_enab

mkFIFO

dataIn

first

In Verilog:

Page 73: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

73Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Modules written to be used by others require detailed specifications

A small sample of the informal,written interface specification (8 pages):

data_outdata_in

push_req_n

clk

pop_req_n

rstn

full

empty“Designware” FIFO and associated documentation

Page 74: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

74Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Module interfaces: summarycritique of today’s RTL methodology

Two modules that implement the same interface have to repeat the same port list (tedious, error prone)Interfaces are flat, unstructured port lists

No concept of grouping ports according to “transactions”

No specification of behavior on the interface “enq_enab allowed only if notFull” “data_in should be valid with enq_enab” “first only valid if notEmpty” “deq_enab allowed only if notEmpty”

Behavior is typically specified in ad hoc text and timing diagrams

Verification obligation, often to incomplete specs

Page 75: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

75Copyright © Bluespec Inc. 2006 Confidential and Proprietary

A FIFO in SystemVerilog

enq() first()/deq()

interface FIFO; bit notFull, enq_enab; bit [31:0] dataIn; bit notEmpty, deq_enab, bit [31:0] first; modport ifc (output notFull, notEmpty, first, input dataIn, enq_enab, deq_enab);endinterface

module mkFIFO_model (FIFO.ifc); …endmodule

module mkFIFO_implem (FIFO.ifc); …endmodule

32

32notFull

enq_enab

notEmpty

deq_enab

en

q

de

qfir

st

mkFIFO

dataIn

first

In SystemVerilog:

Page 76: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

76Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Module interfaces: summarycritique of SystemVerilog methodology

Interface port lists are separately specified (independent of any module implementing the interface)

Two modules that implement the same interface can share the same interface definition (improves “plug and play”)

But, still: Interfaces are flat, unstructured port lists

No concept of grouping ports according to “transactions” No specification of behavior on the interface

“enq_enab allowed only if notFull” “data_in should be valid with enq_enab” “first only valid if notEmpty” “deq_enab allowed only if notEmpty”

Behavior is typically specified in ad hoc text and timing diagrams Verification obligation, often to incomplete specs

Note: SV does allow definition of tasks and functions inside an interface definition, and this provides some limited ability to group according to transactions and to encapsulate interface behavior

Page 77: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

77Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Rule-based Interfaces

Robust, parameterizable, correct-by-construction way to express interactions with a module

Extend Rule Semantics across module boundaries

Capture the protocol of a complete “transaction” with a module

Capture inter-transaction scheduling constraints

Page 78: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

78Copyright © Bluespec Inc. 2006 Confidential and Proprietary

A FIFO in BSV

interface FIFO#(type itemType); method Action enq (itemType x); method itemType first (); method Action deq (); method Action clear ();endinterface

Each method captures a complete transaction protocol: RDY

e.g., enq() is allowed (the FIFO is not full) e.g., deq() is allowed (the FIFO is not empty)

ENABLE e.g., when enq() or deq() is invoked

Input data buses (method arguments) Output data buses (method results)

More abstract than port lists and ad hoc timing diagrams Never have any timing errors at interfaces

enq() first()/deq()

Page 79: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

79Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Methods map directly into HW ports: FIFO

rdy

enabn

rdy

enq

clea

r

not full

always true

Any m

odule

that

pro

vid

es

a F

IFO

inte

rface

enab

enq():• n-bit argument• has side effect (Action)

first():• no argument• n-bit result

deq():• no argument• has side effect (Action)

clear():• no argument• has side effect (Action)

rdy

enab

deq

not empty

n

rdy first

not empty

Page 80: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

80Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Interface methods are HW!

Interface method declarations look like functions/ procedures in SWUses of interface methods look like function/ procedure calls in SW

But: think HW, not SW or process simulation!

A definition of an interface method in a module is a manifest bit of circuitry behind its portsA use of an interface method is just a set of connections (wires) to the module interface portsThere is no “call/execute/return”, stack frame, …!

Page 81: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

81Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Interface methods fit smoothlyinto rules

module … FIFO#(int) iFifo <- mkFIFO; FIFO#(int) oFifo1 <- mkFIFO; FIFO#(int) oFifo2 <- mkFIFO;

rule (iFifo.first[0] == 0); iFifo.deq; oFifo1.enq (iFifo.first); endrule

rule (iFifo.first[0] == 1); iFifo.deq; oFifo2.enq (iFifo.first); endrule endmodule

route

All the implicit conditions (notFull, notEmpty) are automaticallyhandled by incorporating into Rule conditions.This eliminates much clutter, and improves correctness.

Page 82: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

82Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Module interfaces: Inter-transaction scheduling constraints

“With my FIFO, you can enq and deq simultaneously …

Engineer 1“naiveFIFO”

… in most cases, but not if it’s either empty or full.”

Engineer 2“PipelineFIFO”

… even if it’s full.(Think of it as a deq first, making room for a following enq, but squeezed into a single clock. This naturally fits into regster semantics: read old value, write new value. ) ”

Engineer 3“BypassFIFO”

… even if it’s empty.(Think of it as an enq first, making an item available for a following deq, but squeezed into a single clock. This is just a bypass of a value from input to output. ) ”

Architect to Engineers: Please design for me a FIFO in which I can enq and deq simultaneously (i.e., in the same clock)

Page 83: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

83Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Inter-transaction scheduling constraints

enq() deq()

# of elements in FIFO

0 1 2

NaïveFIFO enq enq || deq deq

PipelineFIFO enq enq || deq deq < enq

BypassFIFO enq < deq enq || deq deq

For 3 FIFO designs (capacity 2) and various conditions, allowable operations and their “in the same clock” semantics

Page 84: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

84Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Module interfaces:Inter-transaction scheduling constraints

The FIFO variants have the same interface methods/wires, but differ only in scheduling of the interface transactions

“enq || deq” “deq < enq” “enq < deq”

They have different latency properties NaïveFIFO, PipelineFIFO: minimum 1-tick latency BypassFIFO: minimum 0-tick latency

This can affect “alignment” with associated data on other datapaths

Their control circuits have different properties: PipelineFIFO: “notFull” depends on “deq_enab” BypassFIFO: “notEmpty” depends on “enq_enab”

Their data paths have different properties: BypassFIFO: combinational path from data in to data out (can affect

timing closure)

Page 85: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

85Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Module interfaces:Inter-transaction scheduling constraints

“Client” HW that uses one of these FIFOs will be different, depending on which variant is used

Different control logic to obey different scheduling requirements

In RTL, These difference are often undocumented, or poorly

documented, or poorly communicated from FIFO designer to FIFO user

more verification surprises, bugs

With Rule-based Interface Methods Precise vocabulary to specify and communicate scheduling Control HW in client is automatically synthesized to take into

account scheduling differences

Page 86: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

86Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Broad-brush differencesbetween BSV and RTL:

Module hierarchy

BSV has exactly the same notion of module hierarchy as RTL

In fact, more stringently so: even registers are modules (at the leaves of the hierarchy). In BSV, ordinary variables never represent registers.

Thus, designers exercise precise control over microarchitecture

“If so, how can BSV be a high-level HDL?” Microarchitecture is the creative (and fun) part of HW design; it

distinguishes good designs from bad. The designer should remain involved in this.

Complex concurrency and control is the hard and tedious part of HW design; it’s where most errors arise. BSV’s Rules dramatically simplify and automate this.

Page 87: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

87Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Modules, rules, interfaces, methods

The big picture: modules contain rules which use methods that are provided by sub-modules in their interfaces. Methods, too, can use other methods.interface

state

rule

module

Page 88: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

88Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example: a 2x2 switch, with stats

Packets arrive on two input FIFOs, and must be switched to two output FIFOsCertain “interesting packets” must be counted

Dete

rmin

eQ

ueue

Dete

rmin

eQ

ueue

+1

Countcertain packets

Page 89: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

89Copyright © Bluespec Inc. 2006 Confidential and Proprietary

2x2 switch specs

Input FIFOs can be emptyOutput FIFOs can be full

Shared resource collision on an output FIFO: if packets available on both input FIFOs, both have same destination,

and destination FIFO is not full

Shared resource collision on counter: if packets available on both input FIFOs, each has different

destination, both output FIFOs are not full, and both packets are “interesting”

Resolve collisions in favor of packets from the first input FIFO

Must have maximum throughput: a packet must move if it can, modulo the above rules

Page 90: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

90Copyright © Bluespec Inc. 2006 Confidential and Proprietary

The meat of the BSV code

Dete

rmin

eQ

ueue

Dete

rmin

eQ

ueue

+1

Countcertain packets

module mkSmallSwitch (…); … (* descending_urgency = "r1, r2" *)

rule r1; // for packets from FIFO i1 let x = i1.first; let out = ((x[0] == 0) ? o1 : o2); i1.deq; out.enq (x); if (count(x)) c <= c + 1; endrule

rule r2; // for packets from FIFO i2 let x = i2.first; let out = ((x[0] == 0) ? o1 : o2); i2.deq; out.enq (x); if (count(x)) c <= c + 1; endruleendmodule: mkSmallSwitch

Page 91: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

91Copyright © Bluespec Inc. 2006 Confidential and Proprietary

CommentaryMuxing into output FIFOs, and control of those muxes, automatically generated

Automatic handling of FIFO emptiness, FIFO fullness This is part of BSV’s rule and interface method semantics

Impossible to read a junk value from an empty FIFO Impossible to enqueue into a full FIFO Impossible to race for multiple enqueues onto a FIFO

All control for resource sharing handled automatically Rule atomicity ensures consistency The “descending_urgency” attribute resolves collisions in favor of rule

r1

The BSV code directly expresses design intent without all the clutter of control and shared-resource mgmt generating efficient, correct-by-construction RTL

Page 92: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

92Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Managing change

Now imagine the following changes to the existing code: Some packets are multicast (go to both FIFOs) Some packets are dropped (go to no FIFO) More complex arbitration

FIFO collision: in favor of r1 Counter collision: in favor of r2 Fair scheduling

Several counters for several kinds of interesting packets Non-exclusive counters (e.g., IP packets include TCP packets) M input FIFOs, N output FIFOs (parameterized)

Suppose these changes are required 6 months after original coding

In BSV these are easy, because the source code remains uncluttered by all the

complex control and mux logic atomicity ensures correctness

Page 93: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

93Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Broad-brush differencesbetween BSV and RTL:

BSV is not simulation-centric

RTL and SystemC are simulation-centric“Synthesizable subsets” were defined laterMany concepts/constructs are a consequence of this SW-process-like simulation view. E.g.,

Execution of a process has a program-counter-like “locus of control” Variables have the semantics of updatable memory locations,

updated when “execution reaches this statement” Sensitivity lists “If execution reaches this statement, the wire is driven with the value

of the right-hand side” Functions/procedures get called, execute, and return (stack like

semantics)

None of these are particularly meaningful from a HW point of view: the tail (simulation) is wagging the dog (HW description)

BSV is not simulation-centric, and in these respects, BSV is closer to traditional HW view

Page 94: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

94Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Broad-brush differencesbetween BSV and RTL:

Datapaths and control paths

With BSV you don’t think separately about datapaths and control

Each Rule specifies the part of the datapath relevant for its behavior, and the control conditions under which the path is traversedThe Bluespec compiler combines these specifications to generate the final datapaths and control circuitry

No central datapath description No central “control FSM”

Page 95: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

95Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Interface abstraction

Page 96: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

96Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Interface abstractionExamples of BSV hierarchical and polymorphic interfaces (all synthesizable):

interface Put#(t); method Action put(t x);endinterface

interface Get#(t); … endinterface

interface Client#(reqType, respType); interface Get#(reqType) request; interface Put#(respType) response;endinterface

interface Server#(reqType, respType); interface Put#(reqType) request; interface Get#(respType) response;endinterface

interface DMA#(busReq, busResp) interface Client#(busReq, busResp) dataMover; interface Server#(busReq, busResp) config;endinterface

Page 97: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

97Copyright © Bluespec Inc. 2006 Confidential and Proprietary

client

Client/Server interfaces

Get/Put pairs are very common, and duals of each other, so the library defines Client/Server interface types for this purpose

interface Client #(req_t, resp_t); interface Get#(req_t) request; interface Put#(resp_t) response;endinterface

interface Server #(req_t, resp_t); interface Put#(req_t) request; interface Get#(resp_t) response;endinterface

data

read

y

enab

ledata

enable

ready

getserver

data

read

y

enab

ledata

enable

readyget put

put

req_t resp_t

Page 98: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

98Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Client/Server interfacesinterface CacheIfc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm;endinterface

module mkCache (CacheIfc); // from / to processor FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO;

// to / from memory FIFO#(Req_t) c2m <- mkFIFO; FIFO#(Resp_t) m2c <- mkFIFO;

… rules expressing cache logic …

interface ipc = fifosToServer (p2c, c2p);

interface icm = fifosToClient (c2m, m2c);endmodule

mkCache

getputserver

clientget put

getputserver

clientget put

mkMem

mkProcessor

Page 99: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

99Copyright © Bluespec Inc. 2006 Confidential and Proprietary

mkConnection

Using these interface facilities, assembling systems becomes very easy

interface CacheIfc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm;endinterface

module mkTopLevel (…) // instantiate subsystems Client #(Req_t, Resp_t) p <- mkProcessor; Cache_Ifc #(Req_t, Resp_t) c <- mkCache; Server #(Req_t, Resp_t) m <- mkMem;

// instantiate connects mkConnection (p, c.ipc); mkConnection (c.icm, m);endmodule

mkCache

getputserver (ipc)

client (icm)get put

getputserver

clientget put

mkMem

mkProcessor

Page 100: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

100Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 101: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

101Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Str

uctu

ral

Beh

avio

ral

Bluespec SystemVerilog™A one-slide overview

Rules and Interface Methods

For complex concurrency and control, across multiple shared resources, across module boundaries

Two dimensions raising the level of abstraction (fully synthesizable)

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

High-level abstract typesPowerful static checking

Powerful parameterizationPowerful static elaboration

Advanced clock management

Page 102: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

102Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Structural abstractions

The behavioral abstractions (Rules and Interface Methods), by themselves, tremendously improve productivity and correctness

A designer can be productive with Rules and Interface Methods after about a day of training

The structural abstractions (types, parameterization, static checking, elaboration) are an additional substantial multiplier

Page 103: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

103Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example:a butterfly switch (crossbar)

Basic building blocks:

Recursive construction: 1x1 2x2 4x4 … NxN

00

01

10

11

Page 104: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

104Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Butterfly switch: code excerpts

Polymorphic (type parameter t)Sub-interfaces (hierarchical)Aggregation (lists, vectors of interfaces)

interface XBar #(type t); interface List#(Put#(t)) input_ports; interface List#(Get#(t)) output_ports;endinterface

Page 105: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

105Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Butterfly switch: code excerpts

Size parameter: lognComb. circuit parameter: destinationOfModule parameter: mkMerge2x1

Encapsulates flow-control, arbitration, queueing behavior of the 2x1 merge

Interfaces instead of port lists: XBar#(t)Polymorphic: type parameter t

module mkXBar #(Integer logn, // param function Bit #(32) destinationOf (t x), // param module #(Merge2x1 #(t)) mkMerge2x1) // param (XBar #(t)) // interface …endmodule: mkXBar

Page 106: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

106Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Butterfly switch: code excerpts

Arbitrary elaboration (here: conditional, recursion, loop)

All constructs can be elaborated (first class modules, interfaces, rules, …)

module mkXBar #(Integer logn, …) if (logn == 0) … // BASE CASE FIFO#(t) f <- mkFIFO; … else … // RECURSIVE CASE XBar#(t) upper <- mkXBar (logn-1, …); XBar#(t) lower <- mkXBar (logn-1, …); … for (Integer j = 0; j < n; j = j + 1) … rule route; … if (! flip) merges [j] .iport0.put (x); else merges [jFlipped].iport1.put (x); endruleendmodule: mkXBar

Page 107: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

107Copyright © Bluespec Inc. 2006 Confidential and Proprietary

(see also whitepaper and/or demo for full code)

Summary:- Advanced parameterization- Recursive elaboration- The switch itself: < 60 lines of BSV code- First working (tested) prototype: < 1 day (including simple testbench)- Fully synthesizable:

- synthesized to netlist (Magma, tsmc0.18u, 500 MHz)

Butterfly switch

Page 108: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

108Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example:parameterized, pipelined, priority queue (P3Q)

enq: insertion point depends on “priority”

deq

Must be synthesizable to quality HWMust allow simultaneous (same clock) enq/deqMust be parameterized with:

Capacity of queue Item-type (data type of items being queued) Precise bit-representation of item-type Priority function (“item1 <= item2”) Pipelined (2-clock) or non-pipelined (1-clock) enq op

(to allow synthesis at range of clock speeds) Pipelining should not affect external enqdeq latency

Specs:

Page 109: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

109Copyright © Bluespec Inc. 2006 Confidential and Proprietary

P3Q in Bluespec SystemVerilog

Written, tested, synthesized in ~ 3 daysAbout 610 lines of understandable, well commented code

(~ 400 lines if ignore comments)

Synthesized at 400 MHz (Magma, TSMC 0.18u)(see white paper)

Compares very well with solutions in any other SW programming language or HDL!

Quote from expert commercial architect/designer who specified this problem: “I expect this to be a 10X improvement over what we do today”

Page 110: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

110Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 111: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

111Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Str

uctu

ral

Beh

avio

ral

Bluespec SystemVerilog™A one slide overview

Rules and Interface Methods

For complex concurrency and control, across multiple shared resources, across module boundaries

Two dimensions raising the level of abstraction (fully synthesizable)

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

High level abstract typesPowerful static checking

Powerful parameterizationPowerful static elaboration

Advanced clock management

Page 112: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

112Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Advanced clock management

Clock domains: Clock abstract type, with static checking of clock compatibility So, impossible to connect across clock domains without a

synchronizer Rich, user extensible library of synchronizers

Gated clocks, for power management Clock gating conditions contribute to Rule conditions So, impossible to communicate with a clock domain that is

gated “off”

Page 113: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

113Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Power management:Multiple clock domains

One of the most effective ways to control power consumption

Divide the design into “islands” or “domains” that use a common clocking discipline

Run each domain at the slowest clock speed that is adequate to meet performance specs

“Gate”-off clocks to domains that are currently not being used

E.g., digital camera circuits in a cell phone when the camera is not in use

Page 114: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

114Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Multiple clock domains:Typical design rules

Always use a “synchronizer” at domain boundaries Unless the two clocks only differ in gating (same underlying

“oscillator”)

Do not communicate with a gated-off domain But you may still need to read “most recent values” before the

clock was gated off

“Ignore” timing violations in synchronizers By definition they violate clock timing discipline “False paths” in synthesis constraints

Page 115: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

115Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Multiple clock domains: enforcingdesign rules in BSV

BSV treats Clock as a special abstract data type Distinguised from all other types Type-checking ensures that clocks never get mixed up with ordinary

signals For clock dividers, BSV provides only “trusted” primitives for deriving

the divided clock from an existing clock For clock generation, BSV provides only “trusted” primitives for

elevating an ordinary signal into a Clock

Clocks can be used in expressions, parameters, arguments, arrays, …; type-checking ensures safety

Clock c1;Clock c = (b ? c1 : c2);//b must be known at compile-time

Page 116: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

116Copyright © Bluespec Inc. 2006 Confidential and Proprietary

BSV provides primitives to associate a boolean signal with a Clock, as a gating signal

New gating signals are “ANDed” with existing gating signalsCompiler keeps track fact that c0, c1 and c2 differ only in gating signals (have a common oscillator)

c0, c1 and c2 are said to be “in the same clock family”

Multiple clock domains: enforcingdesign rules in the design language

Bool b1 = …;Bool b2 = …;Clock c1 <- mkGatedClock (b1, clocked_by c0);Clock c2 <- mkGatedClock (b2, clocked_by c1);

Page 117: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

117Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Multiple clock domains: enforcingdesign rules in the design language

When instantiating a module, can connect Clocks as usual Type-checking ensures that only a Clock signal can be connected to a

Clock port

Statically checked rules also ensure that each Rule based Interface Method of the instantiated

module is clocked with a unique Clock keeps track of which method is clocked by which Clock

IfcType ifc <- mkModule (…, c1, clocked_by c0)

… ifc.method_A () … // clocked by c1… ifc.method_B () … // clocked by c0

Page 118: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

118Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Multiple clock domains: enforcingdesign rules in the design language

In every Rule, type-checking ensures that all the methods used in the rule have a “compatible” clock (same clock family)

mod1.method1, mod2.method2 and mod3.method3 must have the same clock (or be in the same family)

If not, a static error is raised by the compiler If, e.g., mod1.method1 has a different clock, the designer must insert

a synchronizer module between mod1.method1 and its use in this rule, to resolve the incompatibility

rule foo (5 < mod1.method1()); let x = mod2.method2 (True); mod3.method3 (x, x+1);endrule

Page 119: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

119Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Multiple clock domains: enforcingdesign rules in the design language

In every Rule, clock gating conditions are “ANDed” with the rule condition

The rule will not execute if any of the clocks of any of the methods is gated off

Therefore, will not attempt to communicate with a method that is gated off

rule foo (5 < mod1.method1()); let x = mod2.method2 (True); mod3.method3 (x, x+1);endrule

Page 120: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

120Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Power management:Multiple clock domains — summary

Today’s SoCs have numerous clock domains: Different IP blocks run at different clock speeds For power management

Abstract types, type checking, and clock tracking can eliminate many of the common errors made by designers in managing multiple clock domains

Clean clocks: cannot accidently use a (possibly skewed) signal for a clock

Cannot accidently connect across clock domain boundaries of unrelated clocks without using a synchronizer

Cannot accidentally communicate with a module whose clock is currently gated off

Page 121: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example: USB 2.0 UTMI

USB Host

USB Device

USB2.0

USB 2.0TransceiverMacrocell

(UTMI)

SerialInterfaceEngine

DeviceSpecificLogic

Source: UTMI specification, version 1.05

USB PHY, includes:• Data serialization/

deserialization• Bit stuffing• Clock recovery and

synchronization- Including 480 Mbps serial mode

Page 122: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

Copyright © Bluespec Inc. 2006 Confidential and Proprietary

UTMI Implementation

USB 2.0TransceiverMacrocell

BSV Implementation

Seria

l Inte

rface

En

gin

e (S

IE)

(30 M

Hz)

Receiver (120 MHz)

Transmitter (120 MHz)

480 MHz Input Clocks (8)

16

16

ReceiveReceiveWord

Analo

g Fro

nt E

ndTransmit

WordTransmit

Physica

l Inte

rface

(480/1

2 M

Hz)

48

4 4

USB 2.012 MHzGenerated Clock

PhyOut

480 MHz Input Clock

Over-

sam

ple

r

13 Clock Domains!

Page 123: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

Copyright © Bluespec Inc. 2006 Confidential and Proprietary

UTMI implementation notes

Developed by one engineer in 3 monthsVerified with Cadence eVC testbenchTransmitter & receiver are separable componentsSynthesizes at 480 MHz in TSMC 0.18 using Magma with positive slack

Absolutely no runtime clock debugging!

Page 124: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

124Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Reuse

Page 125: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

125Copyright © Bluespec Inc. 2006 Confidential and Proprietary

About ReuseIP Reuse has traditionally been difficult because of

inflexibility: IP block can’t be “adjusted” for different application imprecision: Undocumented scheduling/protocol assumptions

All the language-based ideas we have discussed improve the situation:

Rules and Rule based Interface Methods Express complex concurrency across shared resources succinctly and

naturally Eliminate typical control-logic design errors, including race-conditions, by

automatically synthesizing the correct control logic Types, type-checking and clock-checking eliminate careless mistakes

by designers Polymorphism and parameterization allow defining generic IP blocks

that can be instantiated in widely differing contexts Full-power static elaboration allows very succinct expression of

regular structures, dramatically reducing code size, and eliminating tedium and careless mistakes in “cut-and-paste” manual replication

Page 126: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

126Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 127: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

127Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogfor Testbenches

Page 128: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

128Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification is still a bottleneck

TB complexity grows along with exploding complexity in DUTs Complex TB behaviors (simultaneous stimulus on multiple ports,

pipelining, out-of-order processing) Mixing new and old IPs in SoCs Inadequate facilities to construct libraries of common TB design

patterns

Inadequate interface semantics Complex data types Complex interface protocols Difficult to refine from TLM to Implementation Level

Limited Parameterization and therefore reuse of Verification IPs, Transactors, etc.

Bluespec’s strengths can remove these bottlenecks

Page 129: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

129Copyright © Bluespec Inc. 2006 Confidential and Proprietary

BSV improves verification:for the Testbench

Testbenches enjoy the same benefits: Express complex concurrency correctly with Rules

State-machine generation Succinct expression of stimulus patterns

Correct connection to DUT Interface Methods are naturally transactional

Interface abstraction allows high-level interfaces No interface timing errors Clock discipline

Reuse due to parameterization

Page 130: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

130Copyright © Bluespec Inc. 2006 Confidential and Proprietary

State machine generation

Easy to specify precise orchestration of stimulus sequencing, parallel, iteration

Same Rule semantics automatically flow-controlled, robust to latency variations, etc

// Specify an FSM generating a test seqenceStmt test_seq = seq for (i <= 0; i < NI; i <= i + 1) // each input for (j <= 0; j < NJ; j <= j + 1) begin //each output let pkt <- gen_packet (); send_packet (i, j, pkt); // test i-j path in isolation end par // test packet arbitration by sending packets in parallel send_packet (0, 1, pkt0); // to output 1 send_packet (1, 1, pkt1); // to output 1 (collision) endpar endseq// Generate the FSMmkAutoFSM (test_seq);

Page 131: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example:

An Ethernet MAC testbench is created that corresponds to an existing SV TB. The testbench is quickly extended to

create a switch for more real life testing at a fraction of the effort it would take to write and debug the original SV TB

Page 132: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

132Copyright © Bluespec Inc. 2006 Confidential and Proprietary

MAC Testbench Structure

MAC

PHY

Fra

me

So

urc

e

Sin

kM

II In

terf

ace

Inte

rrupts

DUT (MAC)Slave WB IFC M

II Interface

Master WB IFC

RAM

Slave WB IFC

SWEMSoftwareEmulator

Master WB IFC

Fra

me

So

urc

e

Sin

k

Test DUT Receiving PacketsTest DUT Transmitting Packets

Bluespec

Verilog95

Page 133: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

133Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Adding concurrency

Untimed Tb ~7000 lines of codeNo concurrency managementStand-alone checking

Timed Tb ~2600 lines of codeGeneralized Wishbone Model Includes infrastructure to handle concurrency

Router EnvironmentParameterizedVerification Environment With Concurrency

Original SV Tb

Generalized Switch

New Tb

Page 134: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

134Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Extended Example

Combine DUTs into router/switch Multiple DUTs Packet Routing across Wishbone bus Wishbone now includes round-robin arbiter.

Little additional code required Wishbone bus etc. already generalized Instantiate multiple DUTs Add Arbiter/Bank Add serialization code (Frame -> WB)

Original ~2583 relevant lines

Modified ~2957 relevant lines

Page 135: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

135Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Original Testbench Structure

MAC

PHY

Fra

me

So

urc

e

Sin

kM

II In

terf

ace

Inte

rrupts

DUT (MAC)

MII Interface

Master WB IFC

RAM

Slave WB IFC

Slave WB IFC

SWEMSoftwareEmulator

Master WB IFC

FrameSource

Sink

Bluespec

Verilog95

Page 136: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

136Copyright © Bluespec Inc. 2006 Confidential and Proprietary

MAC Extended Example (as Router)W

ishbone B

us

Arbiter

FrameSource

SinkWBSerializer

M/S

WB

IF

C

Current TBCurrent Tb

AddressBank

FrameSource

SinkWBSerializer

M/S

WB

IF

C

Current TBCurrent Tb

FrameSource

SinkWBSerializer

M/S

WB

IF

C

Current TBCurrent Tb

Bluespec

Verilog95

Page 137: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

137Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 138: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

138Copyright © Bluespec Inc. 2006 Confidential and Proprietary

BSV forSoC (System on a Chip) design

a.k.a.

ESL (Electronic System Level) design

Page 139: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

139Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Today’s chips: “SoC”s(System on a Chip)

“IP” blocks (“Intellectual Property”)

ProcessorsCaches, MemoriesInterconnectsDMAsOther peripheral blocksI/O blocks

E.g., cell phones, cell network base stations, TV set-top boxes, iPods, digital cameras, …

System Bus

Peripheral Bus

BusBridge

MemoryControllerProcessor

DMAController

DSP

PowerManagement

Arbitration

ApplicationSpecific

DRAMSRAM

L2Cache

Ctlr

SerialController

Audio VideoFlash/Mem

I/FBus

Controller

Page 140: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

140Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Design Issues

Complex tradeoffs in deciding architectures; need early HW architecture metrics:

Processor power, cache organization, bus and interconnect sizing, latencies, throughputs

Pipelined transactions, bursts, out-of-order processing

SW development needs to begin before HW is ready

Simulation speed (“boot the OS on the processor and run the video app thru the MPEG decoder HW IP block”)

Simulation speed inversely related to level of detail being simulated

Page 141: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

141Copyright © Bluespec Inc. 2006 Confidential and Proprietary

TLM: Transaction Level Models

TLM is a level of abstraction well above the hardware implementation level, based on “transactions” at module interfaces

E.g., “send an Ethernet packet”, “read a disk sector” Instead of:

“send a byte/word” Wait for RDY, assert DATA_IN, assert ENABLE

Advantages: Models can be built quickly Capture essential functionality and essential structure Provide an enviroment for early development of embedded

software Much faster simulation

Page 142: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

142Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Ideal: One consistent platformfor system exploration & design

Models

ImplementationImplementation Implementation

Architecture dimension

Abstraction/refinementdimension

TransactionModels

TransactionModels

TransactionModels

Page 143: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

143Copyright © Bluespec Inc. 2006 Confidential and Proprietary

BSV: single-language methodology

BSVTools allow embedding C code (for embedded SW, early modelling)

Interface methods are naturally “transactional”

Interfaces can express complex interactions

Rules are naturally “reactive”

Types, parameterization, abstraction comparable to C++

Rules make it easier to express complex concurrency (due to atomicity)

HW metrics available from the beginning, for architecture decisions

Good HW synthesis exists

Single language environment, with strong semantics to enable disciplined refinement, testbench reuse, etc.

HW Implementationin BSV

TransactionModel in BSV

(with embedded C)

refinement

Page 144: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

144Copyright © Bluespec Inc. 2006 Confidential and Proprietary

The importance ofrapid architecture exploration

Can you estimate the hardware size of an IP block, just by looking at the spec?

Let’s look at what happened in three actual design activities:

LPM (Longest Prefix Match) in Internet Packet Router MIPS processor 2-stage pipeline 802.11a transmitter

Page 145: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

145Copyright © Bluespec Inc. 2006 Confidential and Proprietary

18

2

3

IP address Result M Ref

7.13.7.3 F

10.18.201.5 F

7.14.7.2

5.13.7.2 E

10.18.200.7 C

A lookup table (sparse tree) for LPM

3

A…

A…

B

C…

C…

5 D

F…

F…

14

A…

A…

7

F…

F…

200

F…

F…

F*

E5.*.*.*

D10.18.200.5

C10.18.200.*

B7.14.7.3

A7.14.*.* F…F…

F

F…

E5

7

10

255

0

14

4A Real-world lookup algorithms are more complex but all make a sequence of dependent memory references.

Page 146: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

146Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Software version of LPM (in C)

intlpm (IPADDRESS ipa){

int p;

p = RAM [ipa >> 16]; // level 1 lookup (16b)if (isLeaf(p)) return p;

p = RAM [p + (ipa >> 8) & 0xFF]; // level 2 lookup (8b)if (isLeaf(p)) return p;

p = RAM [p + ipa & 0xFF]; // level 3 lookup (8b)return p;

}

Note: the C code says nothing about goodmicroarchitectures for HW implementation

Page 147: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

147Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Longest Prefix Match for IP lookup

Static pipeline

Inefficient memory usage but simple design

Linear pipeline

Efficient memory usage through memory port replicator

Circular pipeline

Efficient memory with most complex control

Designer’s Ranking:

1 2 3Which is “best”?

Arvind, Nikhil, Rosenband & Dave ICCAD 2004

Even for such a small function, 3 dramatically different architectures (no doubt many more possibilities)

Page 148: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

148Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Synthesis results

Microarchitecture is by far the most significant determinant of HW quality

Even for an apparently “fixed” microarchitecture, clever microarchitecture optimization can have a dramatic effect

(Static V, I vs Static V, II)

LPM versions Best Area(gates)

Best Speed(ns)

Static V, I 8898 3.60

Static V, II 2271 3.56

Static BSV 2391 (5% larger) 3.32 (7% faster)

Linear V 14759 4.7

Linear BSV 15910 (8% larger) 4.7 (same)

Circular V 8103 3.62

Circular BSV 8170 (1% larger) 3.67 (2% slower)

V = Verilog; BSV = Bluespec SystemVerilog, TSMC 0.18 µm

Page 149: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

149Copyright © Bluespec Inc. 2006 Confidential and Proprietary

(In)applicability of “behavioral synthesis”

Traditional “behavioral synthesis” has a hard time with this example (just 10 lines of C!)

Hard to analyze variable number of memory reads that are data-dependent on each other

Hard to interleave them to access a single shared resource (memory)

Designer creativity needed to improve “Static V I” from “Static V II” (clever sharing of state machine)

Designer creativity needed to come up with circular pipeline

Page 150: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

150Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Design Activity 2

MIT postgraduate course: 6.884 Complex Digital Systems, Spring 2005 (see http://csg.csail.mit.edu/6.884/index.html)

Lab task: design and synthesize a simple MIPS 2-stage processor pipeline

Can there really be much variation in this? The next slide shows the variation in HW quality across

the different lab project teams

Page 151: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

151Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Lab 2 Results

Pareto-Optimal Points

Source: http://csg.csail.mit.edu/6.884/lab2-results.html

Page 152: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

152Copyright © Bluespec Inc. 2006 Confidential and Proprietary

802.11a transmitter

Page 153: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

153Copyright © Bluespec Inc. 2006 Confidential and Proprietary

802.11a: What’s the optimal implementation for power, area,

performance?

802.11a Wi-Fi transmitter targeted at a wireless platformFinal design: 4 milliwatts

Source: Dave, Pellauer, Gerding & ArvindSource: Dave, Pellauer, Gerding & Arvind

PowerCharacterization

PowerCharacterization

RTL for NewMacro-/Micro-Architecture

RTL for NewMacro-/Micro-Architecture

Controller

Scrambler

Encoder

Interleaver Mapper

IFFTCyclicExtend

accounts for > 95% area

Page 154: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

154Copyright © Bluespec Inc. 2006 Confidential and Proprietary

IFFT:Micro-architectural exploration

in0

in1

in2

in3

in4

in59

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

radix4

Perm

utation 0

Perm

utation 1

Perm

utation 2

in63

in60

in61

in62

out0

out1

out2

out3

out4

out59

out63

out60

out61

out62

Sh

arin

g r

adix

4’s?

Folding stages?

Each stage’s 16 radix4 blocks could be also implemented with8, 4, 2 or 1 radix4 block(s) used over multiple cycles

Each stage is almost identical, why not fold and re-use what you can?

+

-

+

-

X

X

X

X

+

+

-

-

x[0]

t[0]

x[1]

t[1]

x[2]

t[2]

x[3]

t[3] *I

rots

tem

p

retv

Each of the 48 radix4 blocks looks like this

Page 155: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

155Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Superfolded circular pipeline:Just one Radix-4 node!

in0

in1

in2

in63

in3

in4

out0

out1

out2

out63

out3

out4

Radix 4

Perm

ute

_1Perm

ute

_2Perm

ute

_3

Stage Counter 0 to 2

Index Counter 0 to 15

64

, 4-w

ay

Muxes

4, 1

6-w

ay

Muxes

4, 1

6-w

ay

DeM

uxes

Designer intuition:

Most efficient design lowest power

Page 156: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

156Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Synchronous pipeline

rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2));endrule

xsReg1inQ

f1 f2 f3

sReg2 outQ

This is real IFFT code; just replace f1, f2 and f3 with stage_f code

Page 157: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

157Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Folded pipeline

x

sReginQ

rule folded-pipeline (True); if (stage==1) begin inQ.deq(); sxIn= inQ.first(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==3) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==3)? 1 : stage+1;endrule

f

outQstage

f1

f2

f3

function f (stage,sx); case (stage) 1: return f1(sx); 2: return f2(sx); 3: return f3(sx); endcaseendfunction

This is real IFFT code too ...

Page 158: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

158Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Performance results

7 combinations created and explored within 5 days

Designers were astounded to find that their intuitions were

wrong and that the critical areas for reducing power were not

where they suspected

802.11a Design(by IFFT block type)

Area(um^2)

Symbol Latency(cycles)

Throughput(clks/

symbol)

Min frequency required (MHz)

Average Power(mW)

Combinational 4.91 10 4 1.0 3.99Pipelined 5.25 12 4 1.0 4.92

Folded - 16 radix4 3.97 12 4 1.0 7.27Folded - 8 radix4 3.69 15 6 1.5 10.9Folded - 4 radix4 2.45 21 12 3.0 14.4Folded - 2 radix4 1.84 33 24 6.0 21.1Folded - 1 radix4 1.52 57 48 12.0 34.6 Original designer

intuitionOriginal designerintuition

Optimal powerOptimal power

Page 159: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

159Copyright © Bluespec Inc. 2006 Confidential and Proprietary

BSV Advantages

It is essential to do architectural exploration for better (area, power, performance, ...) designs

Bluespec enables rapid architectural exploration

Fast, low-effort, low-risk changes enable: Rapid architectural/micro-architectural exploration and optimization Nimble responses to:

Feature/spec changes Timing closure challenges Bug fixes Area optimizations

Page 160: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

160Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Architecture exploration: summary

Despite the self-image of many experienced engineers, there is a wide margin of error in estimating size of IP blocks without actually prototyping them (working out microarchitectures)

A bad estimate will leave you stuck with a sub-optimal design

So, Transaction Level Modeling, and their quick refinement to realistic hardware, are essential for accurate evaluation of candidate architectures

Essential to have a design language that supports this High levels of abstraction High levels of static checking and elaboration Synthesis from high level into quality hardware

Page 161: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

161Copyright © Bluespec Inc. 2006 Confidential and Proprietary

BSV for architectural exploration

Rules and Interface Methods are “transactional” in nature

Can be written at very high level(in addition to the microarchitectural level)

E.g., module interconnection using highly parameterized Get/Put interfaces

From complete packets to bits Similar to SystemC TLM

Clear semantics for splitting, joining, adding, removing Rich theory developed over many decades in Computer Science Enables disciplined refinement

Page 162: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

162Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Methods and Transaction Level Modeling

Each method can be read as a transaction that can be applied against a module

By just changing the level of abstraction of the arguments and results, we can move from realistic hardware to high-level models, using the single paradigm of methods

Get#(Bit#(16)) m <- mkM;Put#(Bit#(16)) n <- mkN;

rule r1 (…); Bit#(16) x <- m.get(); n.put (x);endrule

Get#(EtherPacket) m <- mkM;Put#(EtherPacket) n <- mkN;

rule r1 (…); EtherPacket x <- m.get(); n.put (x);endrule

Page 163: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example:

A reference platform is created to test a device driver for a hard disk microdrive. The reference platform allows either

the device driver or the hardware model to be swapped out for the actual implementation. The model is instrumented

with Assertions for monitoring transactions.

Page 164: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

164Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Creating a Reference Platform

RoS (Rest of System)

DD (Device Driver)

HW (IDE disk)

Monitor

RoS periodically initiatesa disk sector R/W transfer,and continues concurrentactivity (non-blocking)

Converts sector transfer requestsinto IDE protocol consisting ofIDE register R/Ws, andresponding to IDE HW interrupts

Models IDE registercommand block and sectordata buffer, and behaviorin response to IDE commandswritten into IDE commandregister

Interrupts

Callbacks

Monitors allinter-block traffic,checks forimmediate andtemporalcorrectnessconditions

Handle callbacksasynchronously

Data forPIO regreads

Disk sector R/W requests

IDE register R/Ws

System

Page 165: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

165Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Using the reference platform,replacing DD with real C code

RoS (Rest of System)

DD (Device Driver)

HW (IDE disk)

Monitor

SystemCosimIn SystemC simulatorAll written in C/C++/SystemC

In BluesimAll written inBSV (samereferencemodel code)

Interface communicationshim code automaticallygenerated by Bluespeccompiler

Communicationsare simplefunction calls

DD nowwritten in C

Other C/C++/SystemC code

Understanding IDE: 2 weeks Coding and Verification: 5 days Integrating “C” Driver: 3 days

Page 166: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

166Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Example:Amba AHB bus system,from transactional levelto implementation level

Page 167: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

167Copyright © Bluespec Inc. 2006 Confidential and Proprietary

E.g., Amba AHB bus:transactional level (get/put)

Bus master-side transactional interface

Bus slave-side transactional interface

Slave transactional interface

Master transactional interface

Slave Block

Direct transactional interconnect(for faster simulation)

Master Block

Bus master-side transactional interface

Bus slave-side transactional interface

Slave transactional interface

Master transactional interface

Slave Block

Master Block

Page 168: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

168Copyright © Bluespec Inc. 2006 Confidential and Proprietary

E.g., Amba AHB bus: mixed transactional/implementation levels

Bus master-side interface

Master interface

Bus slave-side interface

Slave interface

Bus master-side transactional interface

Bus master-side interface

Bus slave-side interface

Bus slave-side transactional interface

Master interface

Slave interface

Slave transactional interface

Master transactional interface

Master Block

Slave Block

Slave Block

AHB

adapter

Master Block

adapter

Page 169: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

169Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Amba AHB bus:Implementation level

Bus master-side interface

Master interface

Bus slave-side interface

Slave interface

Master Block

Slave Block

AHBBus master-side interface

Master interface

Bus slave-side interface

Slave interface

Master Block

Slave Block

Page 170: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

170Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 171: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

171Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Many proof points demonstrating- General applicability,- Productivity- HW quality

Page 172: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

172Copyright © Bluespec Inc. 2006 Confidential and Proprietary

System Bus

Peripheral Bus

BusBridge

MemoryControllerProcessor

DMAController

DSP

PowerManagement

Arbitration

ApplicationSpecific

DRAMSRAM

L2Cache

Ctlr

SerialController

Audio VideoFlash/Mem

I/FBus

Controller

Algorithms(e.g.

DSP/math)

Algorithms(e.g.

DSP/math)

ComplexDatapaths

(e.g.processor/controller)

ComplexDatapaths

(e.g.processor/controller)

ControlControlAlgorithms

(e.g. DSP/math)

Algorithms(e.g.

DSP/math)

ComplexDatapaths

(e.g.processor/controller)

ComplexDatapaths

(e.g.processor/controller)

ControlControl

System Bus

Peripheral Bus

BusBridge

MemoryControllerProcessor

DMAController

DSP

PowerManagement

Arbitration

ApplicationSpecific

DRAMSRAM

L2Cache

SerialController

Audio VideoFlash/Mem

I/FBus

Controller

“RISC” processorMIPSItaniumPowerPCARM

L2 cache ctlr

Bus converters

AMBA DMA ctlr

802.11aNetwork procQueuing enginesSorting queueArbiterIP lookupDebug controller

PCI ExpressUSB

Pixel processorWaveform generatorPong

IDCTMotion compensatorDESMPEG-4IFFT

DDR2 ctlrSRAM ctlr

FIR filter

Bluespec is the only next generation solution that addresses control and

complex datapaths

Everyone else only addresses this

application space

I2CPCI-X

OCPinterconnect

Bluespec has been used for every design listed:

Designs with Bluespec

Page 173: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

173Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec vs. Hand-coded RTL

Better Same

Bluespec vs. RTL (Area Optimized)

-12345678910

10%+(smaller)

5 to 10%(smaller)

0.5 to 5%(smaller)

-0.5 to0.5%

-0.5 to -5%(larger)

-5 to -10%(larger)

-10%-(larger)

Bluespec Area Relative to Hand-Designed RTL

Nu

mb

er o

f T

est

Cas

es

7 Designs 18 Designs

Bluespec vs. RTL (Speed Optimized)

-

2

4

6

8

10

12

14

16

10%+(faster)

5 to 10%(faster)

0.5 to 5%(faster)

-0.5 to0.5%

-0.5 to -5%(slower)

-5 to -10%(slower)

-10%-(slower)

Bluespec Speed Relative to Hand-Designed RTL

Nu

mb

er

of

Te

st

Ca

se

s

5Designs 20 Designs

Hand-coded RTL (area)

Bluespec RTL (area)

Hand-coded RTL (time)

Bluespec RTL (time)

1 Gray code converter 9 9 1.56 1.56

2 Priority encoder 21 21 2 2

3 Parity checker 23 23 3.94 3.94

4 Read/write FSM 20 20 3.93 3.93

5 Barrel shifter 34 34 1.65 1.65

6 Speed FSM 33 33 4.86 4.75

7 Ripple adder 86 86 10.98 10.98

8 Angular FSM 66 63 4.97 4.99

9 One-hot encoded FSM 99 116 5.76 5.97

10 Pattern detecter 67 67 5.98 5.74

11 Wallace multiplier 142 141 9.6 9.91

12 Handshake protocol 109 112 5.99 5.98

13 Traffic light controller 215 211 9.93 10

14 Rotors controller 259 249 8.93 9

15 Sequential multiplier 340 361 8.93 8.93

16 Shift adder 391 399 10 9.85

17 Three-way roundrobin 399 385 9.73 9.06

18 Divider 350 347 27.65 27.65

19 Cache coherence 352 382 10.36 10.5

20 Booth multiplier 974 822 14.91 14.92

21 Fibonacci 914 877 14.45 13.9

22 LIFO 1764 1850 7.99 7.99

23 FIFO1 1926 2018 14.98 14.2

24 Factorial 1611 1605 34.97 34.95

25 Random number generator 9278 8947 79.51 79.88

Totals 19482 19178 313.56 312.23

Area Optimized Speed Optimized

Test Case

Page 174: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

174Copyright © Bluespec Inc. 2006 Confidential and Proprietary

IDCT design results

Verilog BluespecSystemVerilog

RTL coding & unit verification 2.5 man-weeks 1.3 man-weeks

Top level verification 1.5 man-weeks 1.2 man-weeks

Total effort 4 man-weeks 2.5 man-weeks

Lines of code 2716 723

Latency (IO) in clock cycles 172 171

Gate count (2-input NAND; excluding memory)

52K 48K

Page 175: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

175Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Itanium: IA64 in BluespecWunderlich & Hoe

Roland WunderlichRoland Wunderlich 33

Roland WunderlichRoland Wunderlich 77

Platform CapabilitiesPlatform Capabilities

High speed execution of the Bluespec model, High speed execution of the Bluespec model, runs at 100 MHz, 4 orders of magnitude faster runs at 100 MHz, 4 orders of magnitude faster than than ModelSimModelSim

Full access to the FSB, allowing 800 MB/s cache Full access to the FSB, allowing 800 MB/s cache line reads and writes, plus a control channel to line reads and writes, plus a control channel to the Pentium III processor via mapped I/Othe Pentium III processor via mapped I/O

Large FPGA resources, the current design Large FPGA resources, the current design occupies less than 30% of the FPGA resourcesoccupies less than 30% of the FPGA resources

Roland WunderlichRoland Wunderlich 55

Memory

Branch

Integer×3

Pipe. Control

Fetch Decode Disperse

Stack Read Execute Write

Stack Read Execute

Stack Read Execute Memory Write

Instr. Cache

FSB Control Data CacheUnified L2

Branch Pred.

Register Set

Write

Stack

Bypass

IPF Microarchitecture ModelIPF Microarchitecture Model

The first model was developed in a few months by one student!

Page 176: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

176Copyright © Bluespec Inc. 2006 Confidential and Proprietary

… and numerous other examples

Validated by customer experience

50% less time (or better) to verified, synthesized design

Even with no prior knowledge of BSV

Area and time of synthesized design matched previous implementations done in Verilog/VHDL

Up to multi-million gate designs

Page 177: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

177Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 178: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

178Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Tools and tool flow

Page 179: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

179Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Tools and flow

Bluespec SystemVerilog source

Verilog 95 RTL

Verilog sim

VCD output

Visualization(e.g., Debussy)

Bluespec Synthesis

files

Bluespec tools

3rd party tools

Legend

RTL synthesis

gates

Bluesim CycleAccurate

Blueview(plus other

Verilog/VHDL)

Page 180: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

180Copyright © Bluespec Inc. 2006 Confidential and Proprietary

SOURCE RTL

Waves

Interactive Cross-Probingbetween Views (source, RTL, Novas Debussy/Verdi)

Page 181: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

181Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 182: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

182Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Concurrency Semantics of Rules and Rule-based Interface Methods are also available in SystemC

Page 183: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

183Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Why integrate SystemC with Rulesand Rule-based Interface Methods?

Improve SystemC’s concurrency model Atomic transactions vs. threads and events Rule semantics across module boundaries

Provide a path to high-level synthesis for control logic and complex datapaths

Enable use of same model for embedded software development and hardware exploration and hardware implementation

Page 184: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

184Copyright © Bluespec Inc. 2006 Confidential and Proprietary

coreSystemC

Standard SystemC tools(gcc, OSCI sim, gdb, …)

+TLM

coreSystemC

classdefs/libs

TLMclass

defs/libs

Ruleclass

defs/libs

Bluespec Synthesizable subsetRefinement+Rules

Bluespecsynthesis tool

RTL

Bluesim

Standard synthesisback-end tools

HW

other Bluespec

tools

+TLM

TLMclass

defs/libs

+TLM

TLMclass

defs/libs

Page 185: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

185Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Components

Additional classes and macros (esl.h) Defines Bluespec Modules, Rules, Methods, Interfaces, etc

ESL Analyzer (“esepp”) Parses Modules, Rules, Methods Generates code to call elaborator with callback registrations, etc. Generated code is compiled and linked with the rest of the system Cannot be done with cpp Original modules are not changed by the analyzer and can be

compiled directly by gcc, but must be linked with ESEPP-generated code

Run-time system (libesepro.a) Elaborator

Determines priorities and scheduling ordering of rules and methods, executed

Run-time scheduler “Fires” rules on every clock cycle

Page 186: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

186Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Components/flow

gcc

systemc.h

simulation executable

esl.h

dut.cpp

esepp

#include

dut.epp

libsystemc.a esepro.a

Standard SystemC flow Rule classes

Page 187: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

187Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Bluespec SystemVerilogAgenda: Technical Deep Dive

Intro: why an HDL can affect overall productivity, from concept to siliconBehavior:

Rules: a new way to express HW behavior Correctness: why rules help Comparison with behavioral synthesis Rule-based Interface Methods: modularizing rules

Structure: improving the expression of HW structure using ideas from advanced programming languagesClock domains and gated clocks: compiler-guaranteed safetyTestbenches using BSVTransaction Level Modeling/architecture exploration and refinement, within a single paradigm

Comparison with SystemCSynthesis quality: as good as hand-coded RTLTool flows

Coexistence with Verilog/VHDL/SV/SystemCFutures:

Integration of Rules and Rule-based Interfaces into SystemC Formal verification

Page 188: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

188Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Future: formal verification

Page 189: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

189Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification: Formal Methods — why?

So far, Verification = Testing (by simulation) Even the current use of assertions (PSL, SVA) is only

a testing strategy

Unfortunately, the size (# state elements) of todays chips makes it increasingly difficult/ impossible to cover the state space by testing

Page 190: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

190Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification: Formal MethodsApproach 1

Use theorem-proving and other methods to prove assertions (rather than just testing assertions during simulation)

Assertions can be written using PSL, SVA, …

Advantage: coverage (assertion is always true, not just for a particular set of test cases)

Caveat: verification can only be as good as the set of assertions being verified!

Do the set of assertions completely specify the design?

Page 191: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

191Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification: Formal MethodsApproach 2

Prove the equivalence of a simple reference model with the implementation

E.g., for a processor design: Reference model: one instruction at a time, no pipelining, no

speculation, no cacheing Implementation: full implementation details

Proof method: Define a correspondence between each state in the reference

model and a state in the implementation For each state change in reference model, show that the

implementation moves between corresponding states

Page 192: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

192Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification: Formal Methods

Some references on formal verification using Rule semantics

Parallel Program Design: A Foundation, K. Mani Chandy and Jayadev Misra, Addison Wesley, 1988

UNITY programming language for concurrent, reactive systems

Using Term Rewriting Systems to Design and Verify Processors, Arvind and Xiaowei Shen, IEEE Micro 19:3, 1998, p36-46Cache Coherence Verification with TLA+, H. Akhiani, Doligez D., Harter, P., Lamport L., Scheid J., Tuttle M. and Yu Y., Proc. World Congress on Formal Methods in the Development of Computing Systems-Volume II, p.1871-1872, September 20-24, 1999Proofs of Correctness of Cache-Coherence Protocols, Stoy et al, in Formal Methods for Increasing Software Productivity, Berlin, Germany, 2001, Springer-Verlag LNCS 2021Superscalar Processors via Automatic Microarchitecture Transformation, Mieszko Lis, Masters thesis, Dept. of Electrical Eng. and Computer Science, MIT, 2000

Page 193: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

193Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Verification: Formal MethodsSummary

Formal methods in verification are not yet in widespread use

Many companies have started using formal methods on an experimental basis

These methods will beome increasingly important as chip complexity increases

Design languages with strong formal semantics will improve the likelihood of success

Page 194: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

194Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Summary and wrapup

Page 195: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

195Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Str

uctu

ral

Beh

avio

ral

Bluespec SystemVerilog™A one slide overview

Rules and Interface Methods

For complex concurrency and control, across multiple shared resources, across module boundaries

Two dimensions raising the level of abstraction (fully synthesizable)

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

High-level abstract typesPowerful static checking

Powerful parameterizationPowerful static elaboration

Advanced clock management

Page 196: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

196Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Summary

Bluespec is using ideas from advanced programming languages:

Behavior: Rule-based systems, atomic transactions, correctness using

invariants, modularity, achieving performance systematically via Rule-composition semantics, …

Structural correctness, abstraction and elaboration Complex types, abstract types, polymorphism (type

parameterization), systematic overloading Orthogonality (parameterization over all semantically meaningful

concepts, including pieces of behavior) Full programming power for structural descriptions

... to tackle the complexities of modern chip design Both individual HW blocks, and SoCs

Page 197: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

197Copyright © Bluespec Inc. 2006 Confidential and Proprietary

Fully synthesizable – without compromise!

Bluespec: Better Design Accelerates Everything!

Architecture

Design

Verification and Test

Physical Design

More architectural flexibility during

design

50% reduction in errors, faster

correction

50% reduction from design to verified

netlist

Architectural exploration

Early executable models

Early executable models

Better reuse

Faster fixes, to achieve closure

Page 198: Copyright © Bluespec Inc. 2006 Confidential and Proprietary From ESL to Implementation: Reinventing Hardware Design using Bluespec SystemVerilog™ © 2006,

198Copyright © Bluespec Inc. 2006 Confidential and Proprietary

End

Thank you for your attention!