synthesis of transaction-level models to fpgas prof. jason cong yiping fan, guoling han, wei jiang,...

29
Synthesis of Transaction-Level Models to FPGA Synthesis of Transaction-Level Models to FPGA s s Prof. Jason Cong Prof. Jason Cong Yiping Fan, Guoling Han, Wei Jia Yiping Fan, Guoling Han, Wei Jia ng, Zhiru Zhang ng, Zhiru Zhang VLSI CAD Lab VLSI CAD Lab Computer Science Department Computer Science Department University of California, Los Angeles University of California, Los Angeles

Post on 21-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Synthesis of Transaction-Level Models to FPGAsSynthesis of Transaction-Level Models to FPGAs

Prof. Jason CongProf. Jason Cong

Yiping Fan, Guoling Han, Wei Jiang, Zhiru ZhangYiping Fan, Guoling Han, Wei Jiang, Zhiru ZhangVLSI CAD LabVLSI CAD Lab

Computer Science DepartmentComputer Science Department

University of California, Los AngelesUniversity of California, Los Angeles

OutlineOutline

Transaction-level model (TLM)Transaction-level model (TLM) SystemC TLMSystemC TLM

Metropolis Meta ModelMetropolis Meta Model

Synthesis from TLMSynthesis from TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach

xPilot: Ongoing synthesis infrastructure for TLMxPilot: Ongoing synthesis infrastructure for TLM

OutlineOutline

Transaction-level model (TLM)Transaction-level model (TLM) SystemC TLMSystemC TLM

Metropolis Meta ModelMetropolis Meta Model

Synthesis from TLMSynthesis from TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach

xPilot: Ongoing synthesis infrastructure for TLMxPilot: Ongoing synthesis infrastructure for TLM

SystemC FrameworkSystemC Framework

SystemC historySystemC history OO system/HW modeling OO system/HW modeling

and simulationand simulation SystemC under development SystemC under development

by CAD vendors/researchersby CAD vendors/researchers• SynopsysSynopsys• Frontier DesignFrontier Design• CoWare (Belgium)CoWare (Belgium)

Released to public Sept. ‘99Released to public Sept. ‘99• Open source distribution Open source distribution

@ @ www.systemc.orgwww.systemc.org• Version 2 out July ‘01Version 2 out July ‘01

Channels and ModulesChannels and Modules

Basic building blocks:Basic building blocks: ModuleModule (class) instances, communicating via (class) instances, communicating via channelchannel (class) instances (class) instances Modules’ functionality coded as concurrent Modules’ functionality coded as concurrent processesprocesses

• Processes communicate via channels or Processes communicate via channels or eventsevents

Communication Modeling in SystemC Communication Modeling in SystemC

Primitive Channels in SystemC LibraryPrimitive Channels in SystemC Library Ordinary signal (wire) of type <T>Ordinary signal (wire) of type <T>

Fill in data type T when instantiatedFill in data type T when instantiated Point-to-point or multi-point (1 writer, n readers)Point-to-point or multi-point (1 writer, n readers)

Signal bus (arbitrary width)Signal bus (arbitrary width) FIFO, for producer/consumer connectionFIFO, for producer/consumer connection Pseudo-channelsPseudo-channels

Mutex & semaphore, for interprocess syncMutex & semaphore, for interprocess sync Accessed using channel syntaxAccessed using channel syntax

Complex “hierarchical” channels composed of primitive channels, Complex “hierarchical” channels composed of primitive channels, processes, modulesprocesses, modules

Events and ProcessesEvents and Processes Events: abstract occurrences used forEvents: abstract occurrences used for

Process triggering (like VHDL sensitivity list)Process triggering (like VHDL sensitivity list) Channel communicationChannel communication Interprocess synchronizationInterprocess synchronization

Process can call wait() to block on eventProcess can call wait() to block on event Event occurrence tells simulator to schedule simulation of relevant processEvent occurrence tells simulator to schedule simulation of relevant process Processes execution Processes execution

NotNot called directly from your code called directly from your code Triggered for simulation by events on ports, channels, or explicit named eventsTriggered for simulation by events on ports, channels, or explicit named events Registered in constructor of enclosing module (associate method with events)Registered in constructor of enclosing module (associate method with events)

Thread process → infinite loopThread process → infinite loop Must call wait() to lose controlMust call wait() to lose control

Method process → runs to completionMethod process → runs to completion Less scheduling overheadLess scheduling overhead

Data Types in SystemCData Types in SystemC SystemC supports

Native C/C++ Types SystemC Types

SystemC Types Data type for system modeling 2 value (‘0’,’1’) logic/logic vector 4 value (‘0’,’1’,’Z’,’X’) logic/logic vector Arbitrary sized integer (Signed/Unsigned) Fixed Point types (Templated/Untemplated)

Objective: Objective: to reflect HW registers & ALU operationsto reflect HW registers & ALU operations

Functional Level and RTL Modeling in SystemCFunctional Level and RTL Modeling in SystemC Functional levelFunctional level

Sequential, algorithmic, software-likeSequential, algorithmic, software-like

Explore HW/SW architectures, proof of algorithms, performance modeling & Explore HW/SW architectures, proof of algorithms, performance modeling & analysisanalysis

Register transfer level Register transfer level Complete Complete detailed functional descriptiondetailed functional description of hardware of hardware

• Every register, bus, bit for every clock cycleEvery register, bus, bit for every clock cycle• Use C++ switch/case for FSM implementationUse C++ switch/case for FSM implementation

At this point, can switch to HDL, but staying in SystemC leverages test At this point, can switch to HDL, but staying in SystemC leverages test benchesbenches

Prepare for HW synthesis step by using only synthesizable constructsPrepare for HW synthesis step by using only synthesizable constructs

Transaction Level Modeling in SystemCTransaction Level Modeling in SystemC

Transaction level Transaction level Model includes architectural componentsModel includes architectural components

Maintain component interface accuracyMaintain component interface accuracy• E.g., buses modeled as channels (read/write operations)E.g., buses modeled as channels (read/write operations)

Behavioral style inside a componentBehavioral style inside a component

Simulates 100-10,000x faster than RTLSimulates 100-10,000x faster than RTL

Provide execution platform for SW developmentProvide execution platform for SW development

TLM – Raise the Level of Architectural ModelingTLM – Raise the Level of Architectural Modeling

What is TLM? Communication uses function calls

• burst_read(char* buf, int addr, int len);

Why is TLM interesting? Simulation: Fast and compact

Integrate HW and SW models

Early platform for SW development

Early system exploration and verification

Verification reuse

Synthesis …

Reference: www.systemc.org

Typical Design Flow Using TLMTypical Design Flow Using TLM

Functional modelFunctional model Captures system Captures system

behaviourbehaviour

TLM, Transaction Level TLM, Transaction Level ModelModel Bus transactionsBus transactions Accurate interaction Accurate interaction

with SW portionwith SW portion Simulates rapidlySimulates rapidly

Can create TLM model Can create TLM model initiallyinitially

Introduction of MetropolisIntroduction of Metropolis A UCB and GSRC project, A UCB and GSRC project, http://www.gigascale.org/metropolis/http://www.gigascale.org/metropolis/

Platform-based design [ASV]Platform-based design [ASV] Platforms have sufficient flexibility to support a series of applications/products Platforms have sufficient flexibility to support a series of applications/products

Choose a platform by design space exploration Choose a platform by design space exploration

Above two require models to be reusableAbove two require models to be reusable

Orthogonalization of concernsOrthogonalization of concerns Computation vs. CommunicationComputation vs. Communication

Behavior vs. CoordinationBehavior vs. Coordination

Behavior vs. ArchitectureBehavior vs. Architecture

Capability vs. CostCapability vs. Cost

Metropolis Meta ModelMetropolis Meta Model A combination of imperative program and declarative constraintsA combination of imperative program and declarative constraints

Imperative program:Imperative program: objects (process, media, quantity, statemedia)objects (process, media, quantity, statemedia)

netlistnetlist

awaitawait

block and label block and label

interface function call interface function call

quantity annotationquantity annotation

Declarative constraintsDeclarative constraints Linear Temporal Logic (LTL)Linear Temporal Logic (LTL)

(synch)(synch)

Logic of Constraints (LOC)Logic of Constraints (LOC)

A Metropolis Design TutorialA Metropolis Design Tutorial

MyFncNetlist

MP1 P2

Env1 Env2

MyMapNetlist

A Metropolis Design TutorialA Metropolis Design TutorialMyMapNetlist

MyFncNetlist

MP1 P2

Env1 Env2

Y2T

write()Th,Wk

T2Y

read() Bus

ArbiterBus

Mem

Cpu OsSched

MyArchNetlist

mP1 mP2mP1 mP2

B(P1, M.write) <=> B(mP1, mP1.writeCpu); E(P1, M.write) <=> E(mP1, mP1.writeCpu);

B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, mP1.mapf);

B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, M.read) <=> E(mP2, mP2.readCpu);

B(P2, P2.f) <=> B(mP2, mP2.mapf); E(P2, P2.f) <=> E(mP2, mP2.mapf);

Bus

ArbiterBus

Mem

Cpu OsSched

MyArchNetlist…

……

Outlook of the First Metropolis ReleaseOutlook of the First Metropolis Release

Meta model infrastructure

SPIN interface

LOC checking

Front end

Meta model language

SystemC simulation

Back end1

Abstract syntax trees

Back end2 Back endNBack end3

Meta model debugger

Sample architectural libraries:

• coarse-simple cpu, bus, memory, arbiters

• time quantity

Sample MoC:

• multi-media (Yapi, TTL)

• Synchronous

A design tutorial

http://www.gigascale.org/metropolis/http://www.gigascale.org/metropolis/

TLM ConclusionsTLM Conclusions SystemC is the defacto system-level-design standard SystemC is the defacto system-level-design standard

Pushed by many CAD tool vendorsPushed by many CAD tool vendors Used widely in industry and academia Used widely in industry and academia

• E.g., Intel handhold system project [ICCAD’04]E.g., Intel handhold system project [ICCAD’04] Unified language to model a system in different levelsUnified language to model a system in different levels Improving path to HW synthesis from SystemC source codeImproving path to HW synthesis from SystemC source code Fits with trend to take system design to higher levelFits with trend to take system design to higher level

Metropolis is a novel academic framework of model of Metropolis is a novel academic framework of model of computationcomputation Capable of representing TLM as wellCapable of representing TLM as well Provides a comprehensive starting point of synthesisProvides a comprehensive starting point of synthesis

OutlineOutline

Transaction-level model (TLM)Transaction-level model (TLM) SystemC TLMSystemC TLM

Metropolis Meta ModelMetropolis Meta Model

Synthesis from TLMSynthesis from TLM xPilot: our ongoing synthesis infrastructure for TLMxPilot: our ongoing synthesis infrastructure for TLM

RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach

xPilot: TLM to RTL Synthesis Flow xPilot: TLM to RTL Synthesis Flow

TLM in TLM in SystemC/MetropolisSystemC/Metropolis

RTLRTL

SSDMSSDMSSDMSSDM

Arch-generation passes: RTL/constraints geneArch-generation passes: RTL/constraints generationration Verilog/VHDL/SystemCVerilog/VHDL/SystemC Altera/XilinxAltera/Xilinx General/Synopsys/Magma …General/Synopsys/Magma …

Arch-dependent passesArch-dependent passes Memory analysis/allocationMemory analysis/allocation Scheduling/Binding/Memory analysis/allocationScheduling/Binding/Memory analysis/allocation Register/port bindingRegister/port binding Traditional/Low power/RDR-pipe or Placement Traditional/Low power/RDR-pipe or Placement

driven …driven …

Arch-Independent passesArch-Independent passes SSDM CheckingSSDM Checking Loop unrolling/pipeliningLoop unrolling/pipelining Strength reduction/Bitwidth analysisStrength reduction/Bitwidth analysis Speculative-execution transformation …Speculative-execution transformation …

FPGAsFPGAsFPGAsFPGAs

FrontendFrontendFrontendFrontend

Integration xPilot with MetropolisIntegration xPilot with Metropolis

Meta model infrastructure

Front end

Meta model language

SystemC Simulation

Abstract syntax trees

LOC Checking SPIN Interface Synthesis

HW Implementation

FPGA ASICS …

IP AssemblyPredictable RTL Synthesis

RTLTiming

ConstraintsPhysical

Constraints

RTL Handoff

Latency Latency Insensitive DesignInsensitive Design

GALSGALSRDR/MCASRDR/MCAS

IP Library

HW implementation

Compilation for RP

Simulation

Extended Instruction

ReconfigurableInterconnect

ReconfigurableCoprocessor

xPilot/SSDM

SSDM Zoomed In – CDFG SSDM Zoomed In – CDFG

if (cond1) bb1();if (cond1) bb1();

else bb2();else bb2();

bb3();bb3();

switch (test1) {switch (test1) {

case c1: bb4(); break;case c1: bb4(); break;

case c2: bb5(); break;case c2: bb5(); break;

case c3: bb6(); break;case c3: bb6(); break;

}}

bb7()bb7()

cond1 bb1()

bb2()

bb3()

bb4()

test1

bb5() bb6()

T

F

c1

c2

c3

bb7()

2-level CDFG representation2-level CDFG representation 11stst level: control flow graph level: control flow graph 22ndnd level: data flow graph level: data flow graph

SSDM Features Different from Software IRSSDM Features Different from Software IR Top-level: netlist of concurrent processes Top-level: netlist of concurrent processes

Process port/interface semanticsProcess port/interface semantics FIFO: FifoRead() / FifoWrite()FIFO: FifoRead() / FifoWrite()

BUFF: BuffRead() / BuffWrite()BUFF: BuffRead() / BuffWrite()

Memory: MemRead() / MemWrite()Memory: MemRead() / MemWrite()

Bit vector manipulationBit vector manipulation Bit extraction / concatenation / insertionBit extraction / concatenation / insertion

Bit-width property for every valueBit-width property for every value

Cycle-level notationCycle-level notation Scheduling / binding information / delay Scheduling / binding information / delay

Our Architectural Synthesis Approaches – RDR / MCASOur Architectural Synthesis Approaches – RDR / MCAS

Consideration of multi-cycle communication during architConsideration of multi-cycle communication during archit

ectural (or behavioral) synthesisectural (or behavioral) synthesis Regular Distributed Register (RDR) micro-architecture Regular Distributed Register (RDR) micro-architecture

[Cong et al, ISPD’03][Cong et al, ISPD’03]• Highly regularHighly regular• Direct support of multi-cycle on-chip communicationDirect support of multi-cycle on-chip communication

MCAS: Architectural Synthesis for Multi-cycle CommunicationMCAS: Architectural Synthesis for Multi-cycle Communication• Efficiently maps the behavioral descriptions to RDR uArch Efficiently maps the behavioral descriptions to RDR uArch • Integrates architectural synthesis (e.g. resource binding, schedulinIntegrates architectural synthesis (e.g. resource binding, schedulin

g) with physical planningg) with physical planning

RDR/MCAS: Support for Heterogeneous Integration with Multi-RDR/MCAS: Support for Heterogeneous Integration with Multi-cycle Communication & Automatic Interconnect Pipeliningcycle Communication & Automatic Interconnect Pipelining

Distribute registers to each “island”Distribute registers to each “island” Choose the island size such thatChoose the island size such that

Single cycle for intra-island computation and communicationSingle cycle for intra-island computation and communication Multi-cycle communication between islands Multi-cycle communication between islands

Support interconnect pipeliningSupport interconnect pipelining Inter-island pipeline register station (PRS) for global communicationsInter-island pipeline register station (PRS) for global communications PRS performs PRS performs autonomous autonomous store-and-forwardstore-and-forward

MCAS: Multi-cycle architectural synthesis integrated with global placementMCAS: Multi-cycle architectural synthesis integrated with global placement Experimental resultsExperimental results

MCAS vs. Conventional flow:MCAS vs. Conventional flow:

• 36% reduction in clock period and 36% reduction in clock period and

• 30% reduction in total latency30% reduction in total latency MCAS-Pipe vs. MCAS:MCAS-Pipe vs. MCAS:

• 28.8% long global wirelength reduction28.8% long global wirelength reduction

• 19.3% total wirelength reduction19.3% total wirelength reduction

Can also support IP integration using latency Can also support IP integration using latency insensitive technique [Carloni, ICCAD’99]insensitive technique [Carloni, ICCAD’99]

Pipeline Register Station (PRS)3

1 24

LCC

FS

M

LCC

FS

M

LCC

FS

M

IP Library

Adaptor

Reg. FileV channel

H channel1 2

3 4

PRS

PRS

PRS

PRS

Synthesis Flow: MCAS-Pipe SystemSynthesis Flow: MCAS-Pipe System

ICG

C / VHDL

Locations

Placement-driven rescheduling & rebinding

Placement-driven rescheduling & rebinding

Scheduling-driven placementScheduling-driven placement

CDFG generationCDFG generation

Register and port bindingRegister and port binding

Datapath & FSM generationDatapath & FSM generation

Resource allocation& Functional unit binding

Resource allocation& Functional unit binding

RTL VHDL & Floorplan constraints

CDFG

Global interconnect sharingGlobal interconnect sharing

Global interconnect Global interconnect

sharingsharing Enable multiple data Enable multiple data

communications to share communications to share one physical link (a wire one physical link (a wire with pipeline registers)with pipeline registers)

Related PublicationsRelated Publications Regular distributed register (RDR) architecture and MCAS synthesis Regular distributed register (RDR) architecture and MCAS synthesis

algorithms algorithms ISPD’03, ICCAD’03ISPD’03, ICCAD’03

RDR-Pipe and MCAS-Pipe synthesis algorithmsRDR-Pipe and MCAS-Pipe synthesis algorithms DAC’04DAC’04

Lopass: high-level synthesis for low-power FPGAsLopass: high-level synthesis for low-power FPGAs ISLPED’03ISLPED’03

Multiplexor optimization through register/port binding Multiplexor optimization through register/port binding ASPDAC’04ASPDAC’04

Bitwidth-aware scheduling and binding algorithms Bitwidth-aware scheduling and binding algorithms ASPDAC’05ASPDAC’05

ConclusionsConclusions

Higher level abstraction is needed in current SO(P)C desigHigher level abstraction is needed in current SO(P)C desig

n flown flow SystemC becomes the SLD standard, esp., TLM is widely usedSystemC becomes the SLD standard, esp., TLM is widely used

Metropolis is a platform-based design frameworkMetropolis is a platform-based design framework

It is time to build new generation of behavioral synthesis system fIt is time to build new generation of behavioral synthesis system from TLMrom TLM

xPilot:xPilot: Ongoing projectOngoing project

An architectural synthesis infrastructure from TLM to RTL (FPGAsAn architectural synthesis infrastructure from TLM to RTL (FPGAs))