industrial experiences pioneering asynchronous commercial design

37
1 Industrial Industrial Experiences Experiences Pioneering Asynchronous Pioneering Asynchronous Commercial Design Commercial Design Peter A. Beerel Peter A. Beerel Fulcrum Microsystems Fulcrum Microsystems Calabasas Hills, CA, USA Calabasas Hills, CA, USA

Upload: rehan

Post on 05-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Industrial Experiences Pioneering Asynchronous Commercial Design. Peter A. Beerel Fulcrum Microsystems Calabasas Hills, CA, USA. Specification. Design & Verification. Design & Verification. Simulation & Verification. Synthesis & Floor Planning. Physical Design. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Industrial Experiences Pioneering Asynchronous Commercial Design

11

Industrial ExperiencesIndustrial Experiences

Pioneering Asynchronous Pioneering Asynchronous Commercial DesignCommercial Design

Peter A. BeerelPeter A. Beerel

Fulcrum MicrosystemsFulcrum Microsystems

Calabasas Hills, CA, USACalabasas Hills, CA, USA

Page 2: Industrial Experiences Pioneering Asynchronous Commercial Design

22

AgendaAgendaIntroduction to FulcrumIntroduction to Fulcrum

Description of Integrated PipeliningDescription of Integrated Pipelining Fulcrum’s clockless circuit architectureFulcrum’s clockless circuit architecture

Description of Fulcrum’s Design FlowDescription of Fulcrum’s Design Flow

Overview of NexusOverview of Nexus Fulcrum’s Terabit crossbarFulcrum’s Terabit crossbar

Overview of PivotPointOverview of PivotPoint Fulcrum’s first commercial productFulcrum’s first commercial product

CircuitA

CircuitB

Design & Verification

Design & Verification

Synthesis & Floor Planning

Physical Design

Specification

Database Release to Manufacturing

Sim

ula

tio

n &

Ver

ific

atio

n

Page 3: Industrial Experiences Pioneering Asynchronous Commercial Design

33

Company SnapshotCompany Snapshot

“Clockless”Semiconductor Company

Located in Calabasas, CA(30 people)

Technology provenin large-scale designs

Backed by top-tier investors(raised $14M in June)

Formed out of Caltech(1/00)

Page 4: Industrial Experiences Pioneering Asynchronous Commercial Design

44

AgendaAgendaIntroduction to FulcrumIntroduction to Fulcrum

Description of Integrated PipeliningDescription of Integrated Pipelining Fulcrum’s clockless circuit architectureFulcrum’s clockless circuit architecture

Description of Fulcrum’s Design FlowDescription of Fulcrum’s Design Flow

Overview of NexusOverview of Nexus Fulcrum’s Terabit crossbarFulcrum’s Terabit crossbar

Overview of PivotPointOverview of PivotPoint Fulcrum’s first commercial productFulcrum’s first commercial product

CircuitA

CircuitB

Design & Verification

Design & Verification

Synthesis & Floor Planning

Physical Design

Specification

Database Release to Manufacturing

Sim

ula

tio

n &

Ver

ific

atio

n

Page 5: Industrial Experiences Pioneering Asynchronous Commercial Design

55

Fulcrum’s Integrated PipeliningFulcrum’s Integrated Pipelining

Acknowledge

Robust, power efficient, and high performance

Fast delay-insensitive style using domino logic without latches(Developed at Caltech by Fulcrum’s founders)

Acknowledge

Dual-RailDominoLogic

Dual-RailDominoLogic

Dual-RailDominoLogic

Page 6: Industrial Experiences Pioneering Asynchronous Commercial Design

66

Integrated PipeliningIntegrated Pipelining

Harnessing the power of Domino LogicHarnessing the power of Domino Logic Addresses delay variability with Completion SensingAddresses delay variability with Completion Sensing Addresses power inefficiency with Async HandshakesAddresses power inefficiency with Async Handshakes Leverages more efficient “N” transistorsLeverages more efficient “N” transistors

OutputCompletionDetection

Dual-RailDominoLogic

Control

Dual-RailDominoLogic

Control

Dual-RailDominoLogic

Control

InputCompletion

Detection

Leaf Cell A Leaf Cell B Leaf Cell C

Page 7: Industrial Experiences Pioneering Asynchronous Commercial Design

77

Hierarchical DesignHierarchical Design Multi-level hierarchy of communicating blocksMulti-level hierarchy of communicating blocks

ASIC

At each level blocks communicate along channels

Page 8: Industrial Experiences Pioneering Asynchronous Commercial Design

88

Hierarchical DesignHierarchical Design Multi-level hierarchy of communicating blocksMulti-level hierarchy of communicating blocks

Main FSM

Register Bank

Memory

Adder/Mult.

Subtract/Divider

At each level blocks communicate along channels

Page 9: Industrial Experiences Pioneering Asynchronous Commercial Design

99

Hierarchical DesignHierarchical Design Multi-level hierarchy of communicating blocksMulti-level hierarchy of communicating blocks

BN-1 BN-2 BN-3

FAN-1 FAN-2 FAN-3 FA0Reg C

Reg B

Adder

Multiplier

Reg A

At each level blocks communicate along channels

channels

leaf cells

Page 10: Industrial Experiences Pioneering Asynchronous Commercial Design

1010

Leaf CellsLeaf Cells

DefinitionDefinition Smallest block that performs logic and communicates via channelsSmallest block that performs logic and communicates via channels Based on small number of pipeline templates guiding designBased on small number of pipeline templates guiding design Forms basic building block for physical designForms basic building block for physical design

FeaturesFeatures Facilitates high throughput and low latencyFacilitates high throughput and low latency Provides easy timing validation and analog verificationProvides easy timing validation and analog verification ~1,000 digital leaf cell types compose our leaf cell library~1,000 digital leaf cell types compose our leaf cell library ~200 additional subtypes for different environments (e.g., loads)~200 additional subtypes for different environments (e.g., loads)

FRCD

D

LCD

C

Page 11: Industrial Experiences Pioneering Asynchronous Commercial Design

1111

• Each pipeline style (QDI, timed…) has a different blueprint

• Library uses a blueprint to implement the lowest level blocks

RCD

F

LCD

C

Blueprint for a QDI N-input M-output pipeline stage

RCD

F

LCD

C

LCD

2-input 1-output pipeline stage

RCD

F

LCD

C

RCD

1-input 2-output pipeline stage

Template-Based Cell DesignTemplate-Based Cell Design

Page 12: Industrial Experiences Pioneering Asynchronous Commercial Design

1212

Summary of CharacteristicsSummary of CharacteristicsDelay-Insensitive timing modelDelay-Insensitive timing model Gates and wires can have arbitrary delaysGates and wires can have arbitrary delays

4 phase 1of4 handshake4 phase 1of4 handshake Uses 4 wires to send 2 bitsUses 4 wires to send 2 bits Plus an acknowledge wire for flow controlPlus an acknowledge wire for flow control Returned to neutral between each data transferReturned to neutral between each data transfer Self shieldingSelf shielding

Precharge domino logic plus async handshakePrecharge domino logic plus async handshakeLow latency; high frequency; robustLow latency; high frequency; robustAuto power conservation; zero standby powerAuto power conservation; zero standby power

Page 13: Industrial Experiences Pioneering Asynchronous Commercial Design

1313

AgendaAgendaIntroduction to FulcrumIntroduction to Fulcrum

Description of Integrated PipeliningDescription of Integrated Pipelining Fulcrum’s clockless circuit architectureFulcrum’s clockless circuit architecture

Description of Fulcrum’s Design FlowDescription of Fulcrum’s Design Flow

Overview of NexusOverview of Nexus Fulcrum’s Terabit crossbarFulcrum’s Terabit crossbar

Overview of PivotPointOverview of PivotPoint Fulcrum’s first commercial productFulcrum’s first commercial product

CircuitA

CircuitB

Design & Verification

Design & Verification

Synthesis & Floor Planning

Physical Design

Specification

Database Release to Manufacturing

Sim

ula

tio

n &

Ver

ific

atio

n

Page 14: Industrial Experiences Pioneering Asynchronous Commercial Design

1414

Fulcrum Design FlowFulcrum Design Flow

Hierarchical design flowHierarchical design flow Executable specificationsExecutable specifications Formal decompositionFormal decomposition Creates design hierarchyCreates design hierarchy

Semi-custom Semi-custom synthesis & layoutsynthesis & layout

Hierarchical floor planningHierarchical floor planning Automated transistor sizingAutomated transistor sizing Semi-automated physical Semi-automated physical

designdesign

Supports synchronous & Supports synchronous & asynchronous designsasynchronous designs

Hard macro from place & routeHard macro from place & route

ArchitectureDesign & Verification

Micro-architectureDesign & Verification

Synthesis &Floor Planning

Physical Design

Design Specification

Database Releaseto Manufacturing

Mit

ere

d S

imu

lati

on

& V

erif

icat

ion

Page 15: Industrial Experiences Pioneering Asynchronous Commercial Design

1515

Managing Design HierarchyManaging Design Hierarchy

Proprietary Objected Oriented Hardware LanguageProprietary Objected Oriented Hardware Language Integrated hierarchical design/verification languageIntegrated hierarchical design/verification language

Defines cell specification & implementationDefines cell specification & implementation SpecificationSpecification

Java or communicating-sequential-processes (CSP)Java or communicating-sequential-processes (CSP) Implementation: multiple formsImplementation: multiple forms

Sub-cellsSub-cellsSub-cells defined in terms of specification or implementationSub-cells defined in terms of specification or implementation

Defines integrated test environment for each cellDefines integrated test environment for each cell Enables verification at all pairs of levelsEnables verification at all pairs of levels

Efficiency featuresEfficiency features Supports refinement of cells and channelsSupports refinement of cells and channels

Page 16: Industrial Experiences Pioneering Asynchronous Commercial Design

1616

Physical DesignPhysical DesignLayout hierarchy based on design hierarchyLayout hierarchy based on design hierarchy Hierarchical floor-planning semi-automated Hierarchical floor-planning semi-automated Large scale hand placement before sizingLarge scale hand placement before sizing Long distance channels planned carefullyLong distance channels planned carefully

Timing closure by constructionTiming closure by construction Placement drives sizingPlacement drives sizing Can insert extra pipelining on long wires late in designCan insert extra pipelining on long wires late in design

Tradeoffs between performance and design timeTradeoffs between performance and design time Hand layout where necessaryHand layout where necessary Automated layout where possibleAutomated layout where possible

GoalsGoals Full-custom density and speed within ASIC design timeFull-custom density and speed within ASIC design time

Page 17: Industrial Experiences Pioneering Asynchronous Commercial Design

1717

Design Verification: System-LevelDesign Verification: System-Level

MissionMission Verify that executable spec = written spec + gate-level modelVerify that executable spec = written spec + gate-level model

Use industry-standard tools & methodsUse industry-standard tools & methods Cadence NCSIM and efficient Java-Verilog interfaceCadence NCSIM and efficient Java-Verilog interface Directed random testing Directed random testing Line & functional coverageLine & functional coverage

TestCases

Traffic Generator& Checker

ConfigurationManager

Test Bench Device Under Test

Monitor

BusFunctional

Model

ExecutableSpec

Gate-levelVerilogModel

Page 18: Industrial Experiences Pioneering Asynchronous Commercial Design

1818

Design Verification: Unit-LevelDesign Verification: Unit-Level

Mitered co-simulation for unit-level verificationMitered co-simulation for unit-level verification Check correctness of digital model by comparing it to golden CSP/Java Check correctness of digital model by comparing it to golden CSP/Java

modelmodel

FeaturesFeatures Framework automated and regressedFramework automated and regressed Checks correctnessChecks correctness Checks delay insensitivity and/or throughput and latencyChecks delay insensitivity and/or throughput and latency

High level(Java/CSP)

Low level(CSP/PRS/CDL)

TestEngine

Log==Copy

Page 19: Industrial Experiences Pioneering Asynchronous Commercial Design

1919

Analog Verification: Charge SharingAnalog Verification: Charge Sharing

SPICE-based charge sharing analysisSPICE-based charge sharing analysisTest case generation and analysis automatedTest case generation and analysis automatedCharge-sharing problems solved in numerous ways Charge-sharing problems solved in numerous ways

SymmetrizationSymmetrization Less transistor sharingLess transistor sharing Delay perturbationsDelay perturbations

Synthesis

Charge SharingTest Generator

SPICE

Page 20: Industrial Experiences Pioneering Asynchronous Commercial Design

2020

Synthesis: Gate Generation / SizingSynthesis: Gate Generation / Sizing

Automated generation of Automated generation of transistor netliststransistor netlists

Dynamic logic generationDynamic logic generation Transistor sharingTransistor sharing SymmetrizationSymmetrization Gate-library matchingGate-library matching

Transistor sizingTransistor sizing Path-based sizing to meet Path-based sizing to meet

amortized unit-delay modelamortized unit-delay model

Micro-architecture feedbackMicro-architecture feedback Identifies where fanout limits Identifies where fanout limits

performanceperformance

CSPGate

LibraryFloor planning

Information

Logic Synthesis

Transistor Sizing

CDL Netlist

Page 21: Industrial Experiences Pioneering Asynchronous Commercial Design

2121

Fulcrum QDI v. Synchronous FlowsFulcrum QDI v. Synchronous Flows

Save clock tree design, analysis, optimization, and verificationSave clock tree design, analysis, optimization, and verification

No timing closure problemsNo timing closure problems Unexpected long-wire bottlenecks easily solved with additional pipeline Unexpected long-wire bottlenecks easily solved with additional pipeline

buffers late in design cyclebuffers late in design cycle

QDI/DI timing model reduces timing analysis challengesQDI/DI timing model reduces timing analysis challenges

Fulcrum QDI hierarchical design facilitates:Fulcrum QDI hierarchical design facilitates: Composability, re-use, and early bug detectionComposability, re-use, and early bug detection

Hierarchical-floorplanning improves predictability of wiresHierarchical-floorplanning improves predictability of wires

Template-based leaf cell designs simplifies logic designTemplate-based leaf cell designs simplifies logic design

Design reuse reduces criticality of high-level synthesisDesign reuse reduces criticality of high-level synthesis

Decomposition methodology amenable to formal verificationDecomposition methodology amenable to formal verification

Page 22: Industrial Experiences Pioneering Asynchronous Commercial Design

2222

AgendaAgendaIntroduction to FulcrumIntroduction to Fulcrum

Description of Integrated PipeliningDescription of Integrated Pipelining Fulcrum’s clockless circuit architectureFulcrum’s clockless circuit architecture

Description of Fulcrum’s Design FlowDescription of Fulcrum’s Design Flow

Overview of NexusOverview of Nexus Fulcrum’s Terabit crossbarFulcrum’s Terabit crossbar

Overview of PivotPointOverview of PivotPoint Fulcrum’s first commercial productFulcrum’s first commercial product

CircuitA

CircuitB

Design & Verification

Design & Verification

Synthesis & Floor Planning

Physical Design

Specification

Database Release to Manufacturing

Sim

ula

tio

n &

Ver

ific

atio

n

Page 23: Industrial Experiences Pioneering Asynchronous Commercial Design

2323

Globally Asynchronous,Globally Asynchronous,Locally SynchronousLocally Synchronous

SoC designs: many cores with different clock domainsSoC designs: many cores with different clock domains

Async circuits can interconnect multiple sync cores in an Async circuits can interconnect multiple sync cores in an SoC design, eliminating global clock distribution and SoC design, eliminating global clock distribution and simplifying clock domain crossingsimplifying clock domain crossing

Fulcrum’s “Nexus” is a high speed on-chip interconnect:Fulcrum’s “Nexus” is a high speed on-chip interconnect: 16 port, 36 bit asynchronous crossbar16 port, 36 bit asynchronous crossbar Asynchronous cross-chip channelsAsynchronous cross-chip channels Async-sync clock domain convertersAsync-sync clock domain converters Runs at 1.35GHz in 130nm processRuns at 1.35GHz in 130nm process

Page 24: Industrial Experiences Pioneering Asynchronous Commercial Design

2424

Nexus System-on-Chip Nexus System-on-Chip InterconnectInterconnect

Non-blocking crossbarNon-blocking crossbar16 full-duplex ports16 full-duplex portsFlow control extends Flow control extends through the crossbarthrough the crossbarFull speed arbitrationFull speed arbitrationArbitrary-length “bursts”Arbitrary-length “bursts”Bridges clock domainsBridges clock domainsScales in bit width and Scales in bit width and portsportsProcess portableProcess portable

Generic Nexus Example

- Synchronous IP block

- Asynchronous IP block

- Pipelined repeater

- Clock domain converter

Page 25: Industrial Experiences Pioneering Asynchronous Commercial Design

2525

Nexus Burst FormatNexus Burst Format

To

D1

0

Incoming From Source Outgoing To Target

D2

0

D3

0

DN

1

• • •

From

D1

0

D2

0

D3

0

DN

1

• • •Data 36 bit

Tail 1 bit

Control 4 bit

Arbitrary-length source-routed bursts provide flexibility

Source Module

Target Module

Page 26: Industrial Experiences Pioneering Asynchronous Commercial Design

2626

Sync-to-Async ConversionSync-to-Async ConversionSynchronous Request / Grant FIFO protocolSynchronous Request / Grant FIFO protocol Data transferred if request and grant both high on rising edge of clockData transferred if request and grant both high on rising edge of clock Compensates for any skew on asynchronous sideCompensates for any skew on asynchronous side Low latency: 1/2 to 3/2 clock cycles at A2SLow latency: 1/2 to 3/2 clock cycles at A2S

S2A

A

SynchronousDatapath

Request

Grant

clock

AsynchronousDatapath

A2S

A

SynchronousDatapath

Request

Grant

clock

AsynchronousDatapath

Seamlessly Bridges Different Clock Domains

Page 27: Industrial Experiences Pioneering Asynchronous Commercial Design

2727

Arbitration and OrderingArbitration and OrderingUnrelated sender/receiver links are independentUnrelated sender/receiver links are independentBursts sent from multiple input ports to the same output Bursts sent from multiple input ports to the same output port are serviced fairly by built-in arbitration circuitry port are serviced fairly by built-in arbitration circuitryBursts from A to B remain orderedBursts from A to B remain orderedProducer-consumer and global-store-ordering satisfiedProducer-consumer and global-store-ordering satisfied

A sends X to B, A notifies C, C can read X from BA sends X to B, A notifies C, C can read X from B A writes X to B, A writes Y to C, if D reads Y from C, it can read A writes X to B, A writes Y to C, if D reads Y from C, it can read

X from BX from B

Split transactions implement loadsSplit transactions implement loads Load request and load completion burstsLoad request and load completion bursts Load completions returned out-of-orderLoad completions returned out-of-order

Can tunnel common bus and cache coherance protocols

Page 28: Industrial Experiences Pioneering Asynchronous Commercial Design

2828

Example: Load/Store SystemsExample: Load/Store SystemsOption 1: Pure Master/Target PortsOption 1: Pure Master/Target Ports Masters send Requests to Targets, which may return Masters send Requests to Targets, which may return

CompletionsCompletions Each port must either be a Master or a Target so that Each port must either be a Master or a Target so that

Completions are never blocked by RequestsCompletions are never blocked by Requests Devices which need to be both Masters and Targets are Devices which need to be both Masters and Targets are

given two separate full-duplex portsgiven two separate full-duplex ports Could use two separate Nexus crossbarsCould use two separate Nexus crossbars

Option 2: PeersOption 2: Peers Modules which are both Masters and Targets implement Modules which are both Masters and Targets implement

an internal buffer to hold Requests so that Completions an internal buffer to hold Requests so that Completions can bypass themcan bypass them

All Masters or Peers restrict number of outstanding All Masters or Peers restrict number of outstanding Requests to avoid overflowing Request buffersRequests to avoid overflowing Request buffers

Page 29: Industrial Experiences Pioneering Asynchronous Commercial Design

2929

Example: Switch FabricExample: Switch Fabric

Each module maintains input/output queues for Each module maintains input/output queues for traffic to/from each other moduletraffic to/from each other module

Data is sent from an input queue to an output Data is sent from an input queue to an output queue over Nexus as a series of short burstsqueue over Nexus as a series of short bursts

Flow control credits for each output queue are Flow control credits for each output queue are sent backwardsent backward

Eliminates head-of-line blockingEliminates head-of-line blocking

Segmentation, buffering, and overspeed optimize Segmentation, buffering, and overspeed optimize performance during congestionperformance during congestion

Used in PivotPoint, Fulcrum’s first chip product.Used in PivotPoint, Fulcrum’s first chip product.

Page 30: Industrial Experiences Pioneering Asynchronous Commercial Design

3030

ALU

S1

S2

S3

S4

S5

S6

S7

Serial IO

Nexus Silicon ValidationNexus Silicon Validation

Plot of Nexus crossbar

Block diagram of Nexus Validation Chip ProcProc VV GHzGHz nsns pJ/bitpJ/bit

Low-KLow-K 1.21.2 1.351.35 2.02.0 10.410.4

Low-KLow-K 1.01.0 1.111.11 2.42.4 7.07.0

FSGFSG 1.21.2 1.101.10 2.52.5 11.211.2

FSGFSG 1.01.0 0.870.87 3.13.1 7.67.6

TSMC 130nm LV Results

Crossbar area: 1.75mm^2Total interconnect area: 4.15mm^2

Peak cross-section bandwidth: 778Gb/s

Page 31: Industrial Experiences Pioneering Asynchronous Commercial Design

3131

Nexus SummaryNexus SummaryNexus is an asynchronous crossbar Nexus is an asynchronous crossbar interconnect designed to connect up to 16 interconnect designed to connect up to 16 synchronous modules in a SoCsynchronous modules in a SoCNexus can be used to implement load/store Nexus can be used to implement load/store systems as well as switch fabricssystems as well as switch fabricsSystems using Nexus can be tested with Systems using Nexus can be tested with standard equipmentstandard equipmentNexus runs up to 1.35GHz in TSMC 130nmNexus runs up to 1.35GHz in TSMC 130nmAsynchronous interconnect is now viable for Asynchronous interconnect is now viable for very high performance SoC designsvery high performance SoC designs

Page 32: Industrial Experiences Pioneering Asynchronous Commercial Design

3232

AgendaAgendaIntroduction to FulcrumIntroduction to Fulcrum

Description of Integrated PipeliningDescription of Integrated Pipelining Fulcrum’s clockless circuit architectureFulcrum’s clockless circuit architecture

Description of Fulcrum’s Design FlowDescription of Fulcrum’s Design Flow

Overview of NexusOverview of Nexus Fulcrum’s Terabit crossbarFulcrum’s Terabit crossbar

Overview of PivotPointOverview of PivotPoint Fulcrum’s first commercial productFulcrum’s first commercial product

CircuitA

CircuitB

Design & Verification

Design & Verification

Synthesis & Floor Planning

Physical Design

Specification

Database Release to Manufacturing

Sim

ula

tio

n &

Ver

ific

atio

n

Page 33: Industrial Experiences Pioneering Asynchronous Commercial Design

3333

PivotPoint Blade InterconnectPivotPoint Blade Interconnect

Large-scale SoC designLarge-scale SoC design >32.5M transistors (83% async)>32.5M transistors (83% async) 14 separate clock domains14 separate clock domains

Includes key Fulcrum IPIncludes key Fulcrum IP Nexus Terabit CrossbarNexus Terabit Crossbar Quad-port 600MHz async SRAMQuad-port 600MHz async SRAM

Operates at over 1GHzOperates at over 1GHzDelivers 192Gbps of non-blocking Delivers 192Gbps of non-blocking switching capacityswitching capacityTestable via standard toolsTestable via standard tools

JTAG; scan chainJTAG; scan chain

Activity-based power scalingActivity-based power scaling9-month project9-month project

World’s first high-performance clockless chip

X8

SPI-4

I/O(Phy/MAC)

BackplaneInterface

CPUNPUASICFPGA

CPUNPUASICFPGA

CPUNPUASICFPGA

CPUNPUASICFPGA

Generic System “Blade”

Page 34: Industrial Experiences Pioneering Asynchronous Commercial Design

3434

PivotPoint Leverages NexusPivotPoint Leverages NexusFlexible architectureFlexible architecture

6 duplex SPI-4.2 interfaces6 duplex SPI-4.2 interfaces All paths are independentAll paths are independent

Optimized for performanceOptimized for performance Up to 14.4Gbps per interfaceUp to 14.4Gbps per interface Up to 32Gbps per Nexus portUp to 32Gbps per Nexus port Full-rate buffer memoriesFull-rate buffer memories Lossless flow controlLossless flow control

Easily configurableEasily configurable 16-bit CPU interface16-bit CPU interface JTAG supportJTAG support

Modest size and powerModest size and power ~2 Watt per active interface~2 Watt per active interface 1036 ball package1036 ball package

3ns latency

A true SoC GALS design

Control Bus(Serial Tree)

SPI-416KBBuffer

SPI-4 16KBBuffer

RouteTable

SPI-416KBBuffer

SPI-416KBBuffer

RouteTable

SPI-416KBBuffer

SPI-4 16KBBuffer

RouteTable

SPI-416KBBuffer

SPI-416KBBuffer

RouteTable

SPI-416KBBuffer

SPI-4 16KBBuffer

RouteTable

SPI-416KBBuffer

SPI-416KBBuffer

RouteTable

CPUInterface

JTAGInterface

BoundaryScan

Page 35: Industrial Experiences Pioneering Asynchronous Commercial Design

3535

Testing – Testing – A Multi-Dimensional ApproachA Multi-Dimensional ApproachDFTDFT Synchronous scan chains for Synchronous logicSynchronous scan chains for Synchronous logic Asynchronous scan-chain-like structures for Asynchronous scan-chain-like structures for

asynchronous logic and sync-async interfacesasynchronous logic and sync-async interfaces Standardized JTAG interface for testingStandardized JTAG interface for testing

Fault-GradingFault-Grading Verilog fault-model for domino logicVerilog fault-model for domino logic Industry-standard fault grading toolsIndustry-standard fault grading tools

BISTBIST Use Nexus for observability in Nexus-Based SOCsUse Nexus for observability in Nexus-Based SOCs RAM self test and repairRAM self test and repair

Page 36: Industrial Experiences Pioneering Asynchronous Commercial Design

3636

Differentiating Through TechnologyDifferentiating Through TechnologyLeveraging our clockless technology foundation

Differentiated Product OfferingDifferentiated Product Offering

High performance (latency, capacity)

Power efficient (linear scaling)

Robust in operation

High performance (latency, capacity)

Power efficient (linear scaling)

Robust in operation

Clockless Technology FoundationClockless Technology Foundation

Silicon proven and customer validated

Mature CAD flow (integrated with commercial tools)

Robust cell library (thousands of unique cells)

Silicon proven and customer validated

Mature CAD flow (integrated with commercial tools)

Robust cell library (thousands of unique cells)

Unique IP BlocksUnique IP Blocks

Unmatched performance

Extremely robust (power and temperature)

Easy to integrate (benign behavior)

Unmatched performance

Extremely robust (power and temperature)

Easy to integrate (benign behavior)

Page 37: Industrial Experiences Pioneering Asynchronous Commercial Design

3737

Thank You!Thank You!

“A group of engineers wants to turn the microprocessor world on its head by doing the unthinkable: tossing out the clock and letting the signals move about unencumbered. For those designers, inspired by research conducted at Caltech, clocks are for wimps.”

Anthony Cataldo , EE Times

Peter A. Beerel, PhDVP Strategic [email protected]

818.871.8100www.fulcrummicro.com

26775 Malibu Hills RoadSuite 200Calabasas Hills, CA 91301