chipyard: integrated design, simulation, and

12
Department: Head Editor: Name, xxxx@email Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs Alon Amid, David Biancolin, Abraham Gonzalez, Daniel Grubb, Sagar Karandikar, Harrison Liew, Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, Paul Rigge, Colin Schmidt, John Wright, Jerry Zhao, Yakun Sophia Shao, Krste Asanovi´ c, Borivoje Nikoli´ c University of California, Berkeley Abstract—Continued improvement in computing efficiency requires functional specialization of hardware designs. Agile hardware design methodologies have been proposed to alleviate the increased design costs of custom silicon architectures, but their practice thus far has been accompanied with challenges in integration and validation of complex systems-on-a-chip. We present the Chipyard framework, an integrated SoC design, simulation, and implementation environment for specialized compute systems. Chipyard includes configurable, composable, open-source, generator-based IP blocks that can be used across multiple stages of the hardware development flow while maintaining design intent and integration consistency. Through cloud-hosted FPGA-accelerated simulation and rapid ASIC implementation, Chipyard enables continuous validation of physically realizable customized systems. In the face of the slowdown in technology scaling, sustaining improvements in system ca- pability requires a greater use of domain-specific architectures. This era of specialization is being seen as a new golden age of computer architec- ture [1], but creates the challenge of escalating development costs. Differentiated architectures require productive digital system design methods for architectural exploration, system integration, verification, validation, and physical design. To reduce development costs, a generator- based agile design process for hardware has been proposed and demonstrated through a series of RISC-V microprocessor chips developed with small teams [2] and made available as a now widely used open-source codebase [3]. Designs captured as generators enable reuse through rich parameterization and incremental extension. Past efforts have focused on the module generator development and an implementation of relatively IEEE Micro Published by the IEEE Computer Society c 2020 IEEE 1 This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616 Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Upload: others

Post on 14-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chipyard: Integrated Design, Simulation, and

Department: HeadEditor: Name, xxxx@email

Chipyard: Integrated Design,Simulation, andImplementation Framework forCustom SoCsAlon Amid, David Biancolin, Abraham Gonzalez, Daniel Grubb, Sagar Karandikar, HarrisonLiew, Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, Paul Rigge, Colin Schmidt,John Wright, Jerry Zhao, Yakun Sophia Shao, Krste Asanovic, Borivoje NikolicUniversity of California, Berkeley

Abstract—Continued improvement in computing efficiency requires functional specialization ofhardware designs. Agile hardware design methodologies have been proposed to alleviate theincreased design costs of custom silicon architectures, but their practice thus far has beenaccompanied with challenges in integration and validation of complex systems-on-a-chip.We present the Chipyard framework, an integrated SoC design, simulation, and implementationenvironment for specialized compute systems. Chipyard includes configurable, composable,open-source, generator-based IP blocks that can be used across multiple stages of the hardwaredevelopment flow while maintaining design intent and integration consistency. Throughcloud-hosted FPGA-accelerated simulation and rapid ASIC implementation, Chipyard enablescontinuous validation of physically realizable customized systems.

In the face of the slowdown in technologyscaling, sustaining improvements in system ca-pability requires a greater use of domain-specificarchitectures. This era of specialization is beingseen as a new golden age of computer architec-ture [1], but creates the challenge of escalatingdevelopment costs. Differentiated architecturesrequire productive digital system design methodsfor architectural exploration, system integration,verification, validation, and physical design.

To reduce development costs, a generator-based agile design process for hardware has beenproposed and demonstrated through a series ofRISC-V microprocessor chips developed withsmall teams [2] and made available as a nowwidely used open-source codebase [3]. Designscaptured as generators enable reuse through richparameterization and incremental extension. Pastefforts have focused on the module generatordevelopment and an implementation of relatively

IEEE Micro Published by the IEEE Computer Society c© 2020 IEEE 1

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 2: Chipyard: Integrated Design, Simulation, and

Department Head

EOS14(2012)

Raven-2(2012)

EOS16(2012)

Raven-3(2012)

EOS18(2013)

EOS20(2013)

EOS22(2014)

Raven-4(2014)

Swerve(2015)

EOS24(2015)

Hurricane-1(2016)

CRAFT-0(2016)

Hurricane-2(2017)

CRAFT-P1(2017)

CRAFT-FFT2(2017)

BROOM(2018)

EAGLE(2018)

GPSSOC(2019)

EAGLE-X(2019)

Tran

sist

ors

Test Chip (Chronological)

106

109

108

107

Figure 1. Increasing complexity of custom RISC-V SoC test chips built at Berkeley using the Rocket Chip SoCgenerator since 2012.

small, homogeneous processor architectures withuniform interconnect and core-level configura-bility. Enabling the development of complexheterogeneous systems-on-a-chip (SoCs) requiresgreater levels of coordination and synchronizationbetween many disparate open-source designs anddevelopment tools, motivating the creation of anew unified open-source SoC design framework,which we call Chipyard1.

Chipyard provides a framework to bring to-gether a collection of independently developedopen-source tools and RTL generators, allowingdevelopment of heterogeneous SoCs through in-tegrated design, simulation, and implementationenvironments. Chipyard helps alleviate many ofthe challenges that exist when using indepen-dent and uncoordinated open-source tools anddesigns, as often experienced in concurrent andnon-uniform feature design iterations, typical inthe agile design process.

Agile SoC DevelopmentA number of RISC-V test chips have been

designed by small groups of students at UCBerkeley over the past eight years. These designshave been based on the open-source Rocket ChipSoC generator [3] that includes Rocket, an in-order RISC-V core, and supports coherent caches

1https://github.com/ucb-bar/chipyard/

and standard interconnects via the TileLink pro-tocol. Customizations to test chips based on theRocket Chip generator have been performed byadding multiple generations of vector processors,peripheral devices, and by replacing the stan-dard in-order core with an out-of-order core.The design process outlined in [2] was graduallyenhanced on the physical design front to supportmuch larger silicon dies and many more placeableinstances in the SoC, resulting in increasinglycomplex test chips (Figure 1). The most recenttest chips in this series are representative ofmodern SoCs composed of a diverse set of IPblocks and include multiple cores, accelerators,and complete analog subsystems, including high-speed serial links (SerDes), analog-to-digital con-verters (ADCs) and phase-locked loops (PLLs).

Verification and validation are major com-ponents of the SoC design cycle. By relyingon many previously verified open-source compo-nents such as the Rocket core, and incrementallyextending SoCs based on the Rocket Chip gen-erator, it was often possible to sidestep rigorousverification and validation steps in this series oftest chips. Each test chip has a focus on testingeither a particular design feature of a moduleor a component of the design methodology, andtherefore, the verification of a test chip wasrespectively limited to a small set of functionality.

2 IEEE Micro

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 3: Chipyard: Integrated Design, Simulation, and

This reliance on open-source generators shiftsthe verification challenge from block-level test-ing to system-integration verification and vali-dation. Rich module-level configurability in thegenerator-based approach allows for quick, itera-tive design customization, enabled by expansiveconfiguration and parameter systems. While theseparameter systems enable quick iterations acrossmany design points, their flexible nature makesthem prone to misconfiguration, underscoring theneed for continuous full-system integrated val-idation and verification. Rich module-level pa-rameterization makes it possible to continuouslyupdate the SoC design in an agile way, but doesnot inherently aid the design verification or pre-silicon validation of the chip performance. Inaddition, issues relating to on- and off-chip inter-faces, clock domain crossings, third-party IP inte-gration, and power management are all vulnerableto insufficient full-system verification coverage.Our previous chips have encountered both ver-ification and validation gaps of this type: onechip’s cache capacity was inadvertently reducedto half of its desired size, when the configurationwas changed late in the design process to meetphysical design constraints.

Although design and verification executed bya small team in an agile manner has a high appealfor small companies and industrial and academicresearch labs, our experiences with test chipsdeveloped through an agile process also identifysome challenges in execution. The increasingcomplexity of test chips makes it difficult toparallelize and distribute effort among the smallnumber of designers, as architecture definition,RTL implementation, physical design, verifica-tion, and validation all take varying amountsof time, which is difficult to account for. Fur-thermore, it is difficult to maintain institutionalmemory of good design practices, especially inacademic environments and when working withcomplex physical design tool flows in deeplyscaled technologies.

Finally, transitioning between process tech-nologies is a significant undertaking in both sys-tem and test chip design. The test chips in this ret-rospective have been designed in several differentprocess technologies, including IBM 45nm SOI,ST 28nm FD-SOI, TSMC 28nm, TSMC 16nmFFC, and Intel 22nm FFL. The generator-based

approach simplifies process technology transition,since it allows for adjustment of design param-eters through high-level descriptions. However,this flexibility does not propagate across theabstraction layers to the physical design pro-cess. Each custom test chip requires significantmanual effort in mapping RTL abstractions toprocess-specific components (such as memoryand register-file macros), as well as meeting thedesign rules. When generator parameters changebetween design iterations, a new rigid physicaldesign script is often created.

While some of the issues described in thissection are not specific to agile hardware de-sign methodologies, their visibility increases asthe cost and complexity of other parts of theflow diminish. Nevertheless, a common themethroughout our observations relates to integrationand partitioning—on the chip level, and on themethodology and flow level.

The Chipyard FrameworkChipyard provides a unified framework and

work flow for agile SoC development. Multipleseparately developed and highly parameterized IPblocks can be configured and interconnected toform a complete SoC design. The SoC designcan be verified and validated through both FPGA-accelerated and standard software simulations,then pushed through portable VLSI design flowsto obtain tapeout-ready GDSII data for varioustarget technologies. Chipyard also provides aworkload management system to generate soft-ware workloads to exercise the design.

Chipyard Front-End RTL GeneratorsThe front end of the Chipyard framework is

based on the Rocket Chip SoC generator [2],[3]. Chipyard inherits Rocket Chip’s Chisel-based parameterized hardware generator method-ology [3], including a Scala-based parameter-negotiation framework, Diplomacy [4], that nego-tiates mutually compatible parameterizations andinterconnections across all IP blocks in a design.A unified top-level SoC generator enables thegeneration of heterogeneous systems based on pa-rameterized configurations. Chipyard also allowsIP blocks written in other hardware languages,e.g., Verilog, to be included via a Chisel wrapper.

Chipyard adds a large corpus of open-source

July/August 2020 3

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 4: Chipyard: Integrated Design, Simulation, and

Department Head

Figure 2. Multiple disparate design flows supported by the Chipyard framework through generators andtransformations. A series of FIRRTL transformations consumes generators with a custom configuration, andoutputs appropriate Verilog and associated collateral for different design stage platforms.

IP generators to the existing Rocket Chip baselibrary, allowing for the construction of moderndigital SoCs. These include the Berkeley Out-of-Order Machine (BOOM) generator [5], theAriane core [6], the Hwacha vector-unit genera-tor [7], digital signal processing (DSP) modules,domain-specific accelerators (including machinelearning and cryptography), memory systems, andperipherals. The majority of these generators havesilicon-proven instances through the aforemen-tioned series of test chips.

While some commercial IP vendors have largecollections of proprietary configurable IP for cer-tain portions of an SoC, Chipyard provides apublicly extensible open-source alternative for

complete SoCs to support continuing research anddevelopment of specialized state-of-the-art SoCs.

Other open-source SoC design frameworksfocus on tile-granularity customization in many-core architectures [8], [9], or rely on rigid sub-systems and proprietary IP [10]. The gener-ator approach used in Chipyard does not relysolely on static interfaces for integration of IPblocks, but allows for dynamic customizationof encodings, memory maps, and buses duringthe hardware generation stage, enabling customcomponents to be created and integrated at var-ious levels, including in the MMIO periphery,as tightly integrated accelerators, and as het-erogeneous cores and controllers. For example,

4 IEEE Micro

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 5: Chipyard: Integrated Design, Simulation, and

through its fine-grained intra-core parameter sys-tem, the Rocket core can be used for differentpurposes in an SoC. Specifically, in several ofour recent test chips, the application cores consistof fully Linux-capable Rocket cores supportingRV64GC with floating-point units and virtualmemory support, while the controller cores (e.g.a power-management unit) are a Rocket coresupporting only RV64IMAC, with significantlysmaller branch prediction and cache resourcesand no virtual memory. In essence, a significantportion of the microcontrollers running firmwareon the SoC can be implemented using variantsof Rocket or BOOM cores. This common coreconfiguration interface for all software-managedcontrollers within the system improves designerproductivity, compared to alternative SoC de-velopment frameworks that enable drop-in re-placement core options or only coarse-grainedsizing. Similarly, machine-learning acceleratorshave been integrated with Rocket and other coregenerators as tightly integrated accelerators [11],as well as in the form of MMIO periphery ac-celerators [12], demonstrating the various levelsof possible customization in the Chipyard frame-work. Figure 3 demonstrates the various degreesof customization on the core, tile, and SoC levels,enabled by the Chipyard Framework.

FIRRTL Intermediate RepresentationThe Chipyard framework currently integrates

tools to address the three main activities withinthe custom SoC design cycle: front-end RTL de-sign, system validation/verification, and back-endchip physical design. These different stages of theSoC design flow require different levels of de-scription of the design. For example, while front-end RTL descriptions usually use abstract notionsof memory and I/O, back-end RTL requires moreprecise descriptions mapped to the underlyingprocess technology. Similarly, FPGA emulationwill require the digital design to interact withFPGA-specific interfaces, periphery, and internalcomponents, which requires particular collateral.Co-simulation also requires additional hardwareclock gating to control simulation progress.

Chipyard elaborates the front-end RTL de-sign into a FIRRTL [13] intermediate represen-tation. Custom FIRRTL transformations convertthe generated FIRRTL design to drive the differ-

ent flows used at different stages of the designcycle. Using FIRRTL transformations to enablemultiple concurrent design flows from the sameshared code repository and source RTL helpsto reduce and amortize the environment setupcosts incurred with frequent iterations betweendevelopment stages, as is needed for an agilemethodology. This approach is demonstrated inFigure 2.

While Chisel is the primary language fordesign entry in Chipyard using the FIRRTL com-piler, a FIRRTL-based flow can integrate VerilogIP through either “Blackbox” IP integration orVerilog-to-FIRRTL support by certain Verilogelaboration tools [14]. Furthermore, while theVerilog outputs of various stages of a FIRRTL-based flow can be integrated into standard dy-namic verification environments or compared us-ing logical equivalence checking, tools for bothsimulation and temporal property checking of“FIRRTL-native” circuits are openly available[15].

Verilog or SystemVerilog-based design frame-works [10], [8] must rely on design-specific cus-tom scripts or interface adjustments when transi-tioning between emulation, simulation and phys-ical design. In contrast to alternative hardwarepackage management systems [16] or integrationstandards like IP-XACT, which focus on metadataassociated with particular IP components to targetdifferent EDA flows, the FIRRTL transformationsused by Chipyard can be applied to any Chiseldesign integrated with the Chipyard framework.

Software RTL SimulationSoftware-based RTL simulators are a critical

tool in most phases of the design process. Com-piling a software simulator of a top-level design,including various IP components, peripheral andmemory models, and an external test harness canbe a time-consuming engineering task. Chipyardprovides build flows for both the open-sourceVerilator simulator and proprietary commercialsimulators. Open-source RTL simulators such asVerilator are also used in industry [17] to provideefficient and cost-effective digital design veri-fication. Chipyard provides Makefile wrappersfor direct generation of a simulation executablewhich simulates tethered designs with emulatedperipherals. The Makefile wrappers generate the

July/August 2020 5

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 6: Chipyard: Integrated Design, Simulation, and

Department Head

Figure 3. SoC customization in Chipyard using the Rocket Chip generator configuration system. (a) A typicalChipyard SoC can include multiple heterogeneous application cores, as well as multiple configurations ofBOOM or Rocket cores that can serve various purposes within the SoC, such as a controller unit. Customizationcan be performed across various levels of the SoC, including RoCC accelerators, MMIO accelerators andperipherals. (b) An example configuration of a simple design, consisting of a single Rocket in-order processorcore and a custom educational SHA3 tightly-integrated accelerator. (c) An example configuration of a morecomplex SoC, consisting of a multiple cores of various capabilities (RV32 Rocket in-order control core, RV64Rocket in-order application core, RV64GC BOOM 3-wide out-of-order application core), custom acceleratorsattached to each core, a standard set of peripherals, and a deeper memory hierarchy with an L2 cache. Thisexample also includes a simulated block device and simulated backing DRAM memory.

top-level design and matching test harnessesbased on the SoC configuration. Tethered designsuse a host to send transactions that bring upthe simulated SoC and load programs. Thesesoftware RTL simulation wrappers enable quickdesign cycles and execution of RISC-V binariesin simulation. While tethered designs are the de-fault form to generate software RTL simulationsin Chipyard, Chipyard also supports un-tetheredSoC configurations in which the SoC can bootstandalone using a boot ROM.

FPGA-Accelerated Simulation with FireSim

For full-system validation and evaluation, theChipyard framework harnesses the FireSim [18]open-source FPGA-accelerated simulation plat-form using the AWS EC2 public cloud. In con-trast with FPGA prototyping, FPGA-acceleratedsimulation correctly models timing behavior ofnot only the design under test, but also the I/Osand peripherals of the SoC. FPGA-acceleratedsimulation enables deterministic and reproducibleevaluation with a realistic system environment,as opposed to FPGA prototyping where eachexecution is sensitive to the FPGA environment

6 IEEE Micro

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 7: Chipyard: Integrated Design, Simulation, and

and timing depends on FPGA peripheral deviceperformance (e.g. DRAM performance).

Originally developed as a platform to enablescale-out simulation for datacenter architectureresearch on hundreds of cloud FPGAs, FireSimnecessarily automates the infrastructure manage-ment and simulation mapping necessary to auto-matically run high-performance simulations. Aspart of the agile chip-design stack, this automa-tion and integration reduces the level of expertiserequired to harness cloud FPGAs for emulationpurposes and thus increases the accessibility ofhigh performance full-system simulation to abroad spectrum of designers. FireSim has beenuseful in pre-silicon verification, validation, andsoftware development. From the perspective ofsmall agile teams with limited resources, FireSimprovides many of the features available in costlycommercial emulation platforms. In contrast withprior FPGA-accelerated simulation tools, the ac-cessibility of FireSim through FPGA instances onthe AWS public cloud, together with the automa-tion of host-target interfaces with the FPGA, havemade FireSim a popular tool within Berkeley andother academic hardware development users, aswell as emerging startup companies.

FireSim enables co-development of softwareand hardware simultaneously, allowing for quicksoftware adjustment turnarounds based on hard-ware modifications. Furthermore, FireSim playsa major role in the performance and functionalvalidation of processors, since it enables theidentification of bugs deep into simulation ex-ecution time thanks to FPGA-acceleration withappropriate peripheral modeling. Unlike manyother open-source hardware development plat-forms with FPGA support, FireSim’s focus onsimulation and emulation as opposed to prototyp-ing enables true pre-silicon performance evalua-tion and validation in a full-system context withinthe Chipyard framework. While maintaining itsstand-alone operation as an architectural researchplatform, FireSim was transformed into a librarywhich is integrated into the broader Chipyardframework. As such, FireSim can now consumedesign configurations composed within the Chip-yard framework, and transform them into FPGA-accelerated simulations. Furthermore, the FireSimGolden Gate compiler has been integrated intothe Chipyard framework, so it can now consume

arbitrary FIRRTL as its input, as well as externalVerilog components necessary for broader systemintegration.

Back-End Physical Design with HammerFor back-end physical design, Chipyard in-

cludes a modular VLSI flow named Hammer [19].The Hammer VLSI flow provides an abstractionlayer above process-technology- and EDA-tool-specific concerns, with the goal of increasingre-use and modularity of vendor-specific com-ponents of the physical design flow. To thisend, the Hammer VLSI flow utilizes separatevendor-specific process technology plug-ins andEDA-tool-specific plug-ins, which implement ab-stracted software APIs to generate design flowcollateral like Tcl scripts, clock constraints, andpower specifications based on higher-level designinputs. For example, Hammer will emit process-and vendor-specific macro placement, obstruc-tion, and power-strap placement commands froma high-level process- and vendor-agnostic descrip-tion of the design. This separation of abstractionlayers between design, process technology, andEDA tool vendor enables faster adoption of open-source components.

The Hammer flow aspires to support open-source tools in conjunction with commercial andproprietary tools using common levels of ab-straction. As such, while the first Hammer-baseddesigns were implemented using proprietary pro-cess technologies, a plug-in for the ASAP7 [20]open-source predictive PDK was created in onlya few weeks and is now included in the coreHammer repository. With this, small teams andacademic users can prototype design flows andexperiment with RTL designs using predictiveor simple physical design kits, while being ableto reuse similar Hammer descriptions for chipfabrication using advanced process nodes.

Hammer was designed to support hierarchi-cal physical design flows. Hierarchical physi-cal design flows are of particular importance inhighly complex custom SoCs, composed of mul-tiple specialized blocks with a variety of phys-ical design constraints. Decomposing a designinto these smaller hierarchical components notonly improves the quality of results emitted byEDA tools, but it also allows the distribution ofphysical design tasks among multiple hardware

July/August 2020 7

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 8: Chipyard: Integrated Design, Simulation, and

Department Head

developers, which is important for agile design.FIRRTL-based grouping and flattening transfor-mations in Chipyard further assist the hierarchicalphysical design flow in Hammer by enablingusers to specify one logical hierarchy in thesource RTL, while choosing a different hierarchyfor physical boundaries through automated trans-formations.

Workload Management with FireMarshal

Finally, no hardware development environ-ment is complete without relevant software in-terfaces. In order to support continuous and con-current full-system validation, Chipyard enablesshared software development across the featuredesign stages through the FireMarshal softwareworkload generation tool. Shared software de-velopment is especially important in integratedvalidation of specialized and custom designs.FireMarshal provides a standard and version-controlled format for software workload descrip-tions and automates the generation of these work-loads for various simulation targets. Softwaredevelopers can begin work as soon as a functionalmodel is available (e.g. in the Spike RISC-VISA simulator or the QEMU emulator). Thoseworkloads can then be used without any modi-fication on RTL simulations and FireSim simula-tions. In this way, the complex task of softwaredevelopment and porting (particularly for Linux-based workloads) can be reused by anyone onthe design team without requiring special exper-tise. FireMarshal includes several examples andtemplates for Linux-based workloads, enablingfast ramp-up of software development across thevarious simulation and emulation targets usingpreset Linux kernel configurations and base dis-tribution images with matching drivers. Chipyardprovides a versioned set of standard RISC-Vsoftware development tools (e.g. GNU toolchain,QEMU, Spike), as well as a set of equivalentnon-standard RISC-V development tools for non-standard extensions of custom IP blocks. Thetwo software development tool sets can be usedinterchangeably in the framework. The softwaredevelopment structure in Chipyard is illustratedin Figure 4.

Figure 4. Shared software development in Chip-yard using FireMarshal. Designer can use the stan-dard RISC-V software toolchains, or custom softwaretoolchains. While the core application logic and li-braries are consistent with the SoC design, kerneland driver configuration may change based on thetarget platforms or tethered systems. FireMarshalautomates workload generation to enable targetingmultiple platform using a single description.

Concurrent Agile Hardware Development

Together, the components of the Chipyardframework enable continuously integrated devel-opment of custom and specialized SoCs, takingadvantages of the accessibility of open-sourcetooling. The Chipyard framework enables furtheradoption of agile hardware development method-ologies, by lowering the barrier-to-entry and pro-viding support and integration of developmentenvironments for different stages of the designprocess.

Chipyard is designed for concurrent develop-ment of multiple custom SoC features by multiplemembers of a small team, with each feature goingthrough iterative design cycles which includemodeling, simulation, system-integration, FPGAemulation and validation, and physical design.The structure of Chipyard enables independentdevelopment of the various IP modules and/orconfigured SoC instances, with continuous inte-gration using configured workloads for systemvalidation and verification.

8 IEEE Micro

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 9: Chipyard: Integrated Design, Simulation, and

Open-Source SoC Development

Chipyard aims to be a fully open-source SoCdevelopment framework. However, while open-source tools and projects within the digital-logicabstraction layer of the hardware design stackhave been gaining traction and trust in researchand industrial communities, design aspects thatare closer to underlying technologies such asanalog and mixed-signal (AMS) modules andthe manufacturing interfaces still need to identifyappropriate open-source development and inte-gration models.

In the analog domain, new generator-basedapproaches such as the Berkeley Analog Gen-erator (BAG) [21] are envisioning a portableand process-technology agnostic generator-basedapproach to AMS design. However, this approachrequires tight integration with digital SoC com-ponents. Future integration of analog generatorsand their simulation environments, such as BAG,into the Chipyard framework will help supportcommon interfaces and integration between ana-log and digital components through integratedtools and automation of design collateral such asappropriate generated behavioral models of ana-log blocks, as well as matching physical designconstraints.

In the digital domain, there is a large inter-dependence of standard commercial EDA toolstacks with tightly controlled NDA-only physicaldesign kits due to the complexities of sub-micronprocess technologies. This issue has been identi-fied in the past [22], and open-source hardwareprojects choose to address this challenge in var-ious ways: some publish “patches” to the com-mon flow scripts provided by EDA vendors [8],or partially associate hardware design templateimplementations with particular process technolo-gies [23], but are largely obscured elsewhere. Theapproach used in the Hammer VLSI flow withinChipyard is a “plug-in” model. These plug-insprovide a mapping between the Hammer vendor-agnostic level of abstraction, to the proprietaryvendor-specific APIs. Open-source physical de-sign initiatives such as the OpenROAD project[24] are making encouraging progress towardsaddressing this challenge.

ConclusionIn this article, we presented the Chipyard

framework which was developed to provide anintegrated design, simulation, and implementationenvironment to support the growing complexityand differentiation of custom SoCs. Through in-tegration with the Rocket Chip generator ecosys-tem, Chipyard provides a large number of easilycomposable and extensible open-source digitalIP blocks. The additional integration of multiplesimulation and implementation tools enables con-tinuous and simultaneous development for higher-quality verification, validation, and system inte-gration. Future integration with analog genera-tor frameworks will open the door to breakingthe digital-analog divide, and the enabling ofcomplete open-source custom SoC developmentsolutions.

AcknowledgmentsThe information, data, or work presented

herein was funded in part by the DefenseAdvanced Research Projects Agency (DARPA)through the Circuit Realization at FasterTimescales (CRAFT) Program under GrantHR0011-16-C0052, and by the AdvancedResearch Projects Agency-Energy (ARPA-E),U.S. Department of Energy, under AwardNumber DE-AR0000849. Research was partiallyfunded by ADEPT Lab industrial sponsors andaffiliates. We thank Intel, STMicroelectronics,and TSMC for donating prototypes fabrication.The views and opinions of authors expressedherein do not necessarily state or reflect thoseof the United States Government or any agencythereof.

REFERENCES1. J. L. Hennessy and D. A. Patterson, “A new golden

age for computer architecture,” Commun. ACM, vol. 62,

no. 2, pp. 48–60, Jan. 2019.

2. Y. Lee, A. Waterman, H. Cook et al., “An agile ap-

proach to building RISC-V microprocessors,” IEEE Mi-

cro, vol. 36, no. 2, pp. 8–20, Mar 2016.

3. K. Asanovic, R. Avizienis, J. Bachrach et al., “The

Rocket Chip generator,” EECS Department, University

of California, Berkeley, Tech. Rep. UCB/EECS-2016-17,

Apr 2016.

4. H. Cook, W. Terpstra, and Y. Lee, “Diplomatic design

July/August 2020 9

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 10: Chipyard: Integrated Design, Simulation, and

Department Head

patterns: A TileLink case study,” in 1st Workshop on

Computer Architecture Research with RISC-V, 2017.

5. C. Celio, D. A. Patterson, and K. Asanovic, “The

berkeley out-of-order machine (BOOM): An industry-

competitive, synthesizable, parameterized RISC-V pro-

cessor,” EECS Department, University of California,

Berkeley, Tech. Rep. UCB/EECS-2015-167, 2015.

6. F. Zaruba and L. Benini, “The cost of application-

class processing: Energy and performance analysis of a

linux-ready 1.7-ghz 64-bit RISC-V core in 22-nm FDSOI

technology,” IEEE Transactions on Very Large Scale

Integration (VLSI) Systems, vol. 27, no. 11, pp. 2629–

2640, Nov 2019.

7. Y. Lee, C. Schmidt, A. Ou et al., “The hwacha vector-

fetch architecture manual, version 3.8. 1,” EECS De-

partment, University of California, Berkeley, Tech. Rep.

UCB/EECS-2015-262, 2015.

8. J. Balkind, M. McKeown, Y. Fu et al., “OpenPiton: An

open source manycore research framework,” in Pro-

ceedings of the Twenty-First International Conference

on Architectural Support for Programming Languages

and Operating Systems, ser. ASPLOS ’16. New York,

NY, USA: ACM, 2016, pp. 217–232.

9. L. P. Carloni, “Invited - the case for embedded scalable

platforms,” in Proceedings of the 53rd Annual Design

Automation Conference, ser. DAC ’16. New York, NY,

USA: ACM, 2016, pp. 17:1–17:6.

10. P. Whatmough, M. Donato, G. Ko et al., “CHIPKIT: An

agile, reusable open-source framework for rapid test

chip development,” 2020.

11. H. Genc, A. Haj-Ali, V. Iyer et al., “Gemmini: An agile

systolic array generator enabling systematic evalua-

tions of deep-learning architectures,” arXiv:1911.09925

[cs.DC], 2019.

12. F. Farshchi, Q. Huang, and H. Yun, “Integrating NVIDIA

deep learning accelerator (NVDLA) with RISC-V SoC

on FireSim,” in Proccedings of The 2nd Workshop

on Energy Efficient Machine Learning and Cognitive

Computing for Embedded Applications, at HPCA 2019,

2019.

13. A. Izraelevitz, J. Koenig, P. Li et al., “Reusability is FIR-

RTL ground: Hardware construction languages, com-

piler frameworks, and transformations,” in Proceedings

of the 36th International Conference on Computer-

Aided Design, ser. ICCAD ’17. Piscataway, NJ, USA:

IEEE Press, 2017, pp. 209–216.

14. C. Wolf. (2018) Yosys open synthesis suite - write firrtl

- write design to a FIRRTL file. [Online]. Available:

http://www.clifford.at/yosys/cmd write firrtl.html

15. A. Magyar, D. Biancolin, J. Koenig et al., “Golden gate:

Bridging the resource-efficiency gap between ASICs

and FPGA prototypes,” in 2019 IEEE/ACM International

Conference on Computer-Aided Design (ICCAD), 2019,

pp. 1–8.

16. O. Kindgren, “Invited paper: A scalable approach to

ip management with FuseSoC,” in Workshop on Open

Source Design Automation, 2019.

17. D. Das Sarma and G. Venkataramanan, “Compute

and redundancy solution for teslas full self driving

computer,” in Hot Chips 31: Stanford Memorial

Auditorium, Stanford, California, August 18–20, 2019,

2019. [Online]. Available: https://www.hotchips.org/

hc31/HC31 2.3 Tesla Hotchips ppt Final 0817.pdf

18. S. Karandikar, H. Mao, D. Kim et al., “FireSim: FPGA-

accelerated cycle-exact scale-out system simulation in

the public cloud,” in 2018 ACM/IEEE 45th Annual Inter-

national Symposium on Computer Architecture (ISCA),

June 2018, pp. 29–42.

19. E. Wang, C. Schmidt, A. Izraelevitz et al., “A method-

ology for reusable physical design,” in Twenty First

International Symposium on Quality Electronic Design,

2020. Proceedings., March 2020.

20. L. T. Clark, V. Vashishtha, L. Shifren et al., “ASAP7: A

7-nm FinFET predictive process design kit,” Microelec-

tronics Journal, vol. 53, pp. 105–115, 2016.

21. E. Chang, J. Han, W. Bae et al., “BAG2: A process-

portable framework for generator-based AMS circuit

design,” in 2018 IEEE Custom Integrated Circuits Con-

ference (CICC), April 2018, pp. 1–8.

22. G. Gupta, T. Nowatzki, V. Gangadhar et al., “Kickstarting

semiconductor innovation with open source hardware,”

Computer, vol. 50, no. 6, pp. 50–59, 2017.

23. M. B. Taylor, “Basejump STL: Systemverilog needs

a standard template library for hardware design,” in

Proceedings of the 55th Annual Design Automation

Conference, ser. DAC ’18. New York, NY, USA: ACM,

2018, pp. 73:1–73:6.

24. T. Ajayi, V. A. Chhabria, M. Fogaca et al., “Toward

an open-source digital flow: First learnings from the

OpenROAD project,” in Proceedings of the 56th Annual

Design Automation Conference 2019, ser. DAC ’19.

New York, NY, USA: ACM, 2019, pp. 76:1–76:4.

Alon Amid is currently a PhD candidate in theElectrical Engineering and Computer Sciences De-partment, University of California, Berkeley. His cur-rent research focus includes parallel and distributedcomputing, energy-efficient processors and architec-tures, and hardware-software co-design. He has aB.Sc in electrical engineering from Technion – Is-

10 IEEE Micro

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 11: Chipyard: Integrated Design, Simulation, and

rael Institute of Technology and an M.S. from theUniversity of California, Berkeley. Contact him [email protected].

David Biancolin is currently a PhD candidate in theDepartment of Electrical Engineering and ComputerSciences, University of California, Berkeley. He hasa BASc in engineering science from the University ofToronto. Contact him at [email protected].

Abraham Gonzalez is currently a PhD candidatein the Electrical Engineering and Computer SciencesDepartment, University of California, Berkeley. Hisresearch interests are in warehouse-scale comput-ing, high-performance microarchitectures, and com-puter architecture tooling. He received a B.S. inElectrical and Computer Engineering from the Uni-versity of Texas at Austin in 2018. Contact him [email protected].

Daniel Grubb is currently a PhD student in theDepartment of Electrical Engineering and Com-puter Sciences, University of California, Berkeley.His current research focus includes navigation sys-tems for autonomous robots and agile physicaldesign methodologies. He has a BS in electri-cal engineering and computer sciences from theUniversity of California, Berkeley. Contact him [email protected].

Sagar Karandikar is currently a PhD student in theDepartment of Electrical Engineering and ComputerSciences, University of California, Berkeley. His re-search focuses on exploring hardware-software co-design in warehouse-scale machines. He has a BSand an MS in electrical engineering and computersciences from the University of California, Berkeley.He is a member of the the Association for Comput-ing Machinery (ACM) and the IEEE. Contact him [email protected].

Harrison Liew is currently a PhD student in theElectrical Engineering and Computer Sciences De-partment, University of California, Berkeley. His cur-rent research is advised by Borivoje Nikolic and isat the intersection of the BWRC and ADEPT labs,focusing on signal processing generators for multi-user massive MIMO base stations and its imple-mentation in integrated circuits using agile physi-cal design frameworks and methodologies. He re-ceived a BS and MS from Columbia University in2016, also in Electrical Engineering. Contact him [email protected].

Albert Magyar is currently a PhD candidatein the ADEPT Lab at the University of Cali-fornia Berkeley, advised by Krste Asanovic andJonathan Bachrach. His research interests includeincreasing productivity of RTL design and improvingthe usability of FPGA simulation. Contact him [email protected].

Howard Mao is currently a PhD candidate at theUniversity of California, Berkeley. He is a memberof the ADEPT lab and is interested in designingmicroarchitectures for datacenter systems. Contacthim at [email protected].

Albert Ou is currently a PhD student at the Uni-versity of California, Berkeley. He received his B.S.(2014) and M.S (2015) in Electrical Engineering andComputer Sciences from UC Berkeley. His researchfocuses on energy-efficient vector processor archi-tectures, VLSI implementation, and data-parallel pro-gramming models. He is a student member of theIEEE. Contact him at [email protected].

Nathan Pemberton is currently a PhD student in theDepartment of Electrical Engineering and ComputerSciences, University of California, Berkeley, study-ing computer architecture and operating systems forwarehouse-scale computers. He has a BS in com-puter engineering from the University of CaliforniaSanta Cruz, and an MS in computer science fromthe University of California, Berkeley. He is a memberof the Association for Computing Machinery (ACM).Contact him at [email protected].

Paul Rigge is currently pursuing a Ph.D. degree atthe University of California, Berkeley, CA, USA. Hereceived the B.S. degree in electrical engineering andin computer science from the University of Michigan,Ann Arbor, MI, USA in 2012. His current research in-terests are agile hardware methodologies for wirelesssystems. Contact him at [email protected].

Colin Schmidt is currently a PhD candidate at theUniversity of California, Berkeley, where he works onarchitecting, implementing, and building software forvector accelerators. He has a BS in electrical andcomputer engineering and computer science fromCornell University. He is a student member of theAssociation for Computing Machinery (ACM). Contacthim at [email protected].

John Wright is currently a Ph.D. candidate in theDepartment of Electrical Engineering and ComputerSciences at the University of California, Berkeley. He

July/August 2020 11

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 12: Chipyard: Integrated Design, Simulation, and

Department Head

received the B.S. degree (2011) from the Univer-sity of Kentucky and the M.S. degree (2017) fromthe University of California, Berkeley. He was previ-ously with Cypress Semiconductor. Contact him [email protected].

Jerry Zhao is currently a PhD student in the De-partment of Electrical Engineering and ComputerSciences at the University of California, Berkeley,studying the microarchitecture of high-performanceprocessors. He received a BS in Electrical Engineer-ing and Computer Science from the University ofCalifornia, Berkeley. He is a student member of theAssociation for Computing Machinery and the IEEE.Contact him at [email protected].

Yakun Sophia Shao is currently an assistant pro-fessor in the Department of Electrical Engineer-ing and Computer Sciences at the University ofCalifornia, Berkeley. She has a PhD in computerscience from Harvard University. Contact her [email protected].

Krste Asanovic is currently a professor in theDepartment of Electrical Engineering and ComputerSciences at the University of California, Berkeley. Hehas a PhD in computer science from the Universityof California, Berkeley. He is a Fellow of the IEEEand the Association for Computing Machinery (ACM).Contact him at [email protected].

Borivoje Nikolic is the National Semiconductor Dis-tinguished Professor of Engineering at the Universityof California, Berkeley. He has a PhD in electrical andcomputer engineering from the University of Califor-nia, Davis. He is a Fellow of the IEEE. Contact him [email protected].

12 IEEE Micro

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/MM.2020.2996616

Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].