fpga design methods for fast turnaround

14
White Paper FPGA Design Methods for Fast Turn Around “If only I could get my FPGA design done sooner” Angela Sutton Staff Product Marketing Manager, FPGA Implementation, Synopsys Today’s FPGAs are doubling in capacity every 2 years and have already surpassed the 5 million equivalent ASIC gate mark. With designs of this magnitude, the need for fast flows has never been greater. At the same time, designers are seeking rapid feedback on their ASIC or FPGA designs by implementing quick prototypes or initial designs on FPGA-based boards. These prototypes or designs allow designers to start development, verification and debug of the design—in the context of system software and hardware—and also to fine tune algorithms in the design architecture. Quick and intuitive debug iterations to incorporate fixes are of great value. The ability to perform design updates that don’t completely uproot all parts of the design that have already been verified is also a bonus! Whether the goal is aggressive performance or to get a working initial design or prototype on the board as quickly as possible, this paper provides information on traditional and new techniques that accelerate design and debug iterations. Introduction FPGAs have been used for years to create functioning prototypes of ASIC and System on a Chip (SoC) designs, and the popularity of this verification technique continues to increase. Indeed, well over 90% of designs are prototyped using FPGAs. Such verification typically involves squeezing the evolving design, eventually destined for an ASIC, into the largest, most capable FPGAs available on the market and then debugging the design along with system software and drivers on the board. Shorter iteration times for system debug…. results stability from one run to the next … Team and/ or parallel development flows …. Ways to quickly make small changes …. e holy grail In addition to prototyping, there is increasing demand for FPGAs in production systems. Growth in capacity, functionality, and performance, accompanied by a decrease in price per gate of FPGAs fuels this trend. Low-cost FPGA families such as Cyclone-IV from Altera and Spartan-6 from Xilinx offer million+ ASIC gate equivalent capacity and come equipped with embedded RAM, microprocessors, dedicated DSP blocks and Gigabit serial transceivers. Part prices in volume have become very attractive -- ranging from sub-$10 to the low hundreds of dollars. These production designs are typically both large and challenging, driving a demand for ASIC-style iterative flows and fast design debug cycles. The cost of a hardware design mistake or update may be cheaper in an FPGA than in an ASIC – simply repartition and reprogram the chip with the corrected design - but time is still money when it comes to completing a design project . March, 2010

Upload: jjnnn

Post on 07-Apr-2015

83 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: FPGA Design Methods for Fast Turnaround

White Paper

FPGA Design Methods for Fast Turn Around“If only I could get my FPGA design done sooner”

Angela Sutton

Staff Product

Marketing Manager,

FPGA

Implementation,

Synopsys

Today’s FPGAs are doubling in capacity every 2 years and have already surpassed the 5 million equivalent

ASIC gate mark. With designs of this magnitude, the need for fast flows has never been greater. At the

same time, designers are seeking rapid feedback on their ASIC or FPGA designs by implementing

quick prototypes or initial designs on FPGA-based boards. These prototypes or designs allow designers

to start development, verification and debug of the design—in the context of system software and

hardware—and also to fine tune algorithms in the design architecture. Quick and intuitive debug iterations

to incorporate fixes are of great value. The ability to perform design updates that don’t completely

uproot all parts of the design that have already been verified is also a bonus! Whether the goal is

aggressive performance or to get a working initial design or prototype on the board as quickly as possible,

this paper provides information on traditional and new techniques that accelerate design and debug

iterations.

IntroductionFPGAs have been used for years to create functioning prototypes of ASIC and System on a Chip (SoC)

designs, and the popularity of this verification technique continues to increase. Indeed, well over 90% of

designs are prototyped using FPGAs. Such verification typically involves squeezing the evolving design,

eventually destined for an ASIC, into the largest, most capable FPGAs available on the market and then

debugging the design along with system software and drivers on the board.

Shorter iteration times for system debug…. results stability from one run to the next … Team and/ or parallel development flows …. Ways to quickly make small changes …. The holy grail

In addition to prototyping, there is increasing demand for FPGAs in production systems. Growth in

capacity, functionality, and performance, accompanied by a decrease in price per gate of FPGAs fuels

this trend. Low-cost FPGA families such as Cyclone-IV from Altera and Spartan-6 from Xilinx offer

million+ ASIC gate equivalent capacity and come equipped with embedded RAM, microprocessors,

dedicated DSP blocks and Gigabit serial transceivers. Part prices in volume have become very attractive

-- ranging from sub-$10 to the low hundreds of dollars. These production designs are typically both large

and challenging, driving a demand for ASIC-style iterative flows and fast design debug cycles. The cost

of a hardware design mistake or update may be cheaper in an FPGA than in an ASIC – simply repartition

and reprogram the chip with the corrected design - but time is still money when it comes to completing a

design project .

March, 2010

Page 2: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 2

Different solutions for different design objectives? Designers using FPGAs for production may have aggressive QoR goals, whereas designers using FPGAs

for prototypes may not. This paper details approaches available to FPGA production users with tight timing

constraints (QoR focused designs) as well as an expanded set available to ASIC prototypers who have lower

QoR expectations and slack timing constraints to meet.

The bigger the better!Iteration times for RTL to FPGA implementation (bitfile) can take several days, depending on how aggressive

your design performance needs are and the computer platform chosen to run the tools.

Figure 1: FPGA sizes are starting to double with each new generation of silicon meaning a design iteration can now take days. The example shown is for Xilinx FPGA families

The shorter the design iteration time the better; the more stable the results from one run to the next the better, since this simplifies the re-verification and debug process.

Good news! Design schedules can be shortened.

In this paper we look at a variety of techniques to bring design schedules under control for large FPGAs.

Speeding Up Your Design ProjectThe menu of items available to accelerate your design project will depend upon whether (1) your priority is

Quality of Results (performance QoR), (2) your goal is to create an ASIC prototype, or (3) you have the means to

perform parallel development. Table 1 shows a variety of traditional and new techniques to consider for these 3

scenarios – including “fast synthesis”,and “continue synthesis upon error” (a.k.a. complete what you can).

Virtex-IIPro 100K LUT

2x the capacity of the prior generation family

1M 2M 5MEquivalent ASIC gates

RTL to bitfilein hours

RTL to bitfilein minutes

RTL to bitfilein days

Virtex-5

Stratix 4 (EP4SE820)

Virtex-6 (XC6VLX760)

Virtex-4

Virtex-IIPro

760K LUT

820K LUT

330K LUT

200K LUT

Page 3: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 3

Priority Best QoR

(tight timing constraints)

Quick ASIC Prototype with loose constraints

Pipe clean RTL/Constraints

Fast incremental design updates and design tuning

Team Design

Primary goal Faster implementation without sacrificing QoR

Fastest Time to the Board. Ease of use

Results stability and Parallel Development

Multiprocessing Server Farm

”hunt for the best performance in parallel”

Floorplanning “partition the design”

Incremental Static Timing Analysis

Fast Synthesis-only iterations with tight

constraints

During initial design tuning

During initial design tuning

Continue Synthesis Upon Error

Use Tools to Find, locate, fix Errors

Fast Synthesis, then P&R with loose constraints

Incremental P&R (e.g. Guided Flow)

Fast P&R Modes If P&R output is legal

Block Based (Partition) flows For parallel development—May not save total runtime

ASIC Design Import

Table 1: Traditional and new techniques to help speed things up!

Some Traditional ApproachesFirst, let’s first take a look at the some of the tried and trusted approaches used today, including server farms,

block based flows, and floorplanning.

When using your server farm, you might apply slight variations in constraints/RTL or settings such as: Xilinx-

route; vary Place and Route effort level or seeds; and run several variations of design settings, in parallel, on

multiple machines. Then you would compare, contrast and choose the best result or just learn from the results

that you see. This can be somewhat of a trial-and-error process that helps you hunt for the best performance

and area results but may not always yield the results that you need. See Table 2

QoR ; Quick Prototype; Incremental Updates/Team Design

Server Farm Hunt for the best way to attain design goals by applying different constraints and RTL scenarios in parallel on adjacent machines and then comparing the outcome

Useful when… …design runs cleanly though the flow and you want to experiment or hunt for ways to meet performance/area criteria

Advantage Can deliver good QoR if used in top-down flow. Easy to automate via scripts.

Disadvantage A trial and error approach for discovering the best performance/area but does not necessarily help to improve results Requires access to a lot of machine hardware. More versions of the design to manage.

Table 2: Server Farms deliver more horsepower to let you try different scenarios in parallel

Page 4: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 4

Many users are familiar with block based flows that allow you to partition the design into blocks upfront

and then refine the blocks in parallel. These blocks are sometimes used in conjunction with floorplanning

where space on the chip is allocated upfront to each partition. Examples of block based flows in use today

include the Altera LogicLock flow and the new Xilinx design preservation flow. The Synopsys - Xilinx design

preservation flow involves defining blocks known as compile points upfront at the RTL level in the Synopsys

Synplify Pro or Synplify Premier Synthesis tool. Synthesis and place and route is then run. In subsequent runs,

unchanged partitions of the design are preserved all the way from RTL through netlist, placement and routing.

During Synthesis runs, only the compile points (partitions) that change are re-synthesized, and then only

these are re-placed and routed. A complete script and GUI based integration exists. Partition information is

shared between Synplify Pro/Premier and the Xilinx ISE Design Suite that includes Xilinx place and route and

tools, Synplify Pro and Synplify Premier optionally integrate with ISE PlanAhead floorplanning flow, generating

EDIF to populate each pre-floorplanned block. See table 3 for a list of advantages and disadvantages of

floorplanning and block based flows.

Synplify Premier itself includes a Design Planner feature which can optionally be used to partition RTL,

working in conjunction with physical synthesis or logic synthesis runs.

Incremental Updates/Team Design

Floorplanning and Block based flows

Divide and Conquer! Partition your design upfront into blocks that can be worked on individually. You may update each block separately. If floorplanning, allocate partitions to defined physical regions on the chip. It is wise to place logic likely to change in partitions separate from logic that is unlikely to change

Useful when… …you want to preserve working blocks that have already been verified. …you have a design team (each member can work on a given set of blocks in parallel). …you use Floorplanning to meet performance targets or lock down performance.

Advantage Improves results stability—Saves verification time since you don’t have to re-verify parts of the design that already work.

Allows team members to work in parallel

Disadvantage May cost QoR by imposing block boundaries and physical placement limitations that prevent the tools from performing the optimizations required to meet timing.

May not be useful if your ASIC design contains gated clocks for some synthesis tools May cost you runtime because the afore-mentioned boundary and physical constraints limit

your optimization possibilities, making the tool work harder to meet performance/area goals Constraints setup can be time consuming—you may need to do time budgeting. If

floorplanning, you will have to allocate resources between partitions to ensure resources are available for clock, DSP element and memory resources. Floorplanning and partitioning is complex—you may need to keep clock domains within a single partition, keep constants in the same partition, reduce cross-partition latency and minimize boundary I/O’s, and you will need to set physical constraints correctly and optimally. If floorplanning, you will have the additional baggage of many physical constraints to manage.

Table 3: Floorplanning /block based flows preserve design results but can cost QoR

Speeding Up Place and Route (P&R) P&R typically consumes well over half of the overall design iteration time so it’s vital to speed this design step

up too. FPGA vendors provide “backend” place and route tools customized to their FPGA families and have

sought to improve turnaround times using techniques such as “multiprocessing”, “incremental P&R” and

“fast P&R”.

For multi-million gate designs, however, place and route can take a whole work day to complete. This is

problematic if all you want is a quick iteration to test a small design change or when you just want a quick

initial implementation of your prototype on the board (see Figure 2). To chip away at the P&R runtime, some

place and route tools may be run in fast or lower effort modes (Table 4), sacrificing some QoR. For example,

Altera users may use fast P&R modes in the Quartus backend tools that run the fitter extremely fast and Xilinx

users may choose to apply lower effort levels to shorten P&R runtime.

Page 5: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 5

Figure 2: An example of a quick initial implementation or ASIC prototype flow

Quick Prototype Incremental Updates/Team Design

FPGA vendor Fast P&R Modes

Synplify Premier (or the P&R tool) minimize the efforts to produce good QoR in the interest of saving time during routing

Useful when… … tuning the design or when you have non-aggressive timing/area goals

Advantage Saves overall runtime—RTL to the Board

Disadvantage May sacrifice QoR. In some FPGA vendor fast P&R modes, you may get an illegal netlist (you will be fully aware that this is an issue)

Table 4: Fast (or low effort) Placement and Routing modes speed up the P&R design step

but may cost some QoR

Additionally, users of Synplify Premier and Xilinx ISE P&R tools can perform fast incremental P&R (see Table

5) using the Xilinx “Guided Flow”. This flow which emphasizes results stability is useful when you make minor

changes to the design that are not on the critical path. How does it work? The Xilinx ISE P&R tools determine

“what’s changed” by doing a netlist comparison between 2nd and prior run. The key to the success of this flow

is the ability for the Synplify Premier synthesis tool to synthesize reproducible and deterministic netlists and

instance names from one run to the next, for every iteration. In 2007, Synplify Premier introduced “path group”

technology that localizes changes in a synthesized netlist to only those parts of the design where the RTL or

constraints actually changed. Similar RTL and constraints produce similar results—a reproducible netlist in

other words.

Route, Generate bitfile

ISE

Synplify Premier

Global and detailedplacement

Continue synthesis(on error with CP flow)

Define CP’s (optional)

Fast synthesis

RTL, constraints

Netlistwithblackbox

Netlist

DesignWarelibrary

New!

Tune/fix RTL, constraints.

Analyze

Page 6: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 6

Incremental Updates/Team Design

Incremental P&R P&R is incremental in 2nd and subsequent iterations of your design flow—and endeavors to only replace those portions of the design where the netlist changed so long as timing can still be met

Useful when… Design changes are small and do not reside on a timing critical path. In other words, it must still be possible to replace and re-route the changed part of the design without ripping up other unchanged parts of the design

Advantage Can save up to 50% in P&R runtime Easy to use since it requires no change to your frontend (synthesis) methodology, nor to your backend (P&R) methodology other than to designate that you want to run the guided flow in the backendUp to 10 iterations can typically be done before there’s a need to re-run the entire P&R from scratch

Disadvantage Use is limited—If your RTL/Constraints design change is on the critical path, chances are you won’t save much P&R runtime because P&R will have to rip up large portions of the designs. If your chip is very full, “highly utilized”, it will be hard to integrate the change without ripping up other portions of the design.

Like Synplify Premier Synthesis, Xilinx and Altera P&R tools have multiprocessing capabilities to reduce

runtime at the cost of some QoR (see Table 6).

Quick Prototype; Incremental Updates/Team Design

Multiprocessing during P&R

P&R tool runs the design on 2 or more processors in parallel

Useful when… When you are tuning your design or have non-aggressive timing/area goals

Advantage Saves P&R and timing analysis runtime by 10% or more

Disadvantage May reduce QoR

Table 6: Multiprocessing during P&R help reduce runtime, but at the cost of QoR

Faster SynthesisWe’ve discussed fast Place and Route (netlist to bitfile)—Now let’s look at ways to speed up design synthesis

(RTL to netlist). A faster synthesis iteration that incorporates and gives you feedback on an RTL or constraint

change in 1 hour instead of 3 hours is very valuable. Synthesis time can indeed be cut using Synplify Premier’s

new FAST synthesis mode (see Figure 3)—which improves runtimes by 2x to 3x for a small reduction in

overall Quality of Results (area and fmax).

Table 5: Incremental P&R is useful for minor changes not on the critical path

Page 7: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 7

62% Typical synthesisruntime savings

Synplify Premier

Tight timing constraints

Synthesize

FPGA P&R

Virtex-5, C2009.03Out of box geomean results

FAST mode ON vs. OFFSame tight timing constraints

(P&R not run in this flow)

FAST mode: fastest synthesis results

RTL, constraints

Netlist

TuneRTLconstraints

Analyze

FPGA implementers seeking fast initial results

Figure 3: Fastest synthesis results flow (Iterate through synthesis only

to tune your RTL/constraints)

When using fast synthesis mode, consider whether your intent is to tune your RTL constraints in which case

you would use this capability for synthesis-only iterations with your normal tight constraints…. Or whether

your intent is a fast iteration RTL bitfile, in which case it is recommended that you use loose timing

constraints (lower QoR).

If your intent is fast synthesis-only iterations…..use normal constraints with Synthesis FAST mode (see Table 7)

QoR ; Incremental Updates/Team Design

FAST SYNTHESIS for Fastest Synthesis Results and synthesis-only iterations

Run synthesis only (not P&R afterwards) with your normal planned timing constraints. Synthesis minimizes efforts to produce good results in the interest of saving time

Useful when… …when you are pipe-cleaning your flow or RTL and just intend to run synthesis— …when you want to know whether your RTL will synthesize…when you want to know the approximate results you can get out of the box

Advantage Saves up to 50% synthesis runtime, allowing you to get rapid feedback so you can fix your RTL and constraints

Disadvantage Sacrifices QoRResults are useful for netlist results analysis, not for use in subsequent Place and Route

Table 7: Fast Synthesis improves synthesis runtimes by 2x to 3x

Since the Fast Synthesis flow does sacrifice some QoR, it is specifically recommended that you NOT run P&R

on the synthesized netlist; that netlist does reflect sub-optimal area and timing results after all. If you ran P&R

on the synthesis netlist, runtime benefit may be lost in an increased P&R runtime because P&R would have to

work harder to make up for the QoR lost during synthesis. If your intent is faster iterations, RTL to bitfile,….use

loose constraint with Synthesis FAST mode (see Figure 4).

Page 8: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 8

The very same Synplify Premier fast synthesis mode can be used with P&R and loose timing constraints for

lower performance designs such as FPGA prototypes (see Table 8).

Synplify Premier

Easy to meettiming constraints

Synthesize

Virtex-5, C2009.03.ISE 10.1sp3

Out of box geomean resultsFAST mode ON vs. OFF1 MHz global clk timing

constraintsfor synthesis and P&R

RTL, constraints

Netlist

Bitfile

Debugdesign on theboard

FAST mode: fastest board implementation

ASIC prototypes with stack timing constraints

24% Typical runtime saving(RTL to bitfile)

44% Typical runtime saving(RTL to netlist)

FPGA P&R

Figure 4: Fast Synthesis to produce fastest implementation on the board

(iterate through synthesis and P&R) with loose timing constraints

Quick Prototype; Incremental Updates/Team Design

FAST Synthesis for Fastest Implementation on the Board

Run synthesis and then P&R with loose timing constraints. Synthesis tool tries less hard to produce good results in the interest of saving time and saves synthesis runtime. P&R is run to generate the bitfile to program the FPGA on the board

Useful when… You want to debug a system on the board and are not going to run at high speed. You want to implement prototype designs more quickly ready for debug on the board and incorporate RTL and constraint design changes

Advantage Reduces (RTL bitfile ) runtime to about ¾ of what it would have been

Disadvantage Useful only if you have low QoR expectations

Table 8: Fast synthesis reduces runtime but sacrifices QoR

In the Synplify Premier tool, you can use your machine’s multiprocessing capability to synthesize designated

design blocks in parallel on separate processors, speeding your runtimes … up to 30% (see table 9). You can

specify the maximum number of processors to be used.

Page 9: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 9

Quick Prototype; Incremental Updates/Team Design

Multiprocessing during Synthesis

Synthesis tool runs the design using 2 or more processors in parallel

Useful when… When you are first tuning your design or have non-aggressive timing/area goals

Advantage Saves synthesis runtime by up to 30% allowing you to get rapid feedback and tune your RTL and constraints

Disadvantage Generally reduces QoR. May frequently be used with block based flows which can further limit QoR

Traditionally, if a small number of errors are encountered during synthesis the synthesis tool will promptly

abort the run. This can result in huge design delays if there are a lot of errors because each error will have to

be detected and fixed piecemeal. Suppose that your design synthesis run encounters 100 errors, of just 5

different types…. and that your flow aborts and errors out whenever a cumulative total of 3 errors have been

encountered. You fix the first 3 errors and then re-start your synthesis run - then the next 3 errors surface; they

are similar to the previous 3. You fix them and you have to start synthesis again. Wouldn’t it be better to know

about all 100 errors after 1 synthesis run rather than having to flush the errors 3 at a time? This is possible

with the Synplify Premier Synthesis product thanks to a new “continue synthesis on error” feature (see

Table 10). When possible, the synthesis tool will black box the erroneous portion of the design and continue

to synthesize the remainder of the design. Under the hood, Synplify Premier is automatically partitioning the

design for parallel synthesis. Good, error-free partitions complete while those with errors are black boxed.

QoR ; Quick Prototype; Incremental Updates/Team Design

Continue Synthesis upon Error

During Synthesis, complete what you can in the presence of coding errors, and then fix your project files in aggregate

Useful when… You are pipe cleaning your design project files—and your files still have hundreds of errors in them. You would rather save time by finding all the errors in one synthesis run and fix them in aggregate, than find and fix each error, one at a time

Advantage Saves time—Fix all the errors in one go rather than run synthesis—find an error—fix the error—rerun synthesis from the beginning—find the next error—rerun synthesis from the beginning

Table 10: Synplify Premier Synthesis finds all errors in a design during a single synthesis run

When errors do occur in your project files, figuring out how to fix them can also be time-consuming. Synplify

Premier hyperlinks your error/warning report to useful documentation that helps you to identify a fix.

You can filter these errors/warnings by type so that you can work only on those errors or warnings that are of

interest (see Table 11).

QoR ; Quick Prototype; Incremental Updates/Team Design

Locate and fix errors quickly during synthesis

During Synthesis, errors and warning are placed in a report that is automatically hyperlinked to documentation to help you understand what to do to fix the problem. You may also use FIND or FIND-IN-FILE features to locate and fix the problems in your source code

Useful when… You are initially running a design or making changes to the design and incur design errors. FIND is useful when you have a very large design and need to locate those parts that you wish to improve or change quickly

Advantage Easier and faster to identify and fix issues within the design… Quick references from error/warning to the documentation helps you determine the cause of the problem

Table 11: An error and warning report is generated allowing you to quickly identify and fix errors in

the design in aggregate

Table 9: Multiprocessing during synthesis allows rapid feedback

Page 10: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 10

Synplify Premier also includes TCL/FIND features that allow you to locate instances in the code, for

example those with negative slack, and improve and debug that part of the design. There is also a “find in

file” feature that allows you to search for strings quickly across specified projects or file types, allowing you to

locate the RTL source that you need to fix.

QoR ; Quick Prototype; Incremental Updates/Team Design

Incremental Static Timing Analysis during synthesis

Update exception constraints such as multi-cycle paths or false paths and see results reflected in revised timing reports without re-running synthesis. You can also generate a new incremental netlist/ constraints file to forward annotate to P&R without re-running synthesis.

Useful when… …it becomes apparent after synthesis that you did not completely specify multi-cycle and false paths

Advantage Saves a synthesis iteration—You can continue to run P&R using the updated constraints

Disadvantage You may get better quality of results (better area and possibly timing) by rerunning synthesis if the exception occurred along what synthesis believed to be a critical path. For example, synthesis may have compromised area by previously optimizing a path that it thought was critical and that was in fact a false or multi-cycle path

Dealing with the “Moving Target” or “Pieces Missing” Design“My design source files are a moving target …I need fast respins!!”

“Part of the Source code is not available!!”

Getting the design through your FPGA flow can be a challenge, especially when various pieces of the design

are still evolving. If you are creating ASIC prototypes, the team generating the ASIC source RTL could well be

changing that source underneath you every week. Your challenge then would be to respin the ASIC prototype

and provide feedback to the ASIC team on the source files faster than they are changing the source!! And,

pieces of the design may be unavailable or incomplete. Chances are that you will need:

` Fast turnaround time using some of the flows previously described in this paper

` Reproducible and stable results from one run to the next, e.g. if the ASIC team makes a small RTL change

to a file, it will only trigger a small change in the resulting FPGA netlist. As previously described, Synplify

Premier applies “path group” technology to localize small changes in the RTL to small changes in the

resulting netlist

` The ability to synthesize in the absence of some source files—Modules of your design may be

incomplete or unavailable since the source files are still being worked but you may want to get a head start

and synthesize the modules that are already complete. You can do this using a compile-point block based

flow by designating the part of the design that is incomplete as a black box

` The flexibility to swap out changed files and manage hundreds of design files. Some ways to do this are

described below

` If prototyping, fast ASIC design import—an FPGA flow that accepts your (ASIC) files easily without manual

modification. Considerations and solutions are outlined in the section on the next page

Table 12: Incremental Static Timing Analysis allows you to change exception constraints and see the results reflected immediately in the timing report, without the need to run synthesis

Page 11: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 11

Managing, visualizing and swapping in and out hundreds of design files When there are hundreds of files to manage, Synplify Premier allows you to perform hierarchical design

project management. You can organize design files into subdirectories. This makes it easier to swap out

changed files and manage hundreds of design files (see Figure 5).

Figure 5: The ability to organize design source files hierarchically is important

Updating select portions of the design (e.g. internal IP or DesignWare IP)Synplify Premier also allows you to specify, package and integrate IP in the industry standard IP-XACT

format—You can assemble/connect the IP at the system level using its SystemDesigner system-level assembly

feature. DesignWare Cores can be directly imported. Simply configure your core in the DesignWare coreTools

and generate project source files ready for import into Synplify Premier and the VCS simulator (see Figure 6).

Figure 6: Synopsys coreConsultant configures DesignWare cores and creates a Synplify Premier or Synplify Pro-ready project file (scripts and source file)

Page 12: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 12

Faster ASIC Design Import—An FPGA Flow that Accepts Your ASIC Design FilesA key question to ask yourself when using an FPGA-based prototype is: Will your ASIC source files

work in an FPGA flow? RTL architected and tuned for your ASICs may not be automatically accepted or

comprehended by an FPGA synthesis tool. The table below lists the most frequent issues encountered

and how the FPGA synthesis tool that reads these ASIC designs must address the issue. In addition, your

FPGA synthesis tool will generate a design using FPGA building block primitives such as LUTs, registers,

DSP elements and dedicated memories. These building blocks as well as the FPGA’s clocking schemes

and resources, are fundamentally very different from what you see in an ASIC fabric, so you need an FPGA

synthesis tool that understands how to use, deal with resource restrictions and design rules, and apply the

FPGA resources to serve the same function as an ASIC

ASIC design contains The FPGA compatibility issue is The FPGA Synthesis tool must …

Gated Clocks ( Used in ASICs to reduce ASIC power consumption)

FPGAs have no true equivalent of a gated clock. Also, when gated clocks are used in a partitioned block based flow, clock management, allocation and correct implementation across block boundaries is a challenge so many FPGA tools don’t support gated clocks in block based flows.

Convert gated clocks to the logical equivalent without changing the intended functionality; Automatically Convert gated clock to the FPGA equivalent (a register with a clock enable).Support gated clocks even when a clock exists in multiple blocks of a partitioned chip

DesignWare IP Must understand the meaning of and implement any DesignWare Building Block instantiation or functions when it encounters one in the RTL. Read any configuration of digital IP core, even if it is encrypted

Accept RTL that includes instantiations of DesignWare IP building blocks (and Synthesize them with reasonable performance results)Accept designs generated and configured by Synopsys coreTools

Your own or 3rd party IP May be encrypted in a way that the synthesis tool cannot read. May have been highly optimized for ASIC and thus lack performance in an FPGA

Preserve boundaries for the IP if requested. Time through IP. In some cases, internally unencrypt but protect the IP

Embedded memory functions FPGA tool may not recognize something in your RTL as a memory. The user has to write specific memory models that work for FPGAs.Some FPGA-vendor specific memory cores are encrypted, making it impossible to simulate the synthesized design (because the memories are black boxed)

Implement behavioral synthesis capabilities (e.g. Synplify Premier’s SynCore) that generate RTL for memory in a way that the FPGA tool recognizes and can implement optimally.

Extensive Language support (VHDL, Verilog, SystemVerilog, VHDL 2008)

Language support may lag the ASIC tool’s support in particular with respect to SystemVerilog support.

Ensure compatibility with the most commonly used Synthesizable ASIC RTL

Page 13: FPGA Design Methods for Fast Turnaround

FPGA Design Methods for Fast Turn Around 13

Putting it all togetherWhen Quality of results (QoR) is your priority, you can deploy high horsepower techniques such as

multiprocessing and only use “FAST synthesis” and “continue synthesis upon error” techniques during initial

design tuning to reduce synthesis iteration times. Multiprocessing may be applied during initial design tuning.

Your final run will use the slower normal synthesis to achieve the best QoR—see Figure 7

Configure

Server farm

Multiprocessing

Re-run synthesis

Fast synthesis(normal constraints)

Re-run synthesis

Normal synthesis

Error report

Links to docs

Fix black boxederroneous modules

Fix RTL/constraints(TCL/find etc)

Debug RTL/netlist/constraints Debug

Debug RTL/ constraints on the board

Import

Read design

Run synthesis

Constraints check

Fast synthesis (normal constraints)

Continue synthesis on error

Run P&R

Normal P&R

Re-run P&R

Normal P&R

Figure 7: Fast QoR flow

When Turnaround time RTL to bitfile implementation on the board is the priority, you may continue to

use fast synthesis and may use either fast or incremental or normal Place and Route modes for subsequent

iterations. Block based flows and multiprocessing are useful tools in your tool chest and you can continue to

use fast synthesis for subsequent iterations. See Figure 8.

Page 14: FPGA Design Methods for Fast Turnaround

Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043 www.synopsys.com

©2010 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is available at http://www.synopsys.com/copyright.html. All other names mentioned herein are trademarks or registered trademarks of their respective owners.

03/10.MH.10-18253.

Error report

Links to docs

Fix black boxederroneous modules

Fix RTL/constraints(TCL/find etc)

Debug RTL/netlist/constraintsImport

ASIC design project import

Designware cores configs and Designware

building blocks

Debug

Debug RTL/ constraints on the board

Configure

Parallel development/Partition design

floorplan

Server farm

Multiprocessing

Instrument RTL

Re-run synthesis

Fast synthesis

or incremental static timing

analysis

Re-run synthesis

Fast synthesis

Run P&R

Fast P&R or normal P&R

Re-run P&R

Incremental or normal P&R

Run synthesis

Constraints check

Fast synthesis (loose constraints)

Continue synthesis on error

Figure 8: Example flow—Quick ASIC Prototype: Priority = is fast board implementation and fast respins

These techniques for QoR and Quick Prototype design were summarized in Table 1.

Conclusion Users of large FPGAs can get their products out the door much faster when design turnaround time is

reduced by using some or all of the methods described in this paper. Additionally, it is very valuable to have

results stability from one design run to the next when incorporating changes and to have the ability to quickly

integrate these changes and see the results. As FPGAs get larger the engineering teams developing them are

also growing requiring new parallel design methodologies be adopted.

At the same time, users generally don’t desire disruptive changes to the design methodology and, when

prototyping, hope that the methodology will not require significant changes to the ASIC project files for

them to be accepted by the FPGA flow. Synplify Premier delivers a menu of technologies including “fast

synthesis”,and “continue upon synthesis error” technology, block based flows, incremental flows, and ASIC

compatibility for prototypers. These capabilities ensure that large designs can be delivered on schedule.