[lectura fpga architecture] architecture of field-programmable gate arrays - rose

7/28/2019 [Lectura FPGA Architecture] Architecture of Field-programmable Gate Arrays - Rose

1/17

Architecture of Field-ProgrammableGate ArraysJONATHAN ROSE, MEMBER, IEEE, ABBAS EL GAMAL, SENIOR MEMBER, IEEE, ANDALBERT0 SANGIOVANNI-VINCENTELLI,FELLOW, IEEEInvited P aper

A survey of F ield-Programmable Gate Arr ay (F PGA) architec-tures and theprogramming technologies used to customize them ispresented. P rogramming technologies are compared on the basis oftheir vola i li ty, size, parasitic capacitance, resistance, and processtechnology complexity. FPGA architectures are divided into twoconstituents: logic block architectures and routing architectures.A classijcation of logic blocks based on their granularity isproposed and several logic blocks used i n commercially availableFPGA s are described. A brief review of recent results on the effecto logic block granularity on logic density and pe$ormance o anFPGA is then presented. Several commercial routing architecturesare described in the contest of a general routing architecturemodel. Final ly, recent results on the tradeoff between the fl eibi li tyof an FP GA routing architecture its routabil ity and density arereviewed.I. INTRODUCTION

The architecture of a field-programmable gate array(FPGA ), as il lustrated in Fig. 1, is similar to that ofa mask-programmable gate array (MPGA), consisting ofan array of logic blocks that can be programmablyinterconnected to realize different designs. The majordifference between FPGA s and MPGA s is that an MPGAis programmed using integrated circuit fabrication to formmetal interconnections, while an FPGA is programmedvia electrically programmable switches much the same astraditional programmable logic devices (PL Ds). FPGA scan achieve much higher levels of integration than PL Ds,however, due to their more complex routing architecturesand logic implementations. PLD routing architectures arevery simple but highly ineffiecient crossbar-like structuresin which every output is directly connectable to every

Manuscript received October 1, 1992. The work by The second authorwas partially supported under contract J-FB I-89-101.J. Rose i s with the Department of Electrical Engineering, University ofToronto, 10 K ings CollegeRoad, Toronto, Ontario M 5S IA4, Canada.A . El Gama1 is with the Depratment of Electrical Engineering, StanfordUniversity, Stanford, CA 94305.A . Sangiovanni-V incentelli i s with the Department of Electrical Engi-neering and Computer Science, University of California, Berkeley, CA94720.IEEE Log Number 9210745.

Fig. 1. FPGA architecture

input through one switch. FPGA routing architecturesprovide a more efficient MPGA-like routing where eachconnection typically passes through several switches. In aPDL, logic is implemented using predominantly two-levelAND-OR logic with wide input AND gates. In an FPGAlogic is implemented using multiple levels of lower faningates, which is often much more compact than two-levelimplementations.An FPGA logic block can be as simple as a transistor oras complex as a microprocessor. It is typically capapble ofimplementing many different combinational and sequentiallogic functions. Current commercial FPGA s employ logicblocks that are based on one or more of the following:Transistor pairs.Basic small gates such as two-input N ANDs orexclusive-OR s.Multiplexers.

0018-9219/93$03.00 0 1993 IEEEPROCEEDINGS OF THE IEEE. VOL 81. NO 7. UL Y 1993

I 1- -1013


2/17

Look-up tables (LUT s).Wide-fanin AND-OR structures.The routing architecture of an FPGA could be as simpleas a nearest neighbor mesh [9]or as complex as the perfectshuffle used in multiprocessors [42]. More typically, anFPGA routing architecture incorporates wire segments ofvarying lengths which can be interconnected via electricallyprogrammable swi tches. The choice of the number of wiresegments incorporated affects the density achieved by an

FPGA. If an inadequate number of segments is used, only asmall fraction of the logic blocks can be utilized, resultingin poor FPGA density; conversely the use of an excessnumber of segments that go unused also wastes area.The distribution of the lengths of the wire segmentsalso greatly affects the density and performance achievedby an FPGA. For example, if all segments are chosen tobe long, implementing local interconnections becomes toocostly in area and delay. On the other hand if all segmentsare short, long interconnections are implemented using toomany switches in series, resulting in unacceptably largedelays.Several different programming technologies are used toimplement the programmable switches. There are threetypes of such programmable switch technologies currentlyin use. These are:SRAM , where the switch is a pass transistor controlledby the state of a SRA M bit,Antifuse, whci, when electrically programmed, formsa low resistance path, andEPR OM , where the switch is a floating-gate transistorthat can be turned off by injecting charge onto theirfloating gate.

In all cases, a programmable switch occupies largerarea and exhibits much higher parasitic resistance andcapacitance than a typical contact or via used in thecustomization of an M PGA . Additional area is also requiredfor programming ci rcuitry. A s a result the density andperformance achievable by todays FPGAs are an orderof magnitude lower than that for M PGAs manufactured inthe same technology.The adverse effects of the large size and relatively highparasitics of programmable switches can be reduced bycareful architectural choices. By choosing the appropriategranularity and functionality of the logic block, and bydesigning the routing architecture to achieve a high degreeof routabil ity while minimizing the number of switches,both density and performance can be optimized. The bestarchitectural choices, however, are highly dependent on theprogramming technology used as well as on the type ofdesigns implemented, so that no one architecture is likelyto be best suited for all programming technologies and forall designs.

The complexity of FPGA s has surpassed the point wheremanual design is either desirable or feasible. Consequently,the utility of an FPGA architecture is highly dependenton effective automated logic and layout synthesis tools tosupport it. A complex logic block may be underutilized

without an effective logic synthesis tool, and the overallutilization of an FPGA may be low without an effectiveplacement and routing tool.Commercial P G A s differ in the type of programmingtechnology used, in the architecture of the logic blockand in the structure of their routing architecture. In thispaper we survey the architectures of commercially availableFPGA s and discuss the dependence of FPGA densityand performance on these factors. T he paper is organizedas fol lows: Section I1 describes the most widely usedprogramming technologies. Section 111presents a surveyof commercial FPGA logic block architectures, classif iedby their granularity. This includes a summary of recentresearch results concerning the effect of granularity on over-all FPGA density and performance. Section IV describesseveral commercial routing architectures in the context of ageneral routing architecture model, and summarizes recentresearch results in this area. Section V concludes with adiscussion of potential future architectural directions forFPGAs.11. PROGRAMMINGECHNOLOGIES

An FPGA is programmed using electrically pro-grammable switches. The properties of these programmableswitches, such as size, on-resistance, and capacitance,dictate many of the tradeoffs in FPGA architecture. In thissection we describe the most commonly used programmableswitch technologies and at the end will contrast eachtechnology with respect to volatility, re-programmability,size, series on-resistance, parasitic capacitance, and processtechnology complexity.A. SRAMProgramming Technology

The SRA M programming technology uses Static RAMcells to control pass gates or multiplexers as illustrated inFig. 2. It is used in the devices from Xi linx [23], Plessey[33] Algotronix, [2], Concurrent Logic [13] and ToshibaWl.When a one is stored in the SRA M cell in F ig. 2(a),the pass gate acts as a closed switch, and can be used tomake a connection between two wire segments. When azero is stored, the switch is open and the transistor presentsa high resistance between the two wire segments. For themultiplexer, the state of the SRA M cells connected to theselect lines controls which one of the multiplexer inputs areconnected to the output, as shown in Fig. 2(b).Since SRAM is volatile, the FPGA must be loaded andconfigured at the time of chip power-up. T his requiresexternal permanent memory to provide the programmingbits such as PROM , EPROM , EEPROM or magnetic disk.A major disadvantage of SRAM programming technol-ogy is its large area. It takes at least five transistors toimplement an SRA M cell, plus at least one transistorto serve as a programmable switch. However, SRA Mprogramming technology has two major advantages; fastre-programmabil ity and that it requires only standard inte-grated circuitprocess technology.

1014 PROCEEDINGS OF THE IEEE, V O L . 81, NO. 7, JULY 1993


3/17

I h PassGate

t(b)

Fig. 2. Static RA M programming technology.word

B. Antifuse Programming TechnologyAn antifuse is a two terminal device with an un-programmed state presenting a very high resistance betweenits terminals. When a high voltage (from 11 to 20 volts,depending on the type of antifuse) is applied across itsterminals the antifuse will blow and create a low-resistance link. This l ink is permanent. A ntifuses in usetoday are built either using an Oxygen-Nitrogen-Oxygen(ONO) dielectric between N+ diffusion and poly-silicon

[191, [151, [11 or amorphous sili con between metal layers[6] or between polysil icon and the first layer of metal [31].Programming an antifuse requires extra circuitry to de-liver the high programming voltage and a relatively highcurrent of 5 mA or more. This is done in [15] throughfairly sizable pass transistors to provide addressing to eachantifuse. A n associated paper in this issue discusses theprogramming of antifuse structures in more detail [18].Antifuse technology is used in the FPGA s f rom Actel [151[I], Quicklogic [6], and Crosspoint [31].A major advantage of the antifuse is its small size,little more than the cross-section of two metal wires. Thisadvantage is somewhat reduced by the large size of thenecessary programming transistors, which must be ableto handle large currents, and the inclusion of isolationtransistors that are sometimes needed to protect low voltagetransistors from high programming voltages. A secondmajor advantage of an antifuse is its relatively low seriesresistance. The on-resistance of the ONO antifuse is 300to500ohms [191, while the amorphous silicon antifuse is50to100ohms [6] [311. Additionally, the parasitic capacitanceof an unprogrammed amorphous antifuse is signif icantlylower than for other programming technologies.C. Floating Gate Programming Technology

The floating gate programming technology uses technol-ogy found in ultraviolet erasable EPROM and electricallyerasable EEPROM devices. T he EPROM -based approachis used in devices from Al tera [43] and Plus Logic [34].The programmable switch, i llustrated in Fig. 3, is atransistor that can be permanently disabled. This is ac-complished by injecting a charge on the floating gate (gate2 in the figure) using a high voltage between the controlgate 1 and the drain of the transistor. T his charge increases

t

Fig. 3. Floating gate programming technology

the threshold voltage of the transistor so that it turns off.The charge is removed by exposing the floating gate to UVlight. This lowers the threshold voltage of the transistor andmakes the transistor function normally.Rather than using an EPROM transistor directly as aprogrammable switch, the unprogrammed transistor is usedto pull down a bit line when the word line is set high,as illustrated in Fig. 3. While this approach can be simplyused to make a connection between the word and bit l ines,it can also be used to implement a wired-A ND style oflogic, thereby providing both logic and routing.A s with the SRA M programming technology, amajor advantage of the EPR OM technology is its re-programmabil ity. An advantage over SRAM, though, isthat no external permanent memory is needed to programthe chip on power-up. EPR OM technology, however,requires three additional processing steps over an ordinaryCMOS process. Two other disadvantages are the high ON-resistance of an EPR OM transistor (about twice that of asimilarly sized NMOS pass transistor) and the high staticpower consumption due to the pull -up resistor used (seeFig. 3).The EEPROM -based programming technology is used inthe devices from AM D [3] and Lattice [4]. It is similarto the EPROM approach, except that removal of the gatecharge can be done electrically, in-circuit, without UV light.This gives the added advantage of easy reprogrammabil ity,which can be very helpful in some applications such ashardware updates to equipment in remote locations. A nEE PROM cell , however, is roughly twice the size of anEPROM cell.E. Summary of Programming Technologies

Table 1lists the properties of each programming technol-ogy. A ll data assumes a 1.2pm C M OS process technology.The first column gives the name of the technology. Notethat there is separate information for the two differenttypes of antifuse. The second column indicates if theconfiguration is lost when power is removed from the

ROSE er al.: ARCHITECTURE OF GATE ARRAYS 1015

I I - . -


4/17

~

Table 1 Comparison of Programming Technologies

300- 500

50-100

2 - 4 k

Technolugy Voletl le?an d Process

n u rsnslatoI2 pm CMOS

Anti-fuaeI2pmCMOS5fF 3

1.1-1.3fF 3

10-201F 3

AmorphousAntl-fuseI2pm CMOS

EPROM12pm CMOS

EEPROM12 pm CMOS

leProg7

Yesin

Circuit

No

No

No

No

No

Yesolndarcuk

Yesin

circuit

Area

Large

Fuse small (via)Pmg. Tran. LargeFuse small (via)

Prog.Tran. Large

Smallin array

2x EPROM 10-2OfF 1 >5device. The third column indicates if the technology permitsreprogramming. T he fourth column provides the relativesize of the programmable switch. The fifth column givesthe series resistance of an on switch, and the sixthcolumn gives the parasitic capacitance of an off switch,not including any capacitance due to associated wiringor programming transistors. For reference, the capacitanceof a 10 pm length of minimum-width wire in a 1.2 pmCMOS process is about 0.6fF. The seventh column givesthe number of additional processing steps required beyondstandard CMOS.111. LOGICBLOCKARCHITECTURE

In this section we survey the commercial FPGA logicblock architectures in use today. In the first section wediscuss the combinational logic portion of the logic block.A discussion of the sequential logic portion is deferred toSection 111-D. In Section 111-E, we present several recentresearch results on the effect of the choice of the logicblock on the density and performance of an FPGA.A . Survey of Commercial Logic Block Architectures

FPGA logic blocks dif fer greatly in their size and imple-mentation capabili ty. The two transistor logic block usedin the Crosspoint FPGA can only implement an inverterbut is very small in size, while the look-up table logicblock used in the Xi linx 3000 series FPGA can implementany five-input logic function but is significantly larger. Tocapture these differences we classify logic blocks by theirgranularity. Granularity can be defined in various ways,for example, as the number of boolean, unctions that thelogic block can implement, the number of equivalent two-

(b)Fig. 4.plementation.Example logic function and two-input NAND gate im-

input NAND gates, the total number of transistors, totalnormalized area, or the number of inputs and outputs. Thematter is further confused because in some architectures,such as the Altera FPGA [43] or the AM D FPGA [3], thelogic and routing are tightly intertwined and it is difficultto separate their contributions to the architecture. For thesereasons, we choose to classify the commercial logic blocksinto just two categories: $ne-grain and coarse-grain.For all the logic blocks described below, we show howto implement the logic function f = ab +T , as illustratedin Fig. 4(a). Note that this is equivalent to the two-inputNAND gate implementation given in Fig. 4(b).B. F ine-Grain Logic BlocksFine-grain logic blocks closely resemble MPGA basiccells. The most fine grain logic block would be identical to abasic cell of an MPGA and would consist of few transistorsthat can be programmably interconnected.

1016 PROCEEDINGS OF TH E IEEE, VOL. 81, NO . 7, JULY 1993


5/17

Transistor PairFig. 5. Transistor pair tiles in cross-point FPGA.

IIIIIII

a CTr ans i s t or s T ~ ~ - I ~ ~ ~ ~Turned Of ff or I s ol at i o n NAND

TWO-I nputNANDFig. 6. Programmed cross-point FPGA for logic functionf = ab+ ?.

1)The Crosspoint FPGA: The FPGA from CrosspointSolutions [311 uses a single transistor pair in the logic block,as illustrated in Fig. 5.Figure 6 illustrates how the function of Fig. 4(b) isimplemented with the transistor pair tiles of the cross-pointFPGA. Since the transistors are connected together in rows,the two two-input NA ND gates are isolated by turning offthe pair of transistors between the gates.In addition to the transistor pair tiles, the cross-pointFPGA has a second type of logic block, called a RAMlogic tile, that is tuned for the implementation of randomaccess memory, but can also be used to build randomlogic functions in a manner similar to the Actel and TheQuicklogic logic blocks described below.2) The Plessey FPGA: A second example of a f ine-grainFPGA architecture is the FPGA from Plessey [33]. Herethe basic block i s a two-input NAND gate as illustrated inFig. 7. Logic is formed in the usual way by connecting theNAND gates to implement the desired function. The logicfunction f =ab+C illustrated in Fig. 4(a) is implementedexactly as shown in Fig. 4(b). If the latch in Fig. 7 is notneeded, then the configuration store is set to make the latchpermanently transparent.Several other commercial FPGA 's employ fine grainblocks. A lgotronix [2] uses a two-input function blockwhich can perform any function of two inputs. This isimplemented using a configurable set of multiplexers. Thelogic block of Concurrent Logic's FPGA [13] contains atwo-input A ND gate and a two-input EXCL USIVE-ORgate. The FPGA recently discussed by Toshiba in [32] alsouses a two-input NA ND gate.The main advantage of using fine grain logic blocks isthat the useable blocks' are ful ly utilized. This is because' n all FPGA's, as well as in all MPGA's, only a fraction of the logicblocks available can be utilized in any design.

Fig. 7. The Plessey logic block.f it is easier to use small logic gates efficiently and the logicsynthesis techniques for such blocks are very similar tothose for conventional mask-programmed gate arrays andstandard cells.The main disadvantage of fine-grain blocks is that theyrequire a relatively large number of wire segments andprogrammable switches. Such routing resources are costlyin delay and area. As a result, FPGA's employing fine-grainblocks are in general slower and achieve lower densitiesthan those employing coarse grain blocks. See Section 111-A

for results supporting this claim.C. Coarse-Grain Logic Blocks

1) The Actel Logic Block: The Actel logic block [151, [11is based on the ability of a multiplexer to implementdifferent logic functions by connecting each of its inputsto a constant or to a signal [46]. For example, considera two-to-one multiplexer with selector input s, inputs aand b and output f = sa+sb. By setting signal b tologic 0, the multiplexer can implement the AND functionf =sa. Setting signal a to logic 1 provides theOR functionf =s+b. By connecting together a number of multiplexersand basic logic gates, a logic block can be constructedwhich can implement a large number of functions in thismanner.The Actel Act-1 logic block [I51 is illustrated in Fig.8(a). It consists of three multiplexers and one logic gate,has a total of 8 inputs and one output, and implements thefunctionf = (s3+s4) (STW + S I X ) + (s3+ sq)(S;jy + s2z).

By setting each variable to an input signal, or to aconstant, 702 logic functions can be realized. For example,the logic function f = ab + E is realized by settingthe variables as shown in Figure 8b: w =1, x =1, SI =0, y =0. z =a. s2 =b, s3 =c, and sq =0.The Act-2 logic block [ I ] is similar to Act-1, except thatthe separate multiplexers on the first row are joined andconnected to a two-input AND gate, as shown in Fig. 9.The Act-2 combinational logic module can implement 766functions.2) Quicklogic Logic B lock: The logic block in the FPGAfrom QuickLogic [6] is similar to the Actel logic blocks inthat it employs a four to one multiplexer. Each input of the

ROSE ef d ' RCHITECTURE OF GATE ARRAYS IO17


6/17

I 03 0402(a)

b(b)

The Actel Act-1 logic block.ig. 8.

X

81P2

Y2

- f

a3 a4Fig. 9. The Actel Act-2 logic block.

multiplexer (not just the select inputs) is fed by an ANDgate, as illustrated in Fig. 10.Note that alternating inputsto the AND gates are inverted. This allows input signals tobe passed in true or complement form, thus eliminating theneed to use extra logic blocks to perform simple inversions.Multiplexer-based logic blocks have the advantage ofproviding a large degree of functionality for a relativelysmall number of transistors. Thi s is, however, achievedat the expense of a large number of inputs (eight in thecase of A ctel and 14 in the case of QuickL ogic), whichwhen utilized place high demands on the routing resources.Such blocks are, therefore, more suited to FPGAs that use

m%CalEzF14 -1 I

Fig. 10. The Quicklogic logic block.

0 1 1

(a)

Fig. 11. Lookup table-based logic.

programmable switches of small size such as antifuses.3)TheXilinr LogicBlock: The basis for the Xilinx logicblock is an SRA M functioning as a look-up table (LU T).The truth table for a K -input logic function is stored in a

2K x 1SRAM . The address lines of the SRA M functionas inputs and the output of the SRAM provides the valueof the logic function. For example, consider the truth tableof the logic function f = ab + given in Fig. ll(a). Ifthis logic function is implemented using a three-input LUT,then the SRAM would have a 1 stored at address 000, a0at001 and so on, as specified by the truth table.The advantage of look-up tables is that they exhibithigh functionality-a K -input L UT can implement anyfunction of K inputs and there are 22K such functions. Thedisadvantage is that they are unacceptably large for morethan about five inputs, since the number of memory cellsneeded for aK -input lookup table is 2. While the numberof functions that can be implemented increases very fast,these additional functions arenot commonly used in logicdesigns and are also difficult to exploit for a logic synthesistool. Hence it is often the case that a large L UT will belargely underutilized.The Xilinx 3000 series logic block [21] [2a] contains afive-input one-output L UT, as illustrated in Fig. 12. Thisblock can be reconfigured into two four-input LUTs, with

1018 PROCEEDINGS OF THE IEEE, VOL. 81. NO. 7, UL Y 1993


7/17

Dap, In

XA

Fig. 12. The Xilinx 3000 logic blockC cz w UI l l 1

wG302G1

a2

GF4F3 aiFl

uocll

Fig. 13 The Xilinx 4000 logic block.

the constraint that together they use a total of no morethan five distinct inputs. This reconfigurability providesflexibility that translates into better logic block utilizationbecause many common logic functions do not require asmany as five inputs. The block also contains sequentiallogic and several multiplexers that connect the combina-tional inputs and outputs to the fl ip-flops or outputs. Thesemultiplexers are controlled by the SRAM cells that areloaded at programming time.The Xilinx 4000 series logic block [23 ] contains twofour-input LUTs feeding into a three-input LUT as il-lustrated in Fig. 13. In this block, all of the inputs aredistinct and available external to the logic block. This blockintroduces two significant architectural changes from the3000 series block. F irst, two differently sized LUTs areused: a four input LUT and a three input LU T, giving thecomplete block a heterogenous flavor. In general, hetero-geneity allows for a better tradeoff between performanceand logic density.The second architectural change in the Xil inx 4000 logicblock is the use of two nonprogrammable connections fromthe two four-input L UTs to the three-input LUT. Theseconnections are significantly faster than any programmableinterconnection since no programmable switches are usedin series, and little is present in parallel. If proper use canbe made of these fast connections FPGA performance canbe greatly improved. T here is a penalty for this type ofconnection, however; since the connection is permanent,the inflexibility means that the three-input LUT may oftengo unused, reducing the overall logic density.

I I +bc(b)

Fig. 14. The Altera 5000 Series logic block.

The Xilinx 4000 block incorporates several additionalfeatures. Each LUT can be used directly as an SRA M block.This allows small amounts of memory to be more eff icientlyimplemented. A nother feature is the inclusion of circuitrythat can be used to implement fast carry addition circuits.4)The Altera L ogic Block: The architecture of the Al teraFPGA [43] has evolved from the PL A-based architectureof traditional PLDs [28] with its logic block consisting ofwide fanin (20to over 100 inputs) A ND gates feeding intoan OR gate with three to eight inputs. Figure 14a il lustratesthe A ltera MAX 5000series logic block. Using the floatinggate transistor-based programmable switch presented inSection 11-C, any vertical wire passing by an AND gatecan be connected as an input to the gate. The three productterms are then ORs together and can be programmablyinverted by an exclusive OR gate, which can also be usedto implement other arithmetic functions. Notice that eachinput signal is provided in both true and complement form,with two separate wires. This programmable inversionsignif icantly increases the functional capabil ity of the block.Figure 14(b) il lustrates the implementation of the logicfunction f = ab + C. The xs in the figure indicate thewired-AN D connections described in Section II -C .The advantage of this type of block is that the wideAND gate can be used to form logic functions with fewlevels of logic blocks, reducing the need for programmableinterconnect. It is difficult, however, to make efficient useof all of the inputs to all of the gates, resulting in loss ofdensity. This loss is not as severe as it first appears becauseof the high packing density of the wired-AND gates, as wellas the fact that logic connections also serve as the routingfunction. In other architectures where logic and routing are

separate such unused inputs would incur a high penalty.A disadvantage of the wired-A ND configuration is theuse of pull-up devices that consume static power. An arrayfull of these pull-ups will consume significant amount ofpower. To mitigate this, each gate in the MAX 7000 seriesROSE et al.: ARCHITECTURE OF GATE ARRAYS 1019


8/17

block can be programmed to consume about 60% lesspower but at the expense of about 40% increase in delay[44]. This feature can be used in noncritical paths to reducepower consumption.In addition to the wide AND-OR logic block, the MAX5000employs one other type of logic block, called a logicexpander. This is a wide-input NAND gate whose outputcan be connected to the AND-OR logic block. While alogic expander incurs the same delay as the logic block, ittakes up less area and can be used to increase its effectivenumber of product terms.The A ltera M A X 7000 logic block 1441 is similar tothe M AX 5000 except that it provides two more productterms and has more flexibility because neighboring blockscan borrow product terms from each other. This isaccomplished using a small routing structure between theAND and OR gates called the product term select matrix.Several other FPGA s use the wide AND-OR style oflogic block, including those produced by Plus Logic 1341,AMD [3], and Lattice [4]. The device in 1341 employs otherlogic types in combination with the wide AND-OR gate.D. Sequential Logic

Most of the logic blocks described above include someform of sequential logic. The Xil inx devices 122, 231 havetwo D fl ip-flops that can be programmably connected tothe outputs of the two lookup tables. The A ltera device1431 has one flip-flop per logic block. In the Act-1 devicefrom Actel [151, the sequential logic is not explicitly presentand so must be formed using programmable routing andthe purely combinational logic blocks. In the Act-2 device[I ], there are two alternating types of logic block: the C-module which is the purely combinational block describedin Section 111-Cl), and the S-module which has similarcombinational functionality to the C-module but includesa D flip-flop.The Plessey logic block 1331 also incorporates one Dlatch. It thus requires two blocks to make a master-slaveflip-flop. The Algotronix logic block [2] forms sequentiallogic using feedback around the basic combinational logicmodule.E. EfSect of Logic Block Granularity on FPGA Densityand Performance

In recent years research efforts have been directed atdetermining choices for FPGA logic blocks that optimizedensity and performance 1351, 1361, [37l, [39l, 1241, [401,1201, 1261, [41]. I n this section we briefly survey thisresearch. For a more complete survey see [8]. The sectionis divided into two parts: the first deals with the effect oflogic block granularity on FPGA density, while the secondcovers the effect of granularity on performance.I ) Effect o Granularify onDensity As the granularity ofa logic block increases, the number of blocks needed toimplement a design should decrease. On the other handa more functional (larger granularity) logic block requiresmore circuitry to implement it, and therefore occupies more

-41

Y (a)unurdi+b)! a( C)Three implementationsof f = abd +bcd +ab?.ig. 15.

area. This tradeoff suggests the existence of an optimallogic block granularity for which the FPGA area devotedto logic implementation is minimized. While this argumentfor logic area is straightforward, the effect of granularity onrouting area is not as simple, and has a significant impacton the overall FPGA routing area, as discussed below.As an example of the effect of block functionali ty onlogic area, consider the implementation of the logic functionf = abd +bcd+abcusing logic blocks of three differentgranularities as illustrated in Fig. 15.The three logic blocksare a 2-input lookup table (denoted 2-LUT), a 3-L UT, anda 4-LUT. As shown, the 2-L UT implementation requiresseven logic blocks, the 3-L UT needs three blocks, and the4-L UT only one. As an area measure, consider the numberof memory bits required to implement this logic functionusing a K -LUT . Since each K -LUT requires 2 bits, the2-L UT implementation requires a total of 28 bit, the 3-L UTneeds 24 bits and the 4-L UT needs just 16.Using this areameasure the 4-LUT achieves the minimum logic area.To date, all research aimed at determining the logic blockgranularity that minimizes the overall area of an FPGAhas been experimental. A variety of benchmark designs aremapped into FPGAs with different logic block granularitiesand the total logic block area as well as routing area aredetermined for each mapping. Results are then averagedand compared.Figure 16gives an example of such experimental resultsfrom 1361. It plots, for one benchmark design, both the

1020 PROCEEDINGS OF TH E IEEE, VOL. 81. NO. 7, JULY 1993


9/17

2.5 1

Numberlocksof

Number ofq30 Blockm?Area500 10

2 3 4 5 6 7Number of Inputs,K

Blocks 6oo x10 3

-Blocks - 30 0A Route AreaE3lwYRoute Area

200 PerBlock::[ A ' A ' - x m"20-3Fig. 16. Number of blocks and block area, for one circuit.

500i L looI I2 3 4 5 6 7

Number of Inputs,KFig. 17. Number of blocks and routing areahlock, for one design.

number of K -L UT blocks needed to implement that designas well as the size of one K -L UT, versus K. These resultsassume the use of an SRA M programming technology anda 1.2 pm CM OS process. The number of logic blocksdecreases rapidly as K increases, while the block sizeincreases exponentially in K . The total logicblock area (theproduct of the two curves) achieves a minimum at K=4. Thetotal logic block area minimum exhibits a weak dependenceon the size of the programmable switch as discussed in1361, WI.Active logic area is only part of the total area. Thearea for routing is usually larger than the active area,particularly in FPGA 's, representing from 70 to90%of thetotal area. Figure 17 plots the routing area per logic blockand the number of logic blocks used versus K for the samebenchmark design. While the number of wires decreases asK increases, experimental results show acontrary effect: asK increases the routing area per block increases, which isnot only consistent with the results of other experimentalstudies [27], but also predicted by theoretical studies [14].The total routing area, obtained by the product of the twocurves in Fig. 17, again exhibits a minimum at K=4 asconcluded in [36].The total chip area needed for an FPGA is the sum of thelogic block area plus the routing area. This can be simplycalculated by multiplying the curves in Figs. 16 and 17 andsumming them. The normalized and averaged results for 12benchmark designs are plotted in Fig. 18.As the figure shows, total normalized area is minimizedfor K =4. Figure 18 uses a programming technology sizeequivalent to a static RAM cell, but these results appear tobe only weakly dependent on the size of the programmableswitch [36].These results also appear to be insensitive tothe tools used for logic synthesis and layout.

0A - 1S ! m * * ZNormalized ,4,

2 3 4 5 6 71

Number 01 Inputs.K

Fig. 18. Average normalized total area versus K for a K-LUT.

In [25], [26] Kouloheris and El Gama1 investigatedgranularity versus density for K -input M -output LUT's [25]and K -input M -output N-product term programmable logicarrays. For K-input M -output L UT's it was shown that:A 4-input 1-output lookup table yields the minimumtotal area of any K -input M-output lookup table logicblock for a wide range of programming technologiesand routing pitches.The best K for area is determined largely by the ratioof memory bit area to the fixed overhead area andexhibits little dependence on the switch area.

For K -i nput M-output N-product term programmablelogic array blocks it was shown that:A PLA with eight-10 inputs, three-four outputs, and12-13 product terms logic block gave the smallestoverall FPGA area.Assuming the same programming technology, the over-all FPGA area using the best PL A logic block i s com-parable to that using a four-input one-output lookuptable.

One other study has investigated another dimension oflogic block architecture. Hill and Woo [20] observed that,as in the X il inx 2000 and 3000 blocks, any K-LUT canbe implemented by two (K -1)-LUT's connected by a two-to-one multiplexer, where the Kth input is the input to themultiplexer. I f the outputs of the smaller LUT's are madeaccessible, then the entire block can be used as either two(K -I )-L UT's or one K -L UT . Hill and Woo investigate thebenefits and drawbacks of this kind of flexibility.2) Effect of Granularity onPe$ormance: The granularityof the logic block has a significant effect on the performanceof an FPGA . For example, Fig. 19(a) gives the implemen-tation of the logic function f = abd + abc + acdusing two-input NA ND gate logic blocks. The longestpath requires four levels of blocks. Figure 19(b) showsan implementation of the same function using three-inputlookup tables requiring only two levels of blocks. Assuminga 1.2pm CM OS technology, a two-input NA ND gate delayis estimated at 0.7ns while a three-input L UT delay isestimated at 1.4 ns. Clearly, for a nonzero routing delaybetween the blocks, the higher granularity of the 3-L UTwill result in a faster implementation.The effect of logic block granularity on FPGA per-formance has also been studied using an experimental

ROSE c d. : RCHITECTURE OF GATE ARRAYS 1021


10/17

(b)Fig. 19. Two implementations of the same logic function.approach [24], [39], [40], and [41]. Figure 20 plots boththe average number of logic block levels in the criticalpath and the logic block delay for an FPGA employing aK -LUT versus K. The curve was obtained by normalizing theresults for each of 20 benchmarks of different complexitiesto the K=2 case and then averaging the normalized dataover the benchmarks. This figure corroborates the tradeoffillustrated by the example in Fig. 19; as the logic blockgranularity increases, the number of levels of logic in thecritical path decreases, while the delay of the logic blockincreases.The routing delay per stage is also an increasing functionof K as illustrated in Fig. 21 for the following reasons:Firstly, the average fanout is increasing, and secondly, thenumber of switches loading each wire is increasing becausethere are more pins per logic block. Finally, the wiresincrease in length as the logic block grows in size.The tradeoff between the decrease in the number oflevels of logic with increasing K and the increase in blockand routing delay determines the granularity for optimalFPGA performance. Figure 22 is a plot of normalizedcritical path delay versus number of inputs to the lookuptable. The figure gives several curves for different values ofRC delay in the programmable switch, T,. If the RC delayis relatively small the best granularity is relatively small,around K=3 or 4. On the other hand, if the RC delay islarger the best granularity is larger, around 6 or 7. Figure 23plots the granularity which achieves the best performanceas a function of the programmable switch time constantT, =RC. (See Table 1 for typical values of R and C forvarious programming technologies.)For more details on these results see [24], [41]. Inaddition, [41] investigates the delay properties of severalother types of block, including NAND gates, multiplexersand wide AND-OR gates.

o.2 t 11 l o2 3 4 5 6 7 8 9 10Number d B 1 d npuu.K

Fig. 20.for I


11/17

Fig. 23. Best performance I< versus T,

A. Survey of Commercial Routing ArchitecturesFigure 24 illustrates a routing architecture model whichwe use to describe several commercial FPGA routing archi-tectures. Before proceeding, a few definitions are necessary.A wire segment is a wire unbroken by programmableswitches. One or more switches may attach to the wiresegment. Each end of a wire segment typically has a switchattached.A track is a sequence of one or more wire segments ina line.A routing channel is a group of parallel tracks, asillustrated in Figure 24.As shown in Fig. 24, the model contains two basicstructures. The first is the connection block which appears inall architectures. A connection block provides connectivityfrom the inputs and outputs of a logic block to the wiresegments in the channels. A lthough not shown in the figure,there can be connection blocks in the vertical direction aswell as in the horizontal direction. The second structureis the switch block, which provides connectivity betweenthe horizontal as well as vertical wire segments. In Fig.24,the switch block provides connectivity among the wiresegments incident to its four sides. In some architectures,the switch block and connection block are intermingled,

and in others they are combined into a single structure.The following four sections describe four commercialFPGA routing architectures beginning with the context ofthe general model of Fig. 24, then proceeding to moreunique features.I ) The Xilinx Routing Architecture: Figure 25 illustratesthe routing architecture used in the X ilinx 3000 seriesFPGA [21] [22]. Connections are made from the logic blockinto the channel through a connection block. Since eachconnection site is large because of the SRAM programmingtechnology, the Xilinx 3000 connection block typicallyconnects each pin to only two or three out of the five trackspassing by a block as the expanded figure in the upper leftcomer of Fig. 25 illustrates. On all four sides of the logicblock there are connection blocks that connect a total of

11 dif ferent logic block pins to the wire segments. Theconnections are implemented with pass transistors for theoutput pins and multiplexers for the input pins. T he useof multiplexers reduces the number of SRA M cells neededper pin.

Fig. 24. General FPGA routing architecture.

Once the logic block pin is connected via the connec-tion block, the switch block makes connections betweensegments in intersecting horizontal and vertical channels.As the expanded picture in the lower right-hand comershows, each wire segment can connect to a subset of thewire segments on opposing sides of the block. Each wiresegment can typically connect to five or six out of a possible15 wire segments on the opposing sides. Again, this numberis limited by the large size and capacitance of the SRAMprogrammable switches.There are four types of wire segments provided in theXilinx 3000 architecture:General-purpose interconnect consisting of wire seg-ments that pass through switches in the switch block.Direct interconnect consisting of wire segments thatconnect each logic block output directly to four near-est neighbors as illustrated by the thick black linesemanating from the center block in Fig. 25.L ong lines, which span the length or width of thechip, providing high-fanout uniform delay connections,indicated by the dashed lines in Fig. 25.A clock line, which is a single net that spans the entirechip and is driven by a high-drive buffer. This line isconnected only to the clock inputs of the fl ip-flops, andprovides for a low-skew clocking scheme.

The Xilinx 4000 series architecture [23] is similar tothe 3000 series, with the key architectural differencesbeing that there are many more general purpose tracks perchannel-1 8 versus five. A lso, connectivity between thelogic block pins and the tracks is much higher as each logicblock pin connects to almost all of the tracks. Finally, fourof the tracks pass through a switch only every second switchblock. These double-length lines aredepicted in Fig. 26.

ROSE er al.: ARCHITECTURE OF GATE ARRAYS 1023


12/17

hpnsegment Ouput segmmtVer6cal Tradc

LE

Fig. 25. Xilinx 3000 routing architecture.

LE LB E LB

(Sib-bnglhLinetammtrham)

Fig. 26. Xilinx 4000 segments of length 2.The advantage of longer wire segments is that the signal

passes through less series resistance in travelling the samedistance compared to wire segments of length one. Inaddition, since there are fewer switches and associatedprogramming cells the FPGA logic density i s superior.2) The Actel Routing Architecture: Figure 27 illustratesthe general Actel routing architecture [151. The routingarchitecture is asymmetric because there are moreuncommitted general purpose tracks in the horizontaldirection than the vertical. While mask-programmed gatearrays and standard cell layout styles have always hadthis kind of directional bias, to our knowledge there is nodefinitive result that determines whether a biased approachis better or worse than an unbiased one.The connection block (or routing channel) of the Actelrouting architecture is shown in the middle of Fig. 27. Theconnectivity is different for input pins and output pins. Forinput pins, each pin can connect to all of the tracks in thechannel that are on the same side as the, pin. The outputpins extend across two channels above the logic block and

wiring segment

ttttt--w---/LE LB LB

- n6-luseClockback

Fig. 27. Actel FPGA routing architecture.two channels below it. A n output pin can connect to everytrack in all four channels that it crosses.There is no clearly separable switch block in the Actelarchitecture. Instead, the switching is distributed throughoutthe horizontal channels. Depending on the direction of theconnection, different degrees of connectivity are available.All vertical tracks can make a connection with everyincident horizontal track. Note that this provides a greatdeal more connectivity than the switch block in the Xilinxarchitectures. This flexibility allows the routing of thehorizontal channels to be performed independently, becausefor a given route that travels through two channels, thechoice of a track in one channel has no effect on the numberof choices available in another channel. By contrast, in theXil inx approach, the choice of a particular track in onechannel severely limits the choice of tracks in subsequentchannels. This flexibility simplifies the routing problem.The drawback is that more switches are needed whichcontribute extra capacitive loading.Each horizontal channel consists of 22 routing tracks, andeach track is broken up into segments of different lengths.Segments vary in length from two logic blocks long (allow-ing connections between just two blocks) to segments thatare equal in length to the entire track. This wide distributionof segment lengths makes it likely that a segment of theexact or close length of any given connection can be found,so that very few series programmable switches are neededin any intra-channel connection.As mentioned above, there are fewer uncommitted ver-tical segments than horizontal segments. There are threetypes of vertical segments. In addition to the input segmentsand output segments already described, there are uncom-mitted vertical freeways that either travel the entire heightof the chip, or some significant portion of it. There is onefreeway per logic block. This allows signals to travel longervertical distances than permitted by the output segments.The routing architecture of the Crosspoint FPGA [31] issimilar to that of Actel. The Quicklogic architecture [6],which also uses antifuses, is again similar except that thesegments are of two classes: short tracks of length one, andlong tracks that traverse the entire chip.3) The Altera Routing Architecture: The routing architec-ture of the A ltera FPGA is illustrated in Figs. 28and 29 This

1024

T I -PROCEEDINGS OF THE IEEE, VOL. 81. NO. 7, JULY 1993

.


13/17

Lou1 Bru: Global Bw :LAB PIA LAB

LA BLA B

Fig. 28. Altera MAX SO00 global routing architecture.

Local nWmnneCt B L ~ Logc B b&Fig. 29.architecture is novel in that it has a two-level hierarchy.This i s an important feature because most designs exhibitsome form of locality which a hierarchical organization maybe able to util ize to obtain better density and performance.At the first level of the hierarchy, 16 or 32 of the logicblocks are grouped into aLogic Array Block, or LAB. Theinner details of the LAB are il lustrated in Figure 29,andit can be seen that the structure of the LAB is very similarto a traditional PLD [28]. Each x in the figure indicatesa point where a connection can be made. As describedin Section 2.3, the connection is formed using EPROM -like floating-gate transistors. Here the channel is a set ofdedicated vertical tracks that run the entire length of theLAB. The tracks are dedicated to one of four types ofconnections:

1) Connections from the outputs of all logic blocks inthis LAB.2) Connections from the logic expanders, described inSection IIILC4).3) Connections from outputs of logic blocks in otherLABs. These connections come from the globalinterconnect structure at the next level of hierarchy,called the PI A , and are described below.4) Connections from the I/O pads of the chip.All four types of tracks pass by every logic block in theLAB. In the connection block every such track can connectinto every logic block pin. This makes the routing verysimple, since any input can directly connect to any track.

Altera M AX SO00 local routing architecture.

!!!-z

Fig. 30. Plessey routing architecture.S Complete

30.00120.001 , ,

5.00 10.00

R=10F~- ~ i - _ _ _ _ _ _ - -Fs=EF s=7F s=6F s=5R =4F s=3F s=2

- - - -- - -

FcFig. 31. Percent routing completion versus F,, one circuit.Using fewer connection points results in better density andperformance, but yields a more complex routing problem.The intra-L AB routing structure could be considered asegmented channel, where the segments are all as longas possible. Recall that connections also perform wire-ANDing, and so the transistors in effect have two purposes.Referring again to Fig. 28, connections are made amongdifferent LABS using a global interconnect structure calledthe Programmable I nterconnect Array (PIA). I t connectsoutputs from each LAB to inputs of other LABs, and actsas one large switch block. I ts internal structure is similarto that of the internal routing scheme within the LAB-alarge number of vertical tracks (180 in the EPM 5128device) are connected to the logic block outputs. There isfull connectivity among the logic block outputs and LABinputs within the PI A. Again, the advantage of this schemeis that it makes the routing problem very easy, and theregularity of the physical design of the silicon allows itto be packed tightly and efficiently. The disadvantage isthat many switches are needed, and these may add morecapacitive load than necessary.A second advantage of this approach is that the delaythrough the PI A is the same regardless of which track isused since all tracks have identical loading. This is veryhelpful when attempting to predict system performance.

ROSE er nl.: ARCHITECTURE OF GATE ARRAYS 1025


14/17

Theprice, however, may be circuits that are much slowerthan is possible with segmented tracks.The routing architecture of A M D [3] and Plus Logic [34]are similar to that of A ltera.4) ThePlessey Routing Architecture: The routing archi-tecture of the Plessey FPGA is il lustrated in Fig. 30. Thereare two alternating types of logic block. The Master blockis indicated by an M, and the Slave block is indicatedby an S. Also, the rows of logic blocks alternate in theirdirection of flow. Programmable routing is achieved usingonly a multiplexer as a connection block on the inputs ofthe two-input NA ND gate as described in Section 111-B2).The multiplexers are controlled by SRA M cells. Each inputof the NA ND gate comes from a four-to-one multiplexer.In Fig. 30, the connections for one logic block are shown,and indicate the basic pattern of connectivi ty. T he inputsto each multiplexer are connected to:

1) The output of the previous NA ND gate in the row.2) The output of the NAND gate above or below this3) A vertical long track.4) One of the following three connections depending on

logic block, whichever is closer.

which NA ND input the multiplexer drives:

the connection block, Fc, to be the number of tracks inthe adjacent channel to which each logic block pin canconnect. T he flexibility of the switch block, F,, as depictedin Fig. 24, is defined as the number of tracks to whicheach track entering the switch block can connect. Both ofthese definitions assume that all wires or pins has the samedegree of connectivity.The research in [38], [7] uses an experimental approachsimilar to the logic block experiments described in Section3.5. Several benchmark designs are implemented usingrouting architectures with different flexibilities. For eachbenchmark the following was determined: 1) the percentageof connections successfully routed with a fixed number oftracks per channel and 2) the number of tracks per channelrequired to achieve 100% routing completion, assuming thatevery channel has the same number of tracks.Figure 31 plots results of the first type for one benchmarkdesign. T he Y-axis is the percentage of connections thatwere successfully completed with a connection block offlexibility F , as indicated on the X -axis, and a switch blockof flexibility F , varying from 2 to 10across the nine curves.These data lead to two conclusions that are consistent acrossall benchmark designs considered. T hese are:

1) The connection block requires flexibility F , greaterthan half the tracks for 100% routing completion tobe possible.2) The switch block requires little flexibility to achieve100% routing completion. This is because aflexibilityof F , = 3 easily achieves completion and the

A horizontal long track (the upper input).The NA ND gate output two blocks previous tothe current one (lower input of M aster block).The output of the block diagonally away fromthe current one (lower input of Slave block).A lthough not shown, the other logic blocks are con-nected similarly.

There are three vertical long tracks per column of logicblocks, and two horizontal l ong tracks per row. T here aretwo types of long tracks in each direction: short range trackswhich travel 10blocks, and long range tracks which travelthe width or length of the entire chip.While this type of nearest-neighbour routing architectureworks well for circuits with primarily local connections,there. s an insufficient number of longer horizontal and ver-tical routing tracks for typical circuits with more non-localconnections. For this reason the logic blocks themselvesmust be used for routing which can be costly in area andperformance.The routing architectures of A lgotronix [2],ConcurrentLogic [131, and Toshiba [32] exhibit asimilar mix of manylocal and few long range connections.B. Routing Architecture Tradeoffs

In this section we summarize several recent results con-cerning FPGA routing architecture. The work in [37], [38],[7] [8] considers the relationship between the flexibilityof FPGA routing architecture and both the routability andarea-efficiency. A basic tradeoff is as follows: using alarge number of programmable switches makes it easierto achieve routing completion, but these switches con-sume area, and therefore it is desirable to minimize thenumber used. The work in [38] defines the flexibility of

maximum value of F, is 36 if there are 12 tracksin a channel. These results make intuitive sense, asexplained in [38].A s flexibili ty increases, the number of tracks needed toobtain 100% routing completion decreases. Table 2 givesthe average, over five circuits, of the number of tracksin excess of channel density needed to successfully routearchitectures with different values of E, and 5. he tableillustrates the diminishing returns, in terms of track count,

of increasing the flexibility of the routing architecture.Table 3 gives the average number of switches per logicblock for the data in Table 2. This data shows that thereis a point where the higher flexibility of the switch andconnection blocks would cost more than the reduction intracks, in terms of total number of programmable switches.By inspection of the table, the best architecture, in termsof total number of programmable switches, appears to bewith a value of F, between three and four, and the fraction% (where W is the number of tracks per channel) between0.7 and 0.8 [38], [SI.Asymptotic Results on Channel Segmentation In [16], ElGama1 et al. give asymptotic results on the number of ex-cess tracks (above channel density) needed to successfullyroute a segmented routing channel of the type used in theActel FPGA . A nother paper in this special section containsmore details on this work.In this paper various architectural and programmingtechnology features of commercial FPGA s have been

1026 PROCEEDINGS OF TH E IEEE, VOL. 81, NO. 7, JULY 1993


15/17

Table2 Average Number of Switches per L ogic Block Tile for Each Routing Architecture

0.6nr4.63.02.41.8

3 N N N4 N nr 6.25 N N 4.26 N 7.6 3.87 N 5.2 3.48 N 4.4 1.49 10.0 4.2 1.010 8.0 3.2 1.4

07 0.8 0.9nr 11.2 10.82.4 1.2 0.81.6 0.6 0.61.0 0.4 0.20.8 0.4 0.41.4 I 0.2 I 0.2 I 0.2qyy-g.6 0.2

1.o9.00.80.6

---0.20.40.20.20.00.0---

Table3 Average Excess Tracks for Each Routing Architecture

surveyed. The logic block architectures have been classifiedby granularity and the routing architectures have been de-scribed in terms of connectivity, symmetry, and hierarchy.The effects of logic block choice and routing architectureon FPGA density and performance.v. SUMMARY AND FUTURE IRECTIONS

In the future, we expect to see an even greater pro-liferation of architectural innovations as well as researchthat answers basic questions about FPGA architecture. Weexpect these to include:Development of special-purpose FPGAs tuned to-ward specific applications such as datapath circuits,DSP applications, finite state machines, and FPGA-based compute engines [ 5 ] .The development of multi-chip field-programmablesystems for emulation, acceleration and rapid proto-Development of nonhomogeneous logic block archi-tectures. These hold the promise of providing betteraredperformance tradeoffs, because larger granular-ity blocks achieve better performance while smallergranularity blocks achieve higher density.Segmented routing architectures provide important per-formance and density advantages over nonsegmentedarchitectures (those with only one or two lengths

typing (e.g. [51).

of segment). New insight is needed in determiningsegment length distribution.It is possible that the success of the FPGA in the digitalworld may be reproduced in the analog domain, with afield-programmable analog array ( FPAA) . Such workhas already begun with the chip presented in [29] [30].In addition, computer-aided design tools for FPGAs willbecome more sophisticated, enabling the development ofarchitectural features aimed at improving performance anddensity.

ACKNOWLEDGMENTThe second author would l ike to thank Dana How andJack Kouloheris for their useful comments on earlier draftsof this paper.

REFERENCES[ I ] M . Ahrens, et. al., An FPGA family optimized for highdensities and reduced routing delay, in Proc. 1990CICC, Mpp. 31.5.1-31.5.4, May 1990.[2] CA L 1024 Datasheet, Algotronix L td. Edinburgh, Scotland,

1989.[3] Mach Devices High Density EE Programmable Logic DataBook. AM D, 1990.[4] S. Baker, L attice fields FPGA, E lect. Eng. Times, no. 645,page 1, J une 10, 1991.[5] P. Bertin,D Roncin, and J . Vuillemin, Programmable activememories: Performance measurements, in ACM F irst Inter-ROSE er a/ ARCHITECTURE OF GATE ARRAYS

I I -1027


16/17

national Workshop on Field-P rogrammable Gate Arrays, pp.57-59, Feb. 1992.[6] J . Birkner, A. Chan, H. T . Chua, A. Chao, K . Gordon, B.K leinman, P. K olze, and R. W ong, A very high-speed fieldprogrammable gate array using metal-to-metal anti-fuse pro-grammable elements, in New H ardware Product Introductionat CICC 91.[7] S. Brown, Routing architectures and algorithms for field-programmable gate arrays, Ph.D. Thesis, Dept. of ElectricalEngineering University of Toronto, February 1992.[8] S. Brown, R . Francis, J . Rose, and Z. Vranesic, Field-Programmable Gate Arrays, K luwer Academic Publishers,1992.[9] W. Carter et. al., A user programmable reconfigurable gatearray, Proc. 1986 CICC, pp. 233-235, M ay 1986.[lo] D. Chen, J . Rabaey, A R econfigurable Multiprocessor IC forRapid Prototyping of Real-Time Data Paths, in Proc. ISSCC92,, pp. 74-57, Feb. 1992.[ll] K. Chung, S. Singh, J . Rose, P. Chow, Using hierarchicallogic blocks to improve the speed of field-programmable gatearrays, in FPGAs, W. M oore and W. L uk Eds., Abingdon1991. edited from the Oxford 1991 International Workshop onField Programmable L ogic and Applications.[12] K. Chung and J . Rose, T EM PT : Technology mapping for ex-ploration of F PGA architectures with hard-wi red connections,in Proc. 29th Design Automation C onf , J une 1992, Anaheim,[131 Concurrent Logic CFA6006 Field-Programmable Gate ArrayDura Sheet, Sunnyvale, CA , 1991.[14] A. El G amal, T wo-dimensional stochastic model for inter-connections in master slice integrated circuits, IEEE Trans.Circuits Systems, vol. CA S-28, no. 2, pp. 127-138, February

1981.[15] A . El Gamal, et al., A n architecture for electrically config-urable gate arrays, IEEE J SSC, vol. 124, No. 2, pp. 394-398,Apr. 1989.[161 A . El Gamal, J . Greene, and V. Roychowdhury, Segmentedchannel routing is nearly as eff ici ent as channel routing (and ustas hard), i n Advanced Research in VLSI ,University Califomia,Santa Cruz, March 25-27, 1991.[17] J . Greene, et al, Segmented channel routing, in Proc. 27thDesign Automation Conf., pp. 567-572, J une 1990.[18] J . Greene, A ntifuse field programmable gate array, Proc.IEEE, vol. 81, no. 7, J uly 1993..[19] E. Hamdy, J. M cCollum, S. Chen, S. Chiang, S. Eltoukhy, J .Chang, T. Speers, and A. M ohsen, Dielectric based antifusefor logic and memory ics, Int. Electron Devices Meeting Tech.Digest, pp. 786-789, 1988.[20] D. Hill and N .3 . Woo, The benefits of flexibility in look-up table FPGA s, in FPGAs, W. M oore and W. L uk, Eds.,A bingdon 1991, edited from the Oxford 1991 In?. Workshop onField Programmable Logic and Applications, pp. 127-136.1211 H. Hsieh, et al., A second generation user programmable gatearray, in Proc. 1987 CICC, pp. 515-521, M ay 1987.[22] H. Hsieh, et al, A 9000-gate user-programmable gate array,in Proc. 1988 CICC, pp. 15.3.1-15.3.7, M ay 1988.[23] H. Hsieh, et al., Third-generation architecture boosts speedand density of field-programmable gate arrays, in Proc. 1990CICC, pp. 31.2.1-31.2.7, M ay 1990.1241 J. K ouloheris and A . E l Gamal, F PGA performance vs. cellgranularity, in Custom ntegrated Ci rcuits Conf. 91,CICC, pp.61.2.1-6.2.4, M ay 191.[25] J. K ouloheris and A. El Gamal, FPGA area versus cell granu-larity - lookup tables and PL A cells, in FPGA 92, ACM FirstInt. Workshop on Field-Programmable Gate Arrays, pp. 9-14,Feb. 1992.[26] J . K ouloheris and A. El Gamal, FPGa area versus cell gran-ularity - PL A cells, in Custom Integrated Circuits Conference92, CI CC , May 1992.1271 J . K ouloheris, private communication.[28] P. K. L aIa, Digital System Design U sing Programmable LogicDevices. Prentice Hall, 1990.[291 E. Lee, Field-programmable analog arrays - a cmos real-ization, M.A .Sc. thesis, Univ. of Toronto, Toronto, Ontario,Canada.[30] E. L ee and P. G. Gulak, A CM OS f ield-programmable analogarray,lE EE J SSC, vol. 26, no. 12, pp. 1860-1867, D ec. 1991.1311 D. M arple and L . Cooke, An M PGA compatible FPGA ar-

CA , pp. 361-367.

chitecture, in FPGA 92, ACM First In?. Workshop on Field-Programmable Gate Ar rays, pp. 3944, Feb. 1992.[32] H. M uroga, H. M urata, Y . Saeki, T. Hibi, Y . Ohashi, T.Noguchi, and T. Nishimura, A large scale FPG A with 10Kcore cells wi th CM OS 0.8pm 3-layered metal process, CustomIntegrated Ci rcuits Con6 91,CI CC , pp. 6.4.1-6.4.4, M ay 1991.[33] Plessey Semiconductor ERA 60100 preliminary data sheet,Swindon, England, 1989.1341 Plus L ogic FPGA 2040 Fi eld-Programmable Gate A rray DataSheet, Plus Logic, San J ose, CA , 1989.[35] J . S. Rose, R. J . Francis, P. Chow and D. L ewis, The effect oflogic block complexity on area of programmable gate arrays,in Proc. 1989 Custom Integrated Circuits Conf.,pp. 5.3.1-5.3.5,M ay 1989.[36] J . S.Rose, R. J . Francis, D. Lewis, and P. Chow, A rchitectureof field-programmable gate arrays: The effect of logic blockfunctionality on area efficiency, IEEE J SSC, Vol. 25, No. 5,

1371 J . S.Rose and S. Brown, The effect of switch box flexibilityon routability of field programmable gate arrays, in Proc. 1990Custom Integrated Circuits Conf , pp. 27.5.1-27.5.4, May 1990.[38] J . S. Rose and S. Brown, Flexibility of Interconnection Struc-tures for Fi eld-Programmable Gate Arrays, IEEE J SSC, Vol.1391 S. Singh, J . S. Rose, D. L ewis, K . Chung, and P. Chow,Optimization of Field-Programmable Gate A rray Logic BlockArchitecture for Speed, in Custom Integrated Circuits Confer-ence 91, CI CC , pp. 6.1.1-6.1.6, M ay 1991.[40] S. Singh, The effect of logic block architectureon the speed offield-programmable gate arrays, M .A .Sc. T hesis, Departmentof El ectrical Engineering, University of T oronto, August 1991.[41] S. Singh, J . Rose, D. Lewis, and P. C how, The effectof logicblock architecture onFP GA performance, IEEE J SSC, vol. 27,

no. 3, pp. 281-287, M ar. 1992.[42] H. S. Stone, Parallel processing with the perfect shuffle, IEEETrans. Comput., vol. C-20, pp. 153-161, 1971.[43] S. C. W ong, H. C . So, J . H. Ou, and J . Costello, A 5000-gateCM OS EPL D wi th multiple logic and interconnect arrays, inProc. I989 CICC, pp. 5.8.1-5.8.4, M ay 1989.[44]S. Vij, B . A hanim, A high density, high speed, array-based erasable programmable logic device with programmablespeedlpower optimization, in ACM First Int. Workshop onField-Programmable Gate Arrays,pp. 29-32, Feb. 1992.1451 J ean-M ichel V uil lamy, Performance enhancement in fi eld-programmable gate arrays, M.A .Sc. Thesis, Department ofElectrical Engineering, University of Toronto, April 1991.[461 S. Yau and C. T ang, Universal l ogic modules and their appli-cation, IEEE Trans. Comput., Vol. C-19, pp. 141-149.

pp. 1217-1225, Oct. 1990.

26, NO. 3, pp. 277-282, M arch 1991.

J onathan Rose (Member. I EEE) received theB.A .Sc. degree in engineering science in 1980,and the M.A .Sc. and the Ph.D. degree in electri-cal engineering, n 1982 and 1986, respectively,from the University of Toronto.During the summer of 1983, he was withBell-Northem Research L td., Ottawa. in the In-tegrated Ci cuits CAD/CAM group. From 1986to 1989, he was a Research Associate in theComputer Systems Laboratory at Stanford Uni-versitv. In 1989. he ioined the facultv of theUniversity of Toronto, where he is currently an Assistant Professor ofelectrical engineering.He has served as General and Program Cochair forthe 1992 ACM First International Workshop on Field Programmable GateArrays, and on the program committees of the Oxford Field-ProgrammableLogic workshop in 1991 and the Vienna Field-Programmable LogicWorkshop in 1992, as well as the program committees for ICCD 92and ICCA D 92. In 1990, a paper coauthored with Stephen Brown ondetailed routing for FFGA s received a distinguished paper citation at theICCA D conference. His research interests include CAD and architecturefor Field-ProgrammableGate Arrays, automatic layout, and parallel CADalgorithms.

Abbas El Gamal (Senior Member, IEEE), for photograph and biogra-phy please see the Prolog to the Special Section in this issue of thePROCEEDINGS.I 028 PROCEEDINGS OF THE IEEE, VOL. 81, NO. 7, JULY 1993


17/17

Albert0 Sangiovanni-V incentilli (Fellow,I EEE)received the Dr. of Eng:degree from Polytech-nico di Milano, Italy, in 1971.He is currently a Professor of electrical engi-neering and computer science at the Universityof California at Berkeley, where he has beenon the Faculty since 1976. He was a VisitingScientist at the IBM T.J . Watson ResearchCenter in 1980, Consultant n residence at Harrisin 1981, and Visiting Professor at MIT in 1987.He has held a number of visitine urofessorIpositions at the University of Torino, Universityof Bologna, University ofPavia, Universityof Pisa, and University of Rome. He is a member of theBerkeley Roundtable for Intemational Economy (BRIE). He is a CorporateFellow at Harris and Thinking M achines, has been a Directorof ViewLogicand Pie Design System, on the Strategic Advisory Council of Cadence,and on the Technical Advisory Board of Cadence Analog Division andSynopses, companies that he helped funding. he consulted for severalmajor U.S. (including A n , IBM, DEC. GTE, Intel, NYNEX, GeneralElectric, Texas Instruments), European (A lcatel, Bull, Olivetti, SGS-Thompson, Fiat), and Japanese companies (Kawasaki Steel), in additionto start-ups (Actel, Crosscheck, Biocad, Redwood Design Automation,SiA rc, Wireless Access). He is the Technology Adviser to GreylockManagement. He is on the Intemational Advisory Board of the Institutefor Micro-Electronics of Singapore.In 1981 he received the Distinguished Teaching Award of theUniversity of Cali fomia. He has received three Best Paper Awards (1982,1983, and 1990) and a Best Presentation Award (1982) at the DesignAutomation Conference and is the corecipient of the Guillemin-CauerAward (1982-1983), the Darl ington Award (1987-1988), and the BestPaper Award of the Cicuits and Systems Society of the IEEE (1989-1990).He has published over 250 papers and three books in the are of CADfor V LSI. He has been a Technical Program Chair of the IntemationalConference on CA D for 1989 and the General Chair for 1990.

ROSE et 01 ARCHITECTURE OF GATE ARRAYS

- ~I 11029

[lectura fpga architecture] architecture of field-programmable gate arrays - rose

Documents