high- performance reconfigurable computing

5
0018-9162/07/$25.00 © 2007 IEEE March 2007 23 Published by the IEEE Computer Society H igh-performance reconfigurable computers (HPRCs) 1,2 based on conventional processors and field-programmable gate arrays (FPGAs) 3 have been gaining the attention of the high-per- formance computing community in the past few years. 4 These synergistic systems have the potential to exploit coarse-grained functional parallelism as well as fine-grained instruction-level parallelism through direct hardware execution on FPGAs. HPRCs, also known as reconfigurable supercom- puters, have shown orders-of-magnitude improvement in performance, power, size, and cost over conventional high-performance computers (HPCs) in some compute- intensive integer applications. However, they still have not achieved high performance gains in most general scientific applications. Programming HPRCs is still not straightforward and, depending on the programming tool, can range from designing hardware to software programming that requires substantial hardware knowledge. The development of HPRCs has made substantial progress in the past several years, and nearly all major high-performance computing vendors now have HPRC product lines. This reflects a clear belief that HPRCs High-performance reconfigurable computers have the potential to exploit coarse-grained functional parallelism as well as fine-grained instruction-level parallelism through direct hardware execution on FPGAs. Duncan Buell, University of South Carolina Tarek El-Ghazawi, George Washington University Kris Gaj, George Mason University Volodymyr Kindratenko, University of Illinois at Urbana-Champaign High- Performance Reconfigurable Computing

Upload: others

Post on 01-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High- Performance Reconfigurable Computing

0018-9162/07/$25.00 © 2007 IEEE March 2007 23P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y

G U E S T E D I T O R S ’ I N T R O D U C T I O N

H igh-performance reconfigurable computers(HPRCs)1,2 based on conventional processorsand field-programmable gate arrays (FPGAs)3

have been gaining the attention of the high-per-formance computing community in the past

few years.4 These synergistic systems have the potentialto exploit coarse-grained functional parallelism as wellas fine-grained instruction-level parallelism throughdirect hardware execution on FPGAs.

HPRCs, also known as reconfigurable supercom-puters, have shown orders-of-magnitude improvementin performance, power, size, and cost over conventionalhigh-performance computers (HPCs) in some compute-intensive integer applications. However, they still havenot achieved high performance gains in most generalscientific applications. Programming HPRCs is still notstraightforward and, depending on the programmingtool, can range from designing hardware to softwareprogramming that requires substantial hardwareknowledge.

The development of HPRCs has made substantialprogress in the past several years, and nearly all majorhigh-performance computing vendors now have HPRCproduct lines. This reflects a clear belief that HPRCs

High-performance reconfigurable

computers have the potential to exploit

coarse-grained functional parallelism as

well as fine-grained instruction-level

parallelism through direct hardware

execution on FPGAs.

Duncan Buell, University of South Carolina

Tarek El-Ghazawi, George Washington University

Kris Gaj, George Mason University

Volodymyr Kindratenko, University of Illinois at

Urbana-Champaign

High-PerformanceReconfigurableComputing

Page 2: High- Performance Reconfigurable Computing

24 Computer

have tremendous potential and that resolving all remain-ing issues is just a matter of time.

This special issue will shed some light on the state of the field of high-performance reconfigurable computing.

WHAT ARE HIGH-PERFORMANCERECONFIGURABLE COMPUTERS?

HPRCs are parallel computing systems that containmultiple microprocessors and multiple FPGAs. In cur-rent settings, the design uses FPGAs as coprocessors thatare deployed to execute the smallportion of the application that takesmost of the time—under the 10-90rule, the 10 percent of code thattakes 90 percent of the executiontime. FPGAs can certainly accom-plish this when computations lendthemselves to implementation inhardware, subject to the limitationsof the current FPGA chip architec-tures and the overall system datatransfer constraints.

In theory, any hardware reconfigurable devices thatchange their configurations under the control of a pro-gram can replace the FPGAs to satisfy the same key con-cepts behind this class of architectures. FPGAs, however,are the currently available technology that provides themost desirable level of hardware reconfigurability. Xilinx,followed by Altera, dominates the FPGA market, butnew startups are also beginning to enter this market.

FPGAs are based on SRAM, but they vary in struc-ture. Figure A in the “FPGA Architecture” sidebarshows an FPGA’s internal structure based on the Xilinxarchitecture style. The configurable logic block (CLB) isthe basic building block for creating logic. It includesRAM used as a lookup table and flip-flops for buffer-ing, as well as multiplexers and carry logic. A side-by-side 2D array of switching matrices for programmablerouting connects the 2D array of CLBs.

PROGRESS IN SYSTEM HARDWARE ANDPROGRAMMING SOFTWARE

During the past few years, many hardware systemshave begun to resemble parallel computers. When suchsystems originally appeared, they were not designed tobe scalable—they were merely a single board of one ormore FPGA devices connected to a single board of oneor more microprocessors via the microprocessor bus orthe memory interface.

The recent SRC-6 and SRC-7 parallel architecturesfrom SRC Computers use a crossbar switch that can bestacked for further scalability. In addition, traditionalhigh-performance computing vendors—specifically,Silicon Graphics Inc. (SGI), Cray, and Linux Networx—have incorporated FPGAs into their parallel architec-

tures. In addition to the SRC-7, models of such HPCsystems include the SGI RASC RC100 and the CrayXD1 and XT4. The Linux Networx work focuses onthe design of the acceleration boards and on couplingthem with PC nodes for constructing clusters.

On the software side, SRC Computers provides asemi-integrated solution that addresses the hardware(FPGA) and software (microprocessor) sides of theapplication separately. The hardware side is expressedusing Carte C or Carte Fortran as a separate function,compiled separately and linked to the compiled C (or

Fortran) software side to form oneapplication.

Other hardware vendors use athird-party software tool, such asImpulse C, Handel-C, Mitrion C, orDSPlogic’s RC Toolbox. However,these tools handle only the FPGA sideof the application, and each machinehas its own application interface tocall those functions. At present,Mitrion C and Handel-C support theSGI RASC, while Mitrion C, Impulse

C, and RC Toolbox support the Cray XD1. Only alibrary-based parallel tool such as the message-passinginterface can handle scaling an application beyond onenode in a parallel system.

RESEARCH CHALLENGES AND THE EVOLVING HPRC COMMUNITY

FPGAs were first introduced as glue logic and even-tually became popular in embedded systems. WhenFPGAs were applied to computing, they were introducedas a back-end processing engine that plugs into a CPUbus. The CPU in this case did not participate in the com-putation, but only served as the front end (host) to facil-itate working with the FPGA.

The limitations of each of these scenarios left manyissues that have not been explored, yet they are of greatimportance to HPRC and the scientific applications ittargets. These issues include the need for programmingtools that address the overall parallel architecture. Suchtools must be able to exploit the synergism betweenhardware and software execution and should be able tounderstand and exploit the multiple granularities andlocalities in such architectures.

The need for parallel and reconfigurable performanceprofiling and debugging tools also must be addressed.With the multiplicity of resources, operating system sup-port and middleware layers are needed to shield usersfrom having to deal with the hardware’s intricate details.Further, application-portability issues should be thor-oughly investigated. In addition, new chip architecturesthat can address the floating-point requirements of sci-entific applications should be explored. Portablelibraries that can support scientific applications must be

HPRCs are

parallel computing

systems that contain

multiple

microprocessors

and multiple FPGAs.

Page 3: High- Performance Reconfigurable Computing

March 2007 25

FPGA Architecture Ross Freeman, one of the founders of Xilinx (www.

xilinx.com), invented field-programmable gate arraysin the mid-1980s. 1 Other current FPGA vendorsinclude Altera (www.altera.com), Actel(www.actel.com), Lattice Semiconductor (www.latticesemi.com), and Atmel (www.atmel.com).

As Figure A shows, an FPGA is a semiconductordevice consisting of programmable logic elements,interconnects, and input/output (I/O) blocks(IOBs)—all runtime user-configurable—that allowimplementing complex digital circuits. The IOBsform a ring around the outer edge of the microchip;each IOB provides individually selectable I/O accessto one of the I/O pins on the exterior of the FPGApackage. A rectangular array of logic blocks liesinside the IOB ring.

A typical FPGA logic block consists of a four-inputlookup table (LUT) and a flip-flop. Modern FPGAdevices also include higher-level functionalityembedded into the silicon, such as generic DSPblocks, high-speed IOBs, embedded memories, andembedded processors. Programmable interconnectwiring is implemented so that it’s possible to connectlogic blocks to logic blocks and IOBs to logic blocksarbitrarily.

A slice (using Xilinx terminology) or adaptivelogic module (using Altera terminology), whichcontains a small set of basic building blocks—forexample, two LUTs, two flip-flops, and some controllogic—is the basic unit area when determining anFPGA-based design’s size. Configurable logic blocks(CLBs) consist of multiple slices. Modern FPGAsconsist of tens of thousands of CLBs and a program-mable interconnection network arranged in a rec-tangular grid.

Unlike a standard application-specific integratedcircuit that performs a single specific function for achip’s lifetime, an FPGA chip can be reprogrammed toperform a different function in a matter of microsec-onds. Typically, either source code written in a hard-ware description language, such as VHDL or Verilog, ora schematic design provides the functionality that anFPGA assumes at runtime.

As Figure B shows, in the first step, a synthesisprocess generates a technology-mapped netlist. Amap, place, and route process then fits the netlistto the actual FPGA architecture. The process gener-ates a bitstream—the final binary configurationfile—that can be used to reconfigure the FPGA.Timing analysis, simulation, and other verificationmethodologies can validate the map, place, androute results.

Reference

1. S.M. Trimberger, ed., Field-Programmable Gate Array Tech-nology, Kluwer Academic, 1994.

IOB

IOB

IOB

IOB IOB

CLB CLB CLB CLB

BRAM BRAM BRAM

DSP DSP DSP

CLBCLB

CLBCLB

CLB CLB CLB CLBCLBCLB

CLB CLB CLB CLBCLBCLB

CLB CLB

IOB IOB IOB IOB

IOB

IOB

IOB

IOB

IOB IOB IOB IOB IOB

IOB

IOB

IOB

IOB

Verification

Postsynthesissimulation

Functionalsimulation

Timingsimulation

HDLimplementation

Algorithm

Netlist

Bitstream

SynthesisSynthesis

Implementation:map, place,and route

Figure A. FPGA internal structure based on the Xilinx architecture

style. An FPGA can be described as “islands” of (reconfigurable)

logic in a “sea” of (reconfigurable) connectors.

Figure B.Typical FPGA design flow.

Page 4: High- Performance Reconfigurable Computing

26 Computer

sought, and the need for more closely integrated micro-processor and FPGA architectures to facilitate the data-intensive hardware/software interactions should befurther studied.

As researchers pursue developments to meet a widerange of HPRC requirements, the failure to incorporatestandardization into some of these efforts would bedetrimental. It can be particularly useful if academia,industry, and government work together to create a com-munity that can approach these problems with the fullintellectual intensity it deserves, subject to the needs ofthe end users and the experience of the implementers.

Some of this community-forming has been alreadyobserved. On the one hand, OpenFPGA (www.openfpga.org) has recently been formed as a consortium thatmainly pursues standardization. Onthe other, the NSF has recentlygranted to the University of Floridaand George Washington Universityan Industry/University Center forHigh-Performance ReconfigurableComputing (http://chrec.ufl.edu)award. The center includes morethan 20 industry and governmentmembers who will guide the univer-sity research projects.

IN THIS ISSUEWe have selected five articles for this special issue that

represent the latest trends and developments in theHPRC field. The first two cover particularly importanttopics: a C-to-FPGA compiler and a library frameworkfor code portability across different RC platforms. Thethird article describes an extensive collection of FPGAsoftware development patterns, and the last two describeHPRC applications.

In “Trident: From High-Level Language to HardwareCircuitry,” Justin Tripp, Maya Gokhale, and KristopherPeterson describe an effort undertaken at the LosAlamos National Laboratory to build Trident, a high-level-language to hardware-description-language com-piler that translates C language programs to FPGAhardware circuits. While several such compilers are com-mercially available, Trident’s unique characteristicsinclude its open source availability, open framework,ability to use custom floating-point libraries, and abilityto retarget to new FPGA board architectures. Theauthors enumerate the compiler framework’s buildingblocks and provide some results obtained on the CrayXD1 platform.

“V-Force: An Extensible Framework for Recon-figurable Computing” by Miriam Leeser and her col-leagues and students from Northeastern University andthe College of the Holy Cross outlines their efforts toimplement the Vforce framework. Based on the object-oriented VSIPL++ standard, Vforce encapsulates hard-

ware-specific implementations behind a standard API,thus insulating application-level code from hardware-specific details. As a result, as long as the third-partyhardware-specific implementation is available, the sameapplication code can run on different reconfigurablecomputer architectures with no change. The authorsinclude examples of applications and results from usingVforce for application development.

In “Achieving High Performance with FPGA-BasedComputing,” Martin Herbordt and his students fromBoston University share a valuable collection of FPGAsoftware design patterns. The authors start with anobservation that the performance of HPC applicationsaccelerated with FPGA coprocessors is “unusually sen-sitive” to the quality of the implementation. They exam-

ine reasons for such a “sensitivity,”list numerous methods and tech-niques to avoid generating “imple-mentational heat,” and provide afew application examples thatgreatly benefit from the uncovereddesign patterns.

“Sparse Matrix Computations on Reconfigurable Hardware,” byGerald Morris and Viktor Prasannadescribes implementations of conju-gate gradient and Jacobi sparse

matrix solvers. In “Using FPGA Devices to AccelerateBiomolecular Simulations,” Sadaf Alam and her col-leagues from the Oak Ridge National Laboratory andSRC Computers describe an effort to port a productionsupercomputing application, a molecular dynamics codecalled Amber, to a reconfigurable supercomputer plat-form. Although the speedups obtained while porting theseapplications—highly optimized for the conventionalmicroprocessors—to an SRC-6 reconfigurable computerare not spectacular, these articles accurately capture theoverall trend.

Reconfigurable supercomputing has demonstrated itspotential to accelerate computationally demanding appli-cations and is rapidly entering the mainstream HPC world.

H igh-performance reconfigurable computing hasdemonstrated its potential to accelerate demandingcomputational applications. Much, however, must

be done before this technology becomes a mainstreamcomputing paradigm. The articles in this issue highlighta small subset of challenging problems that must beaddressed. We encourage you to get involved with HPRCand contribute to this newly developing field. ■

References

1. D.A. Buell, J.M. Arnold, and W.J. Kleinfelder, eds., Splash 2:FPGAs in a Custom Computing Machine, IEEE CS Press, 1996.

High-performance

reconfigurable computing

has demonstrated its

potential to accelerate

demanding computational

applications.

Page 5: High- Performance Reconfigurable Computing

2. M.B. Gokhale and P.S. Graham, Reconfigurable Computing:Accelerating Computation with Field-Programmable GateArrays, Springer, 2005.

3. S.M. Trimberger, ed., Field-Programmable Gate Array Tech-nology, Kluwer Academic, 1994.

4. T. El-Ghazawi et al., “Reconfigurable Supercomputing Tutorial,” Int’l Conf. High-Performance Computing, Net-working, Storage and Analysis (SC06); http://sc06.supercomputing.org/schedule/event_detail.php?evid=5072.

Duncan Buell is a professor in the Department of Com-puter Science and Engineering at the University of SouthCarolina, Columbia. Buell received a PhD in mathe-matics from the University of Illinois at Chicago. Con-tact him at [email protected].

Tarek El-Ghazawi is a professor in the Department of Elec-trical and Computer Engineering at the George WashingtonUniversity, Washington, D.C. El-Ghazawi received a PhDin electrical and computer engineering from New MexicoState University. Contact him at [email protected].

Kris Gaj is an associate professor in the Department of Electrical and Computer Engineering at George Mason University, Fairfax, Virginia. Gaj received a PhD in electri-cal engineering from Warsaw University of Technology,Poland. Contact him at [email protected].

Volodymyr Kindratenko is a senior research scientist at theNational Center for Supercomputing Applications, Uni-versity of Illinois at Urbana-Champaign, Urbana. Hereceived a DSc in analytical chemistry from the Universityof Antwerp, Belgium. Contact him at [email protected].

March 2007 27