105.100.000 designntest esl special edition complete
TRANSCRIPT
SEPTEMBER-OCTOBER 2006
Electronic System-Level DesignComponent-Based Design • Platform-Based Taxonomy
• Improving Transition Delay Test
IEEE DESIGN & TEST OF COM
PUTERSSeptem
ber-October 2006 Electronic System
-Level DesignVOLUM
E 23 NUMBER 5
ALEXANDER TORRES 2006
SPECIAL ITC SECTION
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:08 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:08 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:10 UTC from IEEE Xplore. Restrictions apply.
Features
335 Guest Editors’Introduction: The
True State of the Art of ESL DesignSandeep K. Shukla, Carl Pixley, and
Gary Smith
338 AComponent-BasedDesign Environment
for ESL DesignPatrick Schaumont and
Ingrid Verbauwhede
348 Modeling EmbeddedSystems: From
SystemC and Esterel to DFChartsIvan Radojevic, Zoran Salcic, and
Partha S. Roop
359 APlatform-BasedTaxonomy for ESL
DesignDouglas Densmore,
Roberto Passerone, and
Alberto Sangiovanni-Vincentelli
375 The Challenges ofSynthesizing Hardware
from C-Like LanguagesStephen A. Edwards
ITC Special Section
388 Guest Editor’sIntroduction: ITC Helps
Get More out of TestKenneth M. Butler
390 Extracting Defect Densityand Size Distributions
from Product ICsJeffrey E. Nelson, Thomas Zanon,
Jason G. Brown, Osei Poku,
R.D. (Shawn) Blanton, Wojciech Maly,
Brady Benware, and Chris Schuermyer
402 Improving Transition Delay Test Using a
Hybrid MethodNisar Ahmed and
Mohammad Tehranipoor
414 Impact of ThermalGradients on Clock Skew
and TestingSebastià A. Bota, Josep L. Rosselló,
Carol de Benito, Ali Keshavarzi, and
Jaume Segura
September–October 2006Volume 23 Number 5
http://www.computer.org/dt
Copublished by the
IEEE Computer Society
and the IEEE Circuits and
Systems Society
ISSN 0740-7475
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:12 UTC from IEEE Xplore. Restrictions apply.
Cover design by Alexander Torres
Departments333 From the EIC387 Counterpoint425 TTTC Newsletter426 Book Reviews428 Standards430 CEDA Currents432 The Last Byte
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:12 UTC from IEEE Xplore. Restrictions apply.
Staff EditorRita Scanlan
IEEE Computer Society10662 Los Vaqueros CircleLos Alamitos, CA 90720-1314Phone: +1 714 821 8380Fax: +1 714 821 [email protected]
Group Managing EditorJanet [email protected]
Assoc. Staff EditorEd Zintel
Magazine [email protected]
Contributing EditorsThomas Centrella
Noel Deeley
Tim Goldman
Louise O’Donald
Joan Taylor
Art DirectionJoseph Daigle
Cover DesignAlexander Torres
PublisherAngela Burgess
Associate PublisherDick Price
Membership/CirculationMarketing Manager
Georgann Carter
Business Devel. Mgr.Sandy Brown
Advertising CoordinatorMarian Anderson
Editor in ChiefKwang-Ting (Tim) ChengUniv. of California, Santa Barbara
Editor in Chief EmeritusRajesh K. Gupta
Univ. of California, San Diego
Associate EICMagdy Abadir
Freescale Semiconductor
CS Publications BoardJon Rokne (chair)Michael R. BlahaMark ChristensenFrank FerranteRoger U. FujiiPhillip LaplanteBill N. SchilitLinda ShaferSteven L. TanimotoWenping Wang
CS Magazine OperationsCommittee
Bill N. Schilit (chair)Jean BaconPradip BoseArnold (Jay) BraggDoris L. CarverKwang-Ting (Tim) ChengNorman ChonackyGeorge V. CybenkoJohn C. DillRobert E. FilmanDavid Alan GrierWarren HarrisonJames HendlerSethuraman (Panch)
PanchanathanRoy Want
Submission Information: Submit a Word, pdf, text, or PostScript version of your
submission to Manuscript Central, http://cs-ieee.manuscriptcentral.com.
Editorial: Unless otherwise stated, bylined articles and columns, as well as product and service
descriptions, reflect the author’s or firm’s opinions. Inclusion in IEEE Design & Test of Computers
does not necessarily constitute endorsement by the IEEE Computer Society or the IEEE Circuits
and Systems Society. All submissions are subject to editing for clarity and space considerations.
Copyright and reprint permissions: Copyright ©2006 by the Institute of Electrical and
Electronics Engineers, Inc. All rights reserved. Abstracting is permitted with credit to the
source. Libraries are permitted to photocopy beyond the limits of US Copyright Law for
private use of patrons (1) those post-1977 articles that carry a code at the bottom of the first
page, provided the per-copy fee indicated in the code is paid through the Copyright
Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; (2) for other copying, reprint, or
republication permission, write to Copyrights and Permissions Department, IEEE Publications
Administration, 445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855-1331.
IEEE Design & Test of Computers (ISSN 0740-7475) is copublished bimonthly by the IEEE
Computer Society and the IEEE Circuits and Systems Society. IEEE Headquarters: 345 East 47th
St., New York, NY 10017-2394. IEEE Computer Society Publications Office: 10662 Los Vaqueros
Circle, PO Box 3014, Los Alamitos, CA 90720-1314; phone +1 714 821 8380. IEEE Computer
Society Headquarters: 1730 Massachusetts Ave. NW, Washington, DC 20036-1903. IEEE Circuits
and Systems Society Executive Office, 445 Hoes Lane, Piscataway, NJ 08854; phone +1 732 465
5853. Annual subscription: $38 for CS members and $68 for other IEEE society members in
addition to IEEE and Computer Society dues; $69 for members of other technical organizations
outside the IEEE. Back issues: $20 for members and $96 for nonmembers. The Biomedical
Engineering Citation Index on CD-ROM lists IEEE Design & Test of Computers articles.
Postmaster: Send undelivered copies and address changes to IEEE Design & Test of
Computers, Circulation Dept., PO Box 3014, Los Alamitos, CA 90720-1314. Periodicals postage
paid at New York, NY, and at additional mailing offices. Canadian GST #125634188. Canada Post
Corp. (Canadian distribution) Publications Mail Agreement #40013885. Return undeliverable
Canadian addresses to 4960-2 Walker Road; Windsor, ON N9A 6J3. Printed in USA.
TECHNICAL AREAS____Analog and Mixed-SignalTest: Michel Renovell, LIRMM;[email protected]
CAE/CAD: Dwight Hill, Synopsys;[email protected]
Configurable Computing: FadiKurdahi, University of California,Irvine; [email protected]
Deep-Submicron IC Designand Analysis: Sani Nassif, IBM;[email protected]
Defect and Fault Tolerance:Michael Nicolaidis,iRoC Technologies;[email protected]
Defect-Based Test: Adit Singh,Auburn University,[email protected]
Design for Manufacturing,Yield, and Yield Analysis:Dimitris Gizopoulos, University ofPiraeus; [email protected]
Design Reuse: Grant Martin,Tensilica; [email protected]
Design Verification andValidation: Carl Pixley, Synopsys;[email protected]
Economics of Design andTest: Magdy Abadir, Freescale;[email protected]
Embedded Systems andSoftware: Sharad Malik,Princeton University;[email protected]
Embedded Test: Cheng-Wen Wu,National Tsing Hua University;[email protected]
Infrastructure IP: André Ivanov,University of British Columbia;[email protected]
Low Power: Anand Raghunathan,NEC USA; [email protected]
Memory Test: Fabrizio Lombardi,Northeastern University;[email protected]
Microelectronic IC Packaging:Bruce Kim, University of Alabama;[email protected]
Nanotechnology Architecturesand Design Technology:Seth Goldstein,Carnegie Mellon University;[email protected]
Performance Issues in IC Design:Sachin Sapatnekar, University ofMinnesota; [email protected]
SoC Design: Soha Hassoun, Tufts University; [email protected]
System Specification andModeling: Sandeep Shukla,Virginia Polytechnic and StateUniversity; [email protected]
Member at Large: Kaushik Roy,Purdue University;[email protected]
DEPARTMENTSBook Reviews: Scott Davidson,Sun Microsystems,[email protected];Grant Martin, Tensilica,[email protected]; and SachinSapatnekar, Univ. of Minnesota,[email protected]
Conference Reports andPanel Summaries: YervantZorian, Virage Logic;[email protected]
DATC Newsletter: Joe Damore;[email protected]
Interviews: Ken Wagner, DesignImplementation and OttawaDesign Centre, PMC Sierra;[email protected]
The Last Byte: Scott Davidson,Sun Microsystems;[email protected]
Perspectives: AlbertoSangiovanni-Vincentelli,University of California, Berkeley,[email protected]; andYervant Zorian, Virage Logic,[email protected]
The Road Ahead: AndrewKahng, University of California,San Diego; [email protected]
Roundtables: William H. JoynerJr., Semiconductor ResearchCorp.; [email protected]
Standards: Victor Berman,Cadence Design Systems;[email protected]
TTTC Newsletter: Bruce Kim, University of Alabama;[email protected]
D&T ALLIANCEPROGRAM______________DTAP chair:Yervant Zorian, Virage Logic;[email protected]
Asia: Hidetoshi Onodera, KyotoUniversity; [email protected]
CANDE: Richard C. Smith,EDA and Application ProcessConsulting; [email protected]
DAC: Luciano Lavagno,Politecnico di Torino,[email protected]; and Andrew Kahng, University of California, San Diego
DATC: Joe Damore;[email protected]
DATE: Ahmed Jerraya, TIMA;[email protected]
Europe: Bernard Courtois, TIMA-CMP; [email protected]
Latin America: Ricardo Reis,Universidade Federal do RioGrande do Sul; [email protected]
TTTC: André Ivanov, University of British Columbia;[email protected]
ADVISORY BOARD_____Anthony Ambler,University of Texas at Austin
Ivo Bolsens, Xilinx
William Mann
Tom Williams, Synopsys
Yervant Zorian, Virage Logic
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:14 UTC from IEEE Xplore. Restrictions apply.
3330740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006
DESIGNERS ARE HUNGRY for electronic system-
level (ESL) methodologies and supporting tools that
can raise the abstraction level of design entry and
enhance the global analysis and exploration of design
trade-offs. A recent report by Gartner Dataquest on
worldwide EDA market trends forecasted a strong
growth rate for ESL tools over the next five years.
However, existing solutions remain inadequate, and a
comprehensive ESL design infrastructure brings with
it several challenges that design and test professionals
must solve. This issue of IEEE Design & Test discusses
some of these challenges and their corresponding solu-
tions. Guest editors Sandeep Shukla, Carl Pixley, and
Gary Smith have collected a set of interesting articles
concerning languages, tools, and methodologies of
ESL design. I’d like to thank them for the great job
they’ve done in putting together this strong issue.
In addition, we are happy to present a special sec-
tion highlighting the 2006 International Test Conference
(ITC). In the sub-65-nanometer technology era, in which
electronic products encounter a far wider variety of fail-
ure sources and a higher failure rate than ever, test has
gradually expanded its role in the semiconductor indus-
try. Test is no longer limited to defect detection. It has
become a critical technology for debugging, yield
improvement, and design for reliability as well. This
trend inspired this year’s ITC theme, “Getting More out
of Test.” Guest editor Ken Butler, 2005 ITC program
chair, has selected three articles for this special section
that highlight this theme.
We also have some exciting plans for the next few
issues of D&T. Special-issue themes will include impor-
tant industry topics such as process variation and sto-
chastic design and test, biochips, functional validation,
and IR drop and power supply noise effects on design
and test. We will also present exciting roundtables, such
as the one moderated by Andrew Kahng at the 43rd
Design Automation Conference (DAC 06), on design
and tool challenges for next-generation multimedia,
game, and entertainment platforms. In addition, at the
6th International Forum on Application-Specific Multi-
Processor SoC (MPSoC 06), Roundtables editor Bill
Joyner moderated a roundtable on single-chip multi-
processor architectures, which we will include in a
future issue of D&T. You will also see interesting inter-
views with key technologists, such as Texas Instruments’
Hans Stork, keynote speaker at this year’s DAC.
If you’d like to participate in a future D&T issue,
please submit your theme or nontheme manuscript as
soon as it is ready. To serve as a guest editor, submit
your special-issue proposal for evaluation by the D&T
editorial board. See D&T’s Web site (http://computer.
org/dt) for guidelines. For additional information or
clarification, please feel free to contact me directly.
Kwang-Ting (Tim) Cheng
Editor in Chief
IEEE Design & Test
The new world of ESL design
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:19 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:30 UTC from IEEE Xplore. Restrictions apply.
0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006 335
ESL, OR ELECTRONIC SYSTEM LEVEL, is a new
buzzword in the EDA industry. It has found its way into
the mainstream EDA vocabulary in the past few years
because of increased interest in finding new ways to raise
the abstraction level for the design entry point during the
electronic-systems design process.
In hardware design over the past three decades, the
design entry point has moved upward in the abstraction
hierarchy—from hand-drawn schematics to gate-level
design, to RTL descriptions. As hardware design com-
plexity has become increasingly unmanageable, find-
ing ways to design hardware ICs at higher abstraction
levels and developing tools to automatically create the
circuits’ actual layouts has gained more importance in
industry and academia. This upward trend in abstrac-
tion has enabled engineers to exploit the scientific and
engineering advances that have tracked Moore’s law
quite closely.
Since the late 1980s, the design entry point’s abstrac-
tion level had remained almost stagnant at the structur-
al RTL. Behavioral synthesis had remained mostly
elusive, with some domain-specific success areas, such
as DSP chips. But by the late 1990s, recognition of the so-
called “productivity gap problem” led to various
attempts at abstraction enhancement. These attempts
gave rise to various languages for system-level design,
such as SpecC, SystemC, and variants of these. Tools and
methodologies for using these languages for design entry
have emerged in the market and in academic circles.
In the meantime, integrated systems such as cell
phones, network routers, consumer electronics, and per-
sonal electronic devices like PDAs started to dominate
the electronics landscape. In contrast to earlier comput-
ing devices such as microcontrollers and general-purpose
microprocessors (GPPs), these systems had one thing in
common: you could distribute their functionality into
hardware or software with sufficient fluidity based on var-
ious trade-offs in performance, power, cost, and so on.
This development broke the hardware abstraction for
software development hitherto used in traditional com-
puting platforms, such as GPPs, as illustrated by the
Windows and Intel platforms in desktop computing.
Another phenomenon that had occurred for decades
in avionics, automotive, and industrial-control systems
also gained increased attention among EDA researchers.
The design of such systems’ embedded software was typ-
ically at a much higher abstraction level, using synchro-
nous languages, Argos-like visual environments, and so
on to describe required control trajectories. Control-sys-
tems engineers had also used Matlab and similar mathe-
matical and visual enhancements of such tools for
decades to design, validate, and even synthesize their
software. In the meantime, increasingly more computing
devices were mixtures of software and hardware, and
there was increased flexibility for deciding the hardware-
software partitioning. Consequently, architectural explo-
ration at the functional and architectural level became
increasingly critical for finding the right trade-off points
in the design. It’s best to perform such explorations
before the designers commit to the RTL hardware logic
design, or before the embedded-software writers commit
to the embedded-software code.
These evolutionary trajectories of electronic-system
design led to the introduction of ESL design. According
Guest Editors’ Introduction:The True State of the Art ofESL DesignSandeep K. Shukla
Virginia Polytechnic and State University
Carl Pixley
Synopsys
Gary Smith
Gartner Dataquest
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:26 UTC from IEEE Xplore. Restrictions apply.
to the popular “ESL Now!” Web site (http://www.
esl-now.com), ESL design concerns the following:
■ “the development of product architectures and spec-
ifications, including the incorporation and configu-
ration of IP,”
■ “the mapping of applications to a product specifica-
tion, including hardware/software partitioning and
processor optimization,”
■ “the creation of pre-silicon, virtual hardware plat-
forms for software development,”
■ “the determination/automation of a hardware imple-
mentation for that architecture,” and
■ “the development of reference models for verifying
the hardware.”
In this special issue, we explore recent developments
in ESL languages, tools, techniques, and methodologies
related to improving productivity or enhancing design
quality. We wanted this issue to answer the following
key questions regarding ESL design:
■ What is ESL design, and what are the current lan-
guages that support ESL features?
■ What tool chains and design flows are appropriate
for ESL-based design and validation?
■ What new validation techniques and methodologies
are available if ESL abstractions are used in a design
flow? Are there any test technology benefits?
■ Are there major industrial projects today that have
been successful due to ESL usage?
■ What are the market indicators and forces that might
make or break ESL design?
Although the articles in this special issue don’t nec-
essarily answer all these questions, they address some
key issues and are quite thought-provoking. In the first
article, Patrick Schaumont and Ingrid Verbauwhede
focus on two properties they see as key to ESL design:
abstraction and reuse. They present an ESL design flow
using the Gezel language, and they show with several
very different design examples how Gezel supports their
case for reuse and abstraction.
The second article, by Ivan Radojevic, Zoran Salcic,
and Partha Roop, considers the need for directly express-
ing heterogeneous, hierarchical behaviors for modeling
specific embedded systems. The authors examined two
existing ESL languages: SystemC and Esterel. Their analy-
sis led them to create a new computation model as well
as a graphical language to gain the direct expressivity
they need for their model. Although there have been var-
ious attempts at changing SystemC and Esterel to fit mod-
eling requirements, these authors mainly consider
standard SystemC and Esterel here.
In the next article, Douglas Densmore, Roberto
Passerone, and Alberto Sangiovanni-Vincentelli attempt
to stem the seemingly ever-increasing tide of confusion
that permeates the ESL world. Not only are software
developers and hardware designers having difficulty find-
ing a common language—verbally, as well as design-
wise—but communication failures are common within
those communities as well. Traditionally, there are three
rules of design: First, there is a methodology, then there
is a design flow, and last there are the tools necessary to
fill that flow. But, as this article points out, we seem to
have approached ESL backward. We have built tools, but
we have no flow. And, it goes without saying, we have no
methodology. No wonder then that the predictions of ESL
taking off in the next four years seem to be overly opti-
mistic. Still, the customer demand is there. But these cus-
tomers have had to fill the need with internally
developed ESL tools. The University of California,
Berkeley, has long been the champion of platform-based
design, and these authors base their taxonomy on a com-
bination of UC Berkeley’s platform work and Dan Gajski’s
Y-chart work (at UC Irvine). Hopefully, this taxonomy will
help stem the tide of confusion and enable the design
community to turn around its ESL efforts.
Finally, the article by Stephen Edwards presents one
side of an ongoing debate on the appropriateness of C-
like languages as hardware description languages. In
the ESL landscape, it is often assumed that a high-level
programming language can substitute for a higher-
abstraction-level hardware description language. This
article attempts to deconstruct such a myth about the C
programming language by extensively documenting the
shortcomings of such an approach and by identifying
the features that an ESL language should have. A brief
alternative opinion by John Sanguinetti immediately fol-
lows this article.
ESL DESIGN, METHODOLOGIES, LANGUAGES, AND
TOOLS are still not clearly identified and taxonomized,
and the articles in this special issue attempt to reduce
some of the confusion regarding the term ESL. However,
we believe that we are still in the early stages of ESL-
based design. Many more discussions, expository arti-
cles, and debates must take place before it can find its
permanent design entry point in industry.
Electronic System-Level Design
336 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:26 UTC from IEEE Xplore. Restrictions apply.
The articles in this special issue could not cover
everything. Although many of the synthesis technologies
mentioned address algorithm design—for instance, for
DSP—technologies to synthesize high-level control logic
are necessary for ESL design to address the breadth of
circuits designed by hardware engineers. In the recent
past, researchers insufficiently addressed behavioral syn-
thesis, but this segment is now showing increased activ-
ity. Bluespec (http://www.bluespec.com), for example,
offers new technology to raise the abstraction level for
complex control logic and to synthesize RTL design from
these descriptions. Other behavioral-synthesis solutions
are coming to the market as transaction-level models.
We hope you find the articles in this special issue
interesting. We encourage you to send us critiques,
comments, or questions about this special issue. Letters
to the editor for publication in future issues are also
encouraged. Finally, we thank the authors, the review-
ers, and the editorial staff at IEEE Design & Test for their
help in making this issue possible. ■
Sandeep K. Shukla is an assistantprofessor of computer engineering atVirginia Tech. He is also founder anddeputy director of the Center forEmbedded Systems for Critical Appli-
cations (CESCA), and he directs the Fermat (FormalEngineering Research with Models, Abstractions, andTransformations) research lab. His research interestsinclude design automation for embedded-systemsdesign, especially system-level design languages, for-mal methods, formal specification languages, proba-bilistic modeling and model checking, dynamic powermanagement, application of stochastic models andmodel analysis tools for defect-tolerant system design,and reliability measurement of defect-tolerant systems.Shukla has a PhD in computer science from the StateUniversity of New York (SUNY) at Albany. He has beenelected as a College of Engineering Faculty Fellow atVirginia Tech, and he is on the editorial board of IEEEDesign & Test.
Carl Pixley is group director at Syn-opsys. His pioneering achievementsinclude model checking based onbinary decision diagrams (BDDs),Boolean equivalence, alignability
equivalence, constraint-based verification, and C-to-RTL verification. Pixley has a PhD in mathematics fromSUNY at Binghamton. He is a member of the IEEE and
the Mathematics Association of America, and is verifi-cation editor for IEEE Design & Test.
Gary Smith is a chief analyst atGartner Dataquest, where he is part ofthe Design & Engineering Group andserves in the Electronic DesignAutomation Worldwide program. His
research interests include design methodologies,ASICs, and IC design. Smith has a BS in engineeringfrom the United States Naval Academy in Annapolis,Maryland. He is a member of the Design TechnologyWorking Group for the International TechnologyRoadmap for Semiconductors (ITRS).
Direct questions or comments about this specialissue to Sandeep K. Shukla, Department of Electricaland Computer Engineering, Virginia Polytechnic andState University, Blacksburg, VA 24061, [email protected]; Carl Pixley, Synopsys, 2025 NW Cornelius PassRd., Hillsboro, OR 97124, [email protected]; orGary Smith, Gartner Dataquest, 281 River Oaks Pkwy,San Jose, CA 95134, [email protected].
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
337September–October 2006
The IEEE Computer Society
publishes over 150 conference
publications a year.
For a previewof the latest papersin your field, visit
The IEEE Computer Society
publishes over 150 conference
publications a year.
For a previewof the latest papersin your field, visit
www.computer.org/publications/
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:26 UTC from IEEE Xplore. Restrictions apply.
Electronic System-Level Design
338 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
RECENTLY, there has been an increasingly greater
variety of target architecture options for digital electronics
design. Whereas the driving applications for these archi-
tectures are often governed by standards and thus tend
to be regularized, there is still a lot of design freedom in
the target architectures themselves. There is a wide range
of programmable-processor architectures,1,2 and with any
given application, designers must balance performance,
power consumption, time to market, and silicon cost.3
The obvious question is how to choose the most appro-
priate target architecture for a given application.
In this article, we present Gezel, a component-based,
electronic system-level (ESL) design environment for
heterogeneous designs. Gezel consists of a simple but
extendable hardware description language (HDL) and
an extensible simulation-and-refinement kernel. Our
approach is to create a system by designing, integrating,
and programming a set of programmable components.
These components can be processor models or hard-
ware simulation kernels. Using Gezel, designers can
clearly distinguish between component design, plat-
form integration, and platform programming, thus sep-
arating the roles of component builder, platform
builder, and platform user.
Embedded applications have driven the develop-
ment of this ESL design environment. To demonstrate
the broad scope of our component-based approach, we
discuss three applications that use our environment; all
are from the field of embedded security.
ESL design has many facesA common definition for ESL design
is the collection of design techniques for
selecting and refining an architecture.
But ESL design has many aspects and
forms. Even within a single application
domain, system-level design can show
wide variations that are difficult to cap-
ture with universal design languages and architectures.
Therefore, you can also think of ESL design as the abil-
ity to successfully assemble a system out of its con-
stituent parts, regardless of their heterogeneity or nature.
Consider the following three examples. All of them
closely relate to design for secure embedded systems,
but they also require very different design configura-
tions. Thus, these examples show the need for a more
general approach, which we achieve using Gezel.
Example 1: Public-key cryptography on 8-bitmicrocontrollers
Sensor networks and radio-frequency identification
tags are examples of the next generation of distributed
wireless and portable applications requiring embedded
privacy and authentication. Public-key systems are
preferable because they allow a more scalable, flexible
key distribution compared to secret-key cryptosystems.
Unfortunately, public-key systems are computationally
intensive and hence consume more power. Recent pro-
posals suggest replacing the RSA (Rivest-Shamir-
Adleman) system with more economical solutions such
as elliptic-curve cryptosystems (ECCs) or hyper-elliptic-
curve cryptosystems (HECCs). ECCs and HECCs provide
security levels equivalent to RSA but with shorter word
lengths (a 1,024-bit RSA key is equivalent to a 160-bit
ECC key and an 83-bit HECC key), at the expense of
highly complex arithmetic. Figure 1 shows the hierar-
chy and mapping of such a system. On top is the HECC
A Component-Based DesignEnvironment for ESL Design
Editor’s note:This article focuses on two key properties that the authors see as critical toESL design: abstraction and reuse. The authors present an ESL design flowusing the Gezel language. Using several very different design examples, theyshow how this design flow supports their case for abstraction and reuse.
—Carl Pixley, Synopsys
Patrick Schaumont
Virginia TechIngrid Verbauwhede
Katholieke Universiteit Leuven
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
point multiplication oper-
ation, which consists of a
sequence of basic elliptic-
curve point operations.
Each of these basic ellip-
tic-curve operations con-
sists of a sequence of
more elementary opera-
tions in the underlying
Galois field. For HECC,
this field is 83 bits. If the
system were an ECC, this
field would be 160 bits.
We implemented this
design as an 8051 micro-
controller, extended with
a hardware acceleration
unit. The 8-bit microcon-
troller interfaces are quite
narrow compared to HECC
word lengths. Therefore,
when building a hardware acceleration unit, it is crucial to
consider overall system performance. Because of the hier-
archy in the calculations, there are multiple ways to accel-
erate the HECC operations—in contrast to secret-key
algorithms, which have fewer hierarchy layers and thus
offer fewer implementation choices. As a stand-alone opti-
mized C implementation, an HECC point multiplication
takes 192 seconds to calculate. A small hardware accel-
erator, requiring only 480 extra FPGA lookup unit tables
(LUTs) and 100 bytes of RAM, improves this time by a fac-
tor of 80, to only 2.5 seconds. Figure 1 indicates the result-
ing split between hardware and software, which is not yet
optimal for an 8051.
Hardware acceleration makes HECC public key pos-
sible on small, embedded microcontrollers. But the
optimal implementation depends on the selection of
the mathematical algorithms and the system-level archi-
tecture. Only a platform-based design approach makes
this design space exploration possible and discloses
opportunities for global improvement.
Example 2: Concurrent codesign for securepartitioning
The design of secure embedded systems leads to
design cases requiring tight interaction between hard-
ware and software—even down to the single-statement
level. Figure 2 shows a fingerprint authentication design,
the ThumbPod-2 system, which is resistant to side-chan-
nel attacks; we implemented and tested this design in
silicon.4 The protocol, shown in Figure 2a, accepts an
input fingerprint and compares it to a prestored, secret
template. The matching algorithm must treat this tem-
plate as a secret, and the ThumbPod-2 system stores it
in a secure circuit style that is resistant to side-channel
attacks. However, because the matching algorithm
manipulates the template, part of the algorithm’s circuit
must also migrate to a secure circuit style. Because this
secure circuit style consumes twice the area of normal
circuits, mapping the complete matching protocol to it
would be inefficient. We therefore separated the proto-
col into an insecure software partition and a secure
hardware partition, and we ended up with the imple-
mentation in Figure 2b. The software reads the input fin-
gerprint and feeds the data to the oracle inside the
secure partition. The oracle compares each input minu-
tia with the template minutia, returning only a global-
matching result: reject or accept. It is impossible for an
attacker with full access to the untrusted software to
determine how the oracle has obtained this decision.
The design and verification of the secure protocol
requires continuous covalidation between hardware
and software. We evaluated various attack scenarios
that attempt to extract the secret template from the
secure hardware partition, assuming that the attacker
can arbitrarily choose the software program at the inse-
cure partition. This led to an iterative refinement of the
oracle interface and the driving software, which we
designed completely within the Gezel environment.
339September–October 2006
P0 P1
CTL DATA
Software
Hardware
Hyper-elliptic-curvecryptography (HECC)
Scalar multiplication
Point or divisoroperations
CombinedGalois field (2nelements)
operations
BasicGalois field (2n elements)
operations
Galois field (2n)coprocessor
8051 CPU
API
C code
Assemblylanguageroutines
Microcodesequences
Data path
Figure 1. Public-key cryptography on an 8-bit microcontroller.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
Example 3: Accelerated embedded virtualmachines
For a third application, shown in Figure 3, we had to
provide hardware acceleration of a cryptographic
library for an embedded virtual machine.5 We used a
Java embedded virtual machine, the Kilobyte Virtual
Machine (KVM), extended with native methods that
allow hardware access directly from a Java application.
We integrated an advanced encryption standard (AES)
coprocessor into the Java virtual machine’s host proces-
sor, and we triggered execution of the coprocessor
using a native method. The virtual machine handles all
data management and synchronization. As Figure 3b
shows, hardware acceleration can improve perfor-
mance by two orders of magnitude. Moreover, data
movement from Java, to and from the coprocessor, has
two orders of magnitude of overhead compared to actu-
al hardware execution. A combined optimization of the
Java-native API, the coprocessor, and the coprocessor
interface is necessary to avoid design errors and, more
importantly, security holes in the final system.
All three examples are practical design problems
from the field of embedded security. There is no unified
design platform or unified design language that could
solve all of them. However, it’s still possible to general-
ize their underlying design principles by using a com-
ponent-based approach.
Component-based ESL designEach programmable architecture comes with a specif-
ic set of design techniques. ESL design, therefore, is no
tightly knit set of techniques, tools, and data models.
Unlike RTL design, which logic synthesis enabled, ESL
design doesn’t offer a standard design flow. In fact, ESL
design might never be unified in a single design flow, given
the architectural scope, the complexities in capturing all
facets of an application, and the daunting task of devel-
oping tools for these different facets. Still, all ESL tech-
nologies share two fundamental objectives: facilitating
design reuse and supporting design abstraction. These two
objectives have guided every major technology step that
has turned transistors into gates, and gates into systems.
Reuse and abstraction for ESL design, however, are
unique and different from other technology transitions.
In ESL design, reuse relates not only to architectures but
also to design environments. For example, when a
designer builds a SoC architecture around a micro-
processor, the microprocessor’s compiler and the instruc-
tion-set simulator (ISS) are as critical to the design’s
success as the actual microprocessor implementation.
Electronic System-Level Design
340 IEEE Design & Test of Computers
(b)(a)Root of trust
ThumbPod-2 client
Minutiaeextraction
Matchingalgorithm
Reject Accept
Loadbogus
Loadmaster
Session key Sk
Secure circuit style
Template
Masterkey
Template
Oracle
Masterkey
Cryptographymodule
RAM orFlash
Leon-2processor
Bridge AMBA UART
Out port In port
Chip commandinterface
Matching
algorithm
AMBAUART
Advanced Microcontroller Bus ArchitectureUniversal asynchronous receiver transmitter
Figure 2. Partitioning for security in the ThumbPod-2 system: protocol for session key generation (a), and
implementation (b).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
The compiler and the simulator are
reused design environments, and the
microprocessor is a reused design artifact.
As another example, consider SystemC.
You can view SystemC as a reusable
design environment for RTL hardware
components. As a C++ library, it can link
to any environment that needs RTL hard-
ware design capability; thus, the SystemC
library itself is a reusable component.
Abstraction in ESL design concerns
not only the masking of implementation
details but also platform programming
mechanisms. Finding successful system-
level abstractions is extremely difficult,
because abstractions tend to restrict the
scope of coverable target architectures.
For example, C is a successful program-
ming abstraction for a single-core system,
but it becomes cumbersome in multi-
core systems. Despite the multitude of
system-level design languages, none has
so far been able to unify the architecture
design space in a single programming
abstraction.
These two elements of ESL design—its reuse of
design environments and design artifacts, and the com-
ponent-specific nature of its programming abstrac-
tions—guided us toward a component-based approach
in system design. In ESL design, we define a component
as a single programmable element included in a plat-
form. For example, a microprocessor, reconfigurable
hardware, a software virtual machine, and the SystemC
simulation kernel are all programmable components.
As Figure 4 shows, a component-based model for ESL
341September–October 2006
0
1
2
3
4
55.28
3.25Driver C
CPU
Cryptographyhardware
or
(b)(a)
Java applicationelectronic code book aes()
J2ME
Kilobyte VirtualMachine (KVM)
Cryptography
Advanced encryptionstandard (AES)
KVM native interface(KNI)
AESCore
Java APIinterface
KNIinterface
Hardware-softwareinterface
AEScoprocessor
Ove
rhea
d(n
o. o
f cyc
les,
log 1
0)
Java Hardware
109×performance
gain
160×integrationoverhead
Figure 3. Accelerated embedded virtual machine: general structure (a) and performance improvements and
associated overhead (b).
Programming interface
Simulation-and-refinementkernel
Integration interface
Finite-statemachine with data
path (FSMD)
Instructionset
architecture
Gezelkernel
Instructionset simulator
IP block Memory bus
Gezel ARM-C
Platform
Componentdesign
Platformdesign
Platform-baseddesign
Applicationindependent
Application-domainspecific
Applicationspecific
Scheduling and interconnect
Figure 4. Three phases for ESL design automation: component, platform,
and platform based.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
design requires a design flow with three phases of
design: component, platform, and platform based. These
phases correspond to the creation, integration, and use
of programmable components. Several different engi-
neers might work in each design phase, each with his
own perspective on an application. These engineers gen-
erally fall into one of three categories: design automa-
tion, hardware design, or software design. Figure 4 offers
the perspective of the design automation engineer.
In component design, a design automation engineer
develops a design environment for a single program-
mable component. The engineers can do this indepen-
dent of the application. Two interfaces—integration and
programming—characterize a programmable compo-
nent. Through the integration interface, a component
connects to another (possibly heterogeneous) compo-
nent. Between these two is a simulation-and-refinement
kernel. Component design can be very elaborate,
including, for instance, the development of an ISS and
a C compiler for a new processor.
In platform design, a design engineer or design
automation engineer selects various programmable
components and combines them into a single platform
by interconnecting their integration interfaces. Platform
design requires the creation of a platform system sched-
uler to coordinate the individual components’ activi-
ties. This phase also requires the creation of
communication channels between components. The
notion of a platform as an articulation point between
application and architecture is a well-known concept.6,7
In platform-based design, a design engineer devel-
ops an application by writing application programs for
each programmable component in the platform. The
platform simulator lets the designer instantiate a partic-
ular application and tweak overall system performance.
For heterogeneous components, it’s important to bring
the individual components’ programming semantics
sufficiently close together so that a designer can easily
migrate between them.
Designers have used
component-based design
approaches, typically in
software development, to
address problems requir-
ing high architectural
flexibility. For example,
Cesario et al. presents
a component-based ap-
proach for multiprocessor
SoC (MPSOC) design,8
based on four types of components: software tasks,
processor cores, IP cores, and interconnects.
Designing and integrating FSMDcomponents with Gezel
The Gezel design environment (http://rijndael.ece.vt.
edu/gezel2) supports the modeling and design of hard-
ware components. By integrating the Gezel kernel with
other simulators (such as ISSs), we obtain a platform
simulator. The three examples we discussed all rely on
custom hardware design, each with a different platform.
We’ve combined Gezel with other programmable com-
ponents, such as 32- and 8-bit cores. We’ve also com-
bined it with other types of programming environments,
including the SystemC simulation kernel and Java. For
the parts of the design described in the Gezel language,
the Gezel design environment automatically creates
VHDL, enabling technology mapping into FPGA or
standard cells.
Platform-based design using GezelThe Gezel language captures hardware using a cycle-
based description paradigm based on the finite-state
machine with data path (FSMD) model. Widely used for
RTL hardware design, this model has been popularized
through SpecCharts and SpecC.9 The FSMD model
expresses a single hardware module as a combination
of a data path and its controller. You can combine sev-
eral different FSMDs into a network, as Figure 5a shows.
A pure FSMD network is only of limited value for a plat-
form simulator, because such a network supports only
communication between FSMDs. Such a network does-
n’t have the means to communicate with any part of a
platform that is not captured as an FSMD.
To employ FSMDs as platform components, Gezel
supports extended FSMD networks, as Figure 5b shows.
Such an extended FSMD network also includes a sec-
ond type of module called an IP block. An IP block has
an interface similar to that of an FSMD, but the IP block
Electronic System-Level Design
342 IEEE Design & Test of Computers
FSMD2FSMD1FSMD2FSMD1
IP blockWire (input is
instantaneouslydefined by connected
output)(a) (b)
Figure 5. Finite-state machine with data path (FSMD) network: pure (a) and extended (b).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
is implemented outside the Gezel lan-
guage. A similar concept of capturing
heterogeneity also exists in Ptolemy.10
Technically, an IP block is implemented
as a shared library in C++ and thus
can include arbitrary programming con-
structs within the boundaries of a
cycle-based interface. To the Gezel pro-
grammer, the IP block looks like a simu-
lation primitive. The platform designer
defines the IP block’s behavior. In a
component-based design model, these
IP blocks implement communication
channels, which connect Gezel to a
wide range of other components, such
as ISSs, virtual machines, and system
simulation engines.
Platform design using GezelFigure 6 illustrates a platform simula-
tor that uses the Gezel kernel and sever-
al ISSs. Each component simulator exists
as an individual (C++) library, linked
together in a system simulation. For this
platform simulator, we use IP blocks to implement the
cosimulation interfaces between the Gezel model and
the ISS. In addition, a system scheduler calls all the
included component simulators. We implement the
platform simulator in C++.
The extended FSMD network in Gezel, combined
with the component-based design model, offers essen-
tial advantages over a traditional HDL- or SystemC-
based approach. VHDL has no means to natively
support a simulation setup like the one in Figure 6,
because it lacks the equivalent of an IP block construct.
Consequently, an HDL-based design flow usually imple-
ments such a simulation setup at the HDL level. This
needlessly increases simulation detail and penalizes
simulation performance.
It’s also possible to implement such a simulation
setup in SystemC. But the platform and the application
are no longer distinguishable, because SystemC cap-
tures everything in C++. This complicates the synthesis
of the application onto the final platform. In other
words, SystemC does not distinguish between the plat-
form and platform-based design phases.
Table 1 lists several platform components that
we’ve used with Gezel to create platform simulators.
They include 8- and 32-bit ISSs, Java (through its native
interface), and SystemC. We coupled each of these
simulators to the Gezel FSMD model using IP blocks.
There are two categories of IP blocks, corresponding
to two different design scenarios: IP blocks that model
a processor’s bus or a dedicated communication port
to implement a coprocessor design scenario like the
one in Figure 7a. Other IP blocks capture a complete
component.
Designers can also use the Gezel IP block construct
to explore multiprocessor architectures, such as the
PicoBlaze microcontrollers shown in Figure 7b. In the
multiprocessor design scenario, the Gezel model cap-
tures the complete platform, clearly improving flexibil-
ity. In addition, this model allows dynamically selecting
the number and types of cores. The Gezel language cap-
tures synchronous, single-clock hardware designs. The
platform simulators in Table 1, however, can accom-
modate multiple clock frequencies to the individual
processors included within the simulation.
Many of the environments in Table 1 are open
source, which greatly eases the construction of platform
simulators. In commercial environments, open source
might still be an unattainable goal, but there are still sig-
nificant benefits from using an open interface. Several
of our cosimulators (including TSIM and SH-ISS) use
commercial, closed-source components, built on the
basis of an open interface.
343September–October 2006
FSMDEmbeddedsoftware
Application
Platform simulator
Gezel kernel(C++ library)
Parser
VHDLcode
generator
C++code
generator
RTcode
generator
Executable-object
hierarchy
Communicationchannel
Instructionset simulatorUser-defined
IP blockimplementation
Cycle-truesystem scheduler
Figure 6. Gezel platform simulator.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
Systematic reuse with a component-basedapproach
We can also implement IP management with Gezel.
IP transfer is notoriously difficult because reuse inter-
faces are hard to define. Microprocessor buses have tra-
ditionally been the reuse interface of choice. New
industry efforts such as the Open Core Protocol IP (OCP-
IP, http://www.ocpip.org) and the Spirit consortium
(http://www.spiritconsortium.com) have focused on
generically packaging IP components rather than using
standard buses. Spirit’s approach is to provide a meta-
data model that encapsulates existing IP components
(expressed in VHDL or
SystemC, for example).
The metadata provides
additional language-neu-
tral information on the IP
interface. However, a
component-based design
flow with Gezel does not
need this encapsulation,
because the language
directly models the reuse
interfaces. Indeed, these
reuse interfaces corre-
spond to the set of IP
blocks that connect the
Gezel models to other plat-
form components.
Consider the case in
which multiple parties
participate in the plat-
form-based design phase.
For example, for the simu-
lator of Figure 6, assume
that an IP developer cre-
ates hardware components in Gezel, and a system inte-
grator creates the system (embedded) software. In such
a case, the IP developer expects a reasonable level of
IP protection before releasing the actual implementa-
tion, whereas the system integrator wants access to the
hardware components as detailed and as soon as pos-
sible. Gezel can support this scenario, as Figure 8 shows.
We define two phases in the IP transfer. In IP creation
and evaluation, the IP developer provides a cycle-based
simulation model of the hardware IP as a black box to
the system integrator; this model provides a nonsyn-
thesizable simulation view of the IP. When the system
Electronic System-Level Design
344 IEEE Design & Test of Computers
Table 1. Platform simulators using Gezel.
Simulation Cross-compiler IP block interface
Component engine* or assembler Core Port or bus
8-bit cores
Atmel AVR Avrora GNU avr-gcc •PicoBlaze kpicosim KCPSM3 assembler • •8051 Dalton ISS SDCC, Keil CC • •32-bit cores
ARM Simit-ARM GNU arm-linux-gcc • •Leon2-Sparc TSIM GNU sparc-rtems-gcc •SH3-mobile SH-ISS GNU sh-elf-gcc •Simulation engines
Java JVM 1.4 javac •SystemC SystemC 2.0.1 GNU g++ •* Information on simulation engines is available as follows:
Avrora: http://compilers.cs.ucla.edu/avrora (open source);
kpicosim: http://www.xs4all.nl/~marksix (open source);
Dalton ISS (Dalton 8051): http://www.cs.ucr.edu/~dalton/i8051 (open source);
Simit-ARM: http://sourceforge.net/projects/simit-arm (open source);
TSIM (TSIM 1.2; cross compiler, sparc-rtems-gcc 2.95.2): http://www.gaisler.com;
SH-ISS (Renesas SH3DSP simulator and debugger, v3.0; cross compiler: sh-elf-gcc 3.3):
http://www.kpitgnutools.com
ARMFSMD
networkFSMD
network
IPblock
IPblock
PicoBlazeIP block
PicoBlazeIP block
(a) (b)
Gezel model Gezel model
Figure 7. Application of different IP block categories: coprocessor (a) and multiprocessor (b) design scenarios.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
integrator decides to
acquire the hardware IP,
the second phase of the IP
transfer begins. Now the
IP developer provides a
synthesizable version of
the hardware IP in VHDL.
The component-based
approach of Gezel is well-
suited for this IP design
flow. We model black
boxes as IP blocks. The IP
block simulation views are
in binary format as shared
libraries, and thus of little
value for this implementa-
tion. We wrote two code
generators for FSMD net-
works in Gezel. The first
converts FSMDs into
equivalent IP block simu-
lation views. The second converts FSMD
into synthesizable VHDL code. The IP
developer can use them together to
implement the design flow of Figure 8.
Table 2 shows several examples of IP
modules designed in Gezel. They range
from simple components, such as an
Internet packet check-sum evaluation
module (CHKSUM) to complex IP mod-
ules, such as an AES module and a high-
speed Gaussian-noise generator for
bit-error-rate measurements (BOXMUL).
For each module, Table 2 lists the line
counts of the original Gezel design and the amount of
generated code in C++ and VHDL. We also mapped the
VHDL code onto an FPGA, and Table 2 gives the area
and speed of the results. We expect the numbers shown
to be close to those of manually written VHDL. For
example, a comparable AES design by Usselman on
Xilinx Spartan3 technology lists a LUT count of 3,497.
Design examples revisitedNow, we briefly discuss how we used our compo-
nent-based approach to support the three design exam-
ples presented earlier.
Public-key cryptographyThe platform simulator for the HECC application
consisted of two components: the Gezel kernel and the
8051 ISS (http://www.cs.ucr.edu/~dalton/i8051/). Using
IP block models, we designed communication links
between the 8051 ISS and the coprocessor. We devel-
oped the driver software running on the 8051 using the
Keil tool suite. The platform simulator maps the HECC
mathematical formulas into a combination of C, assem-
bly language, and hardware. After obtaining a suitable
partitioning, we converted the hardware coprocessor
into VHDL. We then combined this coprocessor with a
synthesizable view of the 8051 processor and mapped
it into an FPGA.
Security partitioning for an embeddedfingerprint authentication design
This platform contains the Leon2 ISS and the Gezel
kernel. We constructed it in a process similar to that of
345September–October 2006
C++
IPcreationandevaluation
IPtransfer
IP transfer VHDLARMcoreVHDL
bus
Platform implementation
Implementation
Hardware developer System integrator
Gezel
Gezel
IP block ARM-C
FSMD(Black-box
view)
VHDL codegeneration
Generatesimulation view Simulation
library
Platform
Figure 8. IP reuse in the platform-based design phase.
Table 2. IP model complexity. (NCLOC: noncommented source line of code)
Model line count (NCLOC) Area (no.
Design Gezel C++ (IP blocks) VHDL of LUTs)* Speed (ns)**
CHKSUM 149 1,564 907 131 9.19
EUCLID 69 710 62 557 560.00
JPEG 526 8,091 719 5,514 14.62
AES 292 2,653 1,807 3,332 8.29
BOXMUL 763 6,105 6,282 4,225 20.30
* Target platform was Xilinx Virtex4, speed grade 12
** Speed is the clock period we recorded after place and route.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
constructing the public-key cryptography platform. We
developed software using the GNU tool suite. In a later
design phase, we used the VHDL code generator to con-
vert the Gezel design into VHDL, eventually leading to
a tested and fully functional chip.4 This design, howev-
er, requires fitting the hardware coprocessor onto a non-
standard synthesis design flow based on logic for
resisting side-channel attacks. So that chip designers
could verify their custom synthesis flows, we extended
the platform simulator to record trace stimuli for indi-
vidual hardware modules. We can also provide this
capability using the IP block approach. It is important
to separate design flow issues, such as the stimuli
recording facility, from actual design issues. The design
flow in Figure 4 also supports this concept by distin-
guishing between the platform builder and the platform
user. Gezel lets users write new IP blocks in C++ accord-
ing to a standard template, and more advanced Gezel
users can develop them as library plug-ins.
Acceleration of embedded virtual machinesFor the third design, we integrated three components:
a port of the Java-embedded virtual machine, the SH3-
DSP ISS, and the Gezel kernel. We developed software in
Java, C, and assembly language. In addition, this design
required a considerable number of cryptographic sup-
port libraries. This kind of design demonstrates the impor-
tance of varying the design abstraction level within a
single platform. The entire cryptographic application in
Java can take millions of clock cycles, and the hardware
coprocessor is active for a fraction of the time. On the one
hand, we need increased simulation efficiency (and
decreased simulation detail) for much of the design, but
on the other hand, at a few select places we must observe
every bit that toggles in every gate. A component-based
design approach can cope with this heterogeneity.
HETEROGENEOUS SYSTEM architectures will continue
to dominate in applications that require dedicated,
high-performance, and energy-efficient processing. The
challenge at the electronic system level will be to design
these architectures in increasingly shorter design cycles.
New tools will have to quickly create not only deriva-
tive platforms but also entirely new platforms. We are
exploring novel mechanisms in Gezel to further accel-
erate platform construction, and we are presently work-
ing on such a platform designer for FPGA technology.
We’d also like to stress that ESL design requires not
only new tools but also a change in design culture.
Designers of heterogeneous architectures will inevitably
get in touch with new design cultures and practices, not
only from those novel ESL tools but also from their col-
league designers. ■
AcknowledgmentsWe thank the reviewers for their constructive feed-
back. We also thank the many students who have
experimented with Gezel and whose designs we’ve
mentioned in this article. This research has been made
possible with the support of STMicroelectronics,
Atmel, the National Science Foundation, University of
California Microelectronics and Computer Research
Opportunities (UC Micro), SRC, and FWO (Fonds voor
Wetenschappelijk Onderzoek).
References1. C. Rowen and S. Leibson, Engineering the Complex
SoC: Flexible Design with Configurable Processors,
Prentice Hall, 2004.
2. T.J. Todman et al., “Reconfigurable Computing:
Architectures and Design Methods,” Proc. IEE, vol. 152,
no. 2, Mar. 2005, pp. 193-207.
3. D. Talla et al., “Anatomy of a Portable Digital
Mediaprocessor,” IEEE Micro, vol. 24, no. 2, Mar.-Apr.
2004, pp. 32-39.
4. K. Tiri et al., “A Side-Channel Leakage Free Coproces-
sor IC in 0.18um CMOS for Embedded AES-Based
Cryptographic and Biometric Processing,” Proc. 42nd
Design Automation Conf. (DAC 05), ACM Press, 2005,
pp. 222-227.
5. Y. Matsuoka et al., “Java Cryptography on KVM and Its
Performance and Security Optimization Using HW/SW
Co-design Techniques,” Proc. Int’l Conf. Compilers,
Architecture, and Synthesis for Embedded Systems
(CASES 04), ACM Press, 2004, pp. 303-311.
6. T. Claassen, “System on a Chip: Changing IC Design
Today and in the Future,” IEEE Micro, vol. 21, no. 3,
May-June 2003, pp. 20-26.
7. A. Sangiovanni-Vincentelli, “Defining Platform-Based
Design,” EE Times, Feb. 2002, http://www.eetimes.com/
news/design/showArticle.jhtml?articleID=16504380.
8. W.O. Cesario et al., “Multiprocessor SoC Platforms: A
Component-Based Design Approach,” IEEE Design &
Test, vol. 19, no. 6, Nov.-Dec. 2002, pp. 52-63.
9. D. Gajski et al., SpecC: Specification Language and
Methodology, Kluwer Academic Publishers, 2000.
10. E. Lee, “Overview of the Ptolemy Project,” tech. memo
UCB/ERL M03/25, Dept. of Electrical Eng. and Comput-
er Science, Univ. of California, Berkeley, 2003.
Electronic System-Level Design
346 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
Patrick Schaumont is an assistantprofessor in the Electrical and Com-puter Engineering Department at Vir-ginia Tech. His research interestsinclude design methods and architec-
tures for embedded systems, with an emphasis ondemonstrating new methodologies in practical appli-cations. Schaumont has an MS in computer sciencefrom Ghent University, Belgium, and a PhD in electri-cal engineering from the University of California, LosAngeles. He is a senior member of the IEEE.
Ingrid Verbauwhede is an associ-ate professor at the University of Cali-fornia, Los Angeles, and an associateprofessor at Katholieke UniversiteitLeuven, in Belgium. Her research
interests include circuits, processor architectures, anddesign methodologies for real-time, embedded sys-tems in application domains such as security, cryp-tography, DSP, and wireless. Verbauwhede has anelectrical engineering degree and a PhD in appliedsciences, both from Katholieke Universiteit Leuven.She is a senior member of the IEEE.
Direct questions or comments about this article toPatrick Schaumont, 302 Whittemore Hall (0111),Virginia Tech, VA 24061; [email protected].
347
SubscribeNow!
IEEE Pervasive Computing delivers
the latest peer-reviewed developments in pervasive,
mobile, and ubiquitous computing to developers,
researchers, and educators who want to keep abreast
of rapid technology change. With content that’s
accessible and useful today, this publication acts as a
catalyst for progress in this emerging field, bringing
together the leading experts in such areas as
• Hardware technologies
• Software infrastructure
• Sensing and interaction with the physical world
• Graceful integration of human users
• Systems considerations,including scalability, security, and privacy
•• HealthcareHealthcare
•• Mining a Sensor-RichMining a Sensor-RichWWorldorld
•• Urban ComputingUrban Computing
•• Security & Privacy Security & Privacy
VISITwww.computer.org/pervasive/subscribe.htm
F E AT U R I N G
IN 2007
www.computer.org/e-News
Available for FREE to members.
Be alerted to• articles and
special issues
• conference news
• registrationdeadlines
Sign Up Today for the IEEE
Computer Society’s
e-News
Sign Up Today for the IEEE
Computer Society’s
e-News
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.
Electronic System-Level Design
348 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
THE DESIGN PRODUCTIVITY of engineers has not
kept pace with rapid improvements in silicon technolo-
gy. This has resulted in what is commonly known as the
productivity gap. To close this gap, researchers have intro-
duced various system-level design languages (SLDLs) to
raise the design abstraction level by focusing on a sys-
tem’s behavior rather than low-level implementation
details. A major challenge that SLDLs face stems from the
behavioral heterogeneity of most embedded systems. For
example, one part of an embedded system might perform
intensive computations on samples that regularly arrive
from an analog-to-digital converter. Another part of the
same system might perform only minor computations
while being ready to quickly respond to events that arrive
asynchronously from the environment.
An embedded system’s behavior usually involves a
set of concurrent, communicating processes. A model
of computation (MoC) defines the rules for communi-
cation and synchronization between processes.
Different MoCs are suitable for different behaviors. For
example, hierarchical concurrent finite-state machines
(HCFSMs), which the statecharts family uses,1 are suit-
able for describing control-dominated behavior, where-
as dataflow models are good for data-dominated
behavior. SLDLs must support multiple
MoCs to successfully cope with embed-
ded systems’ behavioral heterogeneity.
Using a case study of a practical, het-
erogeneous embedded system called fre-
quency relay, we evaluate the modeling
capabilities of two popular system-level
languages, SystemC and Esterel.2,3 Based
on this case study, we establish an
expanded set of system-level language
requirements, against which we evaluate
the strengths and weaknesses of these two languages.
Because of these languages’ limitations, we suggest a
new MoC for heterogeneous systems called DFCharts,
which SystemC and Esterel should follow to support bet-
ter modeling of heterogeneous embedded systems.
(The “Related work” sidebar discusses other efforts to
compare languages for embedded-systems design.)
DFCharts targets heterogeneous embedded systems
by combining a data-dominated MoC called synchro-
nous dataflow (SDF) with a control-dominated MoC
called Argos (which, like statecharts, is based on
HCFSMs).4,5 In terms of MoCs that are combined,
DFCharts is similar to *charts,6 which also uses HCFSMs
and SDF. However, *charts allows only hierarchical
refinement of one model by another. At each hierar-
chical level, blocks must obey the semantics of a single
MoC, but internally a designer can refine each block
into a system that behaves according to some other
model. The major problem with this approach has to do
with the communication between hierarchical levels,
which can lead to the loss of some of a given MoC’s orig-
inal characteristics. Unlike *charts, DFCharts lets SDFs
and FSMs coexist at the same hierarchical level, and a
rendezvous mechanism of communicating sequential
Modeling EmbeddedSystems: From SystemC andEsterel to DFCharts
Editor’s note:This article addresses the need for directly expressing heterogeneous,hierarchical behaviors for modeling specific embedded systems. Afteranalyzing two existing ESL languages, SystemC and Esterel, the authorscreated a new model of computation and a graphical language to gain thedirect expressivity they need for their model. Although researchers havesuggested various changes to SystemC and Esterel to fit modelingrequirements, this article considers mainly standard SystemC and Esterel.
—Sandeep K. Shukla, Virginia Polytechnic and State University
Ivan Radojevic, Zoran Salcic, and Partha S. Roop
University of Auckland
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
processes (CSPs) enables communication between
them.7 In this way, each model retains its characteris-
tics, and there is more flexibility in modeling.
Initial system-level requirements forSystemC and Esterel
Whereas an industry consortium is proposing
SystemC and it has no formal semantics, Esterel has for-
mal semantics and formal verification capabilities.
Hence, both languages represent differing perspectives
on system-level modeling. Some of the key modeling
requirements at the system level are as follows:
■ Separation of communication and computation. This
makes the model suitable for reuse in an environ-
ment involving several independently developed,
concurrent components.
■ Concurrency and communication primitives at a high
abstraction level. The purpose of the system-level
design is to create a model involving several com-
ponents, each having its own MoC. Therefore, the
modeling language must combine several MoCs and
facilitate communication among them.
■ Functional hierarchy. The modeling language might
need to express a particular functionality hierarchi-
cally to enable succinct specification. Hierarchy
should allow mixing different MoCs that exist at dif-
ferent hierarchical levels. This requirement is also
called hierarchical heterogeneity.8
■ Exception handling. Because exceptions are critical
to embedded systems, the language must provide
direct support to capture and handle exceptions.
In light of the frequency relay case study, we will
expand these requirements.
Case study: Frequency relayPower systems need protection from overloading.
When a power system is overloaded, it’s necessary to dis-
connect some loads to prevent damage. A significant
decrease in the main AC signal’s frequency level (the nor-
mal value is 50 Hz) indicates a dangerously overloaded
system. The same problem also occurs when the AC sig-
nal’s rate of change (ROC) is too fast. The frequency relay
is a system that measures the frequency and its ROC in a
power network, comparing measurement results against
a set of thresholds that a control system can modify via
the Internet. If the current thresholds indicate that the fre-
quency is too low or that its ROC is too fast, the frequen-
cy relay disconnects some loads from the network by
349September–October 2006
Related workThere have been a few other attempts to describe and com-
pare languages for embedded-systems design. Edwardsreviews hardware description languages, programming lan-guages, and system-level languages.1 Cai et al. compare spec-ification languages SpecC and SystemC.2 Gorla et al. compareseveral languages for system specification.3 They also use thecase study of a practical heterogeneous embedded system toillustrate relevant concepts. Brisolara et al. use the same casestudy to compare two variants of the Unified ModelingLanguage with Simulink.4
The key difference between our work and these is that weclosely concentrate on the link between the specification lan-guages and the models of computation (MoCs) suitable for het-erogeneous systems. Moreover, we introduce a new MoC,DFCharts, to model heterogeneous systems.
Other models that target heterogeneous embedded systemsinclude Reactive Process Networks,5 Funstate,6 CompositeSignal Flow,7 and Mode Automata.8
References1. S. Edwards, Languages for Digital Embedded Systems, Kluwer
Academic Publishers, 2000.
2. L. Cai, S. Verma, and D.D. Gajski, Comparison of SpecC and
SystemC Languages for System Design, tech. report CECS-03-
11, Center for Embedded Computer Systems, Univ. of
California, Irvine, 2003.
3. G. Gorla et al., “System Specification Experiments on a
Common Benchmark,” IEEE Design & Test, vol. 17, no. 3,
July-Sept. 2000, pp. 22-32.
4. L. Brisolara et al., “Comparing High-Level Modeling
Approaches for Embedded System Design,” Proc. Asia and
South Pacific Design Automation Conf. (ASP-DAC 05), ACM
Press, 2005, pp. 986-989.
5. M. Geilen and T. Basten, “Reactive Process Networks,” Proc.
4th ACM Int’l Conf. Embedded Software (EMSOFT 04), ACM
Press, 2004, pp. 137-146.
6. K. Strehl et al., “FunState—An Internal Design Representation
for Codesign,” IEEE Trans. Very Large Scale Integration (VLSI)
Systems, vol. 9, no. 4, Aug. 2001, pp. 524-544.
7. A. Jantsch and P. Bjureus, “Composite Signal Flow: A
Computational Model Combining Events, Sampled Streams,
and Vectors,” Proc. Design, Automation and Test in Europe
Conf. (DATE 00), IEEE CS Press, 2000, pp. 154-160.
8. F. Maraninchi and Y. Remond, “Mode-Automata: A New
Domain-Specific Construct for the Development of Safe Critical
Systems,” Science of Computer Programming, vol. 46, no. 3,
Mar. 2003, pp. 219-254.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
opening one or more switches (three in the case we pre-
sent here), as determined by a decision algorithm. The
system gradually reconnects loads if the frequency and
its ROC improve.
Figure 1 illustrates the
main operation that we
just described, mode1.
Data-dominated processes
perform a DSP operation
similar to autocorrelation;
this operation is necessary
for frequency calculation.
Control-dominated pro-
cesses perform various
decision-making and mi-
nor computations. The
parameter-settings process
monitors the interface with
the Internet. The frequen-
cy calculation and ROC calculation processes determine
the frequency and its ROC.
Figure 2 shows the switch-control process, repre-
senting it as an FSM with four states. The initial state is
S3. Each state determines how many switches are
closed. For example, three switches are closed in S3,
whereas all three switches are open in S0. The state tran-
sitions come from inputs t1, t2, and t3, which indicate
whether certain thresholds have been exceeded. The
input from timer to (time-out) is also a factor. The
switch-control block can restart the timer by emitting
output st.
Figure 3 shows the frequency relay’s global states.
The initial state, initialize, configures some system para-
meters. After this initialization, init_done, the next state
is mode1, in which the main operation occurs (as
described by the processes in Figure 1). If reset occurs,
the system reinitializes. When off occurs, mode1 termi-
nates and mode2 begins. Nothing happens in this state;
the system simply stops, and all switches close. If on
occurs, the system enters mode1 again. The FSM in
Figure 3 represents the frequency relay’s top level. (The
processes in Figure 1 are one level below this.)
The arrows between the processes in Figure 1 denote
directions of communication, but so far we have not dis-
cussed the communication semantics. Before writing
the specification in SystemC and Esterel, we need to
state the required communication mechanisms.
Furthermore, we need to state how the computations
inside the processes will occur. By identifying the
required models for computation and communication,
we can make a complete list of requirements against
which we will evaluate SystemC and Esterel.
The three data-dominated blocks perform intensive
computations on samples that regularly arrive from the
Electronic System-Level Design
350 IEEE Design & Test of Computers
Figure 1. Main operation of frequency relay mode1. (Clear boxes indicate data-
dominated processes. Shaded boxes indicate control-dominated processes.)
Figure 2. Switch-control process.
Figure 3. Global states of frequency relay.
Averagingfilter
Switchcontrol
Timer
Symmetryfunction
Frequencycalculation
Rate-of-change(ROC)
calculation
Peakdetection
Parametersettings
AC waveform
SwitchesCommunicationnetwork
S3
S0
t3/st
S2
S1
t1/stt1/st
t2/st
t3/stt3/st t2/st to/st
to
t3/st
to/st
t2/st
init_done off
on
mode1
mode2initialize
reset
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
power system. Lee and Messerschmitt have successful-
ly applied SDF for this type of behavior.4 In SDF, process-
es communicate through asynchronous FIFO buffers.
Each process can fire when its firing rule is satisfied, and
this determines how many tokens must be present in
the input buffers. Imperative statements (which pro-
grammers can write in C, for instance) describe the
algorithms inside the processes. FSMs, such as the one
for the switch-control process in Figure 2, can effectively
capture the control-dominated processes’ behavior.
FSMs can also be hierarchical. The most convenient
communication model among concurrent FSMs
appears to be synchronous reactive (SR).9 (In fact, most
variants of statecharts use SR.1) Thus, we need hierar-
chical, concurrent FSMs with SR communication. In
addition, we need imperative statements for minor com-
putations performed on state transitions. Finally, we can
use CSP-like rendezvous for communication between
peak detection and frequency calculation processes. A
high-level communication mechanism guarantees loss-
less transmission of data without buffers.
The models listed thus far (HCFSM, SR, SDF, CSP,
and imperative statements) cover the majority of mod-
els that Edwards et al. discuss.9 An important model that
we haven’t discussed is discrete event. Although high-
ly expressive, discrete-event models are very difficult to
synthesize.9
Suitability of SystemC and Esterel formodeling heterogeneous embeddedsystems
Based on the frequency relay case study, we expand
the system-level language requirements given earlier
into the following six requirements:
■ concurrent processes, an essential requirement and
a precondition for all other points that follow;
■ rendezvous communication;
■ support for dataflow, including buffered communi-
cation between processes and specification of firing
rules for dataflow modules;
■ support for HCFSM models with synchronous com-
munication;
■ imperative statements to describe data transforma-
tions inside SDF actors, as well as smaller computa-
tions performed by FSMs; and
■ hierarchy and preemption, multiple processes inside
a hierarchical state, and instant termination of lower-
level processes when any transition leaves the hier-
archical state.
The first five requirements relate to the first two
requirements given earlier (separation of communica-
tion and computation, and concurrency and commu-
nication primitives at a high abstraction level). The last
requirement, hierarchy and preemption, relates to the
last two requirements (functional hierarchy and excep-
tion handling) from the earlier list.
Evaluation of SystemC and Esterel based onthese requirements
Now, we evaluate the level of support of SystemC and
Esterel for each of the six expanded system-level require-
ments. Table 1 summarizes the results of this evaluation.
Concurrent processes. SystemC relies on implicitly
assumed concurrency; processes defined in a single
module are concurrent. When multiple modules con-
nect at any hierarchical level, they always execute con-
currently. In fact, specifying the execution order of
modules, as in sequential or pipelined execution (avail-
able in some other languages), is not possible in
SystemC. The designer would have to use control sig-
nals to manipulate the execution order of modules.
Esterel lets programmers explicitly create concur-
rency using parallel operator || at any hierarchical level.
The || operator creates concurrent threads that commu-
nicate and synchronize using the synchronous broad-
cast. This approach is based on the SR MoC, which
assumes there is a global clock. Esterel generates inputs
and corresponding outputs in the same tick of the glob-
al clock, leading to the logical-zero delay model. Also,
Esterel broadcasts events generated in any thread to all
other threads. Clever programming would be necessary
for any other form of concurrency, however.
Rendezvous communication. SystemC has no higher-
level construct to implement rendezvous directly.
351September–October 2006
Table 1. Level of support provided by SystemC and
Esterel, in a scale of 0 to 3, with 3 being the highest
level of support.
Requirement SystemC Esterel
Concurrent processes 3 3
Rendezvous communication 2 2
Support for dataflow 2 0
Support for HCFSMs 2 3
Data transformations 3 3
Hierarchy and preemption 0 3
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
However, creating rendezvous between two processes
using wait and notify statements should not be difficult.
Esterel does not allow direct specification of ren-
dezvous. Instead, programmers must create rendezvous
using a combination of appropriately employed await
and emit statements.
Support for dataflow. In SystemC, primitive channel
sc_fifo can implement FIFO buffers. Because of constant
data rates, it’s best to implement data-dominated blocks
as method processes. There’s no need to use less efficient
thread processes. However, only thread processes that
are dynamically scheduled by the SystemC kernel can
use sc_fifo buffers. Hence, implementing static schedul-
ing with the firing rules of the SDF model is difficult.
Esterel allows the implementation of a FIFO buffer as
a separate process (C function), thus separating compu-
tation and communication. However, the FIFO process
would still synchronize with the tick signal. Thus, the
abstraction level would be lower than in asynchronous
SDF buffers. In the frequency relay, the SDF blocks per-
forming signal processing must be reactive, like all other
processes in the system. The event to which they react is a
sample from the analog-to-digital converter. The problem
is that all processes must align with a single tick signal—
that is, they must read inputs and produce outputs at the
same time instant. The most efficient solution for the SDF
processes is to have the tick signal coincide with the AC
input signal’s sampling frequency. The ticks must be fre-
quent enough to capture all system inputs. Thus, the
process with the fastest system inputs determines the tick
signal rate. The result is an implementation that is likely
to be inefficient, because the data-dominated blocks work
faster than they would oth-
erwise need to work. A
more efficient implemen-
tation would specify data-
dominated blocks as
asynchronous tasks, taking
more than one tick to
complete computations.
However, using asynchro-
nous tasks leads to integra-
tion problems.
Support for HCFSMs.
SystemC lets you describe
FSMs using switch-case
constructs, which can be
nested for hierarchical
FSMs. This involves using multiple state variables. Signal
sensitivities and the wait statement support reactivity.
However, SystemC cannot match powerful preemption
statements such as abort and trap in the SR-based Esterel
language.
Esterel obviously completely supports SR communi-
cation. Statements such as abort and trap can naturally
describe preemption. Although Esterel’s imperative state-
ments can easily describe an FSM, using visual syntax is
probably more convenient in most cases. This is where
SyncCharts (http://www.esterel-technologies.com) com-
plements Esterel.
Data transformations. SystemC, as an imperative lan-
guage, provides excellent support for describing
sequential algorithms. In Esterel, C is available as a host
language; hence, Esterel can specify complex algo-
rithms for data transformations inside transformational
blocks similar to the way SystemC does. However,
Esterel requires you to assume that computation of time-
consuming algorithms is instantaneous.
Hierarchy and preemption. In SystemC, there is no
direct way to implement exceptions modeled by exits
from higher-level hierarchical states. We indicated earli-
er that hierarchy in an FSM could be modeled by using
nested switch-case statements; however, this type of mod-
eling is not applicable here, because it’s not possible to
instantiate processes inside a case branch. Because pre-
empting processes is not possible, one or more control
signals must control each process. Consequently, the
global-state FSM in Figure 3 must be at the same hierar-
chical level as the processes in Figure 1 (see Figure 4).
Electronic System-Level Design
352 IEEE Design & Test of Computers
Figure 4. Modified frequency relay model for SystemC implementation.
Averagingfilter
Switchcontrol
Timer
Symmetryfunction
Global statefinite-statemachine
Frequencycalculation
ROCcalculation
Peakdetection
Parametersettings
AC waveform
SwitchesCommunicationnetwork
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
Esterel supports behavioral hierarchy and has sev-
eral statements that enable preemption. For example,
concurrent blocks can run inside the body of an abort
statement.
Additional analysis of SystemC and EsterelTables 2 and 3 give the SystemC and Esterel specifi-
cations of the frequency relay. The Esterel specification
is a mixture of Esterel files with an .strl extension, and
C files with a .c extension. Neither specification com-
pletely follows the model in Figure 1.
The total code size for the SystemC specification,
excluding the testbench file, was 1,102 lines. The total
code size for the Esterel specification was 901 lines. This
difference is not significant, considering that the
SystemC specification has more files and thus more dec-
larations. Each SystemC file contains one process. The
first three files in Table 2 contain thread processes; all
others contain method processes.
Although the time required to prepare a simulation is
important, a more critical factor is the actual simulation
time. The Esterel simulation took close to 4 hours,
whereas the SystemC simulation took only 5 minutes.
We performed both simulations on the same platform.
For SystemC, we used Microsoft Visual C++ version 6
with SystemC class library 2.0.1. For Esterel, we used
Esterel Studio version 4, which supports Esterel version
5. The latest, recently released version of Esterel (ver-
sion 7) allows multiclock designs that are globally asyn-
chronous, locally synchronous (GALS).
Several factors might account for the huge difference
in actual simulation times, but the most interesting one
concerns modeling in Esterel. The entire system must
run on one clock because Esterel doesn’t support mul-
tiple clocks. The process with the fastest changing
inputs—the parameter-setting block—determines the
system speed. This speed is unnecessarily high for data-
dominated parts, which need to read inputs only when
a sample arrives. Consequently, there are many ticks
with absent inputs in this part of the system.
Although simulation is the most widely used valida-
tion method, it is not the only one. The other method is
formal verification, which Esterel specifications (unlike
SystemC) may employ. However, formal verification is
not particularly helpful for the frequency relay, because
any useful properties that could be verified would relate
to data-dependent internal activities rather than inputs
and outputs. It would be difficult to define such prop-
erties using Esterel observers, which check properties
only in the control part.
DFChartsBecause of the limitations of SystemC and Esterel,
we introduced DFCharts as a model they should sup-
port to capture heterogeneous embedded systems. (We
explain the detailed semantics of DFCharts else-
where.10) DFCharts combines two well-known models,
SDF and Argos,4,5 in a novel way. SDF is suitable for data-
dominated systems. Argos is suitable for control-domi-
nated systems.
SDF belongs to the family of dataflow models. In
SDF, each process operates on streams of tokens in fir-
ings. A process’ firing rule specifies how many tokens
each firing consumes and produces. In SDF, unlike
dynamic dataflow models, those numbers must be con-
stant, which limits buffer size and makes it possible to
construct efficient static schedules. Because of static
scheduling, the iteration of an SDF graph is clearly iden-
tifiable: It is a series of process firings that return the
buffers to their original state. In Figure 5a, some possi-
353September–October 2006
Table 2. SystemC files for frequency relay specification
(effective lines of source code).
SystemC files Code size
averaging_filter.cpp 85
symmetry_function.cpp 95
peak_detection.cpp 66
frequency_calculation.cpp 93
roc_calculation.cpp 100
parameter_settings.cpp 239
switch_control.cpp 135
timer.cpp 38
frequency_relay.cpp 251
testbench.cpp 412
Table 3. Esterel files for frequency relay specification
(effective lines of source code).
SystemC files Code size
dataflow.strl 76
averaging_filter.c 34
symmetry_function.c 41
measurement.strl 77
freq_average.c 31
roc_average.c 43
parameter_settings.strl 251
switch_control.strl 139
frequency_relay.strl 209
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
ble schedules that create a single iteration are BCA,
CBA, or C and B running concurrently before A. The
numbers next to the processes describe their firing
rules. SDF is suitable for a wide range of signal-process-
ing systems with constant data rates.
Argos models consist of parallel and hierarchical
compositions of FSMs. Argos execution is based on the
synchrony hypothesis, which states that all computations
and communications in the system are instantaneous.
As a result, there is no delay between inputs and outputs;
they are synchronous. Model execution involves a series
of instants (called ticks) of a global clock. In each tick,
Argos reads inputs and instantaneously produces out-
puts. Because all components react simultaneously,
there is no need for scheduling. The three main opera-
tors that Argos uses to construct the HCFSM model are
refinement for hierarchy, synchronous parallel for con-
currency, and hiding for synchronization. Figure 5b
shows a simple Argos specification, which refines state S1
into two concurrent FSMs that synchronize using event c.
When S1 is active and event b occurs, FSM2 makes the
transition and emits c, causing the transition in FSM3 in
the same instant, which in turn emits d. In the instant
when the refined FSM leaves the hierarchical state, the
refining FSMs can react. Thus, d is emitted even if signal
a is present. (This corresponds to the notion of weak pre-
emption, called weak abort in Esterel).
Like Argos, DFCharts has synchronous parallel,
refinement, and hiding operators. However, it also has
the additional asynchronous parallel operator, which it
uses to connect an SDF graph with one or more FSMs.
This operator is asynchronous because the SDF graph
operates independently of FSMs. The SDF graph syn-
chronizes with FSMs only between two iterations: when
it’s receiving inputs for the next iteration and sending
outputs produced during the previous iteration. SDF
graphs can be at any level in the hierarchy of FSMs.
All FSMs in a DFCharts specification use the same set
of ticks (clock). When a tick occurs, every FSM makes a
transition. However, SDF graphs operate at their own
speed. This produces a system with multiple clock
domains, a different domain for each SDF graph and a
single clock domain for all FSMs. This type of mixed syn-
chronous and asynchronous specification supports effi-
cient implementation. Moreover, because DFCharts
allows FSMs and an SDF graph at the same hierarchical
level, each retains its own characteristics.
The example in Figure 6 illustrates the features of
DFCharts. At the top level, state S2 is refined into two par-
allel FSMs that synchronize by local event e. S1 is also
refined into two FSMs, connected by the synchronous
parallel operator; in addition, the asynchronous parallel
operator connects these two FSMs with SDF graph SDF1.
The communication between the SDF graph and the
FSMs passes through channels ch1 and ch2. The arrows
indicate the direction of data exchange. For the SDF
graph, ch1 is an output channel, and ch2 is an input chan-
nel. The communication through each channel occurs
when both the SDF graph and the relevant FSM are ready
for it. (The SDF graph and the FSM meet using CSP-style
rendezvous operations.) If the sender attempts to send
when the receiver is not ready, the sender will block
itself. Similarly, if the receiver attempts to read while the
sender is not ready, the receiver will block itself.
FSMs communicate with SDF graphs from rendezvous
states, which cannot be refined. A rendezvous state is one
that has an outgoing transition triggered by a rendezvous
action. In Figure 6, the rendezvous states are S7 and S9.
When FSM4 is in S7, it is ready to receive data from SDF1
through ch1, as evident from transition ch1?x. We use CSP
notation,7 where “?” denotes a read action, and “!” denotes
a write action. When SDF1 is ready to send data, the com-
munication occurs, triggering transition ch1?x. The data
received from SDF1 is stored in variable x, event h is emit-
ted, and state S8 begins. S8 can also follow S7 when event m
is present, preempting rendezvous on ch1. On the other
hand, FSM5 remains blocked in S9 until SDF1 is ready to
receive data through ch2 from variable y. Figure 7 shows
how DFCharts represents the frequency relay.
Property verification in a DFCharts model is similar
Electronic System-Level Design
354 IEEE Design & Test of Computers
Figure 5. Example specifications of the two models used in DFCharts: SDF (a) and Argos (b).
B 1
111
22 1
3CA S0
a
ab/c c /d
FSM1 FSM2 FSM3c
S1
(a) (b)
S1
Synchronousparallel
Hiding
Refinement
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
to that in Argos. In the latter, combining FSMs removes
hierarchy and concurrency. The result is a single, flat
FSM, whose behavior is equivalent to the original
model. In DFCharts, it is also necessary to integrate SDF
graphs. DFCharts accomplishes this by representing the
operation of each SDF graph as an equivalent HCFSM.
In general, the top-level FSM representing an SDF graph
has two states: io (I/O) and iterate. Figure 8 gives a sim-
ple example of an SDF graph with one input channel
and one output channel.
355September–October 2006
Figure 6. Example of DFCharts model.
Figure 7. Frequency relay in DFCharts.
S3
S4
S5
S6
eg
S2
S8
S7
S8
S9
S10
l
A
B
C
SDF1
S1
2
2
1
1 1
1
1 3
S0
S1 S2
a
c
e
FSM6SDF2 SDF3
m
b/d
n /fg /e
FSM1FSM2 FSM3
FSM2 FSM3
Synchronous parallel
Asynchronous parallelHiding
Refinement
ch1?x/hch1
ch3
ch4 ch2
ch2!y h/assigny1 = y1 + 1y2 = y2 + 2
On Off
1 1 1 111 averagingfilter
symmetryfunction
peakdetectionch1
ch2
initialize mode2
mode1
find_peaks
freq_relay
init_donereset
Timer Switchcontrol
Parametersettings
ROCcalculation
Frequencycalculation
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
The io state is refined by as many concurrent FSMs as
there are inputs and outputs. The input FSM, which con-
sists of two states, receives data through channel cin, and
stores it into variable din. The output FSM sends data from
variable dout through channel cout, as the transition from
so2 to so3 indicates. If no iteration has occurred yet, which
the presence of init indicates, there is nothing to send,
and the output FSM enters so3 immediately after so1.
Otherwise, init is absent (denoted by ⎯⎯init in Figure 8), and
so2 is entered from so1. When the input and output FSMs
enter si2 and so3, respectively, ioc becomes present (io
complete), and the top-level FSM enters iterate, thus
completing a single iteration of the SDF graph. An FSM
representing a particular schedule can further refine this
state. However, this refinement isn’t necessary for the
global analysis.
Comparison between DFCharts andother models
Besides DFCharts, the only other model that com-
bines FSMs and SDFs is *charts,6 which is a part of
Ptolemy.8 The Ptolemy environment hierarchically com-
bines several MoCs. At each hierarchical level, blocks
must obey a single MoC’s semantics, but a designer can
internally refine each block into a system that behaves
according to some other model. The closest subset of
Ptolemy to DFCharts is *charts, which focuses on mix-
ing FSMs with other models. With hierarchical hetero-
geneity, it might be difficult in *charts to devise a
meaningful communication mechanism between outer
and inner models. The inner model might lose some
properties while adjusting to the outer model. For exam-
ple, if a network of SR blocks refines an SDF block, the
refining blocks receive their inputs through blocking
reads, so they are not really reactive. Conversely, if an
SDF network refines an SR block, the SDF network must
conform to the synchrony hypothesis. This means
*charts will assume its iteration is instantaneous and
will synchronize it to all SR blocks in the upper hierar-
chical level. Such assumptions are likely to produce
inefficient implementations. With the parallel hetero-
geneity used in DFCharts, FSMs are free to react to exter-
nal events, and SDF graphs can run at their own speed.
The Communicating Reactive State Machines
(CRSM) language also extends Argos with an asyn-
chronous parallel operator, which uses rendezvous
channels to connect parallel FSMs.11 Thus, DFCharts has
more in common with CRSM than Argos. However, the
purpose of the asynchronous parallel operator in CRSM
is to connect parts in a distributed system, whereas in
DFCharts this operator serves to connect physically
close control-dominated and data-dominated parts.
Another important difference is that in CRSM the asyn-
chronous parallel operator can function only at the top
level (in a GALS manner), whereas in DFCharts it can
function at any hierarchical level.
Feature extensions of SystemC andEsterel
According to our analysis, SystemC only partially
supports or does not support at all the expanded system-
level requirements of rendezvous communication,
dataflow, HCFSMs, and hierarchy and preemption. A
designer can construct a rendezvous channel using wait
and notify statements to create the necessary request
and acknowledge lines for the rendezvous protocol, but
this could take some effort. Ideally, the designer should
add a standard rendezvous channel to the library of
channels that includes sc_fifo, sc_signal, and so on.
Asynchronous thread processes that communicate
through FIFO channels using blocking reads provide a
good foundation for dataflow models.
However, it’s also still difficult in SystemC to specify
firing rules and construct static-scheduling orders, so
improvements are necessary in this area as well.
Synchronous processes can be created in SystemC, and
this is essential for HCFSM support. It’s also possible to
model reactivity using signal sensitivities and wait and
notify statements. But the absence of preemption is a
serious disadvantage when modeling control-dominat-
ed behavior. Processes cannot be instantaneously ter-
minated or interrupted, which is necessary for the
hierarchy and preemption requirement. Overcoming
this fundamental limitation would require making deep
changes in SystemC’s simulation semantics.
SystemC-H is an extension of SystemC that incorpo-
rates some of these desired changes.12 SystemC-H has
an extended SystemC kernel to better support SDF, CSP,
Electronic System-Level Design
356 IEEE Design & Test of Computers
Figure 8. FSM representing the operation of a two-channel
SDF graph.
io state
iterateioc
itc
si1
si2
so2cin?din
cout!dout
so1
so3
initinit
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
and FSM models. Constructing static schedules for SDF
models is possible, and this increases simulation effi-
ciency. Another important addition is hierarchical het-
erogeneity with SDF and FSM models. In its current
form, though, SystemC-H probably wouldn’t be able to
support DFCharts entirely, because the former adheres
to purely hierarchical heterogeneity, as in Ptolemy,
whereas DFCharts represents a mixture of hierarchical
and parallel heterogeneity.
Like SystemC, Esterel does not directly support ren-
dezvous, but using await and emit statements, a design-
er could construct rendezvous. The main problem with
Esterel is its complete lack of support for the third
expanded system-level requirement: support for
dataflow, including buffered communication between
processes and specification of firing rules for dataflow
modules. The assumption made by the synchrony
hypothesis (that all computations are instantaneous) is
seldom valid for data-dominated systems. Furthermore,
Esterel syntax is not appropriate for dataflow. It would
be possible to design a dataflow network inside an asyn-
chronous task. But, describing something in an asyn-
chronous task means going outside Esterel and its
development tools. Creating a solid basis for an inte-
grated environment requires defining a MoC (such as
SDF) for asynchronous tasks and interfacing this MoC
with the SR model.
WE INTEND TO CREATE a graphical environment
for designing embedded systems using DFCharts.
Therefore, we’ve implemented a Java class library to
execute DFCharts specifications. This library incorpo-
rates methods for analyzing SDF graphs from Ptolemy
II. In fact, this was one of the reasons we chose Java for
the implementation. The next step is to create a graph-
ical interface. Another direction of research, which is
the focus of this article, is to modify widely accepted
system-level languages such as SystemC and Esterel to
support DFCharts. ■
References1. M. von der Beeck, “A Comparison of Statecharts
Variants,” Proc. Formal Techniques in Real-Time and
Fault-Tolerant Systems, LNCS 863, Springer-Verlag,
1984, pp. 128-148.
2. Open SystemC Initiative, SystemC Version 2.0 User’s
Guide; http://www.systemc.org.
3. G. Berry and G. Gonthier, “The Esterel Synchronous
Programming Language: Design, Semantics, Implemen-
tation,” Science of Computer Programming, vol. 19, no.
2, Nov. 1992, pp. 87-152.
4. E.A. Lee and D.G. Messerschmitt, “Synchronous Data
Flow,” Proc. IEEE, vol. 75, no. 9, Sept. 1987, pp. 1235-
1245.
5. F. Maraninchi and Y. Remond, “Argos: An Automation-
Based Synchronous Language,” Computer Languages,
vol. 27, nos. 1-3, 2001, pp. 61-92.
6. A. Girault, B. Lee, and E. Lee, “Hierarchical Finite State
Machines with Multiple Concurrency Models,” IEEE
Trans. Computer-Aided Design of Integrated Circuits
and Systems, vol. 18, no. 6, June 1999, pp. 742-760.
7. C.A.R. Hoare, “Communicating Sequential Processes,”
Comm. ACM, vol. 21, no. 8, Aug. 1978, pp. 666-677.
8. J. Eker et al., “Taming Heterogeneity—The Ptolemy
Approach,” Proc. IEEE, vol. 91, no. 1, Jan. 2003, pp.
127-144.
9. S. Edwards et al., “Design of Embedded Systems:
Formal Methods, Validation, and Synthesis,” Proc.
IEEE, vol. 85, no. 3, Mar. 1997, pp. 366-390.
10. I. Radojevic, Z. Salcic, and P. Roop, “Modeling Hetero-
geneous Embedded Systems in DFCharts,” Proc. Forum
Design and Specification Languages (FDL 05), Euro-
pean Chips and Systems Initiative, 2005, pp. 441-452.
11. S. Ramesh, “Communicating Reactive State Machines:
Design, Model and Implementation,” Proc. IFAC Work-
shop Distributed Computer Control Systems, Pergamon
Press, 1998; http://www.cfdvs.iitb.ac.in/projects/crsm/
ifac.ps.
12. H. Patel and S. Shukla, SystemC Kernel Extensions for
Heterogeneous System Modeling: A Framework for
Multi-MoC Modeling & Simulation, Kluwer Academic
Publishers, 2004.
Ivan Radojevic is a PhD candidatein the Department of Electrical andComputer Engineering at the Univer-sity of Auckland in New Zealand. Hisresearch interests include design lan-
guages, models of computation, and hardware-soft-ware codesign for embedded systems. Radojevic hasa BE in electrical engineering from the University ofAuckland.
357September–October 2006
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
Zoran Salcic is a professor of com-puter systems engineering at the Uni-versity of Auckland. His researchinterests include complex digital-systems design, custom-computing
machines, reconfigurable systems, FPGAs, processorand computer systems architectures, embedded sys-tems and their implementation, design automationtools for embedded systems, hardware-softwarecodesign, new computing architectures and modelsof computation for heterogeneous embedded sys-tems, and related areas in computer systems engi-neering. Salcic has a BE, an ME, and a PhD inelectrical and computer engineering from the Univer-sity of Sarajevo. He did most of his PhD research at theCity University of New York (CUNY). He is a seniormember of the IEEE.
Partha S. Roop is a senior lecturerin the Department of Electrical andComputer Engineering at the Universi-ty of Auckland. His research interestsinclude the design and verification of
embedded systems—especially formal verificationtechniques such as model checking and modulechecking, and their applications in embedded sys-tems. Roop has a BE in engineering from Anna Uni-versity, Madras, India; an MTech from the IndianInstitute of Technology, Kharagpur, India; and a PhDin computer science from the University of New SouthWales, Sydney, Australia.
Direct questions or comments about this article to Ivan Radojevic, Department of Electrical andComputer Engineering, University of Auckland, 38Princess St., Auckland, New Zealand; [email protected].
Electronic System-Level Design
358 IEEE Design & Test of Computers
IEEE Design & Test Call for PapersIEEE Design & Test, a bimonthly publication of the IEEE Computer Society and the IEEE Circuits and Systems Society, seeks original manuscripts
for publication. D&T publishes articles on current and near-future practice in the design and test of electronic-products hardware and supportive
software. Tutorials, how-to articles, and real-world case studies are also welcome. Readers include users, developers, and researchers concerned
with the design and test of chips, assemblies, and integrated systems. Topics of interest include
To submit a manuscript to D&T, access Manuscript Central, http://cs-ieee.manuscriptcentral.com. Acceptable file formats include MS Word,
PDF, ASCII or plain text, and PostScript. Manuscripts should not exceed 5,000 words (with each average-size figure counting as 150 words toward
this limit), including references and biographies; this amounts to about 4,200 words of text and five figures. Manuscripts must be double-spaced, on
A4 or 8.5-by-11-inch pages, and type size must be at least 11 points. Please include all figures and tables, as well as a cover page with author contact
information (name, postal address, phone, fax, and e-mail address) and a 150-word abstract. Submitted manuscripts must not have been previously
published or currently submitted for publication elsewhere, and all manuscripts must be cleared for publication.
To ensure that articles maintain technical accuracy and reflect current practice, D&T places each manuscript in a peer-review process. At least
three reviewers, each with expertise on the given topic, will review your manuscript. Reviewers may recommend modifications or suggest additional
areas for discussion. Accepted articles will be edited for structure, style, clarity, and readability. Please read our author guidelines (including
important style information) at http://www.computer.org/dt/author.htm.
Submit your manuscript to IEEE Design & Test today!
D&T will strive to reach decisions on all manuscripts within six months of submission.
■ Analog and RF design,
■ Board and system test,
■ Circuit testing,
■ Deep-submicron technology,
■ Design verification and validation,
■ Electronic design automation,
■ Embedded systems,
■ Fault diagnosis,
■ Hardware-software codesign,
■ IC design and test,
■ Logic design and test,
■ Microprocessor chips,
■ Power consumption,
■ Reconfigurable systems,
■ Systems on chips (SoCs),
■ VLSI, and
■ Related areas.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:32 UTC from IEEE Xplore. Restrictions apply.
0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006 359
THE GROWTH OF THE EDA INDUSTRY has been less
than satisfactory in the past few years. For example, in
2005 growth was only 0.6%,1 and in 2006 it is predicted to
be less than 3%.2 The reasons are varied and are beyond
the scope of this article. However, one of the main issues
is the failure of EDA to address new customers. New cus-
tomers imply a revenue potential that is not consuming
present business, thus allowing real industry growth.
Traditionally, EDA has served the IC industry, where the
demand for tools has been rampant since the early 1980s.
An obvious adjacent market for EDA growth is electron-
ic system-level (ESL) design. (See the “Trends affecting
the ESL design market” sidebar for a brief history and
explanation of how various market factors have con-
tributed to developments in ESL design.)
The 2004 International Technology Roadmap for
Semiconductors (ITRS) placed ESL “a level above RTL,”
including both hardware and software design. The ITRS
defined ESL to “consist of a behavioral (before HW/SW
partitioning) and architectural level (after)” and claimed
it would increase productivity by 200,000 gates per
designer-year. The ITRS states that ESL will improve pro-
ductivity by 60% over an “Intelligent Testbench”
approach—the previously proposed ESL design improve-
ment.3 Although these claims cannot yet be verified and
seem quite aggressive, most agree that
ESL’s overarching benefits include
■ raising the abstraction level at which
designers express systems,
■ enabling new levels of design reuse,
and
■ providing for design chain integration
across tool flows and abstraction levels.
The purpose of this article is to paint the ESL design
landscape by providing a unified framework for plac-
ing and analyzing existing and future tools in the con-
text of an extensible design flow. This approach should
help designers use tools more efficiently, clarify their
flow’s entry and exit points, and highlight areas in the
design process that could benefit from additional tools
and support packages. This framework is based on plat-
form-based design concepts.4,5 Using this framework,
we’ve classified more than 90 different academic and
industrial ESL offerings and partitioned the tool space
into metaclasses that span an ideal design flow.
(Although we try to cover as much of the ESL tool space
as possible, we make no claim of completeness. We
apologize in advance to the authors of tools we have
inadvertently ignored. Also, we don’t analyze the exten-
sive literature that describes these tools; rather, we iden-
tify Web sites that contain relevant information.)
We used this framework to explore three design sce-
narios to demonstrate how those involved in ESL design
at various levels and roles can effectively select tools to
accomplish their tasks more efficiently than in a tradi-
tional IC design flow. The ability to study design sce-
narios goes beyond mere classification, because our
framework exposes the relationships and constraints
A Platform-Based Taxonomyfor ESL Design
Editor’s note:This article presents a taxonomy for ESL tools and methodologies thatcombines UC Berkeley’s platform-based design terminologies with DanGajski’s Y-chart work. This is timely and necessary because in the ESL worldwe seem to be building tools without first establishing an appropriate designflow or methodology, thereby creating a lot of confusion. This taxonomy canhelp stem the tide of confusion.
—Gary Smith, Gartner Dataquest
Douglas Densmore
University of California, Berkeley
Roberto Passerone
University of Trento
Alberto Sangiovanni-Vincentelli
University of California, Berkeley
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
among different classes to the designer, who may wish
to implement a specific integration flow. (The “Related
work” sidebar discusses other efforts to categorize ESL
design approaches.)
The ESL classification frameworkThe design framework shown in Figure 1 is based on
the platform-based design (PBD) paradigm presented
by Sangiovanni-Vincentelli and Martin.5 This framework
treats the design process as a sequence of steps that
repeat themselves as the design moves from higher
abstraction levels to implementation. The primary struc-
ture is a Y shape; thus, it is similar to the famous Y-chart
introduced by Gajski. The left branch expresses the
functionality (what) that the designer wishes to imple-
ment; the right branch expresses the elements the
designer can use to realize this functionality (how); and
the lower branch identifies the elements the designer
will use to implement the functionality (the mapping).6
In this context, the right branch is the platform, and it
includes
■ a library of elements, including IP blocks and com-
munication structures, and composition rules that
express which elements can be combined and how;
and
■ a method to assess the quantities associated with
each element—for example, power consumed or
time needed to carry out a computation.
Each legal composition of elements from the plat-
form is a platform instance. Mapping involves selecting
the design components (choosing the platform
instance) and assigning functionality parts to each ele-
ment, thus realizing the complete functionality, possi-
bly with overlaps. Designers optimize this process
according to a set of metrics and constraints defined
from the cost figures provided, or quantities mentioned.
The designers then use these metrics to evaluate the
design’s feasibility and quality.
This view of the design process is basically an
abstraction of a process that designers have used implic-
itly for years at particular abstraction levels. For exam-
Electronic System-Level Design
360 IEEE Design & Test of Computers
The number of electronic system-level (ESL) designersis reportedly several orders of magnitude larger than thenumber of IC designers. However, until the late 1990s, thesystem-level design market had been highly fragmented.Consumers were unwilling to pay a high price for tools, soEDA companies produced relatively simple tools. For mostof the products in this market, the end product’s complex-ity was not a limiting factor.
In the late 1990s, the situation began to change dra-matically as system complexity reached an inflection pointwith the appearance of increasingly powerful electronicdevices. Demand increased for demonstrably safe, effi-cient, and fault-tolerant operation of transportation systemssuch as automobiles and airplanes. Demand alsoincreased for greater functionality in IT and communica-tion devices, such as computing equipment and cellphones. During the past 10 years, several recalls (consid-er those from BMW and Daimler-Chrysler alone in the pasttwo years, for example) and delays in the launch of previ-ously announced products in the consumer electronicssectors demonstrated that new design methods, tools, andflows were sorely needed to prevent expensive fixes in thefield and to bring new products to the market more quick-ly and reliably.
This situation created the conditions for the birth of new
tool companies and new offerings in established EDA com-panies to address the needs of a changing market.However, because the system industry landscape is verydiverse—with companies varying as widely as Nokia andGeneral Motors, Boeing and Otis Elevators, and Hewlett-Packard and ABB—a design approach that could satisfyall these diverse needs would have required a large invest-ment, with a high risk of failure. Hence, the bulk of the ESLdesign effort (with a few notable exceptions) has comefrom academia and some small start-up companies tryingto address a subset of the many problems and gearedtoward a limited number of potential customers.
For years, Gartner Dataquest has predicted dramaticgrowth in ESL tool revenues, which unfortunately has failedto materialize. One of the reasons for unrealized growth isthe lack of a vision in EDA of what system-level designought to be and of how various tools fit in an overallmethodology that the system industry at large could satis-factorily adopt. Consequently, there is confusion about thevery definition of ESL and about what role it could play inthe overall design of electronic products. Some compa-nies have adopted ESL methodologies and tools, devel-oped either internally or in academic circles, integratingsome commercial tools as well. However, we are certainlyat a relatively early stage of adoption.
Trends affecting the ESL design market
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
ple, interpreting the logic synthesis process in this
framework, we find the following:
■ RTL code or Boolean functions represent the
design’s functionality.
■ The platform includes a library of gates, or higher-
complexity logic blocks.
■ Mapping is the actual logic synthesis step that imple-
ments the functionality as an interconnection of
gates (platform instance) optimizing a set of metrics
involving area, power, and timing; the synthesis tool
then exports the mapped design (gate-level netlist)
to the layout phase, and the physical design tool
maps this representation to a physical platform.
The PBD paradigm applies equally well to the applica-
tion and algorithmic levels, where functionality can be a
mathematical description—for example, a Moving Picture
Experts Group (MPEG) encoding algorithm. Also, the plat-
form can be a set of subalgorithms for implementing each
functional block of the encoding method. The result of the
mapping process then goes to a lower level, where the left
branch is a mapped platform instance, and the right
361September–October 2006
We are not the first to realize the importance of catego-rizing ESL design approaches. Smith and Nadamuni usedtwo axes for this purpose.1 The first axis contains threemethodology components: an algorithmic methodology, aprocessor and memory methodology, and a control-logicmethodology. Each refers to the way in which a designerthinks about the design or its components. The secondaxis includes the abstraction levels to express the designs:behavioral, architectural, and platform based. Smith andNadamuni examined approximately 50 approaches in thisframework.
Maniwa presented a similar approach, also based ontwo axes, to categorize industrial tools.2 The first axis is thedesign style: embedded software, SoC (hardware), behav-ioral, or component. The second axis is the language (forexample, C, C++, or Verilog) to describe the design.Maniwa examined approximately 41 approaches.
Gries also used two axes to classify ESL tools devel-oped in academia and industry.3 The axes in this caserelated to abstraction levels (for example, system level andmicroarchitectural level) and design stages (such as appli-cation, architecture, and exploration). Gries examinedapproximately 19 approaches.
Finally, Bailey, Martin, and Anderson provided a com-prehensive set of taxonomies: a model taxonomy, a func-tional-verification taxonomy, a platform-based designtaxonomy, and a hardware-dependent software taxonomy.4
To the best of our knowledge, their book provides the bestclassification of high-level design tools, and we follow itsdefinitions when appropriate. Compared to their approach,our paradigm places tools in a more general design con-text and gives guidelines on how to connect the availabletools, and IP blocks and their models, in a design flow.
References1. G. Smith and D. Nadamuni, “ESL Landscape 2005,”
Gartner Dataquest, 2005.
2. T. Maniwa, “Focus Report: Electronic System-Level
(ESL) Tools,” Chip Design, Apr./May 2004, http://www.
chipdesignmag.com/display.php?articleId=23&issueId=4.
3. M. Gries, “Methods for Evaluating and Covering the Design
Space during Early Design Development,” Integration: The
VLSI J., vol. 38, no. 2, Dec. 2004, pp. 131-138.
4. B. Bailey, G. Martin, and T. Anderson, Taxonomies for
the Development and Verification of Digital Systems,
Springer, 2005.
Related work
FunctionalityF P
M
Platform
Mapping
Figure 1. Platform-based design classification framework
elements. Functionality indicates functional representations
of a design completely independent of implementation
architectures. Platform concerns the modules used to
implement the functional description—for example,
processors, memories, and custom hardware. Mapping refers
to instances of the design in which the functionality has been
assigned to a set of correctly interconnected modules.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
branch is a new set of elements for implementing the
mapped platform instance. This process repeats until the
result of the mapping process is a fully implemented solu-
tion. Thus, the design process is partitioned into levels,
where each level represents a particular abstraction. The
corresponding platform and mapping process optimizes
specific aspects of the design.
This framework prescribes a unified design method-
ology and hence is useful for identifying where existing
tools and flows fit and how to integrate them in the over-
all system design process.
Classifying ESL toolsWe use the PBD paradigm to classify several ESL-relat-
ed tools. Doing so casts present system-level design efforts
in a global framework that serves as a unifying element.
Of course, existing approaches may fall into more than
one classification category because they cover more than
one step of PBD. We could consider this a fault of the
classification method, because a classification is effec-
tive only if it can cleanly partition the various objects
being classified. However, partitioning the design steps
rather than the tool coverage is more powerful because it
identifies the tools’ roles in the overall design paradigm.
Indeed, the classification criteria can provide hints on
how to connect different tools to yield an encompassing
design flow. We’ve developed an environment for design
space exploration called Metropolis, which completely
reflects the design paradigm followed here. Metropolis
can serve as the unifying framework for system design,
where tool developers can embed tools, libraries, and
approaches if the appropriate interfaces are built.
The classification classes reflect the Y-shaped diagram,
with an additional classification criterion related to the
abstraction level at which the tools work (see Figure 1):
Bin F consists of functional representations of a
design independent of implementation architectures
and with no associated physical quantity, such as time
or power. For example, a Simulink diagram expressing
an algorithm for automotive engine control and a
Ptolemy II description of an MPEG-decoding algorithm
both belong to this bin. These diagrams could be refine-
ments of more abstract representations such as meta-
models, as in Metropolis. To this bin, we assign tools that
manipulate, simulate, and formally or informally ana-
lyze functional descriptions.
Bin P represents the library of modules for imple-
menting the functional description. The modules are
architectural elements such as processors, memories,
coprocessors, FPGAs, custom hardware blocks, and
interconnections (buses, networks, and so on). The ele-
ments also include middleware, such as operating sys-
tems for processors and arbitration protocols for buses,
because these software components present the archi-
tectural services that the hardware offers to the applica-
tion software. To this bin, we assign tools for connecting
or manipulating the modules, as well as tools for ana-
lyzing the property of the complete or partial platform
instances obtained.
Bin M represents mapped instances of the design in
which the designer or an automatic mapping tool has
assigned functionality to a set of correctly intercon-
nected modules. The connection between bins F, P,
and M represents the mapping process. In this bin, we
classify any tool that assigns architectural elements to
functionality or generates the design’s mapped view.
For example, bin M would include a high-level synthesis
tool because the designer has assigned, perhaps man-
ually, part of the functionality to a virtual hardware com-
ponent in the platform and is asking the tool to generate
the lower-level view, in this case an RTL description of
the design. By the same token, we can classify a code
generation tool in bin M because the designer has
assigned (perhaps manually) part of the functionality
to a software-programmable element of the library and
is asking the tool to generate the lower-level view. In this
case, the view is a software program—whether assem-
bly language, C, or a higher-level language—which is
then compiled to move toward implementation. In this
article, we consider the compilation phase and the syn-
thesis from RTL to gates to be part of a traditional design
flow and thus not part of our ESL tool classification.
Some tools can handle two or even all three aspects
of the PBD paradigm. To classify these tools, we intro-
duce metaclasses (or metabins), indicated by combi-
nations of F, P, and M. For example, in metabin FM, we
assign a synthesis tool that handles functional compo-
nents along with their mappings to platform compo-
nents. Tools classified in metaclasses cover several parts
of the PBD design flow. Designers using these tools can
benefit from the design view we propose by clearly
decoupling function from architecture and mapping.
Doing so can enhance reusability and help the design-
er reach a correct implementation efficiently.
To make the partitioning of the tools finer, we intro-
duced another, orthogonal criterion for classification:
the abstraction level at which the tools operate.
Whereas PBD doesn’t limit the abstraction levels that
designers use per se, most of the tools we reviewed
work at three levels, listed here from highest to lowest:
Electronic System-Level Design
362 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
■ System level S corresponds to heterogeneous designs
that use different models of computation (MoCs) to
represent function, platforms, and mappings.
■ Component level C involves subsystems containing
homogeneous components.
■ Implementation level I comprises the final design
step, when the design team considers the job
complete.
We now present our classification, beginning with
tools that fall into individual bins—those meant to be
part of a larger tool flow or that work in a very specific
application domain. We then address tools that cover
larger portions of the design flow space.
Bin FTools in this bin often serve to capture designs and
their specifications quickly without making any assump-
tions about the underlying implementation details (see
Tables 1-3). At this level, the descriptions might include
behavioral issues such as concurrency, or communi-
cation concepts such as communication protocols.
Some tools handle only one MoC—for example, finite-
state machines (FSMs). Others are more general, han-
dling a set of MoCs or having no restrictions. For
example, the Simulink representation language handles
discrete dataflow and continuous time. Hence, it is a
limited heterogeneous modeling-and-analysis tool.
Ptolemy II, with its actor-oriented abstract semantics,
363September–October 2006
Table 1. Tools in bin F: Industrial. (C: component level; I: implementation level; S: system level)
Provider Tools Focus Abstraction Web site
MathWorks Matlab High-level technical computing S: Matlab language, http://www.mathworks.com/products/
language and interactive vector, and matrix matlab
environment for algorithm operations
development, data visualization,
analysis, and numeric
computation.
Scilab Scicos Graphically model, compile, and S: Hybrid systems http://www.scilab.org
simulate dynamic systems
Novas Verdi Debugging for SystemVerilog I: Discrete event http://www.novas.com
Software
Mentor SystemVision Mixed-signal and high-level S: VHDL-AMS, http://www.mentor.com/products/
Graphics simulation Spice, C sm/systemvision
EDAptive EDAStar Military and aerospace S: Performance http://www.edaptive.com
Computing system-level design models
Time Rover DBRover, Temporal rules checking, pattern C: Statecharts http://www.time-rover.com
TemporalRover, recognition, and knowledge assertions
StateRover reasoning
Maplesoft Maple Mathematical problem S: Mathematical http://www.maplesoft.com
development and solving equations
Wolfram Mathematica Graphical mathematical S: Mathematical http://www.wolfram.com
Research development and problem equations
solving with support for Java,
C, and .Net
Mesquite CSIM 19 Process-oriented, general-purpose S: C, C++ http://www.mesquite.com
Software simulation toolkit for C and C++
Agilent Agilent Ptolemy Functional verification C: Timed http://www.agilent.com
Technologies synchronous
dataflow
National LabView Test, measurement, and control S: LabView http://www.ni.com/labview
Instruments application development programming
language
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
can handle all MoCs. Depending on the MoC support-
ed, design entry for each tool could start at a higher or a
lower abstraction level.
Bin PThis category includes providers of platforms or plat-
form components, as well as tools and languages that
describe, manipulate, or analyze unmapped platforms
(see Tables 4 and 5). Similar to tools in bin F, those in bin
P can span several abstraction layers and support differ-
ent kinds of architectural components. For example,
Xilinx and Altera mainly concern programmable hard-
ware devices, whereas Tensilica focuses on configurable
processors. Others, such as Sonics and Beach Solutions,
focus on integration and communication components.
This category’s main characteristic is configurability,
which ensures the applicability of a platform or compo-
nents to a wide variety of applications and design styles.
Bin MThis bin contains tools dedicated to refining a func-
tional description into a mapped platform instance,
including its performance evaluation and possibly the
synthesis steps required to proceed to a more detailed
abstraction level (see Tables 6-8). The tools in bin M
vary widely in particular design style, MoC, and sup-
ported application area. To provide the necessary qual-
ity of results, the tools are typically very specific.
Electronic System-Level Design
364 IEEE Design & Test of Computers
Table 2. Tools in bin F: Academic.
Provider Tools Focus Abstraction Web site
Univ. of Ptolemy II Modeling, simulation, and design of S: All MoCs http://ptolemy.eecs.berkeley.edu
California, concurrent, real-time,
Berkeley embedded systems
Royal Inst. of ForSyDe System design starts with a C: Synchronous MoC http://www.imit.kth.se
Technology, synchronous computational
Sweden model, which captures
system functionality
Mozart Board Mozart Advanced development platform for S: Object-oriented http://www.mozart-oz.org
intelligent, distributed applications GUI using Oz
Table 3. Tools in bin F: Languages.
Provider Tools Focus Abstraction Web site
Celoxica Handel-C Compiling programs into hardware C: Communicating NA
images of FPGAs or ASICs sequential processes
Univ. of SpecC ANSI-C with explicit support for C: C language based http://www.ics.uci.edu/~specc
California, behavioral and structural
Irvine hierarchy, concurrency, state
transitions, timing, and
exception handling
Inria Esterel Synchronous-reactive C: Synchronous http://www-sop.inria.fr/meije/
programming language reactive esterel/esterel-eng.html
Univ. of Rosetta Compose heterogeneous S: All MoCs http://www.sldl.org
Kansas specifications in a single
declarative semantic
environment
Mozart Board Oz Advanced, concurrent, networked, C: Dataflow http://www.mozart-oz.org
soft real-time, and reactive synchronization
applications
Various ROOM Real-time object-oriented modeling S: Object oriented NA
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
Metabin FPThis category consists of languages that can express
both functionality and architecture (see Tables 9 and 10
on p. 368). Typically, they express algorithms and differ-
ent styles of communication and structure for different
MoCs. Assertions, or constraints, complement the platform
description. In the case of Unified Modeling Language
(UML), the semantics are often left unspecified.
Metabin FMThis metabin reflects tools that provide some com-
bination of functional description and analysis capa-
bilities plus mapping and synthesis capabilities (see
Table 11 on p. 368). In this case, the platform architec-
ture is typically fixed. This lack of flexibility is offset by
the often superior quality of achievable implementation
results.
Metabin PMThis metabin includes tools that combine architec-
tural services and mapping (see Tables 12-14 on pp. 369-
370). These tools have a tight coupling between the
services they provide and how functionality can map to
these services. They require the use of other tools for
some aspect of system design (often in the way the
design functionality is specified).
Metabin FPMEntries in this category are the frameworks that sup-
port the PBD paradigm (see Tables 15 and 16 on p. 371).
365September–October 2006
Table 4. Tools in bin P: Industrial.
Provider Tools Focus Abstraction Web site
Prosilog Nepsys Standards-based IP libraries and C: RTL and http://www.prosilog.com
support tools (SystemC) transaction-level
SystemC; VHDL
for SoCs
Beach EASI-Studio Solutions to package and deploy C: Interconnection http://www.beachsolutions.com
Solutions IP in a repeatable, reliable manner
Altera Quartus II FPGAs, CPLDs, and structured I: IP blocks, C, and http://www.altera.com
ASICs RTL; FPGAs
Xilinx Platform Studio IP integration framework C: IP blocks, FPGAs http://www.xilinx.com
Mentor Nucleus Family of real-time operating S: Software http://www.mentor.com/products/
Graphics systems and development tools embedded_software/nucleus_rtos
Sonics Sonics Studio On-chip interconnection I: Bus-functional http://www.sonicsinc.com
infrastructure models
Xilinx ISE, EDK, FPGAs, CPLDs, and structured I: IP blocks, C, and http://www.xilinx.com
XtremeDSP ASICs RTL; FPGAs
Design and Hosted Extranet IP delivery systems S: All types of IP http://www.design-reuse.com
Reuse Services
Stretch Software Compile a subset of C into C: Software- http://www.stretchinc.com
Configurable hardware for instruction configurable
Processor extensions processors
compiler
ProDesign CHIPit Transaction-based verification C: FPGA-based rapid http://www.prodesign-usa.com
platform prototyping
Table 5. Tools in bin P: Languages.
Provider Tools Focus Abstraction Web site
Spirit Spirit IP exchange and integration S: Various IP levels http://www.spiritconsortium.com
Consortium standard written in XML
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
In particular, Metropolis fully embodies this paradigm,
covering all bins and all abstraction layers. In this cate-
gory, we include design space exploration tools and lan-
guages that can separately describe the functionality on
the one hand, and the possible architectures for an
implementation on the other. These tools can also map
the functionality onto the platform instances to obtain
metrics for the implementation’s performance.
Design scenariosHere, we use the PBD framework of Figure 1 to map
three design flow scenarios on the tool landscape.
Figure 2 (see p. 372) shows the metabins and the hier-
archical levels where activities take place.
Scenario 1: New application design fromspecification
The requirements of this scenario include the need
to start from a high-level specification; the desire to cap-
ture and modify the initial specification quickly; the
ability to express concurrency, constraints, and other
behavior-specific characteristics efficiently; and the
ability to capture useful abstract services for imple-
menting high-level specifications into a more detailed
functional view. The flow thus starts at the higher
abstraction levels in bin F of our classification. We can
expand these levels into a Y diagram of the same struc-
ture as the one described in Figure 1. This structure
offers
■ flexible specification capture—no ties to a particu-
lar implementation style or platform;
■ services that help move the abstract design toward a
more constrained version (for example, algorithms
that can implement functionality); and
■ independent mapping of functionality onto algo-
rithmic structures that enable reuse of the functional
specification.
Electronic System-Level Design
366 IEEE Design & Test of Computers
Table 6. Tools in bin M: Industrial, set I.
Provider Tools Focus Abstraction Web site
MathWorks Real-Time Code generation and embedded- S: Simulink-level http://www.mathworks.com
Workshop software design models
dSpace TargetLink Optimized code generation and S: Simulink models http://www.dspace.com
software development
ETAS Ascet Modeling, algorithm design, code S: Ascet models http://en.etasgroup.com/products/
generation, and software ascet/index.shtml
development, with emphasis on
the automotive market
Y Explorations eXCite Take virtually unrestricted ISO or S: C language input http://www.yxi.com
ANSI-C with channel I/O behavior
and generate Verilog or VHDL
RTL output for logic synthesis
AccelChip AccelChip and DSP synthesis; Matlab to RTL C: Matlab http://www.accelchip.com
AccelWare
Forte Design Cynthesizer Behavioral synthesis C: SystemC to RTL http://www.forteds.com
Systems
Future Design System Center ASCI-C to RTL synthesis toolset C: C to RTL http://www.future-da.com
Automation Co-development
Suite
Catalytic DeltaFX, RMS Synthesis of DSP algorithms on I: Matlab algorithms http://www.catalytic-inc.com
processors or ASICs
ACE CoSy Automatic generation of compilers I: DSP-C and http://www.ace.nl
Associate for DSPs embedded-C
Compiler language extensions
Experts
Tenison VTOC RTL to C++ or SystemC I: RTL, transactional http://www.tenison.com
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
Let’s examine an example in the multimedia
domain: the implementation of a JPEG encoder on a
heterogeneous multiprocessor architecture such as the
Intel MXP5800. This architecture has eight image signal
processors (ISP1 to ISP8) connected with programmable
quad ports (eight per processor).7 The encoder com-
presses raw image data and emits a compressed bit-
stream. The first step in the scenario is to choose a
367September–October 2006
Table 7. Tools in bin M: Industrial, set II.
Provider Tools Focus Abstraction Web site
Sequence ESL Power Power analysis and optimization I: SystemC level http://www.sequencedesign.com
Design Technology,
Power Theater,
CoolTime,
CoolPower
PowerEscape PowerEscape Memory hierarchy design, code C: C code http://www.coware.com/products/
(with Architect, performance analysis, powerescape.php
CoWare) PowerEscape complete profiling
Synergy,
PowerEscape
Insight
CriticalBlue Cascade Design flow for application-specific I: C code to Verilog http://www.criticalblue.com
hardware acceleration or VHDL
coprocessors for ARM processors
Synfora PICO Express C to RTL, or C to System C I: Pipeline processor http://www.synfora.com
(transaction-level models) arrays
Actis AccurateC Static code analysis for SystemC C: C syntax and http://www.actisdesign.com
semantic checking
Impulse CoDeveloper C to FPGA C: C code http://www.impulsec.com
Accelerated
Technologies
Poseidon Triton Tuner, Design flow for application-specific C: C and SystemC http://www.poseidon-systems.com
Design Triton Builder hardware acceleration
Systems coprocessors
SynaptiCAD SynaptiCAD line Testbench generators and C: RTL and SystemC http://www.syncad.com
simulators
Avery TestWizard Verilog HDL, VHDL, and C-based I: RTL and C http://www.avery-design.info
Design testbench automation
Systems
Emulation and ZeBu Functional verification I: Hardware emulation http://www.eve-team.com
Verification
Engine
Table 8. Tools in bin M: Academic.
Provider Tools Focus Abstraction Web site
Univ. of Impact Compiler Compilation development for S: C code for high- http://www.crhc.uiuc.edu/Impact
Illinois at instruction-level parallelism performance
Urbana- processors
Champaign
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
particular MoC to describe the design’s functionality.
To be more efficient in applying our proposed design
paradigm, the designer should use a MoC that is also
suitable for describing the architecture’s capabilities.
Hence, the designer eases the mapping task and the
analysis of the mapped design’s properties. In addition,
a synthesis step could execute the mapping process
automatically.
Because this is a data-streaming application that
maps onto a highly concurrent architecture, it is natur-
al to use a Kahn process networks (KPN) representa-
tion. In KPN, a set of processes communicate through
one-way FIFO channels. Reads from channels are
blocked when no tokens are present; processes cannot
query the channel status. However, this model is Turing
complete, so scheduling and buffer size are undecid-
able. The KPN model of the JPEG encoder algorithm is
completely independent of the target architecture sat-
Electronic System-Level Design
368 IEEE Design & Test of Computers
Table 9. Tools in metabin FP: Industrial.
Provider Tools Focus Abstraction Web site
MathWorks Simulink, Modeling, algorithm design, and S: Timed dataflow, http://www.mathworks.com
State Flow software development FSMs
Table 10. Tools in metabin FP: Languages.
Provider Tools Focus Abstraction Web site
Open SystemC Provide hardware-oriented S: Transaction level http://www.systemc.org
SystemC constructs within the to RTL
Initiative context of C++
Object Unified Specify, visualize, and document S: Object-oriented, http://www.uml.org
Management Modeling software system models diagrams
Group Language
Accellera SystemVerilog Hardware description and verification S: Transaction level, http://www.systemverilog.org
language extension of Verilog RTL, assertions
Table 11. Tools in metabin FM: Industrial.
Provider Tools Focus Abstraction Web site
Celoxica DK Design Suite, Algorithmic design entry, behavioral C: Handel-C based http://www.celoxica.com
Agility Compiler, design, simulation, and synthesis
Nexus-PDK
BlueSpec BlueSpec BlueSpec SystemVerilog rules S: SystemVerilog and http://www.bluespec.com
Compiler, and libraries term-rewriting
BlueSpec synthesis
Simulator
I-Logix Rhapsody and Real-time UML-embedded S: UML based http://www.ilogix.com
Statemate applications
Mentor Catapult C C++ to RTL synthesis C: Untimed C++ http://www.mentor.com
Graphics
Esterel SCADE, Esterel, Code generation for safety-critical I: Synchronous http://www.esterel-technologies.com
Technologies Studio applications such as avionics and
automotive
Calypto SLEC System Functional verification between C: SystemC, RTL http://www.calypto.com
system level and RTL
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
isfying the requirements for this scenario. We could use
Ptolemy II to capture this model and simulate the select-
ed algorithm’s behavior.
To allow a better analysis and to refine the model
toward implementation, we can map this model into
another dataflow model, similar to cyclostatic dataflow,8
which permits only one writer per channel but allows
multiple reader processes. For all channels, each reader
process can read each data token exactly once. Also, this
dataflow model allows limited forms of data-dependent
communication. To enable the execution of multiple
processes on a single processing element, this MoC sup-
ports multitasking. In particular, the system may suspend
a process only between firings. Because of the limitations
just discussed, this MoC lets designers decide scheduling,
buffer sizing, and mapping. It is easy to express the model
in Ptolemy II and to describe it in Simulink or the Signal
Processing Worksystem (SPW). This first step—mapping
a more flexible model for the functionality into a more
restricted one that is easier to implement and analyze—
is critical in any system-level design.
Subsequently, the mapped specification becomes
the functional representation for the diagram in Figure
1. So, the flow can continue at lower abstraction levels
with tools in metabin FM for an integrated solution, or
in bin F followed by M for a multitool solution. Because
most of the architecture is fixed, an efficient, special-
ized approach is more appropriate. Figure 2a shows a
369September–October 2006
Table 12. Tools in metabin PM: Industrial, set I.
Provider Tools Focus Abstraction Web site
ARM RealView Embedded microprocessors and C: C++ ARM http://www.arm.com
MaxSim development tools; system-level processor
development tools development
Tensilica Xtensa, XPRES Programmable solutions with C: Custom ISA http://www.tensilica.com
specialized Xtensa processor processor, C and
description from native C and C++ code
C++ code
Summit System Architect, Efficiently design and analyze the C: SystemC http://www.sd.com
Visual Elite architecture and implementation
of multicore SoCs and
large-scale systems
VaST Comet, Meteor Very high-performance processor S: Virtual processor, http://www.vastsystems.com
Systems and architecture models bus, and peripheral
Technology devices
Virtio Virtio Virtual High-performance software model I: Virtual platform http://www.virtio.com
Platform of a complete system models at
SystemC level
Cadence Incisive Integrated tool platform for S: RTL and SystemC http://www.cadence.com
verification, including simulation, assertions
formal methods, and emulation
Mentor Platform Express XML-based integration environment C: XML-based http://www.mentor.com
structure
SpiraTech Cohesive Protocol abstraction transformers C: Transaction level, http://www.spiratech.com
IP blocks
ARC ARC Embedded microprocessors and I: ISA extensions, http://www.arc.com
International development tools microarchitectural
level
Arithmatica CellMath Proprietary improvements for I: Microarchitectural http://www.arithmatica.com
Tool Suite implementing silicon computational datapath
units computation elements
and design
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
potential traversal of the framework. For our JPEG case,
we can map the functionality onto the MXP5800 using
the Metropolis environment to analyze potential prob-
lems with the architecture or to optimize the applica-
tion’s coding for the chosen platform instance.
Scenario 2: New integration platformdevelopment
This scenario describes the development of a new
integration platform: a hardware architecture, embed-
ded-software architecture, design methodologies
Electronic System-Level Design
370 IEEE Design & Test of Computers
Table 13. Tools in metabin PM: Industrial, set II.
Provider Tools Focus Abstraction Web site
Target Chess (compiler), Retargetable tool suite for I: Mapping of C code http://www.retarget.com
Compiler Checkers (ISS) developing, programming, and to processors written
Technologies verifying embedded IP cores in nML
Arteris Danube, Synthesis of NoC C: NoC dataflow http://www.arteris.net
NoCexplorer
ChipVision Orinoco Pre-RTL power prediction for C: SystemC http://www.chipvision.com
Design behavioral synthesis algorithm input
Systems
Wind River Various Provide various platforms for I: Software API http://www.windriver.com
Systems platform different design segments
solutions (auto, consumer)
CoWare ConvergenSC Capture, design, and verification S: SystemC http://www.coware.com
for SystemC functionality input;
SystemC, HDL
services
Carbon VSP Presilicon validation flow C: Verilog and VHDL, http://www.carbondesignsystems.com
Design bus protocols
Systems
GigaScale IC InCyte Chip estimation and architecture S: High-level chip http://www.chipestimate.com
analysis information (gate
count, I/O, IP blocks)
Virtutech Virtutech Simics Build, modify, and I: C language http://www.virtutech.com
program new virtual and ISAs
systems
National LabView 8 FPGA Create custom I/O and control C: LabView graphical http://www.ni.com/fpga
Instruments hardware for FPGAs programming
CoWare LisaTek Embedded-processor C: Lisa architecture http://www.coware.com
design tool suite description language
Table 14. Tools in metabin PM: Academic.
Provider Tools Focus Abstraction Web site
Carnegie MESH Enable heterogeneous microdesign C: C input; http://www.ece.cmu.edu/~mesh
Mellon Univ. through new simulation, programmable,
modeling, and design strategies heterogeneous
multiprocessors
Univ. of xPilot Automatically synthesize high-level C: C, SystemC http://cadlab.cs.ucla.edu/soc
California, behavioral descriptions for silicon
Los Angeles platforms
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
(authoring and integration), design guidelines and
modeling standards, virtual-components characteri-
zation and support, and design verification (hardware-
software, hardware prototype), focusing on a
particular target application.9 Unlike the first scenario,
this one is not concerned with the design of a particu-
lar application but rather with the development of a
substrate to realize several applications. Characteristic
of this scenario is the service- and mapping-centric
requirements that concern tools in metabin PM for
development and analysis at the desired abstraction
level. The platform developer builds the substrate, or
platform, and uses the tools in metabin PM. The plat-
form user proceeds in metabin FM to map the desired
371September–October 2006
Table 15. Tools in metabin FPM: Industrial.
Provider Tools Focus Abstraction Web site
CoFluent CoFluent Studio Design space exploration through S: Transaction-level http://www.cofluentdesign.com
Design Y-chart modeling of functional SystemC
and architectural models
MLDesign MLDesigner Integrated platform for modeling S: Discrete event, http://www.mldesigner.com
Technologies and analyzing the architecture, dynamic dataflow,
function, and performance of and synchronous
high-level system designs dataflow
Mirabilis VisualSim Multidomain simulation kernel and S: Discrete event, http://www.mirabilisdesign.com
Design product family extensive modeling library synchronous
dataflow,
continuous time,
and FSM
Synopsys System Studio Algorithm and architecture capture, S: SystemC http://www.synopsys.com
performance evaluation
Table 16. Tools in metabin FPM: Academic.
Provider Tools Focus Abstraction Web site
Univ. of Metropolis Operational and denotational S: All MoCs http://www.gigascale.org/metropolis
California, functionality and architecture
Berkeley capture, mapping, refinement,
and verification
Seoul Peace Codesign environment for rapid S: Objected-oriented http://peace.snu.ac.kr
National development of heterogeneous C++ kernel (Ptolemy
Univ. digital systems based)
Vanderbilt GME, Great, Metaprogrammable tool for S: Graph http://repo.isis.vanderbilt.edu
Univ. Desert navigating and pruning large transformation, UML
design spaces and XML based,
and external
component support
Delft Univ. Artemis, Workbench enabling methods and C: Kahn process http://ce.et.tudelft.nl/artemis
of Compaan and tools to model applications and networks
Technology Laura, Sesame, SoC-based architectures
Spade
Univ. of Mescal Programming of application-specific S: Extended Ptolemy II, http://www.gigascale.org/mescal
California, programmable platforms network processors
Berkeley
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
functionality to the selected platform instance. Figure
2b illustrates the metabin flows that support these
development requirements.
Consider as a test case the development of a new
electronic control unit (ECU) platform for an automo-
tive engine controller. The application designers have
already developed the application code for the plat-
form, but a Tier 1 supplier wants to improve the cost and
performance of its part of the platform to avoid losing
an important original equipment manufacturer (OEM)
customer. If the designers employ the paradigm
described in this article, the application becomes as
independent on the ECU platform as possible. Next, in
collaboration with a Tier 2 supplier (a chip maker), the
Tier 1 supplier determines qualitatively that a dual-core
architecture would offer better performance at a lower
manufacturing cost. A platform designer then uses a
tool for platform development, such as LisaTek, to cap-
ture the dual-core architecture. If the dual core is based
on ARM processing elements, the designers and the Tier
1 supplier can also use ARM models and tool chains. An
appropriate new real-time operating system could
exploit the implementation’s multicore nature. At this
point, the designers map the application onto one of the
possible dual-core architectures, considering the num-
ber of bits supported by the CPU, the set of peripherals
to integrate, and the interconnect structure. For each
choice, the designers simulate the mapped design with
the engine control software or a subset of it to stress the
architecture. These simulations can employ the ARM
tools or VaST offerings to rapidly obtain important sta-
tistics such as interconnect latency and bandwidth,
overall system performance, and power consumption.
At the end of this exercise, the Tier 2 supplier is fairly
confident that its architecture is capable of supporting a
full-fledged engine control algorithm. Any other Tier 1
supplier can use this product now for its engine control
offering.
Electronic System-Level Design
372 IEEE Design & Test of Computers
F1
Functionality
PlatformFunctionality
Mapping
F0 P0
Step 1.PM-basedtool
Step 2. FM(augmentedfunctionality)
Option 1.P-tool at
appropriateabstraction
level
Functionality Platform Platform
Mapping
Mapping
F0 P0
M0 M0
F0 P0
M0
F1F1 P1
P1P1
M1 M1
M1 Mapping
Step 1.FPM-basedtool
Option 1a. F(multitools)
Option 2.FP with synthesisto lower-levelflows
Option 1b. M(multitools)
Option 2. FM(integrated tools)
(Functionality
Functionality
PlatformFunctionality
Platform
Mapping
Mapping
Platform
(a) (b) (c)
Figure 2. Metabins and hierarchical levels for three design scenarios: new application design from specification
(a), new integration platform development (b), and legacy design integration (c).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
Scenario 3: Legacy design integrationThe final scenario represents a common situation
for many companies wishing to integrate their exist-
ing designs into new ESL flows. In this case, it’s diffi-
cult to separate functionality and architecture,
because in most embedded systems the documenta-
tion refers to the final implementation, not to its orig-
inal specifications and the relative implementation
choices. If modifying the design is necessary to imple-
ment additional features, it’s very difficult to deter-
mine how the new functionality will affect the existing
design. This situation calls for reverse engineering to
extract functionality from the final implementation.
The most effective way to do this might be to start the
description of the functionality from scratch, using
tools in bin F. An alternative might be an effective
encapsulation of the legacy part of the design so that
the new part interacts cleanly with the legacy part. We
could then consider existing components as archi-
tectural elements that we must describe using tools in
bin P. This, in turn, is possible at different abstraction
levels. Because legacy components typically support
a specific application, mapping is often unnecessary,
and functional or architectural cosimulation can val-
idate a new design. Metabin FP at the system level is
therefore the appropriate flow model in this case.
Figure 2c illustrates this scenario.
ESL WILL EVENTUALLY BE in the limelight of the
design arena. But structural conditions in the EDA
and electronics industry must change to offer a suffi-
ciently receptive environment that will allow the
birth of new companies and the evolution of present
ones into this exciting area. An important technical
prerequisite is industry and academia agreement on
a holistic view of the design process in which to cast
existing and future tools and flows. Our unified
design framework can act as a unifying element in
the ESL domain. However, standardization of system-
level design will take years and require significant
effort to fully materialize. ■
AcknowledgmentsWe thank the following for their support in reviewing
this article and in helping to classify the various ESL
approaches. Without them, this article would not have
been possible: Abhijit Davare, Alessandro Pinto, Alvise
Bonivento, Cong Liu, Gerald Wang, Haibo Zeng, Jike
Chong, Kaushik Ravindran, Kelvin Lwin, Mark McKelvin,
N.R. Satish, Qi Zhu, Simone Gambini, Wei Zheng, Will
Plishker, Yang Yang, and Yanmei Li. A special thanks
goes to Guang Yang and Trevor Meyerowitz for their valu-
able feedback. This work was done under partial sup-
port from the Center for Hybrid Embedded Software
Systems and the Gigascale Systems Research Center.
References1. G. Smith et al., Report on Worldwide EDA Market
Trends, Gartner Dataquest, Dec. 2005.
2. J. Vleeschouwer and W. Ho, “The State of EDA: Just
Slightly up for the Year to Date Technical and Design
Software,” The State of the Industry, Merrill Lynch report,
Dec. 2005.
3. International Technology Roadmap for Semiconductors
2004 Update: Design, 2004, http://www.itrs.net/Links/
2004Update/2004_01_Design.pdf.
4. A. Sangiovanni-Vincentelli, “Defining Platform-Based
Design,” EE Times, Feb. 2002, http://www.eetimes.com/
news/design/showArticle.jhtml?articleID=16504380.
5. A. Sangiovanni-Vincentelli and G. Martin, “Platform-
Based Design and Software Design Methodology for
Embedded Systems,” IEEE Design & Test, vol. 18, no. 6,
Nov.-Dec. 2001, pp. 23-33.
6. D.D. Gajski and R.H. Kuhn, “Guest Editors’ Introduction:
New VLSI Tools,” Computer, vol. 16, no. 12, Dec. 1983,
pp. 11-14.
7. A. Davare et al., “JPEG Encoding on the Intel
MXP5800: A Platform-Based Design Case Study,” Proc.
3rd Workshop Embedded Systems for Real-Time Multi-
media (ESTIMedia 05), IEEE CS Press, 2005, pp. 89-94.
8. G. Bilsen et al., “Cyclo-Static Dataflow,” IEEE Trans. Sig-
nal Processing, vol. 44, no. 2, Feb. 1996, pp. 397-408.
9. H. Chang et al., Surviving the SOC Revolution: A Guide
to Platform-Based Design, Kluwer Academic Publishers,
1999.
Douglas Densmore is a PhD can-didate in the Department of ElectricalEngineering and Computer Sciencesat the University of California, Berke-ley. His research interests focus on
system-level architecture modeling, with emphasis onarchitecture refinement techniques for system-leveldesign. Densmore has a BS in computer engineeringfrom the University of Michigan, Ann Arbor, and an MSin electrical engineering from the University of Califor-nia, Berkeley. He is a member of the IEEE.
373September–October 2006
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
Roberto Passerone is an assistantprofessor in the Department of Infor-mation and Communication Technolo-gy at the University of Trento, Italy. Hisresearch interests include system-
level design, communication design, and hybrid sys-tems. Passerone has a Laurea degree in electricalengineering from Politecnico di Torino, Italy, and anMS and a PhD in electrical engineering and computersciences from the University of California, Berkeley. Heis a member of the IEEE.
Alberto Sangiovanni-Vincentelliholds the Buttner Endowed Chair of theElectrical Engineering and ComputerSciences Department at the Universityof California, Berkeley. His research
interests include design tools and methodologies, large-
scale systems, embedded controllers, and hybrid sys-tems. Sangiovanni-Vincentelli has a PhD in engineeringfrom Politecnico di Milano. He is cofounder of Cadenceand Synopsys, an IEEE Fellow, a member of the Gener-al Motors Scientific and Technology Advisory Board,and a member of the National Academy of Engineering.
Direct questions or comments about this article toDouglas Densmore, Dept. of Electrical Engineeringand Computer Sciences, Univ. of California, Berkeley,545Q Cory Hall (DOP Center), Berkeley, CA 94720;[email protected].
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
Electronic System-Level Design
374 IEEE Design & Test of Computers
DON’T RUN THE RISK.BE SECURE.
Ensure that your networks operate safely and provide critical services even in the face of attacks. Develop lasting security solutions, with this
peer-reviewed publication.
Top security professionals in the field share information you can rely on:
Wireless Security • Securing the Enterprise • Designing for Security Infrastructure Security • Privacy Issues • Legal Issues • Cybercrime • Digital Rights Management
• Intellectual Property Protection and Piracy • The Security Profession • Education
Order your subscription today.
Submit an article to IEEE Security & Privacy. Log onto Manuscript Central at http://cs-ieee.manuscriptcentral.com/.
www.computer.org/security/
BE SECURE.DON’T RUN THE RISK.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:34 UTC from IEEE Xplore. Restrictions apply.
0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006 375
THE MAIN REASON people have proposed C-like lan-
guages for hardware synthesis is familiarity. Proponents
claim that by synthesizing hardware from C, we can effec-
tively turn every C programmer into a hardware design-
er. Another common motivation is hardware-software
codesign: Designers often implement today’s systems as a
mix of hardware and software, and it’s often unclear at
the outset which portions can be hardware and which
can be software. The claim is that using a single language
for both simplifies the migration task.
I argue that these claims are questionable and that
pure C is a poor choice for specifying hardware. On the
contrary, the semantics of C and similar imperative lan-
guages are distant enough from hardware that C-like
thinking might be detrimental to hardware design.
Instead, successful hardware synthesis from C seems to
involve languages that vaguely resemble C, mostly its
syntax. Examples of these languages include Celoxica’s
Handel-C1 and NEC’s Behavior Description Language
(BDL).2 You can think of executing C code on a tradi-
tional sequential processor as synthesizing hardware
from C, but the techniques presented here strive for
more highly customized implementations that exploit
greater parallelism, hardware’s main advantage.
Unfortunately, the C language has no support for user-
specified parallelism, and so either the synthesis tool
must find it (a difficult task) or the
designer must use language extensions
and insert explicit parallelism. Neither
solution is satisfactory, and the latter
requires that C programmers think dif-
ferently to design hardware.
My main point is that giving C pro-
grammers tools is not enough to turn
them into reasonable hardware design-
ers. Efficient hardware is usually very difficult to describe
in an unmodified C-like language, because the language
inhibits specification or automatic inference of adequate
concurrency, timing, types, and communication. The
most successful C-like languages, in fact, bear little
semantic resemblance to C, effectively forcing users to
learn a new language (but perhaps not a new syntax).
As a result, techniques for synthesizing hardware from
C either generate inefficient hardware or propose a lan-
guage that merely adopts part of C syntax.
Here, I focus only on the use of C-like languages for
hardware synthesis and deliberately omit discussion of
other important uses of a design language, such as vali-
dation and algorithm exploration. C-like languages are far
more compelling for these tasks, and one in particular,
SystemC, is now widely used, as are many ad hoc variants.
A short history of CDennis Ritchie developed C in the early 1970s,3
based on experience with Ken Thompson’s B language,
which had evolved from Martin Richards’ Basic
Combined Programming Language (BCPL). Ritchie
described all three as “close to the machine” in the
sense that their abstractions are similar to data types and
operations supplied by conventional processors.
A core principle of BCPL is its memory model: an
The Challenges ofSynthesizing Hardware fromC-Like Languages
Editor’s note:This article presents one side of an ongoing debate on the appropriatenessof C-like languages as hardware description languages. The article examinesvarious features of C and their mapping to hardware, and makes a cogentargument that vanilla C is not the right language for hardware description ifsynthesis is the goal.
—Sandeep K. Shukla, Virginia Polytechnic and State University
Stephen A. Edwards
Columbia University
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
undifferentiated array of words. BCPL represents inte-
gers, pointers, and characters all in a single word; the
language is effectively typeless. This made perfect sense
on the word-addressed machines BCPL was targeting,
but it wasn’t acceptable for the byte-addressed PDP-11
on which C was first developed.
Ritchie modified BCPL’s word array model to add the
familiar character, integer, and floating-point types now
supported by virtually every general-purpose processor.
Ritchie considered C’s treatment of arrays to be charac-
teristic of the language. Unlike other languages that have
explicit array types, arrays in C are almost a side effect
of its pointer semantics. Although this model leads to
simple, efficient implementations, Ritchie observed that
the prevalence of pointers in C means that compilers
must use careful dataflow techniques to avoid aliasing
problems while applying optimizations.
Ritchie listed a number of infelicities in the language
caused by historical accident. For example, the use of
break to separate cases in switch statements arose
because Ritchie copied an early version of BCPL; later
versions used endcase. The precedence of bitwise-AND
is lower than the equality operator because the logical-
AND operator was added later.
Many aspects of C are greatly simplified from their
BCPL counterparts because of limited memory on the
PDP-11 (24 Kbytes, of which 12 Kbytes were devoted to
the nascent Unix kernel). For example, BCPL allowed
the embedding of arbitrary control flow statements with-
in expressions. This facility doesn’t exist in C, because
limited memory demanded a one-pass compiler.
Thus, C has at least four defining characteristics: a set
of types that correspond to what the processor directly
manipulates, pointers instead of a first-class array type,
several language constructs that are historical accidents,
and many others that are due to memory restrictions.
These characteristics are well-suited to systems soft-
ware programming, C’s original application. C compil-
ers have always produced efficient code because the C
semantics closely match the instruction set of most gen-
eral-purpose processors. This also makes it easy to
understand the compilation process. Programmers rou-
tinely use this knowledge to restructure source code for
efficiency. Moreover, C’s type system, while generally
very helpful, is easily subverted when needed for low-
level access to hardware.
These characteristics are troublesome for synthesiz-
ing hardware from C. Variable-width integers are natur-
al in hardware, yet C supports only four sizes, all larger
than a byte. C’s memory model is a large, undifferenti-
ated array of bytes, yet hardware is most effective with
many small, varied memories. Finally, modern compil-
ers can assume that available memory is easily 10,000
times larger than that available to Ritchie.
C-like hardware synthesis languagesTable 1 lists some of the C-like hardware languages
proposed since the late 1980s (see also De Micheli4).
One of the earliest was Cones, from Stroud et al.5 From
a strict subset of C, it synthesized single functions into
combinational blocks. Figure 1 shows such a function.
Cones could handle conditionals; loops, which it
unrolled; and arrays treated as bit vectors.
Ku and De Micheli developed HardwareC6 for input
to their Olympus synthesis system.7 It is a behavioral
hardware language with a C-like syntax and has exten-
sive support for hardware-like structure and hierarchy.
Electronic System-Level Design
376 IEEE Design & Test of Computers
Performance or bustThroughout this article, I assume that optimizing perfor-
mance—for example, speed under area and power con-straints—is the main goal of hardware synthesis (beyond, ofcourse, functional correctness). This assumption implicitlyshapes all my criticisms of using C for hardware synthesis andshould definitely be considered carefully.
On the one hand, performance optimization has obviouseconomic advantages: An efficient circuit solves problemsfaster, is cheaper to manufacture, requires less power, and soforth. Historically, this has been the key focus of logic synthe-sis, high-level synthesis, and other automated techniques forgenerating circuits.
On the other hand, optimization can have disadvantagessuch as design time and nonrecurring engineering costs. Thedistinction between full-custom ICs and ASICs illustrates this.A company like Intel, for example, is willing to invest an enor-mous number of hours in designing and hand-optimizing itsnext microprocessor’s layout because of the volume and mar-gins the company commands. A company like Cisco, howev-er, might implement its latest high-end router on an FPGAbecause it doesn’t make economic sense to design a com-pletely new chip. Both approaches are reasonable.
A key question, then, is: What class of problems does hard-ware synthesis from C really target? This article assumes anaudience of traditional hardware designers who want to designhardware more quickly, but other articles target designers whowould otherwise implement their designs in software but needfaster results. The soundness of my conclusions may welldepend on which side of this fence you’re on.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
Figure 2 shows the greatest common divisor (GCD) algo-
rithm in HardwareC.
Galloway’s Transmogrifier C is a fairly small C subset
that supports integer arithmetic, conditionals, and loops.8
Unlike Cones, it generates sequential designs by inferring
a state at function calls and at the beginning of while
loops. Figure 3 shows a decoder in Transmogrifier C.
377September–October 2006
Table 1. C-like languages for hardware synthesis.
Language Comment
Cones Early, combinational only
HardwareC Behavioral synthesis centered
Transmogrifier C Limited scope
SystemC Verilog in C++
Ocapi Algorithmic structural descriptions
C2Verilog Comprehensive
BDL Many extensions and restrictions (NEC)
Handel-C C with CSP (Celoxica)
SpecC Resolutely refinement based
Bach C Untimed semantics (Sharp)
CASH Synthesizes asynchronous circuits
Catapult C ANSI C++ subset (Mentor Graphics)
INPUTS: IN[5];OUTPUT: OUT[3];rd53(){
int count, i;count = 0;for (i=0 ; i<5 ; i++)if (IN[i] == 1)count = count + 1;
for (i=0 ; i<3 ; i++) {OUT[i] = count & 0x01;count = count >> 1;
}}
Figure 1. A function that returns a count of the
number of 1’s in a five-bit vector in Cones. The
function is translated into a combinational
circuit.
#define SIZE 8process gcd (xi, yi, rst, ou)
in port xi[SIZE], yi[SIZE];in port rst;out port ou[SIZE];
{boolean x[SIZE], y[SIZE];
write ou = 0;if ( rst ) <x = read(xi);y = read(yi);
>
if ((x != 0) & (y != 0))repeat {while (x >= y)x = x – y;
<x = y; /* swap x and y */y = x;
>} until (y == 0);
elsex = 0;
write ou = x;}
Figure 2. Greatest common divisor algorithm in
HardwareC. Statements within a < > block run in
parallel; statements within a { } block execute in
parallel when data dependencies allow.
#pragma intbits 8seven_seg(x)#pragma intbits 4int x;{#pragma intbits 8
int result;x = x & 0xf; result = 0;if (x == 0x0) result = 0xfc; if (x == 0x1) result = 0x60;if (x == 0x2) result = 0xda; if (x == 0x3) result = 0xf2;if (x == 0x4) result = 0x66; if (x == 0x5) result = 0xb6;if (x == 0x6) result = 0xbe; if (x == 0x7) result = 0xe0;if (x == 0x8) result = 0xfe; if (x == 0x9) result = 0xf6;return(~result);
}
twodigit(y)int y;{
int tens;int leftdigit, rightdigit;outputport(leftdigit, 37, 44, 40, 29, 35, 36, 38, 39);
outputport(rightdigit, 41, 51, 50, 45, 46, 47, 48, 49);
tens = 0;while (y >= 10) {tens++;y –= 10;
}leftdigit = seven_seg(tens);rightdigit = seven_seg(y);
}
Figure 3. Two-digit decimal-to-seven-segment
decoder in Transmogrifier C. Output-port
declarations assign pin numbers.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
SystemC is a C++ dialect that supports hardware and
system modeling.9 Its popularity stems mainly from its
simulation facilities (it provides concurrency with light-
weight threads), but a subset of the language can be syn-
thesized. SystemC uses the C++ class mechanism to
model hierarchical structure and describes hardware
through combinational and sequential processes, much
as Verilog and VHDL do. Cynlib, from Forte Design
Systems, is similar. Figure 4 shows a decoder in SystemC.
The Ocapi system from IMEC (the Interuniversity
Microelectronics Center in Belgium) is also C++ based
but takes a different approach.10 Instead of being
parsed, analyzed, and synthesized, the C++ program is
run to generate in-memory data structures that repre-
sent the hardware system’s structure. Supplied classes
provide mechanisms for specifying data paths, finite-
state machines (FSMs), and similar constructs. These
data structures are then translated into languages such
as Verilog and passed to conventional synthesis tools.
Figure 5 shows an FSM in Ocapi.
The C2Verilog compiler developed at CompiLogic
(later called C Level Design and, since November 2001,
part of Synopsys) is one of the few compilers that can
claim broad support of ANSI C. It can translate pointers,
recursion, dynamic memory allocation, and other
thorny C constructs. Panchul, Soderman, and Coleman
hold a broad patent covering C-to-Verilog-like transla-
tion, which describes their compiler in detail.11
NEC’s Cyber system accepts BDL.2 Like HardwareC,
Cyber is targeted at behavioral synthesis. BDL has been
in industrial use for many years and deviates greatly
from ANSI C by including processes with I/O ports, hard-
ware-specific types and operations, explicit clock
cycles, and many synthesis-related pragmas.
Celoxica’s Handel-C is a C variant that extends the
language with constructs for parallel statements and
Occam-like rendezvous communication.1 Handel-C’s
timing model is uniquely simple: Each assignment state-
ment takes one cycle. Figure 6 shows a four-place buffer
in Handel-C.
Gajski et al.’s SpecC language12 is a superset of ANSI C,
augmented with many system- and hardware-modeling
constructs, including constructs for FSMs, concurrency,
pipelining, and structure. The latest language reference
manual lists 33 new keywords.13 SpecC imposes a refine-
ment methodology. Thus, the entire language is not direct-
ly synthesizable, but a series of manual and automated
rewrites can refine a SpecC description into one that can
be synthesized. Figure 7 shows a state machine described
in a synthesizable RTL dialect of SpecC.
Electronic System-Level Design
378 IEEE Design & Test of Computers
#include “systemc.h”#include <stdio.h>
struct decoder : sc_module {sc_in<sc_uint<4> > number;sc_out<sc_bv<7> > segments;
void compute() {static sc_bv<7> codes[10] = {0x7e, 0x30, 0x6d, 0x79, 0x33,0x5b, 0x5f, 0x70, 0x7f, 0x7b };
if (number.read() < 10)segments = codes[number.read()];
}
SC_CTOR(decoder) {SC_METHOD(compute);sensitive << number;
}};
struct counter : sc_module {sc_out<sc_uint<4> > tens;sc_out<sc_uint<4> > ones;sc_in_clk clk;
void tick() {int one = 0, ten = 0;for (;;) {if (++one == 10) {one = 0;if (++ten == 10) ten = 0;
}ones = one;tens = ten;wait();
}}
SC_CTOR(counter) {SC_CTHREAD(tick, clk.pos());
}};
Figure 4. A two-digit, decimal-to-seven-segment
decoder in SystemC. The decoder produces
combinational logic; the counter produces
sequential logic.
S0
S1
/sfg 1!eof/sfg 3
eof/sfg 2
fsm f;initials0;states1;
s0 << always << sfg1 << s1;
s1 << cnd(eof) << sfg2 << s1;
s1 << !cnd(eof)<< sfg3 << s0;
Figure 5. FSM described in Ocapi. This is a declarative style
executed to build data structures for synthesis rather than
compiled in the traditional sense.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
Like Handel-C, Sharp’s Bach C is an ANSI C variant
with explicit concurrency and rendezvous communi-
cation.14 However, Bach C only imposes sequencing
rather than assigning a particular number of cycles to
each operation. Also, although it supports arrays, Bach
C does not support pointers.
Budiu and Goldstein’s CASH compiler is unique
among the C synthesizers because it generates asyn-
chronous hardware.15 It accepts ANSI C, identifies
instruction-level parallelism (ILP), and generates an
asynchronous dataflow circuit.
Mentor Graphics’ recent (2004) Catapult C performs
behavioral synthesis from an ANSI C++ subset. Because
it is a commercial product, details of its features and lim-
itations are not publicly available. However, it appears
to be a strict subset of ANSI C++ (that is, with few, if any,
language extensions).
ConcurrencyThe biggest difference between hardware and soft-
ware is its execution model. Software follows a sequen-
tial, memory-based execution model derived from
Turing machines, whereas hardware is fundamentally
concurrent. Thus, sequential algorithms that are effi-
cient in software are rarely the best choice in hardware.
This has serious implications for software programmers
designing hardware—their familiar toolkit of algorithms
is suddenly far less useful.
Why is so little software developed for parallel hard-
ware? The plummeting cost of parallel hardware
would make such software appear attractive, yet
concurrent programming has had limited success
compared with its sequential counterpart. One funda-
mental reason is that humans have difficulty conceiv-
ing of parallel algorithms, and thus many more
sequential algorithms exist. Another problem is dis-
agreement about the preferred parallel-programming
379September–October 2006
behavior even(in event clk, in unsigned bit[1] rst,in bit[31:0] Inport, out bit[31:0] Outport,in bit[1] Start, out bit[1] Done,out bit[31:0] idata, in bit[31:0] iocount,out bit[1] istart, in bit[1] idone,in bit[1] ack_istart, out bit[1] ack_idone)
{void main(void) {bit[31:0] ocount;bit[31:0] mask;enum state { S0, S1, S2, S3 } state;
state = S0;
while (1) {wait(clk);if (rst == 1b) state = S0;switch (state) {case S0:Done = 0b;istart = 0b;ack_idone = 0b;if (Start == 1b) state = S1;else state = S0;break;
case S1:mask = 0x0001;idata = Inport;istart = 1b;if (ack_istart == 1b)
state = S2;else state = S1;break;
case S2:istart = 0b;ocount = iocount;if (idone == 1b) state = S3;else state = S2;break;
case S3:Outport = ocount & mask;ack_idone = 1b;Done = 1b;if (idone == 0) state = S0;else state = S3;break;
}}
}};
Figure 7. State machine in a synthesizable RTL
dialect of SpecC. The wait(clk) statement
denotes a clock cycle boundary.
Figure 6. Four-place buffer in Handel-C. The ? and ! operators
are CSP-inspired receive and transmit operators.
const dw = 8;
void main(chan (in) c4 : dw, chan (out) c0 : dw){
int d0, d1, d2, d3;chan c1, c2, c3;
void e0() { while (1) { c1 ? d0; c0 ! d0; } }void e1() { while (1) { c2 ? d1; c1 ! d1; } }void e2() { while (1) { c3 ? d2; c2 ! d2; } }void e3() { while (1) { c4 ? d3; c3 ! d3; } }
par {e0(); e1(); e2(); e3();
}}
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
model (for example, shared memory versus message
passing), as demonstrated by the panoply of parallel-
programming languages, none of which has emerged
as a clear winner.
Rather than exposing concurrency to the program-
mer and encouraging the use of parallel algorithms, the
more successful approach has been to automatically
expose parallelism in sequential code. Because C does
not naturally support user-specified concurrency, such
a technique is virtually mandatory for synthesizing effi-
cient hardware from plain C. Unfortunately, these tech-
niques are limited.
Finding parallelism in sequential codeThere are three main approaches to exposing paral-
lelism in sequential code, distinguished by their granu-
larity. Instruction-level parallelism (ILP) dispatches
groups of nearby instructions simultaneously. Although
this has become the preferred approach in the com-
puter architecture community, programmers recognize
that there are fundamental limits to the amount of ILP
that can be exposed in typical programs.16 Adding hard-
ware to approach these limits, usually through specu-
lation, results in diminishing returns.
The second approach, pipelining, requires less hard-
ware than ILP but can be less effective. A pipeline dis-
patches instructions in sequence but overlaps
them—the second instruction starts before the first com-
pletes. Like ILP, interinstruction dependencies and con-
trol-flow transfers tend to limit the maximum amount of
achievable parallelism. Pipelines work well for regular
loops, such as those in scientific or signal-processing
applications, but are less effective in general.
The third approach, process-level parallelism, dis-
patches multiple threads of control simultaneously. This
approach can be more effective than finer-grained par-
allelism, depending on the algorithm, but process-level
parallelism is difficult to identify automatically. Hall et
al. attempt to invoke multiple iterations of outer loops
simultaneously,17 but unless the code is written to avoid
dependencies, this technique might not be effective.
Exposing process-level parallelism is thus usually the pro-
grammer’s responsibility. Such parallelism is usually con-
trolled through the operating system (for example, Posix
threads) or the language itself (for example, Java).
Approaches to concurrencyThe C-to-hardware compilers considered here take
either of two approaches to concurrency. The first
approach adds parallel constructs to the language,
thereby forcing the programmer to expose most of the
concurrency. SystemC, BDL, and Ocapi all provide
process-level parallel constructs. HardwareC, Handel-
C, SpecC, and Bach C additionally provide statement-
level parallel constructs. SystemC’s parallelism
resembles that of standard hardware description lan-
guages (HDLs) such as Verilog, in which a system is a
collection of clock-edge-triggered processes. Hard-
wareC, Handel-C, SpecC, and Bach C’s approaches are
more like software, providing constructs that dispatch
collections of instructions in parallel.
The other approach lets the compiler identify paral-
lelism. Although the languages that provide parallel
constructs also identify some parallelism, Cones,
Transmogrifier C, C2Verilog, Catapult C, and CASH rely
on the compiler to expose all possible parallelism. The
Cones compiler takes the most extreme approach, flat-
tening an entire C function with loops and conditionals
into a single two-level combinational function evaluat-
ed in parallel. The CASH compiler takes an approach
closer to compilers for VLIW processors, carefully exam-
ining interinstruction dependencies and scheduling
instructions to maximize parallelism. None of these
compilers attempts to identify process-level parallelism.
Both approaches have drawbacks. The latter
approach places the burden on the compiler and there-
fore limits the parallelism achievable with normal,
sequential algorithms. Although carefully selecting eas-
ily parallelized algorithms could mitigate this problem,
such thinking is foreign to most software programmers
and may be more difficult than thinking in an explicitly
concurrent language.
The former approach, by adding parallel constructs
to C, introduces a fundamental and far-reaching change
to the language, again demanding substantially differ-
ent thinking by the programmer. Even for a programmer
experienced in concurrent programming with, say,
Posix threads, the parallel constructs in hardware-like
languages differ greatly from the thread-and-shared-
memory concurrency model typical of software.
A good hardware specification language must be
able to express parallel algorithms, because they are the
most efficient for hardware. Its inherent sequentiality
and often undisciplined use of pointers make C a poor
choice for this purpose.
Which concurrency model the next hardware design
language should employ remains an open question, but
the usual software model—asynchronously running
threads communicating through shared memory—is
clearly not the one.
Electronic System-Level Design
380 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
TimingThe C language is mute on the subject of time. It
guarantees causality among most sequences of state-
ments but says nothing about the amount of time it
takes to execute each sequence. This flexibility simpli-
fies life for compilers and programmers alike but makes
it difficult to achieve specific timing constraints. C’s
compilation technique is transparent enough to make
gross performance improvements easy to understand
and achieve, and differences in efficiency of sequential
algorithms is a well-studied problem. Nevertheless,
wringing another 5% speedup from an arbitrary piece
of code can be difficult.
Achieving a performance target is fundamental to
hardware design. Miss a timing constraint by a few per-
centage points and the circuit will fail to operate or the
product will fail to sell. Achieving a performance target
under power and cost constraints is usually the only rea-
son to implement a particular function in hardware
rather than using an off-the-shelf processor. Thus, an ade-
quate hardware specification technique needs mecha-
nisms for specifying and achieving timing constraints.
This disparity leads to yet another fundamental ques-
tion in using C-like languages for hardware design: where
to put the clock cycles. Figure 8 shows a program frag-
ment that is interpreted in at least three different ways by
different compilers. Most of the compilers described here
generate synchronous logic in which the clock cycle
boundaries have been defined. There are only two
exceptions: Cones and CASH. Cones only generates com-
binational logic; CASH generates self-timed logic.
Compilers use various techniques for inserting clock
cycle boundaries, which range from fully explicit to
fully implicit. Ocapi’s clocks are the most explicit. The
designer specifies explicit state machines, and each
state gets a cycle. At some point in the SpecC refine-
ment flow, the state machines are also explicit, although
clock boundaries might not be explicit earlier in the
flow. The clocks in the Cones system are also explicit,
but in an odd way—because Cones generates only
combinational logic, clocks are implicit at function
boundaries. SystemC’s clock boundaries are also explic-
it; as in Cones, the clock boundaries of combinational
processes are at the edges, and in sequential processes,
explicit wait statements delay a prescribed number of
cycles. BDL takes a similar approach.
HardwareC lets the user specify clock constraints, an
approach common in high-level synthesis tools. For
example, the user can require that three particular state-
ments should execute in two cycles. This presents a
greater challenge to the compiler and is sometimes more
subtle for the designer, but it allows flexibility that can
lead to a better design. Bach C takes a similar approach.
Like HardwareC, the C2Verilog compiler also inserts
cycles using fairly complex rules and provides mecha-
nisms for imposing timing constraints. Unlike HardwareC,
however, these constraints are outside the language.
Transmogrifier C and Handel-C use fixed implicit
rules for inserting clocks. Handel-C’s are the simplest:
Each assignment and delay statement takes one cycle;
everything else executes in the same clock cycle.
Transmogrifier C’s rules are nearly as simple: Each loop
iteration and function call takes a cycle. Unfortunately,
such simple rules can make it difficult to achieve a par-
ticular timing constraint. To speed up a Handel-C spec-
ification, assignment statements might require fusing,
and Transmogrifier C might require loops to be manu-
ally unrolled.
The ability to specify or constrain detailed timing
in hardware is another fundamental requirement.
Whereas slow software is an annoyance, slow hardware
is a disaster. When something happens in hardware is
usually as important as what happens. This is another
big philosophical difference between software and
hardware, and again hardware requires different skills.
A good hardware specification language needs the
ability to specify detailed timing, both explicitly and
through constraints, but should not demand the pro-
grammer to provide too many details. The best-effort
model of software is inadequate by itself.
TypesData types are another central difference between
hardware and software languages. The most fundamen-
tal type in hardware is a single bit traveling through a
memoryless wire. By contrast, each base type in C and
381September–October 2006
for (i = 0 ; i < 8 ; i++) {a[i] = c[i];b[i] = d[i] || f[i];
}
Figure 8. It is not clear how many cycles it should
take to execute this (contrived) loop written in C.
Cones does it in one (it is combinational),
Transmogrifier-C chooses eight (one per
iteration), and Handel-C chooses 25 (one per
assignment). Others, such as HardwareC, allow
the user to specify the number.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
C++ is one or more bytes stored in memory. Although C’s
base types can be implemented in hardware, C has
almost no support for types smaller than a byte. (The one
exception is that the number of bits for each field in a
struct can be specified explicitly. Oddly, none of these
languages even mimics this syntax.) As a result, straight
C code can easily be interpreted as bloated hardware.
Compilers take three approaches to introducing
hardware types to C programs. The first, and perhaps
the purest, neither modifies nor augments C’s types but
allows the compiler or designer to adjust the width of
the integer types outside the language. For example, the
C2Verilog compiler provides a GUI that lets the user set
the width of each variable in the program. In
Transmogrifier C, the user can set each integer’s width
through a preprocessor pragma.
The second approach is to add hardware types to the
C language. HardwareC, for instance, adds a Boolean
vector type. Handel-C, Bach C, and BDL add integers
with an explicit width. SpecC adds all these types and
many others that cannot be synthesized, such as pure
events and simulated time.
The third approach, used by C++-based languages,
is to provide hardware-like types through C++’s type sys-
tem. C++ supports a one-bit Boolean type by default,
and its class mechanism makes it possible to add more
types, such as arbitrary-width integers, to the language.
The SystemC libraries include variable-width integers
and an extensive collection of types for fixed-point frac-
tional numbers. Ocapi, because it is an algorithmic
mechanism for generating structure, also effectively
takes this approach, letting the user explicitly request
wires, buses, and so on. Catapult C presumably has a
similar library of hardware-like types.
Each approach, however, is a fairly radical departure
from C’s call-it-an-integer-and-forget-about-it approach.
Even the languages that support only C types compel a
user to provide each integer’s actual size. Worrying
about the width of each variable in a program is not
something a typical C programmer does.
Compared with timing and concurrency, however,
adding appropriate hardware types is a fairly easy prob-
lem to solve when adapting C to hardware. C++’s type sys-
tem is flexible enough to accommodate hardware types,
and minor extensions to C suffice. A larger question,
which none of the languages adequately addresses, is
how to apply higher-level types such as classes and inter-
faces to hardware description. SystemC has some facili-
ties for inheritance, but the inheritance mechanism is
simply the one used for software; it is not clear that this
mechanism is convenient for adding to or modifying the
behavior of existing hardware. Incidentally, SystemC has
supported more high-level modeling constructs such as
templates and more elaborate communication protocols
since version 2.0, but they are not typically synthesizable.
A good HDL needs a rich type system that allows pre-
cise definition of hardware types, but it should also
assist in ensuring program correctness. C++’s type sys-
tem is definitely an improvement over C’s in this regard.
CommunicationC-like languages are built on the very flexible RAM
communication model. They implicitly treat all memo-
ry locations as equally costly to access, but this is not
true in modern memory hierarchies. At any point, it can
take hundreds or even thousands of times longer to
access certain locations. Designers can often predict the
behavior of these memories, specifically caches, and
use them more efficiently. But doing so is very difficult,
and C-like languages provide scant support for it.
Long, nondeterministic communication delays are
anathema in hardware. Timing predictability is manda-
tory, so large, uniform-looking memory spaces are rarely
the primary communication mechanism. Instead, hard-
ware designers use various mechanisms, ranging from
simple wires to complex protocols, depending on the
system’s needs. An important characteristic of this
approach is the need to understand a system’s com-
munication channels and patterns before it is running
because communication channels must be hardwired.
The problem with pointersCommunication patterns in software are often diffi-
cult to determine a priori because of the frequent use
of pointers. These are memory addresses computed at
runtime, and as such are often data dependent and can-
not be known completely before a system is running.
Implementing such behavior in hardware mandates, at
least, small memory regions.
Aliasing, when a single value can be accessed
through multiple sources, is an even more serious prob-
lem. Without a good understanding of when a variable
can be aliased, a hardware compiler must place that
variable into a large, central memory, which is neces-
sarily slower than a small memory local to the compu-
tational units that read and feed it.
One of C’s strengths is its flexible memory model,
which allows complicated pointer arithmetic and essen-
tially uncontrolled memory access. Although very use-
ful for system programs such as operating systems, these
Electronic System-Level Design
382 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
abilities make analyzing an arbitrary C program’s com-
munication patterns especially difficult. The problem is
so great, in fact, that software compilers often have an
easier time analyzing a Fortran program than an equiv-
alent C program.
Any technique that implements a C-like program in
hardware must analyze the program to understand all
possible communication pathways; resort to large, slow
memories; or do some combination of the two.
Séméria, Sato, and De Micheli applied pointer analy-
sis algorithms from the software compiler literature to esti-
mate the communication patterns of C programs for
hardware synthesis.18 Although this is an impressive body
of work, it illustrates the difficulty of the problem. Pointer
analysis identifies the data to which each pointer can
refer, allowing memory to be divided. Solving the point-
er analysis problem precisely is undecidable, so
researchers use approximations. These are necessarily
conservative and hence might miss opportunities to split
memory regions, leading to higher-cost implementations.
Finally, pointer analysis is a costly algorithm with
many variants.
Communication costsSoftware’s event-oriented communication style is
another key difference from hardware. Every bit of data
communicated among parts of a software program has
a cost (that is, a read or write operation to registers or
memory), and thus communication must be explicitly
requested in software. Communicating the first bit is
very costly in hardware because it requires the addition
of a wire, but after that, communication is actually more
costly to disable than to continue.
This difference leads to a different set of concerns.
Good hardware communication design tries to mini-
mize the number of pathways among parts of the
design, whereas good software design minimizes the
number of transactions. For example, good software
design avoids forwarding through copying, preferring
instead to pass a reference to the data being forwarded.
This is a good strategy for hardware that stores large
blocks of data in memory, but is rarely appropriate in
other cases. Instead, good hardware design considers
alternate data encodings, such as serialization.
Communication approachesThe languages considered here fall broadly into two
groups: those that effectively ignore C’s memory model
and look only at communication through variables, and
those that adopt the full C memory model.
Languages that ignore C’s memory model don’t sup-
port arrays or pointers. Instead they look only at how
local variables communicate between statements.
Cones is the simplest; all variables, arrays included, are
interpreted as wires. HardwareC and Transmogrifier C
don’t support arrays or memories. Ocapi also falls into
this class, although arrays and pointers can assist during
system construction. BDL is perhaps the richest of this
group, supporting multidimensional arrays, but it doesn’t
support pointers or dynamic memory allocation.
Languages in the second group go to great lengths to
preserve C’s memory model. The CASH compiler takes
the most brute-force approach. It synthesizes one large
memory and puts all variables and arrays into it. The
Handel-C and C2Verilog compilers can split memory into
multiple regions and assign each to a separate memory
element. Handel-C adds explicit constructs to the lan-
guage for specifying these elements. SystemC also sup-
ports explicit declaration of separate memory regions.
Other languages provide communication primitives
whose semantics differ greatly from C’s memory style of
communication. HardwareC, Handel-C, and Bach C
provide blocking, rendezvous-style (unbuffered) com-
munication primitives for communicating between con-
currently running processes. SpecC and later versions
of SystemC provide a large library of communication
primitives.
Again, the difference between appropriate software
and hardware design is substantial. Software designers
usually ignore memory access patterns. Although this
can slow overall memory access speed, it is usually
acceptable. Good hardware design, in contrast, usual-
ly starts with a block diagram detailing every commu-
nication channel and attempts to minimize
communication pathways.
So, software designers usually ignore the funda-
mental communication cost issues common in hard-
ware. Furthermore, automatically extracting efficient
communication structures from software is challenging
because of the pointer problem in C-like languages.
Although pointer analysis can help mitigate the prob-
lem, it is imprecise and cannot improve an algorithm
with poor communication patterns.
A good hardware specification language should make
it easy to specify efficient communication patterns.
MetadataA high-level construct can be implemented in many
different ways. However, because hardware is at a far
lower level than software, there are many more ways to
383September–October 2006
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
implement a particular C construct in hardware. For
example, consider an addition operation. A processor
probably has only one useful addition instruction,
whereas in hardware there are a dizzying number of dif-
ferent adder architectures—for example, ripple carry,
carry look-ahead, and carry save.
The translation process for hardware therefore has
more decisions to make than translation for software.
Making many decisions correctly is difficult and compu-
tationally expensive. Furthermore, the right set of deci-
sions varies with design constraints. For example, a
designer might prefer a ripple-carry adder if area and
power are at a premium and speed is a minor concern,
but a carry-look-ahead adder if speed is a greater concern.
Much effort has gone into improving optimization
algorithms, but it remains unrealistic to expect all these
decisions to be automated. Instead, designers need
mechanisms that let them ask for exactly what they
want. Such designer guidance takes two forms: manu-
al rewriting of high-level constructs into the desired
lower-level ones (for example, replacing a “+” operator
with a collection of gates that implement a carry-look-
ahead adder) or annotations such as constraints or hints
about how to implement a particular construct. Both
are common RTL design approaches. Designers rou-
tinely specify complex data paths at the gate level
instead of using higher-level constructs. Constraint infor-
mation, often supplied in an auxiliary file, usually drives
logic optimization algorithms.
Although it might seem possible to use C++’s opera-
tor-overloading mechanism to specify, for example,
when a carry-look-ahead adder should implement an
addition, using this mechanism is probably very diffi-
cult. C++’s overloading mechanism uses argument types
to resolve ambiguities, which is natural when you want
to treat different data types differently. But the choice
of algorithm in hardware is usually driven by resource
constraints (such as area or delay) rather than data rep-
resentation (although, of course, data representation
does matter). Concurrency is the fundamental problem.
In software, there is little reason to have multiple imple-
mentations of the same algorithm, but it happens all the
time in hardware. Not surprisingly, C++ doesn’t support
this sort of thing.
The languages considered here take two approach-
es to specifying such metadata. One group places it
within the program itself, hiding it in comments, prag-
mas, or added constructs. The other group places it out-
side the program, either in a text file or in a database
populated by the user through a GUI.
C has a standard way of supplying extra information
to the compiler: the #pragma directive. By definition, a
compiler ignores such lines unless it understands them.
Transmogrifier C uses the directive to specify integer
width, and Bach C uses it to specify timing and mapping
constraints. HardwareC provides three language-level
constructs: timing constraints, resource constraints, and
arbitrary string-based attributes, whose semantics are
much like a C #pragma. BDL has similar constructs.
SpecC takes the other approach; many tools for syn-
thesizing and refining SpecC require the user to speci-
fy, using a GUI, how to interpret various constructs.
Constructs such as addition, which are low level in
software, are effectively high level in hardware. Thus,
there must be a mechanism for conveying designer
intent to any hardware synthesis procedure, regardless
of the source language. A good hardware specification
language needs a way of guiding the synthesis proce-
dure to select among different implementations, trad-
ing off between, say, power and speed.
WHY BOTHER generating hardware from C? It is clearly
not necessary, because there are many excellent proces-
sors and software compilers, which are certainly the
cheapest and easiest way to run a C program. So why
consider using hardware? Efficiency is the logical answer.
Although general-purpose processors get the job done,
well-designed customized hardware can always do it
faster, using fewer transistors and less energy. Thus, the
utility of any hardware synthesis procedure depends on
how well it produces efficient hardware specialized for
an application. Table 2 summarizes the key challenges
of a successful hardware specification language.
Concurrency is fundamental for efficient hardware,
but C-like languages impose sequential semantics and
require the use of sequential algorithms. Automatically
exposing concurrency in sequential programs is limit-
ed in effectiveness, so a successful language requires
explicit concurrency, something missing from most
Electronic System-Level Design
384 IEEE Design & Test of Computers
Table 2. The big challenges for hardware languages.
Challenge Comment
Concurrency model Specifying parallel algorithms
Specifying timing How many clock cycles?
Types Need bits and bit-precise vectors
Communication patterns Need isolated memories
Hints and constraints How to implement something
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
C-like languages. Adding such a construct is easy, but
teaching software programmers to use concurrent algo-
rithms is difficult.
Careful timing design is also required for efficient
hardware, but C-like languages provide essentially no
control over timing, so the language needs added tim-
ing control. The problem amounts to where to put the
clock cycles, and the languages offer a variety of solu-
tions, both implicit and explicit. The bigger problem,
though, is changing programmer habits to consider
such timing details.
Using software-like types is also a problem in hard-
ware, which wants to manipulate individual bits for effi-
ciency. The problem is easier to solve for C-like
languages. Some languages add the ability to specify
the number of bits used for each integer, for example,
and C++’s flexible type system allows hardware types to
be defined. The type problem is the easiest to address.
Communication also presents a challenge. C’s flexi-
ble global-memory communication model is not effi-
cient for hardware. Instead, memory should be broken
into smaller regions, often as small as a single variable.
Compilers can do so to a limited degree, but efficiency
often demands explicit control over this. A fundamen-
tal problem, again, is that C programmers generally
don’t worry about memory, and C programs are rarely
written with memory behavior in mind.
A high-level HDL must let the designer provide con-
straints or hints to the synthesis system because of the
wide semantic gap between a C program and efficient
hardware. There are many ways to implement a con-
struct such as addition in hardware, so the synthesis sys-
tem needs a way to select an implementation.
Constraints and hints are the two main ways to control
the algorithm, but standard C has no such facility.
Although presenting designers with a higher level of
abstraction is obviously desirable, presenting them with
an inappropriate level of abstraction—one in which
they cannot effectively ask for what they want—is not
much help. Unfortunately, C-like languages, because
they provide abstractions geared toward the generation
of efficient software, do not naturally lend themselves
to the synthesis of efficient hardware.
The next great hardware specification language
won’t closely resemble C or any other familiar software
language. Software languages work well only for soft-
ware, and a hardware language that does not produce
efficient hardware is of little use. Another important
issue will be the language’s ability to build systems from
existing pieces (known as IP-based design), which none
of these languages addresses. This ability appears nec-
essary to raise designer productivity to the level need-
ed for the next generation of chips.
Looming over all these issues, however, is verification.
What we really need are languages that let us create cor-
rect systems faster by making it easier to check for, iden-
tify, and correct mistakes. Raising the abstraction level
and facilitating efficient simulation are two well-known
ways to achieve this, but are there others? ■
AcknowledgmentsEdwards is supported by the National Science
Foundation, Intel, Altera, the SRC, and New York
State’s NYSTAR program.
References1. Handel-C Language Reference Manual, RM-1003-4.0,
Celoxica, 2003.
2. K. Wakabayashi and T. Okamoto, “C-Based SoC Design
Flow and EDA Tools: An ASIC and System Vendor Per-
spective,” IEEE Trans. Computer-Aided Design of Inte-
grated Circuits and Systems, vol. 19, no. 12, Dec. 2000,
pp. 1507-1522.
3. D.M. Ritchie, “The Development of the C Language,”
History of Programming Languages-II, T.J. Bergin Jr.
and R.J. Gibson Jr., eds., ACM Press and Addison-Wes-
ley, 1996.
4. G. De Micheli, “Hardware Synthesis from C/C++
Models,” Proc. Design, Automation and Test in Europe
(DATE 99), IEEE Press, 1999, pp. 382-383.
5. C.E. Stroud, R.R. Munoz, and D.A. Pierce, “Behavioral
Model Synthesis with Cones,” IEEE Design & Test, vol.
5, no. 3, July 1988, pp. 22-30.
6. D.C. Ku and G. De Micheli, HardwareC: A Language for
Hardware Design, Version 2.0, tech. report CSTL-TR-
90-419, Computer Systems Lab, Stanford Univ., 1990.
7. G. De Micheli et al., “The Olympus Synthesis System,”
IEEE Design & Test, vol. 7, no. 5, Oct. 1990, pp. 37-53.
8. D. Galloway, “The Transmogrifier C Hardware Descrip-
tion Language and Compiler for FPGAs,” Proc. Symp.
FPGAs for Custom Computing Machines (FCCM 95),
IEEE Press, 1995, pp. 136-144.
9. T. Grötker et al., System Design with SystemC, Kluwer
Academic Publishers, 2002.
10. P. Schaumont et al., “A Programming Environment for
the Design of Complex High Speed ASICs,” Proc. 35th
Design Automation Conf. (DAC 98), ACM Press, 1998,
pp. 315-320.
11. Y. Panchul, D.A. Soderman, and D.R. Coleman, System
for Converting Hardware Designs in High-Level
385September–October 2006
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
IEEE Design & Test of Computers
Electronic System-Level Design
Programming Language to Hardware Implementations,
US patent 6,226,776, Patent and Trademark Office,
2001.
12. D.D. Gajski et al., SpecC: Specification Language and
Methodology, Kluwer Academic Publishers, 2000.
13. R. Dömer, A. Gerstlauer, and D. Gajski, SpecC
Language Reference Manual, Version 2.0, SpecC Con-
sortium, 2001.
14. T. Kambe et al., “A C-Based Synthesis System, Bach,
and Its Application,” Proc. Asia South Pacific Design
Automation Conf. (ASP-DAC 01), ACM Press, 2001, pp.
151-155.
15. M. Budiu and S.C. Goldstein, “Compiling Application-
Specific Hardware,” Proc. 12th Int’l Conf. Field-Program-
mable Logic and Applications (FPL 02), LNCS 2438,
Springer-Verlag, 2002, pp. 853-863.
16. D.W. Wall, “Limits of Instruction-Level Parallelism,” Proc.
4th Int’l Conf. Architectural Support for Programming Lan-
guages and Operating Systems (ASPLOS 91), Sigplan
Notices, vol. 26, no. 4, ACM Press, 1991, pp. 176-189.
17. M.W. Hall et al., “Detecting Coarse-Grain Parallelism
Using an Interprocedural Parallelizing Compiler,” Proc.
Supercomputing Conf., IEEE Press, p. 49.
18. L. Séméria, K. Sato, and G. De Micheli, “Synthesis of
Hardware Models in C with Pointers and Complex Data
Structures,” IEEE Trans. Very Large Scale Integration
(VLSI) Systems, vol. 9, no. 6, Dec. 2001, pp. 743-756.
Stephen A. Edwards is an associ-ate professor in the Computer ScienceDepartment of Columbia University.His research interests include embed-ded-system design, domain-specific
languages, and compilers. Edwards has a BS from theCalifornia Institute of Technology and an MS and aPhD from the University of California, Berkeley, all inelectrical engineering. He is an associate editor ofIEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems. He is a senior memberof the IEEE.
Direct questions and comments about this articleto Stephen A. Edwards, Dept. of Computer Science,Columbia University, 1214 Amsterdam Ave. MC 0401,New York, NY 10027; [email protected].
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
Product DisplayJohn RestchackPhone: +1 212 419 7578Fax: +1 212 419 7589Email: [email protected]
Recruitment DisplayMid Atlantic (recruitment)Dawn BeckerPhone: +1 732 772 0160Fax: +1 732 772 0164Email: [email protected]
New England (recruitment)John RestchackPhone: +1 212 419 7578Fax: +1 212 419 7589Email: [email protected]
Southeast (recruitment)Thomas M. FlynnPhone: +1 770 645 2944Fax: +1 770 993 4423Email: [email protected]
Midwest/Southwest (recruitment)Darcy GiovingoPhone: +1 847 498-4520Fax: +1 847 498-5911Email: [email protected]
Northwest/Southern CA (recruitment)Tim MattesonPhone: +1 310 836 4064Fax: +1 310 836 4067Email: [email protected]
Japan (recruitment)Tim MattesonPhone: +1 310 836 4064Fax: +1 310 836 4067Email: [email protected]
Europe (recuirtment) Hilary TurnbullPhone: +44 1875 825700Fax: +44 1875 825701Email: [email protected]
A D V E R T I S E R I N D E XS E P T E M B E R / O C T O B E R 2 0 0 6
FUTURE ISSUE:
November-December 2006
Process Variation and Stochastic
Design and TestAdvertising close date: 01 October 06
Advertising Personnel
Marion DelaneyIEEE Media, Advertising DirectorPhone: +1 415 863 4717Email: [email protected]
Marian AndersonAdvertising CoordinatorPhone: +1 714 821 8380Fax: +1 714 821 4010Email: [email protected]
Sandy BrownIEEE Computer Society,Business Development ManagerPhone: +1 714 821 8380Fax: +1 714 821 4010Email: [email protected]
Advertising Sales Representatives
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:37 UTC from IEEE Xplore. Restrictions apply.
3870740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS September–October 2006
IN “THE CHALLENGES of Synthesizing Hardware
from C-Like Languages,” Stephen Edwards has provided a
good survey of the many attempts to adapt C to hardware
creation. His thesis, that “pure C is a poor choice for spec-
ifying hardware,” was recognized by all the people doing
this work. Unfortunately, he does not recognize the evo-
lution of these C variations. The last in this line, SystemC,
has satisfactorily addressed all the language issues.
Edwards acknowledges that using pragmas to direct the
synthesis process is a satisfactory way to provide the nec-
essary additional information for efficient hardware cre-
ation, yet he criticizes the language for not providing a
different means of doing so. In Table 2 of the article, two of
the five listed challenges are language issues (concurren-
cy and data types), and three are synthesis issues. Edwards
acknowledges that SystemC adequately solves the con-
currency model and types challenges but seems unaware
that existing modern synthesis products have solved the
other three—specifying timing, communication patterns,
and hints and constraints—with pragmas.
Problems have largely been solvedConfusing language with the synthesis process,
Edwards comes to the conclusion that C-like languages
“do not naturally lend themselves to the synthesis of effi-
cient hardware.” That is simply wrong. Commercial Sys-
temC synthesis tools routinely produce more efficient
hardware than handwritten RTL code typically pro-
duces. Edwards argues that properties of C-like lan-
guages make this synthesis process computationally
hard and time-consuming. Although some of the prop-
erties he has cited do make synthesis more difficult,
those problems have largely been solved. Fundamen-
tally, the complexity imposed on these synthesis prod-
ucts results from starting at a higher abstraction level,
not from the language.
Little trouble for competent hardwaredesigner
Edwards says, “My main point is that giving C pro-
grammers tools is not enough to turn them into reason-
able hardware designers.” This statement is unarguably
true. Giving people C compilers is not enough to turn
them into reasonable programmers either. Tuning code
for performance has long been recognized as a separate
skill, closely related to the underlying target processor. For
efficient performance, vector, SIMD (single instruction,
multiple data), SMP (symmetric multiprocessing), and
VLIW (very long instruction word ) machines all require
special techniques, encompassing both coding style and
pragmas. It should surprise no one that when the under-
lying target processor is raw gates, additional skill and
knowledge are required. In fact, a competent hardware
designer has little trouble creating efficient hardware
using SystemC and a modern synthesis product.
IN THE END, though, Edwards’ thesis is beside the
point. As IC capacity increases, it is becoming routine
to implement increasingly larger algorithms in hard-
ware, for the performance and efficiency reasons
Edwards cites. Those algorithms nearly always start out
in C or C++. It is far better to operate on the original ver-
sion directly than to manually translate it to a different
language before beginning to transform it into hard-
ware. Recognition of this fact has motivated most of the
efforts surveyed here. Sure, there are challenges, but the
benefits are worth it. ■
A different view: Hardware synthesis fromSystemC is a maturing technologyJohn Sanguinetti
Forte Design Systems
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:41 UTC from IEEE Xplore. Restrictions apply.
ITC Special Section
388 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
THIS SPECIAL SECTION of IEEE Design & Test of
Computers, along with the International Test Conference
2006, highlights the value that test adds to the electronics
manufacturing business. It leads us to think about test in
a whole new way.
The theme for ITC 2006 is “Getting More out of Test,”
which is very appropriate in light of recent advances
and changes in our industry. These days, everybody is
talking about things like design for manufacturability
(DFM), yield enhancement technologies, test-based out-
lier techniques, and the like. Based on these concepts,
whole companies have been founded and have pros-
pered, such as PDF Solutions, whose CEO was keynote
speaker at ITC 2005. What makes these developments
truly exciting is the role test plays in all of these new
technologies. Test is truly the cornerstone on which the
disciplines of yield and reliability engineering are built.
And we’re not just talking about characterization test or
an occasional product lot, but large production vol-
umes analyzed with new and ever-more powerful data
mining and data reduction techniques.
We have also had to rethink what it means for a die,
chip, board, or system to “pass” or “fail” a test. In the
early days, particularly for digital products, we could
always devise a test whose results were clear indicators
of good or bad units. Yes, there was (and is) the peren-
nial question of the test’s coverage or thoroughness. But,
that aspect related more to the effort level expended to
incorporate good test-access mechanisms into the
design and less to the technology in which the product
was manufactured. Today, however, we see ample evi-
dence of electronics failure mechanisms’ increasingly
subtle nature. We can view this problem from two per-
spectives: the “time zero” or “test escape” question, and
the separate but equally important reliability aspect.
A good example of the former is the relatively recent
proliferation of fault models and test approaches that
various groups are advocating. Everybody continues to
rely on the workhorse stuck-at fault model for bulk sta-
tic defect coverage. But how long will that strategy con-
tinue to work for us? At what point must we supplement,
or dare I say replace, stuck-at testing with other candi-
date test techniques such as N-detect tests, extracted
bridging fault tests, or other nontraditional forms of test-
ing? Authors in this magazine, at ITC, and at other
venues continue to grapple with this question.
On the reliability side, the underlying mechanisms,
such as channel hot carrier (CHC) effects and negative
bias temperature instability (NBTI), have always been
there. We have known about them for decades, but
their impact on quality and product lifetime was rela-
tively invisible to us. Unfortunately, that statement is no
longer true. NBTI and other reliability mechanisms
degrade product lifetime and performance and
demand that we add margins for their occurrence. So,
again, we must call on test to help us identify these prob-
lems when they occur, quantify the magnitude of the
yield/reliability impact, and screen the material before
it gets into the consumer’s hands. Overall, therefore, we
can see that test must play an ever-more-important role
in more aspects of the electronics business.
The first article in this special section, “Extracting
Defect Density and Size Distributions from Product ICs”
by Jeffrey Nelson et al., is a classic example of learning
all you can about the manufacturing process via pro-
duction test. Today, the cost to construct and populate
an IC wafer fabrication facility is measured in billions
of dollars, and the cost of a mask set in an advanced
technology is approaching or can exceed $1 million.
The inevitable outcome of these spiraling costs is that
Guest Editor’s Introduction:ITC Helps Get More out of TestKenneth M. Butler
Texas Instruments
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:40 UTC from IEEE Xplore. Restrictions apply.
fewer companies can afford to maintain captive IC man-
ufacturing sites and thus are moving to fabless, foundry-
based business models. But how do you learn and
respond to important yield and defect Pareto informa-
tion when design and manufacturing are in two com-
pletely separate companies, often geographically
distant from each other, without having to devote cost-
ly wafer volume to test vehicles? This article addresses
that important and timely question.
“Improving Transition Delay Test Using a Hybrid
Method” by Nisar Ahmed and Mohammad Tehranipoor
deals with the increasingly complex subject of delay test.
Starting somewhere around the 130-nm technology
node, and perhaps spurred by the advent of copper met-
allization, delay defects suddenly became something
that, left untested, could result in too large an escape rate
as seen by the customer. The industry responded in
earnest by applying delay test techniques to large num-
bers of production ICs. Immediately, users of this tech-
nology discovered issues with things like pattern volume,
realizable coverage, and test generation tool runtimes.
This article is an example of the types of new thinking
being applied to this problem to make delay test more
tractable and more usable, thus getting more out of it.
The final article, “Impact of Thermal Gradients on
Clock Skew and Testing” by Sebastià Bota et al., in some
sense turns the ITC theme on its ear. To get more out of
test, we must fundamentally understand not only its
capabilities but also its limitations. As die sizes grow
increasingly larger and clock rates continue to climb,
so, too, do power requirements, driving die tempera-
tures higher as well. Within-die thermal gradients can
have negative effects on timing and clocking, which
degrade testing’s accuracy and results. This article sys-
tematically examines the issue of thermal effects, intro-
duces a methodology for quantifying them, and
proposes a design technique for counteracting them.
TAKEN AS A WHOLE, the articles demonstrate the
changing role of test in the entire electronics industry
and how it’s not just for pass/fail anymore. Contributors
to ITC, IEEE Design & Test, and numerous other IEEE
test conferences and workshops are continually invent-
ing and demonstrating new ways in which the test
process can increase our rate of product and process
learning, speed products to yield and reliability entitle-
ment, and generally contribute more to our collective
bottom line. I hope that this information will inspire you
to come to ITC, see the presentations of articles like
these, interact with their authors, visit the exhibits floor
and see the new products that leverage the best test has
to offer, and, most importantly, share your thoughts and
ideas on how we can get more out of test.
I would like to take this opportunity to thank Editor-
in-Chief Tim Cheng and the entire IEEE D&T editorial
staff for their encouragement and assistance in pro-
ducing this special issue. ■
Kenneth M. Butler is a TI Fellow atTexas Instruments in Dallas. Hisresearch interests include outlier tech-niques for quality and reliability andtest-data-driven decision making. But-
ler has a BS from Oklahoma State University and anMS and a PhD from the University of Texas at Austin,all in electrical engineering. He was the program chairof ITC 2005 and currently serves on the program andsteering committees. He is a Senior Member of theIEEE and a member of the ACM.
Direct questions and comments about this specialsection to Kenneth M. Butler, Texas Instruments,13121 TI Boulevard, MS 366, Dallas, TX 75243;[email protected].
389September–October 2006
Join a community that targets your discipline.
In our Technical Committees, you’re in good company.
www.computer.org/TCsignup/
Looking for a community targeted to yourarea of expertise? IEEE Computer SocietyTechnical Committees explore a variety
of computing niches and provide forums fordialogue among peers. These groups influenceour standards development and offer leadingconferences in their fields.
JOIN A THINKTANK
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:40 UTC from IEEE Xplore. Restrictions apply.
ITC Special Section
390 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
DEFECTS FREQUENTLY OCCUR during IC manufac-
ture. Modeling the resulting yield loss is an important part
of any design-for-manufacturability strategy. Of the many
mechanisms that cause yield loss, some have sufficiently
accurate models and are well understood, whereas oth-
ers are unpredictable and difficult to characterize. Current
yield-related research focuses mainly on systematic
defects. In contrast, this article addresses random spot
defects, which affect all processes and currently require a
heavy silicon investment to characterize. We propose a
new approach for characterizing random spot defects in
a process. This approach enables accurate measurement
of parameters for the critical-area yield model—the work-
horse of modern yield-learning strategies.
IC manufacturers often neglect the need to tune the
yield model—that is, to continuously update yield
model parameters—because of the silicon area
required to characterize a process. But the inherently
stochastic nature of yield makes frequent process char-
acterization necessary for accurate yield models. We
present a system that overcomes the obstacle of silicon
area overhead by using available wafer sort test results
to measure critical-area yield model parameters. We use
only wafer sort test results, so no additional silicon area
is required. Our strategy uses the most realistic charac-
terization vehicle for the product IC—the product
itself—rather than memory or special-
ized test structures that waste silicon area
and often do not represent the product’s
design style.
BackgroundDefect density and size distributions
(DDSDs) are important parameters for
characterizing spot defects in a process. A DDSD tells
us what the defect density is for a given defect radius—
that is, the number of defects per unit area. The distrib-
ution gives this information for all defect radii. Typically,
though, as defect radius increases, defect density quick-
ly decreases. Thus, we can generally curtail the distrib-
ution and measure only defect density for a range of
defect radii, because larger defects have a density
approaching zero. This inherent feature becomes use-
ful in attempting to discretize the DDSD.
We can subdivide the distributions characterizing a
process beyond defect size. Each metal layer of the
process can potentially have a different DDSD. Ideally,
we’d like to measure each layer’s DDSD rather than
attempt to characterize all layers simultaneously with a
single distribution. These distributions are parameters
for the critical-area yield model.1-3
IC manufacturers measure DDSDs primarily with spe-
cialized test structures on a wafer. Test structures con-
tain geometries specifically designed to observe defects.
When a defect occurs in a particular region of a test
structure, that structure observes the defect, making it
easy for the process engineer to identify what the defect
mechanism is, where it occurred, and to learn about the
defect’s size. The price we pay for this convenience is
that test structures consume silicon area on the wafer.
Extracting Defect Densityand Size Distributions fromProduct ICs
Editor’s note:Defect density and size distributions are difficult to characterize, especially ifyou have little or no access to test vehicles specifically designed for thepurpose. The authors propose a new methodology for extracting thatinformation directly from production test data on actual products.
—Ken Butler, Texas Instruments
Jeffrey E. Nelson, Thomas Zanon, Jason G. Brown,
Osei Poku, R.D. (Shawn) Blanton, and Wojciech Maly
Carnegie Mellon University
Brady Benware and Chris Schuermyer
LSI Logic
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
Thus, test structures impose a trade-off between area
cost and defect observability.
Consider the three wafers in Figure 1. In Figure 1a,
the entire wafer is dedicated to test structures. This con-
figuration allows excellent defect observability, but the
obvious drawback is that no product can be manufac-
tured from it—product volume is zero. Manufacturers
typically use a full wafer of test structures only during
the earliest yield-learning phase, when the yield
improvement realized from these structures significantly
outweighs manufacturing cost.
In Figure 1b, products have replaced many of the test
structures, raising volume to a medium level. However,
observability has decreased because now there is a sig-
nificant amount of area where defects can occur with
no direct ability to characterize them. The wafer in
Figure 1b also contains test structures in the scribe lines.
This configuration is a compromise between defect
observability and volume. Manufacturers typically use
it during yield ramp, when volume is necessary, but the
ability to characterize defects—particularly systematic
defects—is still required.
Finally, the wafer configuration shown in Figure 1c
uses the entire silicon area to manufacture products.
The scribe lines still contain test structures because they
don’t affect product volume. As in the Figure 1b con-
figuration, this configuration provides limited area to
observe defects, but it is even more extreme because it
relegates the test structures to the scribe lines. This con-
figuration is used most during the volume phase of yield
ramp, when characterization of random spot defects is
most important for predicting yield.
The observability-versus-area trade-off has led to
research that seeks the best of both worlds: high observ-
ability and low (or no) area overhead. In particular,
researchers have used SRAMs to extract DDSDs.4 This
technique requires no additional overhead, because
the characterization vehicle (the SRAM) is a useful
product itself. SRAMs, however, have undesirable char-
acterization characteristics, such as confinement to a
few metal layers, which limits the scope of observable
defects. SRAMs’ extremely regular structure means that
if the replicated cell has a narrow scope of geometric
features for defect observation, this limitation will
extend over the entire chip. These limitations are only
noteworthy when the memories are extracting DDSDs
for yield-loss prediction for random-logic circuits. A
preferable defect characterization vehicle in such cases
is a random-logic product.
Other researchers have suggested using a random-
logic product to estimate the defect pareto in a process
using only test results.5 That work, in conjunction with
the SRAM work, inspired the initial idea that we could
extract a DDSD for each process layer using a random-
logic product IC as a virtual test structure.6 The first pub-
lication describing an investigation of this idea
appeared in March 2006.7 Here, we elaborate on that
publication and present new findings from an experi-
ment conducted on test data from silicon wafers pro-
vided by LSI Logic.
Proposed approachOur system accurately characterizes spot defects that
contribute to yield loss by measuring defect density in
each metal IC layer, without the silicon overhead
required by current techniques. The various geometries
and line spacing in a typical layout lead to defects of dif-
ferent sizes with varying effects on the IC (some small
391September–October 2006
(a)
(b)
(c)
Figure 1. Wafers with different test structure
configurations and varying levels of defect
observability (gray areas and scribe lines
represent test structures): all test structures and
no products (a), some test structures replaced by
products (b), and entire area used for products,
with test structures in scribe lines only (c).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
defects may have a negligible impact). Therefore, in
addition to defect density, we must measure the distri-
bution of defect sizes.
The strategy for achieving this goal is straightfor-
ward.6-8 By nature, each spot defect affects only a small
subset of nodes in close proximity to one another. Each
spot defect leads to a unique, defective circuit response.
Likewise, given a circuit response, there are some poten-
tial spot defects that cause that response. Using results
from structural testing, we can estimate the probability
of a particular circuit response and consequently the
probabilities of defect occurrence. By grouping respons-
es according to specific characteristics, such as the size
of a defect necessary to cause that circuit response, we
can determine the occurrence probabilities of defects
of that size.
Using a modeling strategy to predict faulty circuit
responses as a function of defect characteristics in the
process, we can mathematically derive defect charac-
teristics that minimize the difference between the mod-
eled test response probabilities and the estimated test
response probabilities. Thus, the calculated defect char-
acteristics must represent the actual defect characteris-
tics in the process. Of course, for this to be true, certain
conditions must be met. We propose a defect charac-
terization methodology based on this concept. That is,
we develop and apply a modeling strategy that predicts
probabilities of test responses depending on a DDSD,
and then we find the DDSD that leads to agreement
between circuit test responses measured by a tester and
test responses predicted by the model.
To accomplish this, we have developed a modeling
technique that relates the analyzed IC’s test responses
to defect characteristics that could cause such test
responses. We will describe two mappings: one
between defect probabilities and fault probabilities, and
one between faults and test responses.
Microevents and macroeventsA spot of extra conducting material deposited in a
metal layer can introduce an extra, unwanted bridge
connection between nonequipotential metal regions in
the layer. In most cases, a bridge will affect the circuit’s
electrical behavior. An instance of a bridge that con-
nects two or more nonequipotential metal islands is
called a microevent.4 Each microevent involves a set of
circuit nodes, S = {n1, n2, …, nm}, that are bridged by the
spot defect of a specific radius. We can calculate the
probability of a single, independent microevent using
the critical-area yield model.7 Equation 1 shows the prob-
ability that microevent i will occur, where Ci is the
microevent’s critical area, and Dj(ri) is the defect densi-
ty for defects of radius ri (the same radius as microevent
i) in layer j, the layer in which microevent i occurs.
(1)
Here, we define microevent μEi as a bridge, thus limit-
ing our scope to spot defects causing bridges. We do
this for two reasons: First, it is important that the physics
of the investigated yield loss mechanism be well under-
stood, which is indeed the case for bridges. Second,
spots of extra conducting material are still a major rea-
son for IC malfunctions in many processes.
An IC’s vulnerability to random spot defects greatly
depends on the layout. The critical-area concept was
developed to provide a metric of design sensitivity to
defects.1,9 Critical area is the layout region where, if a
spot defect of radius r occurs, a circuit can fail. Figure
2 shows a small portion of a sample layout with signal
lines in metal 1 and metal 2. The figure illustrates six
microevents: four in metal 1 and two in metal 2. Four
sample spot defects demonstrate how a microevent can
occur. Each microevent has an associated critical area
for a specific defect radius. For example, microevents
μE1 to μE3 have critical area for a defect of radius r1, rep-
resented by the solid boxes associated with each
microevent label. Likewise, microevents μE4 to μE6 have
critical area for radius r2, represented by the dashed
boxes. This example shows that even within a single
metal layer, microevents involving the same circuit
node set S can occur in several discrete regions. In this
p ei
C D ri j i= ( )−
ITC Special Section
392 IEEE Design & Test of Computers
c
b
c
b
μE1
μE2
μE4
μE3
μE6
μE5(a) (b)
Figure 2. Sample layout with six microevents: four in metal
layer 1 (a), and two in metal layer 2 (b). Microevents μE1 to
μE3 have radius r1 (solid boxes) and μE4 to μE6 have radius r2
(dashed boxes), where r1 < r2. Spot defects are circles.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
case, S = {b, c}. Each discrete region of critical area rep-
resents a separate microevent. In addition, microevents
involving the same set of circuit nodes can exist in dif-
ferent metal layers.
Critical-area measurement occurs in steps. First, we
measure critical area for all potential microevents in a
layout for a given radius, rstart. In each subsequent step,
the defect radius is incremented by a small amount and
the first step repeated for the new radius. This process
repeats, continuing over a specified range of defect radii
until reaching rend.
We can now define a macroevent as the set of all
microevents that exist for the same set of circuit nodes
S. As mentioned, many microevents involving S can
exist in different layers for different defect radii. So, a
collection of independent microevents describes each
macroevent. Figure 2 shows a single macroevent, occur-
ring between lines b and c, which consists of
microevents 1 through 6. Because a macroevent is a set
of independent microevents, the probability of a
macroevent involving S is one minus the product of the
probabilities of each microevent involving S not occur-
ring. Thus, in this example, the probability of the
macroevent involving b and c occurring is one minus
the product of the probabilities of each of the six
microevents not occurring.
Critical-area extraction for a range of defect radii pro-
vides a list of microevents and their associated critical
areas. With those measurements, we can calculate
microevent probabilities, and thus macroevent proba-
bilities, as a function of defect densities. Because a
macroevent represents a multiline bridge, we have in
fact extracted a list of potential bridge defects along
with their occurrence probability. This results in the first
mapping between defects and faults.
Logic-level modelingThe final modeling stage necessary for mapping
defect characteristics to test responses is a mapping
between the macroevent list and the test responses. This
mapping is embodied by the T matrix, which we calcu-
late by simulating the entire test set against each
macroevent. Because simulation time for a large number
of macroevents (even a small circuit can have hundreds
of thousands) can be enormous, we model them as logic-
level faults, making efficient simulation possible. To
maintain accuracy when simulating at the logic level, we
first derive an accurate gate-level model of the circuit.
Typical standard-cell representations obscure the
cell’s internal workings, causing the omission of impor-
tant signal lines from the logic-level netlist. This netlist
includes only standard-cell ports, even if the standard
cell contains several CMOS logic gates. Therefore, we
map a standard-cell layout to a logic-level description
that captures the structure of static CMOS gates in the
cell, using the gate primitives NAND, NOR, and NOT.
This change lets us consider gate outputs routed in
metal 1 in a standard cell during microevent extraction
and tie them to logic signals in the netlist.
An AND-gate standard cell illustrates this issue.
Typically, an AND gate is implemented in CMOS by a
NAND gate followed by an inverter, with the connec-
tion between the two routed in metal 1. Microevents
involving the internal metal 1 routing might occur, but
without the layout-to-logic mapping used here, we have
no basis for forming a logic-level fault model that
includes this metal line. With our mapping, we can effi-
ciently handle critical area that involves all metal lines
in a standard cell (which can account for a significant
portion of the chip’s total critical area).
However, some standard cells might still contain
metal structures that are not mapped to the logic level.
These polygonal structures are metal lines that don’t
correspond to a CMOS logic gate’s output (these struc-
tures do not include power and ground, which easily
map to logic 1 and 0). They are typically in complex
CMOS gates such as AND-OR-INVERT gates, multiplex-
ers, and other complex logic functions. Although we
could ignore macroevents involving these polygons,
they will become an additional source of error. We
developed a technique to handle the polygons by map-
ping their logic functions to standard cell ports, and we
used this technique in the silicon experiment that we
describe later.
The extracted macroevents represent bridges that
can involve two or more signal lines. Test engineers
commonly use bridge faults10 to model two-line bridge
defects, but because macroevents can involve more
than two lines, more-advanced fault models are neces-
sary. We use the voting-bridge fault model,11 in which
pull-up and pull-down network drive strengths deter-
mine the erroneous lines.
We form a voting model for each macroevent by sep-
arately summing the drive strengths of all lines in the
macroevent driven to logic 0 and logic 1. We then com-
pare the two sums to determine which logic value will
be imposed on the other lines. An error occurs on each
line with the weaker logic value. To implement the vot-
ing model described here, we use fault tuples, a gener-
alized fault representation mechanism.12 Despite the
393September–October 2006
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
complex models we use, the behavior of real spot
defects is unpredictable and therefore can be a source
of error.
To simulate the macroevents modeled as voting-
bridge faults, we use FATSIM, a concurrent fault simu-
lator for fault tuples.12 To determine which test vectors
detect which macroevents, we use no fault dropping
during simulation. The resulting data is stored in the T
matrix, which has the following form:
where V is the number of test vectors simulated, M is the
total number of macroevents, and ts,i is a 1 (0), indicat-
ing that macroevent i is detected (undetected) by test
vector s. The T matrix provides the mapping between
logic-level faults and circuit test responses.
We have verified qualitatively that an inaccurate T
matrix can significantly decrease the overall accuracy
of our DDSD extraction approach. When we use a ran-
dom T matrix, the resulting DDSDs have no resem-
blance to the expected distribution. Therefore, it is
critical that macroevents be modeled precisely and sim-
ulated correctly; otherwise, the T matrix’s quality will
be questionable. Simulation techniques that are more
detailed than a logic-level model (for example, transis-
tor-level Spice simulation) could possibly lead to greater
accuracy, but they would increase the required simu-
lation time considerably.
DDSD extractionAs discussed earlier, we can measure DDSDs by min-
imizing the difference between the predicted and the
observed probability of passing tests (yield per test). We
have described the various components necessary to
predict probability pi of test i passing. We adapt the crit-
ical-area yield model for this task, using critical-area
functions of macroevents, and the DDSD per layer as
parameters of the model. After measuring the T matrix
and critical-area functions of macroevents, the DDSDs
are the only unknown parameters of the model. We can
easily measure observed yield per test p̂i from tester
results as the ratio of the number of chips that pass test
i to the total number of chips manufactured.
We can find the DDSDs that minimize the error
between pi and p̂i by using linear regression. The key
idea is to abandon the concept of individual DDSDs per
layer. Because we will capture each distribution dis-
cretely using some number of points, we can concate-
nate all the DDSDs’ defect densities into a single vector.
The linear regression’s output will be this vector, which
can then be split into a DDSD for each metal layer. We
present a detailed mathematical description of these
steps elsewhere.9,10
Simulation experimentTo evaluate the proposed approach, we performed
an experiment based on a simulated, artificial process.
We assumed DDSDs for each layer of the artificial
process and inserted defects into the process based on
these distributions. We measured the estimated yield
per test vector by emulating a tester. We then applied
the DDSD extraction strategy to the circuit and com-
pared the extracted DDSDs with the inserted DDSDs.
Demonstration circuitFor this experiment, we used circuit c3540 from the
1985 International Symposium on Circuits and Systems
(ISCAS) benchmark suite.13 We logically optimized the
c3540 implementation and technology-mapped it to a
0.18-micron commercial standard-cell library. The final
layout was routed in five metal layers and used approx-
imately 100 μm × 100 μm of area.
In modern manufacturing processes, a design of this
size would typically be free of defects because of rela-
tively low defect densities. To ease the simulation bur-
den, we assumed that a single die consisted of 10,000
parallel instances of c3540, with each instance retain-
ing its original controllability and observability. As a
result, each die had an area of approximately 1 cm2 and
could still be tested with a test set for a single instance
of c3540. Although this die had a total critical area com-
parable to typical chips, it lacked the diverse geometri-
cal features that a die would normally exhibit. However,
the impact of design diversity on the DDSD extraction
technique was not the experiment’s focus.
After preparing the demonstration circuit, we extract-
ed macroevents, modeled them using fault tuples, and
simulated them with FATSIM to generate the T matrix.
The production test set consisted of 155 stuck-at test pat-
terns. During macroevent extraction, we determined crit-
ical area for a range of defect sizes to build a critical-area
function for each macroevent. For metal layers 1 through
4, the critical-area function domain was 0.2 micron to 2
microns, and for metal layer 5, it was 0.34 micron
to 2 microns, with samples spaced at 50-nm intervals.
This resulted in 182 critical-area points. We determined
the limits on the basis of minimum line spacing for the
T =⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
t t
t t
t
tV V
M
V M
11 1 2
1 2
1, ,
, ,
,
,
LM
LM
ITC Special Section
394 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
lower bound and selected the upper
bound to capture a sufficient portion of
the DDSD’s tail. Figure 3 shows the total
discretized critical-area function (sum of
critical area functions of all microevents
involving the layer) for each of the five
metal layers for one instance of c3540.
Tester emulationIn the proposed DDSD extraction
methodology, we measure the yield per
test from the structural test results of a
real tester. In the simulation experiment,
we substituted tester emulation for actu-
al test results. We generated defects
according to a stochastic Poisson
process in which each potential defect is
an independent event. The assumed
DDSD followed the well-known power
law, with the defect densities shown in
Table 1. We increased defect densities to
levels well beyond realistic figures to
reduce the simulation time required for
test emulation.
We consider each macroevent’s
occurrence an independent Poisson
process because we assume that each
defect’s occurrence is independent of all
others. As a result, each macroevent occurs with a fre-
quency dictated by a Poisson process at a rate deter-
mined from the critical-area function of the macroevent
and the DDSDs. Table 2 shows the percentage of dies
containing zero, one, two, or three macroevents in a
sample size of 50,000 for this experiment. From this
table, we reach two conclusions:
■ Because the occurrence rates of the number of
macroevents per die align with the theoretical occur-
rence rates, 50,000 dies are sufficient.
■ Of the simulated dies, multiple macroevents affect
only a small percentage.
From the artificial process simulation, we knew
which macroevents occurred on each faulty die. We
then obtained the yield per test by inspecting the T
matrix. The yield per test varied slightly around an aver-
age of 98% for each test. We assume that no masking
effects occur for dies affected by multiple macroevents.
Thus, if a test detects any of the individual macroevents,
we assume that the test will fail. Table 2 shows that the
assumption that no masking occurs applies to about
0.16% of all dies; thus, any impact from this assumption
is minimal.
395September–October 2006
Defect radius (μm)
Crit
ical
are
a (μ
m2 )
0.2 0.5 1 20.3
12,000
10,000
8,000
6,000
4,000
2,000
0
Metal182 bins19 bins
1 2 3 4 5
Figure 3. Critical-area functions (white symbols) extracted from all metal
layers of a single instance of circuit c3540 from the ISCAS 85 benchmark
suite. Black symbols represent critical-area functions after combining a
range of defect sizes.
Table 1. Injected defect density and size distributions
(DDSDs) following the power law distribution, with power
parameter p and peak-probability parameter X0 = 0.05 μm
for each metal layer. D0 [cm–2] represents defect density.
Metal layer
Parameter 1 2 3 4 5
D0 (cm–2) 1 2 2 1 3
p 3 4 3 2 3
Table 2. Occurrence rates for the number of macroevents per die for a
sample size of 50,000.
Parameter Occurrence rate
No. of macroevents per die 0 1 2 3
Percentage of dies 94.17 5.67 0.15 0.01
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
DDSD extractionWe formulated the DDSD extraction process as a
minimization problem to be solved using linear regres-
sion analysis. Here, we detail the regression procedure
for the demonstration circuit.
As already mentioned, the total number critical-area
points from the critical-area analysis for all layers is 182.
It is natural to likewise want to discretize the DDSDs by
solving for their values at the same points as the critical
area points. Each of these is referred to as a bin. The
individual defect densities in the 182 bins comprise the
DDSD vector we wish to derive. However, given that
there are only 155 test vectors, we can obtain only 155
yields per test. Consequently, there are more unknowns
than equations, which means the minimization is an
undetermined problem with an infinite number of solu-
tions.
To reformulate the problem so that it is solvable, we
grouped sample points for defect size ranges into
fewer, wider bins, thus reducing the overall number of
densities to be derived. Figure 3 shows the 19 bins
used for this experiment. We recalculated critical-area
functions for the new bin arrangements, represented
by the black symbols in Figure 3. This reconstruction
doesn’t affect the T matrix, so there is no need to res-
imulate the faults. We used principal component
regression to find the values for the 19 bins that make
up the DDSDs. We obtained 95% confidence intervals
for the extracted DDSDs, using standard bootstrapping
techniques.14
Figure 4 shows the final extracted results of the analy-
sis for all five metal layers. The triangles represent the
19 extracted DDSD vector components, and the small
circles represent the assumed DDSD components.
Although the results aren’t perfect, the inserted DDSD
and the extracted DDSD correlate well—a positive and
promising result. Figure 4 also shows the 95% confi-
dence intervals for each DDSD component. Some of the
confidence intervals are quite large. The source of this
variance can be traced to the properties of the critical-
area functions and the T matrix. Specifically, critical-
area functions that contribute to one test’s failing
correlate strongly with critical-area functions con-
tributing to other test patterns.
Silicon experimentAfter the success of the simulation experiment, we
conducted a similar experiment on a chip manufac-
tured in a commercial facility. The chip is an array of
64-bit ALUs manufactured in a 0.11-micron process. LSI
Logic designed the chip as a process development and
silicon-debugging vehicle closely mimicking the design
style of the company’s other digital-logic products.
Hence, the chip is ideally suited for testing and validat-
ing our DDSD extraction strategy. Each die contains 384
ALUs, each independently controllable and observable
(similar to the assumption made in the simulation
experiment).
The chip’s structure is convenient from the perspec-
tive of scale because the die is partitioned into many
small blocks, each a single ALU. Although not all
designs are this regular, large designs are frequently par-
titioned into smaller blocks and tested separately with
scan chains. Analyzing each block independently or
limiting the analysis to just a handful of blocks is one
strategy for coping with the large number of macro-
events associated with an industrial design.
We performed the experiment in almost the same
manner as that of the simulation experiment. We
adjusted the critical-area bins to account for the small-
er feature size. The bin edges were 0.1, 0.2, 0.4, 1, and
2 microns. The silicon chip was routed in six layers
rather than five and thus required 23 bins (like metal
layer 5, metal layer 6 was captured with only three
bins). Another difference in this experiment was that
we used real test results for a test set containing 262
patterns provided by the manufacturer. We extracted
the results using 451 failing ALUs; the part’s yield is IP,
so we don’t disclose the total number of manufac-
tured ALUs.
Figure 5 shows the extracted DDSDs for the six
metal layers. We did not simply parameterize an
assumed model, yet the extracted curve for each layer
follows a power law distribution, a DDSD shape typi-
cally found in manufacturing processes. This strongly
indicates that these results are meaningful. Addition-
ally, the plots indicate that although the distributions
don’t vary widely, there are differences in defect den-
sities from layer to layer. The y-axis in each graph has
the same range, making plot comparisons easier. Final-
ly, the large confidence intervals for the smallest
defect sizes in metal layers 5 and 6 occur because
there is very little critical area for small defects in the
higher metal layers, as Figure 6 shows. This can be the
result of either design rules that force lines to be far-
ther apart or simply the decreased routing density in
those layers. Either way, there is limited ability to
observe small defects in those layers—hence, the large
confidence intervals.
The results of the experiment on chips fabricated in
ITC Special Section
396 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
397September–October 2006
Defect radius (μm)
Metal 1
Def
ect d
ensi
ty (
cm−2
)
0.2 0.5 1 2
1 2 5 63 4
Metal 2
7 8
Metal 3
9 10 1411 12
Metal 4
15 16
Metal 517 18 19
1.0
0.8
0.6
0.4
0.2
0
Defect radius (μm)
Def
ect d
ensi
ty (
cm−2
)
0.2 0.5 1 2
1.0
0.8
0.6
0.4
0.2
0
Defect radius (μm)
Def
ect d
ensi
ty (
cm−2
)
0.2 0.5 1 2
1.0
0.8
0.6
0.4
0.2
0
Defect radius (μm)
Def
ect d
ensi
ty (
cm−2
)
0.3 0.5 1 2
1.0
0.8
0.6
0.4
0.2
0
Defect radius (μm)
Def
ect d
ensi
ty (
cm−2
)
0.2 0.5 1 2
1.0
0.8
0.6
0.4
0.2
0
Extracted defect densityPresumed defect density
(a)
(c)
(b)
(d)
(e)
95% confidence interval
Figure 4. Assumed and extracted DDSDs for all metal layers and corresponding 95% confidence
intervals: metal 1 (a), metal 2 (b), metal 3 (c), metal 4 (d), and metal 5 (e).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
silicon confirm the results of the simulation experiment:
We can measure DDSDs that characterize a process in
ordinary digital circuits using only slow, structural test
results from the product.
RATHER THAN DISCARDING pass/fail test results once a
part has been sorted, we can derive valuable process
characteristics from the test data. Our strategy extracts
DDSDs consistent with those we’d expect to see for a
modern manufacturing process—an achievement not
previously accomplished without using additional sili-
con area. Our ongoing research is looking for ways to
improve accuracy by using high-fidelity fault models and
greater data volume, as well as by accounting for yield
loss due to other defect types such as open circuits.
Many manufacturers continue to rely on inspection
techniques whose quality degrades with every new
process generation. Our approach to extracting process
characteristics doesn’t suffer from the same degrada-
tion. Although manufacturers stand to gain much from
using this approach, our strategy also offers an oppor-
tunity for fabless companies to gain insight into the fab-
rication of their chips. For the first time, such companies
can independently compute their products’ defect char-
acteristics and improve design yield by tuning designs
for a given fabline. ■
AcknowledgmentsSemiconductor Research Corporation supported
this work under contract 1172.001.
ITC Special Section
398 IEEE Design & Test of Computers
0.2 0.5 1 2
Metal 1
3 4
0.2 0.5 1 2
Metal 2
7 8
0.2 0.5 1 2
Metal 3
11 12
0.2 0.5 1 2
Metal 4
15 16
0.2 0.5 1 2
Metal 5
18 19 20
0.2 0.5 1 2
Metal 6
21 2214
9 101 2 5 6
23
Defect radius (μm) Defect radius (μm) Defect radius (μm)
Defect radius (μm) Defect radius (μm) Defect radius (μm)
(a) (b) (c)
(d) (e) (f)
Figure 5. Extracted DDSDs for all metal layers in a fabricated 64-bit ALU test chip, and corresponding 95%
confidence intervals. Defect densities are hidden to protect IP, but the scale of all plots is identical. Metal 1 (a),
metal 2 (b), metal 3 (c), metal 4 (d), metal 5 (e), and metal 6 (f).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
References1. W. Maly and J. Deszczka, “Yield Estimation Model for
VLSI Artwork Evaluation,” Electronic Letters, vol. 19, no.
6, Mar. 1983, pp. 226-227.
2. D. Schmitt-Landsiedel et al., “Critical Area Analysis for
Design-Based Yield Improvement of VLSI Circuits,” Qual-
ity and Reliability Eng. Int’l, vol. 11, 1995, pp. 227-232.
3. D.J. Ciplickas, X. Li, and A.J. Strojwas, “Predictive Yield
Modeling of VLSICs,” Proc. 5th Int’l Workshop Statistical
Metrology (WSM 00), IEEE Press, 2000, pp. 28-37.
4. J. Khare, D. Feltham, and W. Maly, “Accurate Estimation
of Defect-Related Yield Loss in Reconfigurable VLSI Cir-
cuits,” IEEE J. Solid-State Circuits, vol. 8, no. 2, Feb.
1993, pp. 146-156.
5. Y.J. Kwon and D.M.H. Walker, “Yield Learning via Func-
tional Test Data,” Proc. Int’l Test Conf. (ITC 95), IEEE
Press, 1995, pp. 626-635.
6. W. Maly, Spot Defect Size Measurements Using Results
of Functional Test for Yield Loss Modeling of VLSI IC,
white paper, Carnegie Mellon Univ., 2004.
7. J.E. Nelson et al., “Extraction of Defect Density and Size
Distributions from Wafer Sort Test Results,” Proc.
Design, Automation and Test in Europe (DATE 06),
IEEE Press, 2006, pp. 913-918.
8. J.E. Nelson et al., Extraction of Defect Density and Size
Distributions from Wafer Probe Test Results, tech. report
CSSI 05-02, Center for Silicon System Implementation,
Carnegie Mellon Univ., 2005.
9. C.H. Stapper, “Modeling of Integrated Circuit Defect
Sensitivities,” IBM J. Research and Development, vol.
27, no. 6, Nov. 1983, pp. 549-557.
10. K.C.Y. Mei, “Bridging and Stuck-at Faults,” IEEE Trans.
Computers, vol. 23, no. 7, July 1974, pp. 720-727.
11. R.C. Aitken and P.C. Maxwell, “Biased Voting: A Method
for Simulating CMOS Bridging Faults in the Presence of
Variable Gate Logic Thresholds,” Proc. Int’l Test Conf.
(ITC 93), IEEE Press, 1993, pp. 63-72.
12. R.D. Blanton, Methods for Characterizing, Generating
Test Sequences for, and Simulating Integrated Circuit
Faults Using Fault Tuples and Related Systems and
Computer Program Products, US Patent 6,836,856,
Patent and Trademark Office, 2004.
13. F. Brglez and H. Fujiwara, “A Neutral Netlist of 10 Com-
binational Benchmark Designs and a Special Translator
in Fortran,” Proc. Int’l Symp. Circuits and Systems
(ISCAS 85), IEEE Press, 1985, pp. 695-698.
14. B. Efron and R.J. Tibshirani, An Introduction to the Boot-
strap, Chapman & Hall, 1993.
399September–October 2006
0.E+00
1.E+04
2.E+04
3.E+04
4.E+04
5.E+04
0 0.5 1.0 1.5 2.0
Crit
ical
are
a (μ
m2 )
Metal 1Metal 2Metal 3Metal 4Metal 5Metal 6
Defect radius (μm)
Figure 6. Total critical-area functions per layer extracted from all metal layers of a 64-bit ALU.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
Jeffrey E. Nelson is a PhD candi-date in the Department of Electri-cal and Computer Engineering atCarnegie Mellon University. His re-search interests include process char-
acterization and testing of digital systems. He has aBS and an MS in electrical and computer engineeringfrom Rutgers University and Carnegie Mellon Univer-sity, respectively. He is a member of the IEEE.
Thomas Zanon is a PhD candidatein the Department of Electrical andComputer Engineering at CarnegieMellon University and a yield rampingconsulting engineer at PDF Solutions,
in San Jose, California. His research interests includedefect and process characterization based on testresults. Zanon has a Dipl. Ing. degree in electricalengineering and information technology from theTechnische Universitaet Muenchen. He is a memberof the IEEE and EDFAS.
Jason G. Brown is a PhD candi-date in the Department of Electricaland Computer Engineering at Car-negie Mellon University. His researchinterests include defect-based test,
inductive fault analysis, and layout-driven diagnosis.He has a BS in electrical engineering from WorcesterPolytechnic Institute and an MS in computer engi-neering from Carnegie Mellon University.
Osei Poku is a PhD candidate in theDepartment of Electrical and Comput-er Engineering at Carnegie MellonUniversity. His research interestsinclude various aspects in test and
diagnosis of VLSI circuits, such as automatic test pat-tern generation, volume diagnosis, and diagnosis-based yield learning. Poku has a BS in electricalengineering from Hampton University and an MS inelectrical and computer engineering from CarnegieMellon University.
R.D. (Shawn) Blanton is a profes-sor in the Department of Electrical andComputer Engineering at CarnegieMellon University, where he is the asso-ciate director of the Center for Silicon
System Implementation (CSSI). His research interests
include test and diagnosis of integrated, heterogeneoussystems. He has a BS in engineering from Calvin Col-lege, an MS in electrical engineering from the Universi-ty of Arizona, and a PhD in computer science andengineering from the University of Michigan, Ann Arbor.
Wojciech Maly is the Whitaker Pro-fessor of Electrical and ComputerEngineering at Carnegie Mellon Uni-versity. His research interests focus onthe interfaces between VLSI design,
testing, and manufacturing, with emphasis on the sto-chastic nature of phenomena relating these three VLSIdomains. Maly has an MSc in electronic engineeringfrom the Technical University of Warsaw and a PhDfrom the Institute of Applied Cybernetics, Polish Acad-emy of Sciences.
Brady Benware is a staff engineerin the Product Engineering group atLSI Logic, where his current focus is on developing defect-based testmethods to achieve very low defec-
tive-parts-per-million levels. Benware has a PhD inelectrical engineering from Colorado State University.
Chris Schuermyer is an engineerin the Advanced Defect Screeninggroup at LSI Logic. His research inter-ests include test for yield and defectlearning, defect-based testing, and
logic diagnosis. He has a BS in physics and a BS andan MS in electrical engineering, all from Portland StateUniversity.
Direct questions or comments about this article toR.D. Blanton, Dept. of Electrical and ComputerEngineering, Carnegie Mellon University, 5000 ForbesAve., Pittsburgh, PA 15213; [email protected].
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
ITC Special Section
400 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:57:45 UTC from IEEE Xplore. Restrictions apply.
President:DEBORAH M. COOPER* PO Box 8822
Reston, VA 20195
Phone: +1 703 716 1164
Fax: +1 703 716 1159
President-Elect:MICHAEL R. WILLIAMS*Past President: GERALD L. ENGEL*
VP, Conferences and Tutorials: RANGACHAR KASTURI (1ST VP)*VP, Standards Activities:SUSAN K. (KATHY) LAND (2ND VP)*
VP, Chapters Activities: CHRISTINA M. SCHOBER*
VP, Educational Activities: MURALI R. VARANASI†VP, Electronic Products andServices: SOREL REISMAN†
VP, Publications:
JON G. ROKNE†
VP, Technical Activities:
STEPHANIE M. WHITE*
Secretary: ANN Q. GATES*
Treasurer:STEPHEN B. SEIDMAN†
2006–2007 IEEE Division VDirector: OSCAR N. GARCIA†
2005–2006 IEEE Division VIIIDirector: STEPHEN L. DIAMOND†2006 IEEE Div. VIII Director-Elect: THOMAS W. WILLIAMS†
Computer Editor in Chief:DORIS L. CARVER†
Executive Director: DAVID W. HENNAGE†
* voting member of the Board of Governors† nonvoting member of the Board of
Governors
B O A R D O F G O V E R N O R STerm Expiring 2006: Mark Christensen, Alan
Clements, Robert Colwell, Annie Combelles, Ann Q.
Gates, Rohit Kapur, Bill N. Schilit
Term Expiring 2007: Jean M. Bacon,
George V. Cybenko, Antonio Doria, Richard A.
Kemmerer, Itaru Mimura, Brian M. O’Connell,
Christina M. Schober
Term Expiring 2008: Richard H. Eckhouse,
James D. Isaak, James W. Moore, Gary McGraw,
Robert H. Sloan, Makoto Takizawa, Stephanie M.
White
Next Board Meeting: 01 Nov. 06, San Diego, CA
E X E C U T I V E S T A F FExecutive Director: DAVID W.HENNAGEAssoc. Executive Director:
ANNE MARIE KELLYPublisher: ANGELA BURGESSAssociate Publisher: DICK PRICEDirector, Administration:
VIOLET S. DOANDirector, Business & Product Development:
PETER TURNERDirector, Finance and Accounting:
JOHN MILLER
COMPUTER SOCIETY O F F I C E SWashington Office
1730 Massachusetts Ave. NW
Washington, DC 20036-1992
Phone: +1 202 371 0101 • Fax: +1 202 728 9614
E-mail: [email protected]
Los Alamitos Office10662 Los Vaqueros Cir., PO Box 3014
Los Alamitos, CA 90720-1314
Phone:+1 714 8218380
E-mail: [email protected]
Membership and Publication Orders:
Phone: +1 800 272 6657 Fax: +1 714 821 4641
E-mail: [email protected]
Asia/Pacific OfficeWatanabe Building
1-4-2 Minami-Aoyama,Minato-ku,
Tokyo107-0062, Japan
Phone: +81 3 3408 3118 • Fax: +81 3 3408 3553
E-mail: [email protected]
I E E E O F F I C E R SPresident: MICHAEL R. LIGHTNER
President-Elect: LEAH H. JAMIESON
Past President: W. CLEON ANDERSON
Executive Director: JEFFRY W. RAYNES
Secretary: J. ROBERTO DE MARCA
Treasurer: JOSEPH LILLIE
VP, Educational Activities: MOSHE KAM
VP, Publication Services and Products:SAIFUR RAHMAN
VP, Regional Activities: PEDRO RAY
President, Standards Association: DONALD N. HEIRMAN
VP, Technical Activities: CELIA DESMOND
IEEE Division V Director: OSCAR N. GARCIA
IEEE Division VIII Director: STEPHEN L. DIAMOND
President, IEEE-USA: RALPH W. WYNDRUM, JR. rev. 2 Aug. 06
AVAILABLE INFORMATIONTo obtain more information on any of thefollowing, contact the Publications Office:
• Membership applications• Publications catalog• Draft standards and order forms• Technical committee list• Technical committee application• Chapter start-up procedures• Student scholarship information• Volunteer leaders/staff directory• IEEE senior member grade application
(requires 10 years practice and signifi-cant performance in five of those 10)
To check membership status or report achange of address, call the IEEE toll-freenumber, +1 800 678 4333. Direct all otherComputer Society-related questions to thePublications Office, +1 714 821 8380.
PUBLICATIONS AND ACTIVITIES
Computer.The flagship publication of theIEEE Computer Society, Computer publishespeer-reviewed technical content that coversall aspects of computer science, computerengineering, technology, and applications.
Periodicals. The society publishes 14magazines, 10 transactions, and one letters.Refer to membership application or requestinformation as noted at left.
Conference Proceedings & Books.Conference Publishing Services publishesmore than 175 titles every year. CS Presspublishes books in partnership with JohnWiley & Sons.
Standards Working Groups. Morethan 150 groups produce IEEE standardsused throughout the world.
Technical Committees. TCs provideprofessional interaction in over 30 technicalareas and directly influence computer engi-neering conferences and publications.
Conferences/Education. The societyholds about 150 conferences each yearand sponsors many educational activities,including computing science accreditation.
PURPOSE The IEEE Computer Society isthe world’s largest association of comput-ing professionals, and is the leadingprovider of technical information in thefield.
MEMBERSHIP Members receive themonthly magazine Computer, discounts,and opportunities to serve (all activitiesare led by volunteer members). Member-ship is open to all IEEE members, affiliatesociety members, and others interested inthe computer field.
COMPUTER SOCIETY WEB SITEThe IEEE Computer Society’s Web site, atwww.computer.org, offers informationand samples from the society’s publica-tions and conferences, as well as a broadrange of information about technical com-mittees, standards, student activities, andmore.
OMBUDSMAN Members experiencing prob-lems—magazine delivery, membership sta-tus, or unresolved complaints—may write tothe ombudsman at the Publications Office orsend an e-mail to [email protected].
CHAPTERS Regular and student chaptersworldwide provide the opportunity to inter-act with colleagues, hear technical experts,and serve the local professional community.
E X E C U T I V E C O M M I T T E E
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:12 UTC from IEEE Xplore. Restrictions apply.
ITC Special Section
402 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
THIS TRANSITION-FAULT-TESTING TECHNIQUE
combines the launch-off-shift method and an enhanced
launch-off-capture method for scan-based designs. The
technique improves fault coverage and reduces pattern
count and scan-enable design effort. It is practice orient-
ed, suitable for low-cost testers, and implementable with
commercial ATPG tools.
Scan-based structural tests increasingly serve as a cost-
effective alternative to the at-speed functional-pattern
approach to transition delay testing.1,2 Transition fault
testing involves applying a pattern pair (V1, V2) to the
circuit under test. V1 is the initialization pattern, and V2
is the launch pattern. V2 launches the desired signal tran-
sition (0 → 1 or 1 → 0) at the target node, and the
response of the circuit under test is captured at func-
tional speed (the rated clock period). The entire oper-
ation consists of three cycles:
■ initialization—a scan-in operation applies V1;
■ launch—a transition is launched at the target gate ter-
minal (V2 is applied); and
■ capture—the transition is captured at an observable
point.
Transition fault test patterns can be generated and
applied in three ways: the launch-off-shift (LOS)
or skewed-load method, the launch-off-capture (LOC) or
broadside method, or the enhanced-scan method. In
this article, we focus only on the first two methods. In LOS,
the transition at a target gate output is
launched in the last shift cycle during
the shift operation. Figure 1a shows the
waveforms during a LOS operation’s
cycles. The launch cycle is part of the
shift operation and is immediately fol-
lowed by a fast capture pulse. The time
period for the scan-enable signal (SEN)
to make this 1 → 0 transition corre-
sponds to the functional frequency. Hence, LOS
requires that SEN be timing critical. In LOC, the transi-
tion is launched and captured through the functional
pin (D) of any flip-flop in the scan chain.
Figure 1b shows the waveforms of the LOC method,
which separates the launch cycle from the shift opera-
tion. Because launch pattern V2 depends on the func-
tional response of initialization vector V1, the launch
path is less controllable, so test coverage is low. LOC
relaxes the at-speed constraint on SEN and adds dead
cycles after the last shift to provide enough time for SEN
to settle low.
As device frequencies become higher, production
test equipment capabilities limit the ability to test a
device at speed. Rather than purchasing a more expen-
sive tester, test engineers use one of several on-chip DFT
alternatives, such as an on-chip clock generator for at-
speed clock, pipeline SEN generation, or on-chip at-
speed SEN generation3 for LOS transition fault testing.
The LOS method is preferable to the LOC method in
terms of ATPG complexity and pattern count. However,
because of increasing design sizes, the SEN fan-out
exceeds any other net in the design. LOS constrains SEN
to be timing critical, requiring a design effort that makes
it difficult to implement products in reasonable turn-
around times. That’s the main reason for the widespread
use of the LOC method, especially on very low-cost
testers.2 In this article, we propose a hybrid technique
that uses both LOS and LOC in scan-based designs, pro-
Improving Transition DelayTest Using a Hybrid Method
Editor's note:Structured delay test using scan transition tests is becoming commonplace.But high coverage and compact tests can still be elusive in some situations.The authors propose a novel technique combining the cost-effectiveness oflaunch-from-capture test with the coverage/pattern volume advantages oflaunch-from-shift.
—Ken Butler, Texas Instruments
Nisar Ahmed and Mohammad Tehranipoor
University of Connecticut
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
viding higher fault coverage and lower pattern count
with a small scan-enable design effort. (The “Related
work” sidebar discusses other approaches to improving
transition delay test quality.)
OverviewOur proposed scan architecture controls a small sub-
set of selected scan cells by the LOS method, and con-
trols the remaining scan cells by the enhanced
launch-off-capture, or ELOC, method (see the “Related
work” sidebar). We use an efficient ATPG-based con-
trollability-and-observability measurement approach to
select the scan cells controlled by LOS or ELOC. The
selection criteria improve fault coverage and reduce the
overall pattern count. Because a few scan cells are LOS
controlled, only a small subset of the scan chains’ SEN
signals must be timing closed; this reduces the scan-
enable design effort. The method is robust and practice
oriented, and it uses existing commercial ATPG tools.4
To control the scan chain operation mode (LOS or
ELOC), two new cells called local scan-enable genera-
tors (LSEGs) generate on-chip SEN signals. The scan-
enable control information for the launch and capture
cycles is embedded in the test data itself. The LSEGs can
be inserted anywhere in the scan chain with negligible
hardware area overhead. The proposed technique is
suitable for low-cost testers because it doesn’t require
external at-speed SEN.
MotivationELOC improves the controllability of launching a
transition through either the scan path or the functional
path.5 However, it provides less observability than LOS
does because a scan chain working in shift mode to
launch a transition is not observable at the time of cap-
ture (SEN is held high during the launch and capture
cycles). Therefore, ELOC’s fault coverage is less than
that of LOS but greater than that of LOC. Figure 2a (on
p. 405) shows fault coverage analysis for the three tran-
sition fault methods. A common set of transition faults
is detected by both LOS and LOC, and some faults in the
LOC transition fault set are not detected by LOS, such
as shift-dependency untestable faults.6,7 However, ELOC
covers LOC’s entire transition fault set and also detects
some extra faults in the LOS-detected fault set. This is
because LOC is a special case in which all local SEN sig-
nals are held at 0 during the launch and capture cycles.
ELOC provides an intermediate fault coverage point
between LOS and the conventional LOC method.5
To improve fault coverage and identify the union of
fault sets detected in both the LOS and ELOC modes,
the scan cells must be controllable in both modes. Also,
to reduce the design effort for at-speed, scan-enable sig-
nal (required for LOS), we must determine the mini-
mum number of scan cells that require very high
controllability and observability during pattern genera-
tion. We must control the resulting smaller subset of
scan cells in LOS mode, and the remaining scan cells in
ELOC mode. This reduces the design effort to timing-
close the SEN signal at speed as required for LOS-con-
trolled scan flip-flops.
Figure 2b shows an example of a hybrid scan archi-
tecture with eight scan chains. The LOS-controlled scan
flip-flops are stitched in separate scan chains. A fast SEN
signal controls the first three scan chains containing
LOS-controlled flip-flops, and a slow SEN signal controls
the remaining scan chains in ELOC mode. Moreover,
this architecture also requires configuring the LOS-con-
trolled scan chains in functional mode because some
faults are detected only by LOC and not by LOS.
Local SEN generationThe new method for testing transition faults provides
more controllability in launching a transition but
requires an independent SEN for each scan chain. We
can use multiple scan-enable ports, but this increases
403September–October 2006
Scan-in pattern iScan-out response i – 1
Scan-in pattern i + 1Scan-out response i
CaptureLaunchInitialization
Scan-in pattern iScan-out response i – 1
Scan-in pattern i + 1Scan-out response i
CaptureLaunchInitialization
SEN
Clock
Clock
SEN
(a)
(b)
Figure 1. Transition delay fault pattern generation methods:
launch-off-shift (LOS) (a) and launch-off-capture (LOC) (b).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
ITC Special Section
404 IEEE Design & Test of Computers
Wang, Liu, and Chakradhar propose a hybrid scanarchitecture that controls a small subset of selected scancells by launch-off shift (LOS), and the rest by launch-offcapture (LOC).1 The authors have designed a fast scan-enable signal (SEN) generator that drives the LOS-con-trolled scan flip-flops. The selection criteria of theLOS-controlled scan flip-flops determine the method’seffectiveness. In some cases, the number of patterns gen-
erated by the hybrid method exceeds the LOC patterncount. Moreover, the LOS-controlled flip-flops cannot beused in LOC mode. Figure A1 shows the SEN waveformsof this hybrid technique.
In a new scan-based, at-speed test called enhancedlaunch-off-capture (ELOC), the ATPG tool deterministical-ly targets the transition launch path either through a func-tional path or the scan path.2 The technique improves
transition fault testing con-trollability and fault coverage,and it does not require SENto change at speed. FigureA2 shows SEN waveforms inthe ELOC technique. TheSEN signal of a subset ofscan chains stays at 1 (SEN1)during the launch and cap-ture cycles to launch thetransition only. The secondSEN signal (SEN2) controlsthe remaining scan chains tolaunch a transition throughthe functional path during thelaunch cycle and capture theresponse during the capturecycle. Figure A3 shows a cir-cuit with two scan chains,chain 1 acting as a shift reg-ister, and chain 2 acting infunctional mode. The con-ventional LOC method is aspecial condition of theELOC method in which theSEN signals of all chains are0 during the launch and cap-ture cycles.
Two other proposed tech-niques improve LOS faultcoverage by reducing shiftdependency.3,4 A techniqueby Li et al. reorders the scanflip-flops to minimize thenumber of undetectablefaults, and restricts the dis-tance by which a scan flip-flop can be moved to createthe new scan chain order.
Related work
CLK
SEN2
SEN1
(1)
CLK
SEN2
SEN1
(2)
(3)
Combo logic
Chain 2
Chain 1
Controlled by SEN2
Controlled by SEN1
LOC
LOS
Shift-mode
LOC
Scan-in pattern iScan-out response i – 1
Scan-in pattern i + 1Scan-out response i
Scan-in pattern iScan-out response i – 1
Scan-in pattern i + 1Scan-out response i
CaptureLaunchInitialization
CaptureLaunchInitialization
Figure A. Previously proposed techniques: SEN waveforms in hybrid scan
technique (1), SEN waveforms in enhanced LOC (ELOC) technique (2); ELOC
controllability—chain 1 used in shift mode, and chain 2 in functional mode (3).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
the number of pins. Two types of SEN sig-
nals must be generated on chip. The
scan-enable control information for the
scan flip-flops differs only during the pat-
tern’s launch and capture cycles. Hence,
we can use the low-speed SEN signal
from the external tester for the scan shift
operation and internally generate the
scan-enable control information for only
the launch and capture cycles.
LSEG cellsBecause our hybrid technique uses
both LOS and enhanced LOC tech-
niques, we must generate both fast and
slow local SEN signals. We propose two
LSEG cells to generate on-chip local SENs
using a low-speed external SEN generat-
ed by a low-cost tester.
Slow scan-enable generator (SSEG).We designed an LSEG to control a scan
flip-flop’s transition launch path.5 In this
article, we refer to this cell as the slow
scan-enable generator (SSEG) because
the local SEN signal does not make an at-
speed transition. Figure 3a shows the
SSEG cell architecture. It consists of a single flip-flop that
loads the control information required for the launch
and capture cycles. The input scan-enable (SENin) pin
connected to the external SEN signal from the tester is
called global scan-enable (GSEN). An additional out-
put scan-enable pin (SENout LSEN) represents the local
scan-enable (LSEN) signal. Therefore, after going to con-
trol state Q at the end of the shift operation (that is, after
GSEN is deasserted), LSEN remains in this state as long
as GSEN asynchronously sets it to 1. The SSEG cell essen-
tially holds the value 0 or 1 loaded at the end of the shift
operation (GSEN = 1) for the launch and capture cycles:
405September–October 2006
SEN2
LOS-controlledFast SEN signal
ELOC-controlledSlow SEN signal
LOC
LOS
ELOC
AU
SEN1
(b)
(a)
Figure 2. Hybrid method analysis and architecture: Fault analysis of LOS,
LOC, and ELOC techniques (a), and hybrid scan architecture: with LOS-
controlled scan chains using fast SEN signal and ELOC-controlled scan
chains using slow SEN signal (b).
Gupta et al. propose a technique that inserts dummy flip-flops and reorders scan flip-flops, considering wire lengthcosts to improve path delay fault coverage. Wang andChakradhar propose using a special ATPG to identify pairsof adjacent flip-flops and inserting test points (dummygates or flip-flops) between them.5
References1. S. Wang, X. Liu, and S.T. Chakradhar, “Hybrid Delay
Scan: A Low Hardware Overhead Scan-Based Delay Test
Technique for High Fault Coverage and Compact Test
Sets,” Proc. Design, Automation and Test in Europe
(DATE 03), IEEE Press, 2004, pp. 1296-1301.
2. N. Ahmed, M. Tehranipoor, and C.P. Ravikumar,
“Enhanced Launch-off-Capture Transition Fault Testing,”
Proc. Int’l Test Conf. (ITC 05), IEEE Press, 2005, pp. 246-
255.
3. W. Li et al., “Distance Restricted Scan Chain Reordering
to Enhance Delay Fault Coverage,” Proc. 18th Int’l Conf.
VLSI Design, IEEE Press, 2005, pp. 471-478.
4. P. Gupta et al., “Layout-Aware Scan Chain Synthesis for
Improved Path Delay Fault Coverage,” Proc. Int’l Conf.
Computer-Aided Design (ICCAD 03), IEEE Press, 2003,
pp. 754-759.
5. S. Wang and S.T. Chakradhar, “Scalable Scan-Path Test
Point Insertion Technique to Enhance Delay Fault
Coverage for Standard Scan Designs,” Proc. Int’l Test
Conf. (ITC 03), IEEE Press, 2003, pp. 574-583.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
Table 1 shows the SSEG cell’s operation modes.
GSEN = 1 represents the pattern’s normal shift opera-
tion. When GSEN = 0 and Q = 1, LSEN = 1 and the con-
trolled scan flip-flops act in the shift mode to launch
the transitions-only, shift-launch (no-capture) mode.
Moreover, there is no capture, because the LSEN signal
is 1 (LSEN = 1 → 1 at the launch edge). The other
observable scan flip-flops perform the capture. The
LSEN-controlled scan flip-flops act in the conventional
LOC mode when GSEN = 0 and Q = 0 (functional-
launch-capture mode).
Fast scan-enable generator (FSEG). Figure 3b shows
our new local, at-speed, scan-enable generator architec-
ture, which we call the
fast scan-enable generator
(FSEG). Table 2 shows
the FSEG cell’s operation
modes. As in SSEG cell
operation, GSEN = 1 rep-
resents the pattern’s nor-
mal shift operation. When
GSEN = 0 and Q = 1, LSEN
= 1 and the scan flip-flops
act in the shift-launch-cap-
ture mode to launch the
transition from the scan
path and capture the
response at the next cap-
ture cycle (conventional
LOS method). The LSEN from the FSEG cell makes a
1 → 0 at-speed transition at the launch cycle. The LSEN-
controlled scan flip-flops act in the conventional LOC
mode when GSEN = 0 and Q = 0 (functional-launch-
capture mode).
LSEG cell operationLSEG cells inserted in the scan chains pass control
information as part of the test data. The scan-enable con-
trol information is part of each test pattern and is stored
in the tester’s memory. Figure 4a shows the normal scan
architecture with a single SEN signal from the external
tester. The scan chain contains eight scan flip-flops, and
the shifted test pattern is 10100110. Figure 4b shows the
same circuit, which generates an LSEN signal from the test
pattern data for the hybrid transition fault test method. The
main objective is to deassert the external GSEN signal after
the entire shift operation and then generate the LSEN sig-
nal from the test data during the launch and capture
cycles. In this case, the shifted pattern is modified to
[C]10100110, where C is the scan-enable control bit stored
in the LSEG cell at the end of the scan operation.
The GSEN signal asynchronously controls the shift
operation. GSEN is deasserted after the nth shift (ini-
tialization) cycle, where n = 9; n is the length of the
scan chain after insertion of the LSEG cell. After the
GSEN signal is deasserted at the end of the shift opera-
tion, the scan-enable control during the launch and
capture cycles is control bit C stored in the LSEG. At the
end of the capture cycle, GSEN asynchronously sets the
LSEN signal to 1 for scanning out the response.
Figure 4c shows the process of test pattern applica-
tion and the timing waveforms for the two LSEG cells,
SSEG and FSEG.
LSEN GSEN QGSEN
Q GSEN= +( ) = =
=⎧⎨⎩1 1
0
if
if
ITC Special Section
406 IEEE Design & Test of Computers
SENin (GSEN) SENin (GSEN)
SENout (LSEN) SENout (LSEN)
SD01
Q
CLK
(b)
SD
CLK
01
QDFF
Q QD
FF
(a)
Figure 3. LSEG cells: slow scan-enable generator (SSEG) cell (a) and fast scan-enable
generator (FSEG) cell (b).
Table 1. SSEG operation, where GSEN is the global scan-enable signal, Q
is the flip-flop’s state, and LSEN is the local scan-enable signal.
GSEN Q LSEN Operation
1 X 1 Shift
0 1 1 → 1 Shift-launch (no capture)
0 0 0 → 0 Functional launch and capture
Table 2. FSEG operation.
GSEN Q LSEN Operation
1 X 1 Shift
0 1 1 → 0 Shift-launch capture
0 0 0 → 0 Functional launch and capture
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
Flip-flop selection: Measuringcontrollability and observability
In the LOS technique, the fault activation path (scan
path), unlike the functional path used in the LOC
method, is fully controllable from the scan chain input.
Hence, in most cases, for the same detected fault, a LOS
pattern requires fewer care bits than a LOC pattern. The
controllability measure of a scan flip-flop is the per-
centage of patterns in the entire pattern set (P) for
which a care bit is required in the scan flip-flop to
enable activation or propagation of a fault effect. Figure
5 shows a scan flip-flop with an input (observability)
and output (controllability) logic cone. A large output
logic cone implies that the scan flip-flop will control a
greater number of faults; that is, a care bit will be
required in their activation or propagation. Similarly,
the input logic cone determines a scan flip-flop’s observ-
ability. We define this observability as the percentage
of patterns in the entire pattern set (P) for which a valid
care bit is observed in the scan flip-flop.
In a transition fault test pattern pair (V1, V2), initial-
ization pattern V1 is essentially an IDDQ pattern to initial-
ize the target gate to a known state. In the next time
frame, pattern V2 is a stuck-at-fault test pattern to acti-
vate and propagate the required transition at the target
node to an observable point. Therefore, to find the con-
trollability-observability measure of a scan flip-flop, we
use an ATPG tool to generate stuck-at patterns and force
it to fill in don’t-care (X) values for scan flip-flops that
don’t affect any fault’s activation or propagation. The
407September–October 2006
(b)
[C]10100110
GSEN
LSEN
LSEG (FSEG or SSEG)
(a) GSEN
Scan output Scan input10100110
987654321Shift operation
IC LC CC
CLK
GSEN
C C C SSEG cell
FSEG cell(c)
LSEN = (GSEN + C)
C
Figure 4. LSEG cell operation: Scan chain architecture (a), LSEN generation using LSEG (b), and
LSEN generation process and waveforms (c).
Controllabilitylogic cone
Observabilitylogic cone
QD
Scan input
Scan flip-flop
Figure 5. Scan flip-flop controllability-and-
observability measure.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
ith scan flip-flop’s controllability is Ci = pc/P , where pc
is the number of patterns with a care bit in the scan flip-
flop during scan-in, and P is the total number of stuck-
at patterns. Similarly, observability is Oi = po/P, where po
is the number of patterns with a care bit in the scan flip-
flop during scan-out.
We then use each scan flip-flop’s measured control-
lability and observability factors to determine cost func-
tion CFi = CiOi. The scan flip-flops are arranged in
decreasing order of cost function, and a subset with very
high cost functions is selected as LOS-controlled flip-
flops. The ATPG-based controllability-observability mea-
surement technique overcomes the limitation of the
Scoap-based method8 used by Wang, Liu, and
Chakradhar,6 which makes it possible to select a scan flip-
flop that has high 0(1) controllability but is not controlled
to 0(1) during pattern generation by the ATPG tool.
Case studyThe following case study illustrates DFT insertion and
ATPG flow of our hybrid scan transition fault-testing tech-
nique. It includes an analysis of extra detected faults.
Test architectureThe LSEG-based solution presented
here provides a method of generating
internal LSEN signals from pattern data,
and GSEN signals from the tester. The
overhead of generating the LSEN signal is
the additional LSEG (SSEG or FSEG) cell
in the scan chain. An LSEG cell’s area
overhead is a few extra gates, which is
negligible in modern designs. We assume
that the area overhead of the buffer tree
required to drive all the LOS-controlled
scan flip-flops through the LSEG cells is
equal to the overhead of applying an at-
speed GSEN signal from external ATE.
Figure 6 shows a multiple-scan-chain
architecture with n scan chains. The LOS-
controlled scan flip-flops determined by
the controllability-observability measurement are stitched
in separate scan chains. Each scan chain i, where 1 ≤ i ≤ n,
consists of an LSEG (FSEG or SSEG) cell that generates
signal LSENi for the respective scan chain. The GSEN sig-
nal connects only to the SENin port of the LSEG cells.
Study descriptionIn this case study, we experimented with a subchip
of an industrial-strength design with the characteristics
listed in Table 3. One LSEG cell is inserted per scan
chain. The test strategy was to get the highest possible
transition fault test coverage. When generating test pat-
terns for transition faults, we targeted only faults in the
same clock domain. During pattern generation, only
one clock is active during the launch and capture
cycles. Hence, only faults in that particular clock
domain are tested. All primary inputs remain un-
changed, and all primary outputs are unobservable dur-
ing test-pattern generation for transition faults. This is
because the very low cost testers are not fast enough to
provide PI values and strobe POs at speed.
DFT insertionWe measure a scan flip-flop’s cost function (controlla-
bility × observability) using the ATPG-based technique
explained earlier. Figure 7 shows the cost function of each
scan flip-flop in our design. Approximately only 20% to
30% of the flip-flops require very high controllability and
observability. Hence, SEN need not be at speed for all
scan flip-flops. We arrange the scan flip-flops in decreas-
ing order of cost function, and we use this order during
scan insertion.
ITC Special Section
408 IEEE Design & Test of Computers
LSENi
LOS-controlled
ELOC-controlled
FSEG
SSEG
1
n
2
i
GSEN
Figure 6. Hybrid scan test architecture: FSEG cells driving LOS-controlled
scan chains, and SSEG cells driving ELOC-controlled scan chains.
Table 3. Case study design characteristics.
Characteristics No.
Clock domains 6
Scan chains 16
Scan flip-flops 10,477
Nonscan flip-flops 13
Transition delay faults 320,884
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
In the new order of scan chains, the few
initial chains consist of very high control-
lability-observability flip-flops, and we
select them for LOS according to their aver-
age cost function. We measure a scan
chain’s average cost function as ∑CFi/N,
where CFi = Ci × Oi is the cost function of
the ith scan flip-flop in the chain, and N is
the number of flip-flops in the scan chain.
Figure 8 shows each chain’s average cost
function for normal scan insertion and after
scan insertion based on controllability and
observability. As Figure 8b shows, after this
scan insertion, the average cost function of
the first five scan chains is very high (due
to scan flip-flops with very high cost func-
tions) and very low for the rest of the
chains. Therefore, we can design the first
five chains’ SEN signal to be at speed (con-
trolled by the FSEG cell), and the rest of the
scan chains can use a slow-speed SEN
(controlled by the SSEG cell).
We used the Synopsys DFT Compiler for scan chain
insertion.4 To insert the LSEG cells, the synthesis tool must
recognize the LSEG cell as a scan cell and stitch it into the
chain. This means that the LSEG cell must be defined as
a new library cell with scan cell attributes. A workaround
is to design the LSEG cell as a module, instantiate it, and
declare it as a scan segment of length 1. The GSEN signal
is connected to all LSEG cell SENin pins. During scan inser-
tion, we specify only the LSEG cell in the scan path
because the tool will stitch the rest of the cells, including
the LSEG cell, and balance the scan chain, depending on
the longest scan chain length parameter. Additionally, the
tool provides the flexibility to hook up each LSEG cell’s
SENout port in a particular chain to all the SENin ports of the
scan flip-flops in the respective chain.
ATPGThe ATPG tool must understand the LSEG methodol-
ogy and deterministically choose the transition fault acti-
vation path. We used a commercial ATPG tool, Synopsys
TetraMax,4 which supports two ATPG modes: basic scan
and sequential. Basic-scan ATPG is a combinational-only
mode with only one capture clock between pattern
scan-in and response scan-out; the sequential mode uses
a sequential time-frame ATPG algorithm. By default,
when generating test patterns for the transition fault
model in functional launch mode, the ATPG tool uses a
two-clock ATPG algorithm that has some features of both
the basic-scan and sequential engines. The tool under-
stands the LSEG technique and can choose the launch
path for the target transition fault deterministically.
Hence, there is no fundamental difference in ATPG
methodology when we use the LSEG-based solution.
The SEN signal for the flip-flops in the launch and cap-
ture cycles comes from an internally generated signal.
The OR gate in the LSEG cell generates the LSEN signal
through a logical OR of the flip-flop’s GSEN and Q output
(see Figure 3). The GSEN signal is high during scan shift
operation. The tool determines each chain’s LSEN and
shifts the control value into the LSEG cell during pattern
shift for launch and capture. It also deterministically
decides the combination of scan chains to work in shift
or functional launch mode, to activate a transition fault.
Table 4 shows results for conventional LOS and LOC
(normal scan insertion), ELOC, and hybrid transition
delay ATPG on the case study design. We see that LOS
gave approximately 3% higher fault coverage than LOC.
ELOC gave approximately 1.9% higher fault coverage than
the LOC method. The hybrid technique gave better fault
coverage than the other methods and provided a better
pattern count than the LOC and ELOC methods. The pat-
tern count was greater than that of LOS but at the advan-
tage of less scan-enable design effort—only five scan
chains being timing closed for at-speed SEN. (The hybrid
scan technique proposed by Wang, Liu, and Chakradhar6
sometimes gives a greater pattern count than the LOC
technique.) Our hybrid method used more CPU time than
409September–October 2006
1.0
0.8
0.6
0.4
0.2
0
Cos
t fun
ctio
n =
con
trol
labi
lity
× ob
serv
abili
ty
0 2,000 4,000 6,000No. of cells
8,000 10,000 12,000
Cost function
Figure 7. Cost functions of scan flip-flops in case study design.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
the other techniques because for hard-to-detect faults, the
ATPG tool must do more processing to determine the pos-
sible combinations of the SSEG-controlled scan chains in
shift register mode or functional mode.
Analysis of extradetected faults
As Rearick discusses,
the detection of function-
ally untestable faults poses
a potential yield loss prob-
lem.9 We analyzed the
additional faults detected
by the hybrid scan archi-
tecture over the conven-
tional LOC technique. To
determine the nature of
these extra faults, we per-
formed conventional LOC
ATPG on them. For exam-
ple, for ITC99 benchmark
circuit b17, the hybrid scan
method detected 17,926
extra faults. LOC ATPG on
these faults showed all of
them as nonobservable
faults—faults that can be
controlled but cannot be
propagated to an observ-
able point.
It can be argued that
some of these nonobserv-
able detected faults can
result in yield loss because
some of them might be
functionally untestable.
However, some of these
faults are actually func-
tionally testable but
become nonobservable
because of low-cost tester
ATPG constraints such as
no primary input changes
or no primary output mea-
sures. For example, of the
17,926 extra faults detect-
ed by hybrid scan in the
nonobservable class, 1,155
were detectable without
the low-cost tester con-
straints. Also, Lai, Krstic,
and Cheng show that functionally untestable nonob-
servable faults might not need testing if the defect does-
n’t cause a delay exceeding twice the clock period.10
With technology scaling and increasing operating fre-
ITC Special Section
410 IEEE Design & Test of Computers
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
Scan chains(a)
(b) LOS
Ave
rage
cha
in c
ost f
unct
ion
0
0.05
0.10
0.15
0.02
0.25
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
Scan chains
Ave
rage
cha
in c
ost f
unct
ion
Figure 8. Average cost function before (a) and after (b) scan insertion based on
controllability and observability.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
quencies, detecting multicycle delay faults might
become important, and more than two vectors are
required to detect such faults.10 The hybrid scan tech-
nique can be advantageous because it eases ATPG and
detects multicycle faults using a two-vector pair.
Experimental resultsWe experimented with our hybrid scan technique
on the three largest 1999 International Test Conference
(ITC) benchmark circuits and on four more industrial
designs ranging in size from 10,000 to 100,000 flip-flops.
We inserted 16 scan chains in each design. For the LOS
and LOC techniques, we used the Synopsys DFT
Compiler to perform normal scan insertion. For the
ELOC and hybrid techniques, we performed scan inser-
tion based on controllability and observability, and we
inserted one LSEG cell in each scan chain. In the case
of ELOC, we inserted only SSEG cells in each scan
chain. In the hybrid technique, we selected only the first
four scan chains to be LOS controlled (FSEG) after con-
trollability-observability measurement; the remaining
scan chains were ELOC controlled (SSEG). This
reduced the at-speed scan-enable design effort signifi-
cantly because the SEN signal to only one fourth of the
scan flip-flops needed to be timing closed.
During ATPG, the faults related to clocks, scan-enable,
and set or reset pins, referred to as untestable faults, are
not added to the fault list. Table 5 shows the ATPG results,
comparing the LOS, LOC, ELOC, and hybrid methods. The
ELOC method provides higher fault coverage than the
LOC method (up to 15.6% for design b19), and in most
cases an intermediate fault coverage and pattern count
between LOS and LOC. The hybrid method provides bet-
ter coverage than all other methods because it has the
flexibility to use combinations of functional and scan
paths for launching a transition. This method provides
higher fault coverage, by up to 2.68% (design D) and
19.12% (design b19) than LOS and LOC, respectively.
In a worst-case analysis, the lower bound for ELOC
is LOC with no extra faults detected over LOC, and the
upper bound is LOS. Similarly, for the hybrid technique,
the lower bound is ELOC, and the upper bound can be
greater than or equal to LOS. However, in the worst-case
scenario, for a given fault coverage, the hybrid method
will still benefit in test-pattern count reduction com-
pared to LOC, thereby reducing test time, with mini-
mum scan-enable design effort. In some cases, the CPU
time for the hybrid or ELOC method is greater than that
of the LOC method because the ATPG tool needs a larg-
er search space to find the transition launch activation
path for hard-to-detect faults.
Typically, in an ASIC design flow, scan insertion takes
place in a bottom-up manner, independent of a physical
synthesis step. The DFT insertion tool stitches the scan
chains based on the alphanumeric order of scan flip-flop
names in each module. The resulting scan chains are
then reordered during physical synthesis to reduce the
scan chain routing area. At the top level, the module-level
411September–October 2006
Table 4. Case study ATPG results.
Parameter LOS LOC ELOC Hybrid
Detected faults 292342 282658 288681 295288
Test coverage (%) 91.30 88.27 90.15 91.92
Fault coverage (%) 91.11 88.09 89.96 91.74
Pattern count 1,112 2,145 2,014 1,799
CPU time (s) 329.30 896.96 924.74 1,014.60
Table 5. ATPG results for 1999 International Test Conference (ITC) benchmark circuits and industrial designs.
LOS LOC ELOC Hybrid
No. of Fault Fault Fault Fault
FFs coverage No. of CPU coverage No. of CPU coverage No. of CPU coverage No. of CPU
Design (1,000s) (%) patterns time (s) (%) patterns time (s) (%) patterns time (s) (%) patterns time (s)
b17 1.4 95.09 1,088 95.4 81.02 1,190 1,000.8 94.29 1,328 325 96.50 1,179 187.9
b18 3.3 92.67 1,451 279.7 77.50 1,309 1,020.9 93.01 1,876 726 95.18 1,334 336.6
b19 6.6 85.98 2,280 645.3 69.21 1,153 1,050.4 84.81 1,422 1,000 88.33 1,590 1,000.9
A 10 91.11 1,112 329 88.09 2,145 896 89.96 2,014 924 91.74 1,799 1,014
B 30 87.94 4,305 3,569 85.14 8,664 7,800 86.57 8,539 8,702 88.03 8,062 6,611
C 50 81.10 6,869 8,415 79.42 12,073 22,930 80.48 11,583 25,642 83.29 8,134 14,451
D 104 92.15 5,933 6,559 91.56 10,219 12,088 92.28 12,505 47,788 94.83 9,674 18,410
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
scan chains are stitched together. Similarly, in our bot-
tom-up scan insertion flow, the scan chains in each mod-
ule are stitched based on the decreasing order of scan
flip-flops’ cost functions, and the resulting scan chains
are reordered during physical synthesis to reduce the
scan chain routing area. Therefore, the new scan inser-
tion method will not be affected significantly because
scan insertion and physical synthesis are not performed
for the entire chip. Although, it can be argued that our
scan chain stitch for controllability and observability
might slightly increase the scan chain routing area in
some cases, the decreases in scan-enable design effort
and area overhead compared with LOS are significant.
Moreover, the technique has the flexibility to shuffle and
reorder the different groups of scan chains (LOS con-
trolled and ELOC controlled) if any scan-chain-routing
problem arises.
THE PROPOSED HYBRID TECHNIQUE significantly
reduces the design effort and eases the timing closure
by selecting a small subset of scan chains to be con-
trolled using LOS. The experimental results also show
that the pattern count is reduced and fault coverage is
considerably increased. A statistical analysis is required
to find the optimum number of LOS-controlled scan
chains. Minimizing the number of LOS-controlled scan
chains will even further reduce the design effort, and
future work must focus on this issue. ■
AcknowledgmentsMohammad Tehranipoor’s work was supported in
part by SRC grant no. 2005-TJ-1322. Nisar Ahmed per-
formed the implementation work at Texas Instruments,
India.
References1. X. Lin et al., “High-Frequency, At-Speed Scan Testing,”
IEEE Design & Test, vol. 20, no. 5, Sept.-Oct. 2003, pp.
17-25.
2. J. Saxena et al., “Scan-Based Transition Fault Testing—
Implementation and Low Cost Test Challenges,” Proc. Int’l
Test Conf. (ITC 02), IEEE Press, 2002, pp. 1120-1129.
3. N. Ahmed et al., “At-Speed Transition Fault Testing with
Low Speed Scan Enable,” Proc. 24th VLSI Test Symp.
(VTS 05), IEEE Press, 2005, pp. 42-47.
4. User Manual for Synopsys Toolset Version 2005.09,
Synopsys, 2005.
5. N. Ahmed, M. Tehranipoor, and C.P. Ravikumar,
“Enhanced Launch-off-Capture Transition Fault Testing,”
Proc. Int’l Test Conf. (ITC 05), IEEE Press, 2005, pp.
246-255.
6. S. Wang, X. Liu, and S.T. Chakradhar, “Hybrid Delay
Scan: A Low Hardware Overhead Scan-Based Delay
Test Technique for High Fault Coverage and Compact
Test Sets,” Proc. Design, Automation and Test in Europe
(DATE 03), IEEE Press, 2004, pp. 1296-1301.
7. S. Wang and S.T. Chakradhar, “Scalable Scan-Path
Test Point Insertion Technique to Enhance Delay Fault
Coverage for Standard Scan Designs,” Proc. Int’l Test
Conf. (ITC 03), IEEE Press, 2003, pp. 574-583.
8. L.H. Goldstein and E.L. Thigpen, “SCOAP: Sandia Con-
trollability/Observability Analysis Program,” Proc. 17th
Design Automation Conf. (DAC 80), IEEE Press, 1980,
pp. 190-196.
9. K.J. Rearick, “Too Much Delay Fault Coverage Is a Bad
Thing,” Proc. Int’l Test Conf. (ITC 01), IEEE Press, 2001,
pp. 624-633.
10. W.C. Lai, A. Krstic, and K.T. Cheng, “On Testing the
Path Delay Faults of a Microprocessor Using Its Instruc-
tion Set,” Proc. 19th VLSI Test Symp. (VTS 00), IEEE
Press, 2000, pp. 15-20.
Nisar Ahmed is a PhD student in theElectrical and Computer EngineeringDepartment of the University of Con-necticut. His research interests includedesign for testability, at-speed testing,
and CAD. Ahmed has an MS in electrical engineeringfrom the University of Texas at Dallas. He is a memberof the IEEE.
Mohammad Tehranipoor is anassistant professor in the Electricaland Computer Engineering Depart-ment at the University of Connecticut.He has a PhD in electrical engineering
from the University of Texas at Dallas. His researchinterests include computer-aided design and test,DFT, delay fault testing, test resource partitioning, andtest and defect tolerance for nanoscale devices. He isa member of the IEEE, the ACM, and ACM SIGDA.
Direct questions and comments about this article to Mohammad Tehranipoor, ECE Dept. of Univ. ofConnecticut, Storrs, CT 06268; [email protected].
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
ITC Special Section
412 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:03 UTC from IEEE Xplore. Restrictions apply.
Special ITC Section
414 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
CMOS TECHNOLOGY SCALING has brought circuit
applications using hundreds of millions of transistors with
dimensions below 65 nm and operating frequencies
beyond 4 GHz. Among the many challenges imposed by
this scaling race during the past decade, increasing power
consumption from generation to generation is a major
concern. Two factors have caused most of the increase
in total circuit power consumption: a scaling model
based on supply voltage reduction, forcing the same
trend on transistor threshold voltage, and an increase in
operating frequency. The first factor contributes to static
or leakage power increase because of the exponential
dependence of the transistor’s off-state current on the
threshold voltage. The second factor determines active
power because of short-circuit and capacitor charg-
ing/discharging current components. Researchers have
pursued the development of advanced techniques to con-
trol IC total power consumption; these techniques span
many domains, including manufacturing technology,
device design, circuit design, and architecture.
In addition to increasing overall power, a related
effect drawing significant attention is increasing power
density. This increase is due to circuit critical-dimension
reduction, which packs more active devices per unit area
and therefore increases both static and dynamic power
density components. This trend has a direct impact on
circuit junction temperature, with a resulting increase of
overall average operating temperature. Power density
relates closely to circuit activity and gen-
erally is not uniformly distributed within
the circuit. As a result, thermal gradients
between circuit regions can be as high as
40°C to 50°C in high-performance designs,
creating nonuniform thermal maps.1 This
phenomenon can lead to hot spots in
localized IC regions.
The main challenges to the accurate prediction of
power density distribution and control stem from a lack
of tools capable of handling the various mechanisms
that determine hot-spot appearance. Such capabilities
would include accurate layout-based determination of
induced activity and resulting power distribution, cir-
cuit thermal-impedance computation, and heat flux dis-
tribution determination.
Power containment tools and methods have tradi-
tionally targeted overall mean power or peak power esti-
mation and reduction and in general are not valid to
predict hot spots. Predetermination of circuit hot spots
is important not only for reliability (for example, an
increase in wire temperature accelerates interconnect
electromigration), but also because of the circuit’s delay
dependency on temperature. Hot spots can slow specific
circuit regions with respect to other blocks or the clock
line and can cause circuit failure because of timing-rule
violation. Circuit hot spots can also directly affect the
clock line at a given point, causing timing violations.
These problems pose two concerns for circuit testing:
■ normal circuit operation can induce a given thermal
map that is not reproduced during circuit testing,
and
■ activity induced during circuit testing can lead to
modified thermal maps that can cause a circuit to
erroneously pass or fail the test.
Impact of Thermal Gradientson Clock Skew and Testing
Editor's note:It is a well-known phenomenon that test-mode switching activity and powerconsumption can exceed that of mission mode. Thus, testing can inducelocalized heating and temperature gradients with deleterious results. The authorsquantify this problem and propose a novel design scheme to circumvent it.
—Ken Butler, Texas Instruments
Sebastià A. Bota,
Josep L. Rosselló, and
Carol de Benito
University of the Balearic Islands
Ali Keshavarzi
Intel Jaume Segura
University of the Balearic Islands
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
Differences in thermal-map distribution between nor-
mal and test mode operations lead to a nonuniform effect
on relative path delay within logic blocks. Test-induced
hot spots can artificially slow noncritical paths or accel-
erate critical ones with respect to the clock, causing the
entire die to fail (pass) delay testing for a good (bad) part.
Therefore, if designers don’t properly consider higher
activity during test mode and its effect on the clock net-
work, a given percentage of dies can fail during test due
to test-induced thermal-map modification. This would
cause increased yield loss because the thermal map’s
impact on path delay during normal operation is different
from that induced during test. This article shows that clock
circuit distribution plays an important role in determining
the effect of these mechanisms on circuit behavior.
The evolution of VLSI chips toward larger die sizes
and faster clock speeds makes clock design an increas-
ingly important issue. In a synchronous digital IC, the
clock network significantly influences circuit speed, area,
and power dissipation. Because the clock function is vital
to a synchronous system’s operation, clock signal char-
acteristics and distribution networks have drawn much
attention. Any uncertainty in clock arrival times between
two points, especially if these points are near each other,
can limit overall circuit performance or even cause func-
tional errors. Clock signals typically carry the largest fan-
outs, travel over the longest distances, and operate at the
highest speeds of any signal, either control or data, in the
entire chip. Furthermore, technology scaling particular-
ly affects clock signals because long global interconnect
lines become more resistive. In addition, as technology
feature size shrinks, global metal layers that carry the
clock signal are closer to the substrate while the use of
low-k dielectrics for intralevel gap filling can significant-
ly increase thermal effects because these dielectrics have
lower thermal conductivity than SiO2. Both effects con-
tribute to a higher impact of substrate temperature
nonuniformities on the clock line thermal distribution.
Therefore, designers must investigate the possibility that
the nonuniform substrate temperature’s effect on clock
skew is a new delay fault mechanism, even with exact
zero-skew clock-routing algorithms.
In this article, we analyze the impact of within-die
thermal gradients on clock skew, considering tempera-
ture’s effect on active devices and the interconnect sys-
tem. This effect, along with the fact that the test-induced
thermal map can differ from the normal-mode thermal
map, motivates the need for a careful consideration of
the impact of temperature gradients on delay during
test. After our analysis, we propose a dual-VDD clocking
strategy that reduces temperature-related clock skew
effects during test.
Clock networks and clock skewClock network design is a critical task in developing
high-performance circuits because circuit performance
and functionality depend directly on this subsystem’s
performance. When distributing the clock signal over
the chip, clock edges might reach various circuit regis-
ters at different times. The difference in clock arrival time
between the first and last registers receiving the signal is
called clock skew. With tens of millions of transistors
integrated on the chip, distributing the clock signal with
near-zero skew introduces important constraints in the
clock distribution network’s physical implementation
and affects overall circuit power and area.
Researchers have done extensive work on automat-
ic clock network design to minimize the effect of unbal-
anced clock path delays resulting from routing or
differences in capacitive loading at the clock sinks.2 Most
clock distribution schemes exploit the irrelevance of the
absolute delay from a central clock source to clocking
elements—only the relative phase between two clock-
ing points is important. Early methods used symmetric
structures such as H trees or balanced trees. Figure 1
shows the H-tree clock topology, which consists of
trunks (vertical lines) and branches (horizontal lines).
In nonbuffered trees, top-level interconnect segments
are wider than lower-level segments. Furthermore, top-
415September–October 2006
D/2
Figure 1. Symmetric three-level H-tree layout for
clock distribution. D is the length of the H tree.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
level global interconnect segments are routed through
upper metal layers, whereas low-level local segments are
routed through lower metal layers.
In addition to zero skew, a second important require-
ment for a clock network is obtaining a high slew rate
to get sharp clock edges. Designers achieve this by
inserting buffers and repeaters in the clock network, cre-
ating a multistage clock tree, to isolate downstream
capacitance and reduce transition times. Clock net-
works with several buffer stages are common in high-
performance digital designs. Researchers have also
proposed approaches that incorporate uneven loading
and buffering effects resulting in non-H-tree topologies.3
Current designs incorporate clock distribution net-
works consisting of two parts: a global clock network
and a local network. The global clock network distrib-
utes the clock signal from the clock source to local
regions and usually has a symmetric structure. The local
distribution network delivers clock signals to registers
in a local area using a nonsymmetric structure because
register location in the circuit is typically not regular.
Any phenomenon that affects a net’s delay can con-
tribute to skew, so we can no longer ignore the portion
of clock skew caused by process variations in nanome-
ter technologies. Process variations—such as effective
gate length, doping concentrations, oxide thickness,
and interlayer dielectric thickness—cause uncertain
device and interconnect characteristics and can be a
source of significant clock skew. Dynamic variations—
such as power supply variations, coupling noise, and
junction temperature—can contribute to additional
skew during circuit operation.4 Temperature is difficult
to model and predict because of the switching activi-
ties of the various blocks composing the circuit and
their variation over time. Thus temperature is an impor-
tant source of skew. A nonuniform temperature gradi-
ent created by a hot spot can significantly impact clock
tree performance and worsen worst-case clock skew.
Algorithms used to design zero-skew clock tree net-
works usually don’t consider process variations or
nonuniform thermal distributions as possible sources
for clock skew. Researchers have proposed grid-based
clock networks driven by one or more lines of buffers
as an alternative to tree topologies. This method has
proved highly effective in reducing sensitivity to process
variations and environmental effects, typically at the
cost of consuming more wire resources and power. A
recent trend is to use hybrid structures formed by a sym-
metric tree and a mesh for the global clock network.5
Mori et al. demonstrated that adding a mesh to bottom-
level leaves of an H tree helps significantly reduce clock
skew caused by process variations.6
We focus on the relative impact of temperature and
nonuniform thermal maps on hybrid clock networks, as
they are widely used to achieve low clock skew and
power consumption.
Temperature effects on delayThe impact of environmental variations on skew is
difficult to analyze given its dependence on circuit
activity that changes over time. The two major sources
of environmental variations are power supply variations
and temperature. Power supply variations are the main
source of jitter, whereas temperature is a source of skew
(typical time constants for temperature changes are on
the order of milliseconds).
Temperature affects the delay of both interconnect
lines and clock buffers. The main sources of tempera-
ture generation in the chip are switching activities of the
cells over the substrate and joule heating of the inter-
connects when current passes through them. In a high-
performance design, junction temperature can vary
more than 50°C and reach an absolute temperature of
120°C in some circuit regions. To explain these mecha-
nisms, we introduce the temperature dependence of
interconnect and buffer parameters.
Interconnect temperature dependenceInterconnect delay relates to metal resistance and
the parasitic capacitance of wires that connect gates.
An interconnect’s resistance has a polynomial relation-
ship to its temperature. Assuming a first-order approxi-
mation, this dependence is
R(T) = r0(1 + β(T – T0)) (1)
where r0 is the unit length resistance at reference temper-
ature T0, and β is the temperature coefficient of resistance
(°C–1). The dependence of capacitance on temperature
is usually small and is not comparable to resistance vari-
ations. Deutch et al. reported that temperature variation
has a marked impact on wire delay for long interconnects
that are basically resistance limited in terms of delay (as
compared with capacitive and inductive components).7
Interconnect line resistance changes are about 20%, for a
variation of 75°C from ambient temperature.
Buffer temperature dependenceBuffer delay also changes with temperature through
transistor parameters’ dependence on junction tem-
Special ITC Section
416 IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
perature. These parameters include threshold voltage
(VT), mobility (μ), and silicon energy band gap (Eg).
Energy band gap thermal variations are usually small
and not comparable to VT and μ variations. The expres-
sions for the relationships of these last two components
with temperature are
VT(T) = VT(T0) – κ(T – T0)
and
μ(T) = μ(T0)(T/T0)–M
where T0 is room temperature (T0 = 300 K); κ is the
threshold voltage temperature coefficient, whose typi-
cal value is 2.5 mV/K, and M is the temperature expo-
nent, whose typical value is 1.5.
Junction temperature variation is an important
source of driver resistance variation and can have a sig-
nificant impact on buffer propagation delay. Figure 2
shows the variation of high-to-low and low-to-high prop-
agation time for a 70-nm inverter, obtained from elec-
trical simulations using Berkeley Predictive Technology
models (http://www.eas.asu.edu/~ptm).
The switching speed of CMOS inverters used as buffers
is basically a function of resistance-capacitance (RC) time
constants. To determine switching speed in Figure 2, we
measured the 50% transition delay of an inverter loaded
with another inverter stage and ideal wires. We assume
that capacitance is temperature independent. Figure 2
shows that a model similar to the one in Equation 1 can
approximate driver resistance variation with temperature.
Our analysis of interconnect and buffer delay varia-
tion with temperature makes clear that a uniform
increase of IC junction temperature results in a net
increase in absolute delay through the clock distribu-
tion path (clock latency). In balanced trees, this effect is
irrelevant because the main parameter for setting the
system clock period is the worst-case delay of logic
blocks between two consecutive register stages. The key
parameter affecting skew is the relative arrival of the
clock edge at registers at the end of each clock path.
Nonuniform thermal map effectsAs mentioned earlier, an IC’s power dissipation dis-
tribution is not uniform and depends on device and
interconnect electrical characteristics, layout circuit
placement, and the relative switching activity of differ-
ent chip blocks. In this sense, dynamic thermal gradi-
ents are inevitable during normal circuit operation.
Here, we compare temperature effects on nonbuffered
and buffered clock tree networks.
Nonbuffered treesWe model nonbuffered trees using a lumped-RC tree.
Figure 3a shows an example RC tree. We assume that the
417September–October 2006
1.2E-11
1.6E-11
2.0E-11
2.4E-11
2.8E-11
3.2E-11
−50 −30 −10 10 30 50 70 90 110 130Temperature (ºC)
Tim
e (s
)
High-to-low
Low-to-high
Figure 2. Delay versus temperature in a 70-nm
low-leakage inverter gate.
n0
n0
C3R2
R3
R4
R6
R7
C2
R5
C5
R1
C1
C4
C6
C7
n1
n2
n3
n4
n3 n4
n1 n2
(b )
(a)
Figure 3. RC tree used to compute Elmore delay (a) and
equivalent one-level H tree (b).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
tree has been designed such that the only sources of
skew are process variations and environmental condi-
tions. Using an Elmore delay metric, the delay from root
node n0 to sink node ni in the RC tree is
(2)
where Rj is the set of resistances in the path between
the source (root) and node ni, and ∑Ck is the down-
stream capacitance at j, defined as the sum of all
capacitances at any node k such that the unique path
in the tree from k to the root must pass through j. As an
example, we can compute the delay from root node n0
to node n3 in the H tree of Figure 3b as follows:
Tree symmetry leads us to assume that at the refer-
ence temperature, R2(T0) = R5(T0) = RL1,0, C2 = C5 = CL1,
R3(T0) = R4(T0) = R6(T0) = R7(T0) = RL2,0, and C3 = C4 = C6
= C7 = CL2, therefore there is no skew between nodes n1,
n2, n3, and n4. Given that resistances are temperature
dependent and parameter β is positive, performance
degrades with increasing temperature (worsening the
effective signal delay). In addition, because a nonuni-
form thermal profile doesn’t impact all regions of the
clock network distribution but slows only a restricted
area, it has a major effect on skew. Therefore, as a result
of temperature nonuniformities, the H tree’s symmetry
cannot guarantee zero skew.
For simplicity and without loss of generality, we con-
sidered a symmetric three-level H-tree clock structure
to evaluate and compare the effects of variability and
temperature gradients in nonbuffered structures. The
area covered by the tree is 5 mm × 5 mm. We consid-
ered circuit parameters for AlCu interconnects with
β = (3 ×10–3)°C–1, rsh = 0.077 Ω/sq at T0, and csh = 7.68×10–18
F/μm2 as unit sheet resistance and unit area capaci-
tance, respectively. We analyzed clock tree structures
with three different designs:
■ Design A is a clock tree using minimum-width inter-
connects.
■ Design B has interconnect widths computed with
Chen and Wong’s algorithm,8 which optimizes for
both clock delay and minimum skew.
■ Design C is the same as Design B except that it has a
grid shorting the H tree’s sink nodes. This modifica-
tion has moderate impact on mean delay but pro-
vides significant skew reduction.
We investigated the impacts of parameter variation
and temperature gradients on skew for each structure.
Table 1 shows mean delay, mean skew, sigma skew,
and maximum skew obtained from Monte Carlo simu-
lations for 1,000 samples at a uniform room tempera-
ture. Both mean delay and skew from design A (wmean =
0.45 μm, 3σ = 20%) are much higher than those
obtained from design B, which used the optimization
algorithm. Design C provides the best values for sigma
and maximum skew distributions, while providing
about one third of additional overall delay with respect
to design B. Redundancy created by mesh loops
smoothes out undesirable variations between signal
nodes spatially distributed over the chip.9
Figure 4 shows skew induced by a local hot spot of
radius D/8 (D is the length of the H tree shown in Figure
1) when located at different positions of the H tree
obtained for designs A and B. Our most significant
observations from these experiments are as follows:
■ Total skew depends on hot-spot position.
■ In nonoptimized trees, worst-case skew occurs when
the hot spot appears near the clock driver.
■ A design algorithm to optimize clock tree skew also
optimizes the impact of thermal-induced hot spots.
■ In optimized clock trees, depending on the hot spot’s
magnitude and size, its impact can be about 20% of
the skew from parameter variations.
Figure 5 compares worst-case clock skew caused by
hot spots affecting one whole quadrant for designs B (no
grid) and C. For C, we considered an ideal grid (no para-
sitic capacitance) and a realistic grid (with parasitic
capacitance). The amount of worst-case skew caused by
a 10°C difference is of the same order of magnitude as the
D R C R C R Ck
k
k
k
k
k
1 1
1
7
2
2
4
3 3=⎛
⎝⎜
⎞
⎠⎟ +
⎛
⎝⎜
⎞
⎠⎟ +
=
=
=
=
∑ ∑
DD R C R C R Ck
k
k
k
k
k
3 1
1
7
5
5
7
6=⎛
⎝⎜
⎞
⎠⎟ +
⎛
⎝⎜
⎞
⎠⎟ +
=
=
=
=
∑ ∑ 66
D R Ci j
j
k
k
=⎛
⎝⎜
⎞
⎠⎟∑ ∑
Special ITC Section
418 IEEE Design & Test of Computers
Table 1. Comparison of unbuffered clock tree designs.
Design Mean Mean Sigma Maximum
style delay (ps) skew (ps) skew (ps) skew (ps)
A 356.11 35.84 15.09 110.1
B 68.95 4.18 0.88 7.45
C 107.0 1.06 0.33 2.60
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
delay of one clock buffer, while the skew caused by a
50°C difference is of the same order of magnitude as the
skew from process parameter variations. Figure 5 also
shows that inserting a grid reduces skew resulting from
nonuniform thermal maps.
Buffered treesBuffers isolate downstream capacitance in the clock
network (see Equation 2), thus reducing latency and
transition times. In these networks, buffers are a prima-
ry source of total clock skew for two reasons. First,
device parameter variation with temperature is much
larger than interconnect variation. Delay degradation
caused by temperature effects on the driver’s on-resis-
tance are far more severe than delay caused by inter-
connect resistance thermal dependency. Second, delay
related to wiring length between two consecutive buffer
stages is independent of the RC parameters of previous
and subsequent wiring stages.
We designed a buffered H-tree clock network (design
A) and a clock network with a grid shorting the buffered H
tree’s sink nodes (design B) in a 1-V nominal supply volt-
age, 70-nm technology (http://www.eas.asu.edu/~ptm).
For design B we considered an ideal grid B1 (no parasitic
capacitance) and a realistic grid B2 (with parasitic capac-
itance). We considered a 2-mm × 2-mm chip and synthe-
sized a three-level symmetric H tree using the method
described by Cheng et al.,10 obtaining five buffer stages
between the clock source and any of the 64 sinks.
To compute process variability’s influence on skew,
we repeated the Monte Carlo analysis described earlier
(a 3σ variation of 30% in threshold voltage and 20% in
interconnection width). Table 2 shows mean delay,
mean skew, sigma skew, and maximum skew at a uni-
form room temperature. Again, redundancy created by
mesh loops noticeably reduces undesirable variations
between signal nodes spatially distributed over the chip.
419September–October 2006
1.0E-11
8.0E-12
6.0E-12
4.0E-12
2.0E-12
0.0E+00
Hot spotS
kew
(s)
1.4E-12
1.0E-12
1.2E-12
8.0E-13
6.0E-13
2.0E-13
4.0E-13
0.0E+00
Hot spot
Ske
w (
s)
(a) (b)
Figure 4. Hot-spot-induced skew at different clock tree positions for design A (a) and design B (b).
The skew is computed across the whole clock tree; only the quadrant where the hot spot is placed
is shown for simplicity.
0
1E-12
2E-12
3E-12
4E-12
5E-12
6E-12
7E-12
0 10 20 30 40 50Temperature increase (ºC)
Ske
w (
s)
Without gridNonideal gridIdeal grid (C = 0)
Figure 5. Impact of a hot spot on skew in one
clock network quadrant for an optimized clock
tree without a meshing grid, with an ideal grid
without parasitic capacitances, and with a grid
including parasitic capacitances.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
Comparing designs B1 and B2 shows the impact of the
additional capacitance caused by the grid.
Figure 6 plots total skew’s dependency on the mag-
nitude of temperature increase between two different
clock paths (we assume that the hot spot affects all
stages of one path, while the other path remains at the
reference temperature). The figure shows that skew is
roughly proportional to ΔT.
A comparison of results in Table 2 and Figure 6 indi-
cates that in a clock network without a grid, skew relat-
ed to a hot spot that increases temperature by 30°C can
be as much as 20% of overall clock latency (mean
delay). The skew plotted in Figure 6 is due only to the
thermal gradient’s effect; if the combined effect of ther-
mal maps and process variability is included, skew
increases 1.3 times in case A and 2.0 times in case B2.
The interconnect system plays a fundamental role in
overall delay (which decreases by 50%—from 403 ps to
142 ps—if the interconnect is neglected through zero
wire resistance and capacitance). Despite this benefit,
the interconnect system’s impact on thermal-induced
skew is around 7%. Therefore, although overall delay is
interconnect dominated, its heat-related variation is
mainly due to active devices.
We also ran two experiments to investigate the rel-
ative impact on delay of the number of inverters rela-
tive to spot size. In the first experiment (the
nongradual case), we computed the skew caused by
eight equal-size inverters, five at the same hot-spot-ele-
vated temperature, and three at a reference tempera-
ture. In the second experiment, we considered a
chain affected by a gradual hot spot—not all invert-
ers affected by the hot spot had the same temperature,
but the chain had a nonuniform, gradual thermal pro-
file in terms of the hot spot’s peak temperature T
above Tref. Temperature distribution decreased from
the central inverters to the side inverters. We consid-
ered eight inverters on the chain at the following
respective temperature increments: 1/4T, 1/2T, 3/4T,
T, T, 3/4T, 1/2T, 1/4T. Note that the sum of all temper-
ature values is 5T, the same as the sum of all temper-
ature increments for the nongradual case with five
inverters at temperature Tref + T.
Figure 7 compares skew results obtained for the non-
gradual and gradual cases, showing that skew is almost
identical in the two cases. This suggests that we can
compute the additional delay of n buffers (Dn), each at
temperature Ti as
Special ITC Section
420 IEEE Design & Test of Computers
Table 2. Comparison of buffered clock tree designs.
Design Mean Mean Sigma Maximum
style delay (ps) skew (ps) skew (ps) skew (ps)
A 403 15.77 4.36 32.6
B1 (C = 0) 403 0.17 0.23 1.2
B2 466 1.054 0.39 2.6
−3E-11
−1E-11
1E-11
3E-11
5E-11
7E-11
9E-11
−20 −10 0 10 20 30 40 50Hot spot temperature increment (ºC)
Ske
w (
s)
Without gridWith ideal grid without parasitic capacitancesWith grid including parasitic capacitances
Figure 6. Impact of a hot spot on skew.
Measurements used 27°C as the reference
temperature with buffers biased at nominal
supply voltage.
−2E-03
0E+00
2E-03
4E-03
6E-03
8E-03
−10 0 10 20 30 40 50Trise (ºC)
Ske
w (
ns)
GradualNongradual
Figure 7. Skew due to five inverters at a hot spot
of temperature T in an eight-inverter chain
(nongradual), and eight inverters at
temperatures 1/4T, 1/2T, 3/4T, T, T, 3/4T, 1/2T,
and 1/4T, respectively (gradual).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
where Tref is a reference temperature.
Therefore, the skew between two different clock
sinks i and j is proportional to
If the sum is performed for all tree stages, Ti,k is the junc-
tion temperature of the k stage in the path from the root
to sink i, and Tj,k is the junction temperature of the k
stage in the path from the root to sink j.
Finally, from our comparison of buffered and non-
buffered clock trees, we conclude the following:
■ In buffered trees, skew is less dependent on the hot
spot’s position in the tree.
■ The relative impact of thermal gradients on skew with
respect to parameter-variation-induced skew is greater
in buffered clock trees than in nonbuffered trees.
Temperature impact on testingOperating frequency and circuit activity are the two
main factors that determine a circuit’s active power and,
therefore, contribute to nonuniformities in junction tem-
perature distribution. Active power increases almost lin-
early with operating frequency, but circuit activity’s
effect on relative temperature at different operating fre-
quencies has not been investigated. This issue is impor-
tant in comparing a circuit’s relative temperature
increase during normal and test modes.
Typically, a circuit working in normal mode operates
at its maximum frequency, but only a small fraction of its
internal blocks are active. Designers determine power
constraints for normal circuit operation, usually assum-
ing that random logic blocks will have about 20% to 30%
of activity with respect to the clock signal. On the other
hand, circuit activity is substantially higher in test mode
than in normal operating mode, although the effective
operating frequency is much lower because test stimuli
must be scanned in and out through DFT structures. Such
switching activity increases the device’s overall energy,
peak power, and average power consumption. The result-
ing elevated average power will affect the chip’s tem-
perature distribution, and might not only increase overall
chip temperature but also increase hot-spot apparition.
Figure 8 compares relative temperature increase
with internal circuit activity at two different operating
frequencies (50% and 90% of maximum frequency) for
a circuit constructed from a 7 × 7 array of c432 ISCAS
benchmark circuits. We obtained these results with
Rosselló et al.’s thermal and power computation mod-
els.11 We used the resulting power map density to obtain
a thermal map and calculate temperature increase. The
results show that we can obtain a relatively equal junc-
tion temperature increase by running the circuit near
full speed with a typical circuit activity of 20% (normal
mode), or at half speed with increasing activity to about
80%. Therefore, since we can achieve similar thermal
levels during normal and test mode operations, it is
worthwhile to investigate the effect of thermal maps on
delay during test mode. Our results are in line with other
work showing the relative impact of increased power
dissipation during test mode.12
Researchers have proposed strategies for limiting
test-induced power excess by controlling either peak or
average power. Some propose a proper selection of test
vectors to reduce power dissipation and energy con-
sumption while achieving high fault coverage. Many of
these techniques rely on power-constrained test-sched-
uling algorithms and focus on reducing or maintaining
circuit power consumption within safe operating mar-
gins. These methods don’t pursue uniform power dis-
tribution over the die and therefore don’t guarantee a
uniform thermal map.
We have explored a possible method for avoiding
the delay impact of artificially created thermal maps
due to test activity and the consequent masking of test
results. Bellaouar et al. have shown that the rate of dri-
D D T Ti j i k j k
k
− ∝ −( )∑ , ,
D T Tn i
i
n
∝ −( )=
∑ ref
1
421September–October 2006
0
2
4
6
8
10
10 18 26 34 42 50 58 66 74 82 90 98Activity (%)
Tem
pera
ture
incr
ease
(ºC
) 90% fmax50% fmax
Figure 8. Temperature increase versus activity,
while controlling inputs of an array of
independent logic circuits.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
ver resistance variations due to temperature fluctuations
is strongly dependent on power supply voltage, and that
an optimum bias voltage (VDDopt) minimizes these vari-
ations.13 We have proposed a dual-supply-voltage clock
tree to reduce skew related to temperature gradients.14
Figure 9 shows such a tree. The high-to-low converter
(HLconverter) is a buffer that converts the incoming
clock signal to the chip from a standard swing to a lower
voltage swing. The HLconverter’s structure is relatively
straightforward. To convert the clock swing from the
standard voltage range to a lower voltage range, we use
a conventional buffer driven by supply voltage VDDopt.
The clock signal is then transmitted on the chip as a low-
voltage signal. At the utilization points at the sink flip-
flops, the low-to-high converter (LHconverter) converts
the signal using the block to the higher voltage swing,
which is the voltage used by the logic network. The
LHconverter’s structure is more involved; some design
examples appear in other works.15
We performed a simulation experiment on a 130-nm
technology to test a multiple-supply-voltage scheme that
uses a bias supply selected to compensate for tempera-
ture-related effects. As Figure 10 shows, changing the
clock buffers’ supply voltage from VDD to VDDopt signifi-
cantly reduced total skew.
A related advantage of this clock scheme is a reduc-
tion in power consumption. However, side effects such
as noise on the supply network could be significant.
Also, an increased impact of process parameter varia-
tions on delay at the reduced supply voltage16 could
compromise the compensation effects.
Our results suggest an alternative clock skew opti-
mization approach: introducing a clock grid in the
Special ITC Section
422 IEEE Design & Test of Computers
−1.0E-11
−5.0E-12
0.0E+00
5.0E-12
1.0E-11
1.5E-11
2.0E-11
2.5E-11
3.0E-11
−10 0 10 20 30 40 50Temperature increase (ºC)
Ske
w (
s)
VDD = 1.2 VVDDopt = 0.8 V
Figure 10. Skew versus temperature increase
for a three-level buffered clock tree without
grid, biased at nominal supply voltage VDD and at
VDDopt for various hot-spot temperatures.
VDDopt region VDD region
HLconverter LHconverter
Figure 9. Dual-voltage clock scheme.
5.0E-04
1.0E-03
1.5E-03
2.0E-03
2.5E-03
3.0E-03
3.5E-03
Ave
rage
pow
er p
er c
ycle
(W
)
−1.0E-11
−5.0E-12
0.0E+00
5.0E-12
1.0E-11
1.5E-11
2.0E-11
0.7 0.8 0.9 1.0 1.1 1.2VDD
Ske
w (
HS
= 1
0ºC
)
With gridWithout grid
Figure 11. Isopower skew improvement gained
by using clock grid design for low-voltage
section of clock tree shown in Figure 9.
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
clock region polarized at VDDopt during testing. This
option would minimize the impact of process parame-
ter variations and noise on clock distribution. It would
also optimize power dissipation because the clock dis-
tribution circuitry would not contain a grid mesh—
avoiding mesh-based architectures’ power penalty of
about 40%.5
To verify this optimization, we compared average
power per cycle dissipated by the clock tree distribution
and the resulting clock skew in two designs: a design
without a grid and a design with a grid on the portion of
clock distribution at the reduced optimal supply voltage.
Figure 11 shows the results, which confirm the benefit
in overall skew reduction for an isopower comparison
between these two design alternatives. The isopower
scheme refers to the comparison between the two
designs. The upper graph in Figure 11 is used to set the
power limit (2.1 mW in this case), thus the horizontal
line intersects two points from each curve in this graph
being at the same power value. The intersection of the
horizontal curve (iso power) with the supply voltage of
the “With Grid” design determines the supply voltage of
the clock tree; the lower graph in Figure 11 provides the
value of the skew reduction. The isopower scheme does-
n’t provide the full gain achievable because the reduced
supply voltage provided by the isopower requirement
(around 0.97 V) is slightly beyond the optimal supply
voltage for this technology (0.8 V). Nevertheless, skew
decreased from 12 ps to less than 1 ps.
CLOCK SKEW has as much impact on overall paramet-
ric yield as any propagation delay. Large clock skews
can cause timing violations because of the erosion in
setup or hold times. Researchers have reported that
process parameter variations, parasitics, and noise
effects such as crosstalk affect the delay of each clock
tree branch. We have shown that temperature gradients
can also be an important source of clock skew, causing
spatially correlated variations.
Nonbuffered and buffered clock-tree networks
respond differently to nonuniform thermal maps. In
nonbuffered trees, a hot spot’s relative location in the
tree structure has a high impact on overall thermal
skew. The clock network distribution temperature is dif-
ficult to evaluate because network parasitics come from
resistive components distributed in different metal lay-
ers and at different levels from those of the main power
sources. In buffered trees, the main contributions to
skew are differences in clock tree buffer delay, even if
the overall delay magnitude is interconnect dominat-
ed. In this case, the hot spot’s relative position has much
less impact than in nonbuffered trees. Interestingly, we
have also observed that in buffered trees the hot spot’s
impact on delay can be quantified without computing
the hot spot’s exact thermal spatial profile with respect
to the buffers. This might significantly affect future CAD
tool development.
Our results show the importance of having a tem-
perature-aware clock tree design. The combination of
cross-link insertion and multiple-supply-voltage clock
schemes is likely to provide the best trade-off between
skew reduction and power-conscious design. ■
AcknowledgmentsThis work was partially supported by the Spanish
Ministry of Science and Technology, the Regional
European Development Fund under EU project
TEC2005-05712/MIC, and Intel Research Labs.
References1. S. Borkar et al., “Parameter Variations and Impact on
Circuits and Microarchitecture,” Proc. 40th Design
Automation Conf. (DAC 03), ACM Press, 2003, pp. 338-
342.
2. B. Lu et al., “Process Variation Aware Clock Tree Rout-
ing,” Proc. Int’l Symp. Physical Design (ISPD 03), ACM
Press, 2003, pp. 174-181.
3. G.E. Tellez and M. Sarrafzadeh, “Minimal Buffer Inser-
tion Clock Trees with Skew and Skew Rate Constraints,”
IEEE Trans. Computer-Aided Design of Integrated Cir-
cuits and Systems, vol. 16, no. 4, Apr. 1997, pp. 333-
342.
4. G. Bai, S. Bobba, and N. Hajj, “Static Timing Analysis
Including Power Supply Noise Effect on Propagation
Delay,” Proc. 38th Design Automation Conf. (DAC 01),
ACM Press, 2001, pp. 295-300.
5. C. Yeh et al., “Clock Distribution Architectures: A Com-
parative Study,” Proc. 7th Int’l Symp. Quality Electronic
Design (ISQED 06), IEEE Press, 2006, pp. 85-91.
6. M. Mori et al., “A Multiple Level Network Approach for
Clock Skew Minimization with Process Variations,” Proc.
Asia South Pacific Design Automation Conf. (ASP-DAC
04), ACM Press, 2004, pp. 263-268.
7. A. Deutch et al., “On-Chip Wiring Design Challenges for
Gigahertz Operation,” Proc. IEEE, vol. 89, no. 4, Apr.
2001, pp. 529-555.
423September–October 2006
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
8. Y. Chen and D. Wong, “An Algorithm for Zero-Skew
Clock Tree Routing with Buffer Insertion,” Proc.
European Design and Test Conf. (ED&TC 96), IEEE
Press, 1996, pp. 230-236.
9. A. Rajaram, J. Hu, and R. Mahapatra, “Reducing Clock
Skew Variability via Cross Links,” Proc. 41st Design
Automation Conf. (DAC 04), ACM Press, 2004, pp. 18-23.
10. C.K. Cheng et al., Interconnect Analysis and Synthesis,
Wiley InterScience, 2000.
11. J. Rosselló et al., “A Fast Concurrent Power-Thermal
Model for Sub-100 nm Digital ICs,” Proc. Design
Automation and Test in Europe (DATE 05), vol. 1, IEEE
Press, 2005, pp. 206-211.
12. E. Larsson and Z. Peng, “Power-Aware Test Planning in
the Early System-on-Chip Design Exploration Process,”
IEEE Trans. Computers, vol. 55, no. 2, Feb. 2006, pp.
227-239.
13. A. Bellaouar et al., “Supply Voltage Scaling for Tempera-
ture Insensitive CMOS Circuit Operation,” IEEE Trans.
Circuits and Systems II, vol. 45, no. 3, Mar. 1998, pp.
415-417.
14. S. Bota et al., “Within Die Thermal Gradient Impact on
Clock-Skew: A New Type of Delay-Fault Mechanism,”
Proc. Int’l Test Conf. (ITC 04), IEEE Press, 2004, pp.
1276-1284.
15. J. Pangjun and S. Sapatnekar, “Low-Power Clock Distri-
bution Using Multiple Voltages and Reduced Swings,”
IEEE Trans. Very Large Scale (VLSI) Integration
Systems, vol. 10, no. 3, June 2002, pp. 309-318.
16. S. Bota et al., “Low VDD vs. Delay: Is It Really a Good
Correlation Metric for Nanometer ICs?” Proc. 24th VLSI
Test Symp. (VTS 06), IEEE Press, 2006, pp. 358-363.
Sebastià A. Bota is an associateprofessor in the Electronic Technolo-gy Group of the University of theBalearic Islands, Palma de Mallorca,Spain. His research interests include
very large-scale integration design and test and low-temperature CMOS design. Bota has a PhD in physicsfrom University of Barcelona in Spain.
Josep L. Rosselló is an associateprofessor in the Electronic Technolo-gy Group of the University of theBalearic Islands. His research inter-ests include device and circuit model-
ing, very large-scale integration design and test, andlow-temperature CMOS design. Rosselló has a PhD inphysics from the University of the Balearic Islands.
Carol de Benito is an associateprofessor in the Electronic TechnologyGroup of the University of the BalearicIslands. Her research interests includedevice and circuit modeling and low-
temperature CMOS design. De Benito has an MS inphysics from the University of the Balearic Islands.
Ali Keshavarzi is a research scien-tist at Circuit Research Laboratories(CRL) of Intel. His research interestsinclude low-power/high-performancecircuit techniques and transistor
device structures for future generations of micro-processors. He has a PhD in electrical engineeringfrom Purdue University.
Jaume Segura is an associate pro-fessor in the Electronic TechnologyGroup of the University of the BalearicIslands. His research interests includedevice and circuit modeling and very
large-scale integration design and test. Segura has aPhD in physics from the Polytechnic University ofCatalunya.
Direct questions and comments about this articleto Sebastià A. Bota or Jaume Segura, Electronic Tech.Group, Univ. Illes Balears, Cra. Valldemossa, km. 7.5,07122 Palma de Mallorca, Spain; [email protected], [email protected].
Special ITC Section
424 IEEE Design & Test of Computers
Memberssave 25%
on all conferences sponsored bythe IEEE Computer Society.
Not a member?Join online today!
www.computer.org/publications/
Memberssave 25%
on all conferences sponsored bythe IEEE Computer Society.
Not a member?Join online today!
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:18 UTC from IEEE Xplore. Restrictions apply.
September–October 2006 1
UPCOMING TTTC EVENTS
12th International Workshop on ThermalInvestigations of ICs and Systems
27-29 September 2006Nice, Francehttp://tima.imag.fr/conferences/therminic/Therminic workshops are offered annually to address the
essential thermal questions of microelectronic microstructures,and of electronic parts in general. This year’s workshop dis-cusses issues in thermal simulation, monitoring, and cooling.
21st IEEE International Symposium onDefect and Fault Tolerance in VLSI
Systems (DFT 06)4-6 October 2006Arlington, Va.http://netgroup.uniroma2.it/DFT06/cfp.htmlDFT provides an open forum for discussing defect and fault
tolerance in VLSI systems, including emerging technologies.Topics include all aspects of design, manufacturing, test, relia-bility, and availability affected by defects during manufacturingor by faults during system operation.
International Test Conference (ITC 06)24-26 October 2006Santa Clara, Calif.http://www.itctestweek.org/ITC is the world’s premier conference on the electronic test of
devices, boards, and systems. It covers the complete cyclefrom design verification, test, diagnosis, and failure analysis toprocess and design improvement. At ITC, test and design pro-fessionals can confront the challenges the industry faces andlearn how academia, design-tool and equipment suppliers,designers, and test engineers address these challenges.
IEEE International Workshop on Current &Defect Based Testing (DBT 06)
26-27 October 2006Santa Clara, Calif.http://www.cs.colostate.edu/~malaiya/dbt.html
To develop more appropriate fault models, designers and testengineers must have a good handle on both systematic andrandom defect mechanisms to support the manufacturability ofICs for defect-based test approaches. Because of increasingdesign complexity and process variability, the focus is shifting tosuch approaches. This workshop addresses these issues.
First IEEE International Design and TestWorkshop (IDT 06)
19-20 November 2006Dubai, United Arab Emirateshttp://www.tttc-idt.org/index_files/IDT.CFP.06.pdfThis event provides a unique forum in the Middle East and
Africa region for researchers and practitioners of VLSI design,test, and fault tolerance to discuss new research ideas andresults. IDT will run in conjunction with the annual Innovationsof IT Conference and in parallel with Global IT Exhibitions(GITEX).
7th International Workshop onMicroprocessor Test and Verification
4-5 December 2006Austin, Texashttp://mtv.ece.ucsb.edu/MTVThis workshop brings together researchers and practitioners
from verification and test to discuss today’s difficult challengesin the processor and SoC design environments. It’s the idealenvironment for joint test and verification experiences and inno-vative solutions.
NEWSLETTER EDITOR’S INVITATIONI’d appreciate input and suggestions about the newsletter
from the test community. Please forward your ideas, contribu-tions, and information on awards, conferences, and workshopsto Bruce C. Kim, Dept. of Electrical and Computer Engineer-ing, Univ. of Alabama, 317 Houser Hall, Tuscaloosa, AL 35487-0286; [email protected].
Bruce C. KimEditor, TTTC Newsletter
CONTRIBUTIONS TO THIS NEWSLETTER: Send contributions to Bruce C. Kim, Dept. of Electrical and Computer Engineering, Univ. of Alabama,
317 Houser Hall, Tuscaloosa, AL 35487-0286; [email protected]. For more information, see the TTTC Web page: http://tab.computer.org/tttc/.
TEST TECHNOLOGY TCNEWSLETTER
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:58:13 UTC from IEEE Xplore. Restrictions apply.
Book Reviews
IN THE November-December 2004
Last Byte, I bemoaned the fact that
design has become so complex that no
one person can understand all of it, and
that EDA tools have become so diverse
and complicated that we confine our-
selves to a small subset of their function-
ality. The massive book under review
here, Electronic Design Automation
for Integrated Circuits Handbook, repre-
sents the best way I know to address this
problem.
This two-volume set contains 49 articles on EDA,
ranging from high-level design to technology CAD. The
first volume, EDA for IC System Design, Verification, and
Testing, has five sections: An introductory section out-
lines and summarizes the design process. A section on
system-level design discusses modeling languages,
processor and system modeling, performance metrics,
and system-level power management. The microarchi-
tectural design section describes performance estima-
tion, power management, and design planning at this
level. Six chapters on logic verification cover design and
verification languages, and various verification meth-
ods. The final section, on test, focuses on DFT, test gen-
eration, and analog test.
The second volume, EDA for IC Implementation,
Circuit Design, and Process Technology, focuses on the
second part of the IC design flow. It includes sections
on synthesis, place and route, analog and mixed-signal
design, physical verification, and technology CAD.
Chapters within these sections cover topics such as syn-
thesis, power management at all levels, design rule
checking, design for manufacturability, timing analysis,
noise analysis, and libraries.
I confess that I did not read all the chapters in this
book, wishing to complete the review before we move to
biochips. In fact, when I first received this text, I was cer-
tain that I’d never finish it. But I’m not so sure now. I have
already read more chapters than I’d originally intended,
and I think the reason for this explains why this hand-
book is a success. In most cases, the material covers the
important points without going into so much detail or
length as to be intimidating. Chapters range from seven
pages to 33 pages, with an average of 15 to 20, each
including an extensive list of references. This seemed just
right for the surveys making up this handbook. You can-
not completely learn EDA from a book like this, of course,
but you can learn quite a lot about EDA.
There are three types of chapters throughout the
book. Some are introductory in nature, surveying a
topic such as design flow at a high level. Some target
EDA users, showing the types of tools that are available
and putting them into context. Others target EDA devel-
opers, describing the algorithms underlying the tools,
with information on the benefits of each. Some subjects
are covered from several angles. Most subjects could
be, of course, but that would balloon this work into
three or four volumes. For the most part, I was happy
with the choice of angle; only in a few cases, such as the
chapter on design rule checking, would I have preferred
a more user-oriented approach.
A comprehensive EDA handbookScott Davidson
Sun Microsystems
426 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Reviewed in this issueElectronic Design Automation
for Integrated Circuits Handbook,edited by Louis Scheffer, LucianoLavagno, and Grant Martin (CRCPress, 2006, ISBN 0-849-33096-3,2 vols., 1152 pp., $149.95).
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:23 UTC from IEEE Xplore. Restrictions apply.
One danger of a handbook approach is repetition,
as important subjects tend to get covered more than
once. However, I found very little redundancy in this
book. There was practically none in the sections on test.
The editors must have done an excellent job reviewing
chapter outlines.
The most important thing, though, is how good the
individual chapters are. So, I will give my impressions of
some of the ones I read. The first full chapter is “The
Integrated Design Process and Electronic Design
Automation.” This chapter starts a bit abruptly, but quick-
ly progresses into an excellent overview of the design
process. I’d recommend it to everyone who reads these
volumes. I wish it had some pointers to subsequent chap-
ters, however. Chapter 5, “SoC Block-Based Design and
IP Assembly,” is an excellent tutorial focusing on real
issues, especially in the area of verification.
Chapter 8, “Processor Modeling and Design Tools,”
provides a taxonomy and survey of architecture descrip-
tion languages (ADLs). The taxonomy is excellent,
describing very clearly what ADLs are and what they are
not. However, I would have liked to see more of an
industrial focus in the survey. This chapter had a bit
more of an academic slant than most of the others. The
chapter on “Design and Verification Languages,” on the
other hand, covers commercially available languages
with excellent examples. An outline gives the salient
points of each language, with strong points and weak
points, and includes a taste of how to do coding in each
language. At 28 pages, this is one of the longer chapters,
but hardly a word is wasted.
There are three chapters on test, two of which I’d like
to discuss here. Chapter 21, “Design-for-Test,” is one of
the longest (35 pages) in the book. It contains even
more text than the page count indicates, in fact,
because it includes absolutely no figures or diagrams.
This chapter covers a lot, from the objectives and his-
tory of DFT; through scan, BIST, and compression for
logic testing; to memory test. It ends with a short section
on FPGA test (which could easily have been cut). The
reader of this chapter might have a hard time distin-
guishing which of these concepts are truly important
and which are minor. In addition, there is often too
much detail for an introductory survey. For example,
there is almost an entire page in the section on logic
BIST about structural dependencies and scan chain
lengths. These are issues, but they could have been
eliminated to make the chapter shorter and more
readable. The chapter on “Automatic Test Pattern
Generation” is more developer oriented, with a survey
of ATPG algorithms. It’s somewhat academic, with a
large section on Boolean satisfiability (SAT) solvers for
test generation, but this is balanced by an excellent sec-
tion on applications for ATPGs beyond test generation.
The chapter on “Logic Synthesis” (in volume 2) is 15
pages and has 11 references. It gives a very high-level
view of a well-known subject. I think the author was
right to avoid trying to cover all aspects of this area in
depth, instead pointing the reader to places for further
study. Chapter 6, “Static Timing Analysis,” is one of the
best chapters I read. It is at the right length and depth,
and it provides helpful pseudocode for the major algo-
rithms discussed. Chapter 9, “Exploring Challenges of
Libraries for Electronic Design,” considers not cell
libraries, but IP libraries. At eight pages, it is very short
and superficial. The last three subsections are basical-
ly only outlines.
The last chapter I want to highlight is the one on
“Design Databases.” This chapter is excellent. It targets
users, but displays a deep knowledge of the implemen-
tations of design databases. It is also very readable.
EVERY DESIGN GROUP should have a copy of this
handbook in its library. It is an excellent reference text.
It can also serve as outstanding background reading
for new engineers exposed to some of these areas for
the first time. The material here is better organized and
better written than what could be found on the Web.
Putting together such a high-quality, substantive work
is quite an achievement. I’ll be reading more chapters
for quite some time to come. ■
Direct questions and comments about this depart-ment to Scott Davidson, Sun Microsystems, 910Hermosa Court, M/S USUN05-217, Sunnyvale, CA94085; [email protected].
For further information on this or any other computing
topic, visit our Digital Library at http://www.computer.org/
publications/dlib.
September–October 2006 427
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:23 UTC from IEEE Xplore. Restrictions apply.
Standards
AT A RECENT IEEE Design Automation Standards
Committee (DASC) meeting (http://www.dasc.org/
meetings/2006-07/20060727_DASC_Minutes.doc), we
discussed two interesting standardization proposals:
Rosetta and Esterel version 7. Both are based on tech-
nology that has been under development for a long
time, and both target the formalization of system-level
design and verification. But, otherwise, they take very
different approaches. We hear a lot of talk about move-
ment to more abstract design paradigms. Are these
proposals confirmation of this trend, or are they yet
another false start? Read these brief outlines, and
decide for yourself.
The Rosetta languageSystem-level design involves consolidating informa-
tion from multiple domains to predict the effects of
design decisions. To support system-level design, a lan-
guage must allow heterogeneous specification while
providing mechanisms to compose information across
domains.
The goal of the Rosetta system-level design language
(http://www.sldl.org/standards.htm) is to compose het-
erogeneous specifications in a single semantic envi-
ronment. Rosetta provides modeling support for
different design domains, employing semantics and syn-
tax appropriate for each. Thus, Rosetta lets designers
write individual specifications with semantics and
vocabulary appropriate for their domains. Users com-
pose information across specification domains by defin-
ing interactions between them. To achieve this end,
Rosetta provides a collection of domains, called facets,
for describing system models. Interactions provide a
mechanism for defining constraints between domains.
Facets define system models from one engineering
perspective. Users can write facets by extending a
domain that provides vocabulary and semantics for the
model. Using the design abstractions that its domain
provides, a facet describes a system’s requirements,
behavior, constraints, or function. Domains provide
vocabulary and semantics for defining facets. Each
domain provides mechanisms for describing data, com-
putation, and communication models appropriate for
one area of systems design.
Interactions define how information from one engi-
neering domain is reflected in another. Domains don’t
share a common set of semantics, but rather share infor-
mation when necessary using interactions. Thus,
Rosetta defines each design facet by using appropriate
design abstractions from that facet’s domain rather than
forcing a common design model across all facets.
Facet algebra expressions use facets, domains, and
interactions to compose models into system descrip-
tions. Users can evaluate local design decisions from a
systems perspective by using interactions to understand
how these decisions impact other system domains.
Work on Rosetta is ongoing, with this Web site serv-
ing as a clearinghouse for language definition and
usage information. The various Web pages provide def-
inition and tutorial documents, as well as examples and
standardization information.
The Esterel languageEsterel (http://www.esterel-technologies.com) is a for-
mal synchronous language for unambiguously specify-
ing and implementing hardware and software embedded
systems. Esterel was initially developed in academia, with
strong cooperation by industrial users. The Esterel devel-
oper community has developed the current Esterel ver-
sion 7 language as a proposed standard. The developers
derived this version from the previous Esterel v5 acade-
mic version by adding new features necessary for hard-
ware design. Because of the formal character of the
language and semantic kernels, you can fully and faith-
DASC sees moves toward formality in designVictor Berman
Cadence Design Systems
428 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:25 UTC from IEEE Xplore. Restrictions apply.
fully translate Esterel programs either to hardware circuit
descriptions written in conventional hardware descrip-
tion languages (HDLs) or to equivalent conventional soft-
ware programs, with the very same behavior in both
cases. It’s also possible to translate Esterel programs to
input for formal-verification systems (for example, model
checkers) so that verified properties will be guaranteed
to hold in hardware and software implementations.
The proposed project will create an initial IEEE stan-
dard based on Esterel v7, ensuring unambiguous defi-
nition of the language syntax and semantics and,
therefore, full interoperability between Esterel-based
program implementation, static analysis, and verifica-
tion tools. The output of the project will be the standard
Esterel Language Reference Manual.
This project’s purpose is to provide the EDA, semi-
conductor, and systems-design communities with a well-
defined, official IEEE definition of the Esterel language.
This is necessary because Esterel is not a minor variant
of existing languages that could be defined with an
addendum to existing standards. Rather, Esterel is
unique in the way it formally merges sequencing as typ-
ically only software languages do, uses single-clock or
multiclock concurrency as typically only HDLs do, and
employs unique temporal primitives that drive the life
and death of activities within programs. Esterel also sup-
ports formal definition of data paths based on arbitrary
precision and exact arithmetic, bit vectors, and arrays of
arbitrary dimensions and types. These language primi-
tives facilitate, by at least one order of magnitude, the
expression of complex behavior, providing the user with
unmatched clarity and productivity for specification,
design, and verification activities. Esterel lets you obtain
equivalent hardware and software targets from a single
source, so hardware simulation using software is more
tenable. Esterel also lets you perform late choices
between hardware and software final implementation.
The key technical objective is to stabilize and fully
define the language’s syntax and semantics. The tech-
nical aspects to be scrutinized concern the data path’s
arbitrary-precision and exact-arithmetic features, the
temporal statements particular to Esterel, and the life
and death of activities and signals. I have no doubt that
developers can solve all the involved questions in a
completely rigorous way, thus providing a fully solid
basis for both users and tool builders, and ensuring full
interoperability between tools from diverse origins.
A derived objective is to ensure that it’s possible to
effectively compile a given Esterel design to other stan-
dardized languages such as VHDL, Verilog, SystemVerilog,
C, and SystemC, with the same guaranteed behavior for
all these different targets. This will require checking that
all Esterel constructs are synthesizable in hardware or soft-
ware, up to well-identified limitations of back-end syn-
thesis or compilation tools. ■
Direct questions and comments about this depart-ment to Victor Berman, Cadence Design Systems, 270Billerica Road, Chelmsford, MA 01824; [email protected].
September–October 2006 429
Get accessto individual IEEE Computer Society
documents online.
More than 100,000 articles
and conference papers available!
US$9 per article for members
US$19 for nonmembers
http://computer.org/publications/dlib/
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:25 UTC from IEEE Xplore. Restrictions apply.
CEDA Currents
A Conversation with Robert BraytonOn the occasion of Robert
Brayton receiving the 2006 EDAA
(European Design and Automation
Association) Lifetime Achievement
Award and the 2006 IEEE Emanuel
R. Piore Award, Karti Mayaram
from CEDA Newsletter spoke to
him about his career, achieve-
ments, and moments of inspira-
tion. Brayton also had some practical advice for young
researchers.
It was a pleasure talking with Bob Brayton. All of us
who have been affiliated with the EDA field are well
aware of the many fundamental contributions he’s
made. His impact on the industry has been tremendous.
The early yearsBob grew up in Ames, Iowa, and attended Iowa State
University, where he graduated with a BS in electrical
engineering in 1956. After a 6-month stint in the US
Army, he went to MIT to pursue a PhD in mathematics.
He chose math because he thought he lacked mathe-
matical foundations and he had a strong interest in this
field. Bob believes this unique combination of an
undergraduate EE degree and a PhD in math has been
a major contributor to his success. He not only has a
good understanding of the application area but also a
strong foundation in mathematical tools to solve rele-
vant problems.
A year before completing the PhD program, Bob
accepted a summer job at IBM Research (T.J. Watson
Research Center) in the Mathematical Sciences
Department. This was such a wonderful experience that
he joined the department upon completing his PhD.
Looking back, he realizes that both MIT and IBM
Research were very influential in his life. He had the
opportunity to work with exceptional people. Moreover,
at IBM Research he had freedom to work on the
research topics that most interested him. This combi-
nation of wonderful colleagues and flexibility to pursue
appealing subjects helped shape Bob’s career.
Back to schoolAfter spending 26 years at IBM Research, Bob start-
ed a second career as a professor in the Electrical
Engineering and Computer Sciences Department at the
University of California, Berkeley, in 1987. He had spent
a year at UC Berkeley on sabbatical from IBM Research
in 1985, during which he had worked with some very
talented students on logic synthesis and the develop-
ment of industrial-quality tools. When he returned in
1986, IBM Research was offering early retirement, which
he decided to accept, turning his sights toward acade-
mia. When UC Berkeley—his top choice—made him
an offer, he accepted, and he has continued to make
important contributions in logic synthesis.
The ‘Aha!’ momentThe most thrilling moment in his career was the
development of the Sparse Tableau Approach (STA) for
assembling and solving circuit equations. He and his col-
league, Gary Hachtel (now a professor at the University
of Colorado in Boulder), were having a conversation
after a game of tennis. They’d been thinking about an
elegant solution for assembling circuit equations for
some time. Suddenly, all the pieces of the puzzle were
falling into place. This was the start of STA. Before their
work, circuit equation assembly required different kinds
of manipulations and reductions. STA provided a sim-
ple way to assemble electrical-circuit equations. There
was no need for reducing equations; with STA, you
could directly apply Gaussian elimination. This work
was one of the cornerstones of IBM’s circuit simulator,
Astap (Advanced Statistical Analysis Program).
Some thoughts on EDA developmentsI also asked Bob what he thought were the most
exciting developments in EDA. He said there is always a
progression of things, and newer developments over-
shadow some of the developments of the past. But he
named two topics that he saw as step functions. One
was binary decision diagrams. BDDs provided a way for
efficiently manipulating large logic equations, and they
proved important for logic synthesis and verification.
The other topic was the work on solving stiff differential
430 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:29 UTC from IEEE Xplore. Restrictions apply.
equations, performing equation assembly and solution,
and integrating these techniques as packages—such as
the Astap and Spice circuit simulators—for solving cir-
cuit problems.
What lies aheadBob named deep-submicron and nanometer design
as the greatest challenges facing EDA. We no longer
have the luxury of working on independent problems
that can be solved separately. Electrical interference
and manufacturing variations make very low nanome-
ter CMOS design a difficult problem. Then there are the
new technologies that will require effective design tools.
Some practical adviceHis advice to young researchers in a challenging
funding environment is to work on relevant problems
and to keep putting out proposals. Being able to solve
relevant problems in interesting ways can be a big moti-
vator. This is what motivates Bob. He is able to identify
such problems and find interesting ways to solve
them—just like putting together the pieces of a puzzle.
The whole notion of fads driving research funding and
publications is not appealing to him. Such an approach
takes away resources from basic research. There should
be more emphasis on fundamental work.
Asked how he felt about receiving the two recent
awards, he replied, “surprised.” He also thanked the
people who took the time and effort to put together the
nominations for these awards. Bob is an extremely mod-
est and wonderful person who has made seminal con-
tributions to EDA. We all are happy to see him get the
recognition that he deserves.
Upcoming Research FundingOpportunities
US Department of DefenseExperimental and Theoretical Development of
Quantum Information Science
Deadline: 11 December 2006
http://www.arl.army.mil/main/Main/
DownloadedInternetPages/CurrentPages/
DoingBusinesswithARL/research/QC06Final6Jul06.pdf
National Science FoundationPower, Controls and Adaptive Networks (PCAN)
Deadline: 7 September - 7 October 2006
http://nsf.gov/funding/pgm_summ.jsp?pims_id=13380
Foundations of Computing Processes andArtifacts (NSF 06-585)Deadline: 10 October 2006
http://www.nsf.gov/pubs/2006/nsf06585/
nsf06585.htm
Upcoming CEDA Events
Please see these Web sites for upcoming events:
■ CODES+ISSS: http://www.esweek.org
■ Nano-Net: http://www.nanonets.org
■ FMCAD: http://www.cs.utexas.edu/users/hunt/FMCAD
■ ICCAD: http://www.iccad.com
■ PATMOS: http://www.patmos-conf.org
■ VLSI-SoC: http://tima.imag.fr/conferences/
VLSI-SoC06
CEDA Distinguished Speaker Reception
The Council’s Distinguished Speaker Series features
detailed presentations of the most significant research
results in EDA over the past year, as demonstrated by
awards at our top conferences and journals. The sec-
ond presentation in this series took place at the
Moscone Center in San Francisco during DAC 2006. The
featured article was by Janusz Rajski, Jerzy Tyszer, Mark
Kassab, and Nilanjan Mukherjee, the authors of this
year’s IEEE Transactions on Computer Aided Design
Donald O. Pederson Best Paper Award. Their presenta-
tion, which covered several aspects of VLSI testing, had
significant tutorial value and will be archived at the
Council’s Web site (http://www.c-eda.org).
CEDA Currents is a publication of the IEEE Coun-cil on Electronic Design Automation. Please sendcontributions to Kartikeya Mayaram ([email protected]) or Preeti Ranjan Panda ([email protected]).
September–October 2006 431
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:29 UTC from IEEE Xplore. Restrictions apply.
The Last Byte
THE 2006 INTERNATIONAL TEST CONFERENCE
theme encourages us to consider ways for “getting more
out of test.” How about getting more out of the
International Test Conference? Technical paper sessions
are the heart and soul of ITC—and there’s something
there for everyone, from classic microprocessor and ATE
sessions to delay, test compression, test power, and more.
But there are plenty of ways to get more out of ITC.
We’ve changed the structure of ITC and Test Week
(22-27 October) for this year’s new site, Silicon Valley.
The new format offers some great opportunities. First,
be sure to arrive in time for Monday’s test Q&A panel
(23 October), starting at 4:45 p.m. Come hear the
experts discuss diverse test topics unrehearsed. Remind
yourself that even the experts don’t have all the answers;
there’s still plenty to debate on every topic.
Get up the next morning to attend the Tuesday ple-
nary, which starts at 9:30 a.m. The plenary kicks off a
day specially organized to include material for those
who manage test. Don’t miss Tuesday afternoon’s exec-
utive test panel, which boasts an impressive array of
participants sharing unique perspectives on the cost of
quality. Watch the users of silicon debate the providers,
and find out their views on how we can get more out
of test. Afterward, enjoy the welcome reception, where
you can meet friends and colleagues and find out
which panelists’ perspectives they plan to take home
with them.
Be sure to schedule enough time to visit the exhibit
floor. How else can you improve your standing with
your children by bringing home all sorts of nifty gadgets
and at the same time find out about the latest offerings
from the key vendors in test-related fields? Don’t forget
the free lunch Tuesday, Wednesday, and Thursday on
the exhibit floor. If the line looks long, grab a few col-
leagues and discuss the latest developments you’ve
heard. Afterward, take advantage of an opportunity to
hear industry authorities address your favorite topic and
mine—test, of course—at each day’s invited address,
conveniently located adjacent to the exhibit hall.
ITC has only one regular panel slot this year, so be
sure not to miss it—and don’t forget the wine-and-
cheese party afterward. In addition to these treats, we
have our usual outstanding set of papers, so you can
learn what advances are on the way. We also have an
interesting lecture series, providing you with informa-
tion that you can take back to work to use right away.
This year marks ITC’s first visit to Silicon Valley. For
those who work in the area, this is the easiest ITC to
attend yet. Getting to the Santa Clara Convention Center
might be a shorter commute than going to work—and
parking is free.
I LOOK FORWARD to seeing you at ITC. Let’s learn some,
do some business, have some laughs, and get inspired to
get more out of test. You can find out all about ITC and
Test Week at http://itctestweek.org. ■
Anne Gattiker is a research staff member at IBMAustin Research Lab. Contact her at [email protected]
Direct questions, comments, and contributionsabout this department to Scott Davidson, SunMicrosystems, 910 Hermosa Court, M/S USUN05-217,Sunnyvale, CA 94085; [email protected].
Getting more out of ITCAnne Gattiker
IBM Austin Research Lab
432 0740-7475/06/$20.00 © 2006 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:31 UTC from IEEE Xplore. Restrictions apply.
Looking for accessible tutorials on software development, project management, and emerging technologies? Then have a lookat ReadyNotes, another new product from the IEEE Computer Society.
ReadyNotes are guidebooks that serve as quick-start references for busy computing professionals.
Available as immediately downloadable PDFs (witha credit card purchase), ReadyNotes sell for $19 or less.
Here now from the IEEE Computer Society
IEEE ReadyNotes
www.computer.org/ReadyNotes
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:34 UTC from IEEE Xplore. Restrictions apply.
IEEE DESIGN & TEST OF COM
PUTERSSeptem
ber-October 2006 Electronic System
-Level DesignVOLUM
E 23 NUMBER 5
CAREERACCELERATORFORUMAUTUMN 2006
improve your career
in the long run.Changing Lanes Webinars12 October 2006 Visit our online Conference Pavilion and
participate in our FREE LIVE WEBINARSon 12 October, featuring renowned career
specialists, educators and prominent tech-
science professionals. Get insight and advice
on advancing your degree, getting ahead in
your field, landing your dream job and even
changing your career. Plus, this interactive
event makes it possible for you to question
our expert panel — all from the convenience
of your desktop!
Interactive Exhibition Hall Visit our Online Exhibition Hall to view presen-
tations by leading colleges and universities.
Find key information on graduate and post-
graduate studies, take online campus tours,
get career counseling advice and download
admissions applications. Search at your
own pace 24/7.
Featuring panelists in the fields of Engineering, ComputerScience and MBA Education
Webinar 1: The Risks and Rewards of Entrepreneurship in Today’sTechnology EnvironmentTIME: 12:00 PM ET/9 AM PT/16:00 GMT
Topics and Speakers TBA
It worked for Bill Gates, Andy Grove, Irwin
Jacobs, and Scott McNally. Why not you? This
may be just the time to start your own business
or take an executive position at a start-up.
Experienced executives and experts in
entrepreneurship will tell you how to get started,
what to expect, and how to leverage education
and work experience to ensure success.
Webinar 2: Programming Your Future in Computer ScienceTIME: 1 PM ET/10 AM PT/17:00 GMT
Topics and Speakers TBA
Wireless Internet and security are currently the
“hot” technologies, and require a more diverse
level of skills in computer science. But there
are also opportunities for positions as systems
analysts for those with degrees in related fields.
Industry experts will tell you what employers
are looking for, how to match your training
and skills with the best jobs, and how to use
relevant work experience to your advantage.
Hit your stride at this innovative online event. Sign up today at
www.spectrum.ieee.org/caforum
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 07:06:35 UTC from IEEE Xplore. Restrictions apply.