fault localization and testability approaches for fpga fabric...

https://doi.org/10.1007/s10836-019-05840-w

Fault Localization and Testability Approaches for FPGA FabricAware Canonic Signed Digit Recoding Implementations

Ayan Palchaudhuri1 · Anindya Sundar Dhar1

Received: 23 January 2019 / Accepted: 23 October 2019© Springer Science+Business Media, LLC, part of Springer Nature 2019

AbstractCanonic signed digit (CSD) recoding finds applications in real time VLSI signal processing. In this paper, we have proposedoptimized FPGA implementations of CSD recoding techniques starting from a two’s complement input and a redundantsigned digit (SD) input. The architectures exploit the fast, hardwired fabric resources of the FPGA logic elements to giverise to a circuit realization optimized for speed and area. The underutilized logic elements configuring the original designare further targeted to append suitable fault localization circuitry without any compromise in speed and area. This makesthe designs attractive for implementation in an era where reliability issues of semiconductor chips are on the rise owingto extensive miniaturization of physical device dimensions. Primitive instantiation and constrained placement based designapproach allow us to conveniently select the logic area for mapping or to detect and bypass any physical FPGA slicecoordinates if deemed faulty.

Keywords Canonic signed digit · Redundant signed digit · FPGA · Fault localization · C-testability · Monitor signals ·Alternating logic

1 Introduction

A minimal weighted, positional signed number system canbe achieved using canonic signed digit (CSD) representationas it guarantees sparse representation of a number with-out adjacent non-zero digits [1]. Such representation findsseveral applications in real time VLSI implementations ofdigital signal processing algorithms. Hardware implementa-tion for conversion to CSD number system starting from anyunsigned, two’s complement or any other signed digit (SD)number without the sparsity characteristic, is the first andfundamental step in the domain of VLSI signal processing.Previous research articles on such conversion architectureshave been conceived primarily keeping CMOS or ASIC

Communicated by:Responsible Editor: L. Cassano

� Ayan [email protected]

Anindya Sundar [email protected]

1 Department of Electronics and Electrical CommunicationEngineering, Indian Institute of Technology Kharagpur,Kharagpur, India

implementations into consideration [2–7]. FPGA specificstructural optimization strategies for the same conversioncircuits have not been elaborated in previous literature,except for our previous work in [8], which we have extendedfurther in this paper with suitable appendage of fault local-ization circuitry without any hardware overhead or speedpenalty, and elaborated on other testability measures.

With rapid advances in VLSI fabrication technology,device size miniaturization has been made possible.However, certain reliability challenges have cropped upalong the path, where issues such as bias temperatureinstability, dielectric breakdown, hot carrier injection andaging related factors [9–14] needs to be further addressed byVLSI designers. Methodologies for identification of stuck-at faults which might have crept in for non-scan sequentialcircuits with feedback loops have been developed as atool in [15], for generating tests yielding maximum faultefficiency of embedded logic modules. Identification ofregister input logic stuck-on faults starting from the RegisterTransfer Level (RTL) description of a circuit has beendiscussed in [16]. Model-checking for detecting untestablestuck-at faults at RTL for synchronous sequential circuitsis proposed in [17]. Testability analysis and test patterngeneration for single event upsets (SEUs) in SRAM basedFPGAs have proved to be of utmost importance [18–20].

Journal of Electronic Testing (2019) 35:779–796

/ Published online: 13 2019November

http://crossmark.crossref.org/dialog/?doi=10.1007/s10836-019-05840-w&domain=pdf

http://orcid.org/0000-0002-4338-6404

mailto: [email protected]

mailto: [email protected]

Readback mechanism in FPGAs simplify detection of SEUsby enabling read back operation of the configuration datathat typically constitute the configuration memory contentof the FPGA logic elements. This check necessitates asimulator design for extracting the state of the circuitin order to compare for equality with a desired goldenresponse. Care should be taken that the clock needs tobe de-asserted at the correct cycle so that under faultfree conditions, the exact snapshot of the logic state ofthe configured FPGA after a designated clock cycle maybe captured without any error [21]. Significant researchefforts have also been tailored towards error detection, faultlocalization and fault tolerance in FPGAs [22–29], manyof which are applicable for permanent faults, involvingboth their detection and localization. Methodologies arebeing proposed to facilitate in-system testing of XilinxFPGA logic [23]. A revamp in FPGA architecture with userconfigurable scan chain is only a proposed solution fromXilinx, but yet not available in form of a manufacturedchip. Existing solutions for error detection on FPGAcentric architectures primarily rely on duplicating existinginstances of configured LUTs and comparing the duplicatedresponses by a comparator [26]. However, it incurs long,cascadable and programmable routing delay with twiceor fourfold increase in area, along with inviting speedbottlenecks [12, 30].

Circuit implementations on FPGA often call for a thor-ough understanding of the underlying fabric for optimizedarchitecture mapping. Following a design methodology ofdirectly configuring the physical FPGA primitives, knownas primitive instantiation [31], the circuits may be realizedin an optimized fashion, often starting from first princi-ples [32]. Such a practice allows the designer to control theplacement coordinates on which the configuration logic isto be mapped which can eventually lead to a high speedimplementation. The reason behind this may be attributed tomapping the subcircuits within close physical adjacency toreduce interconnection delays. Faulty FPGA zones can alsobe conveniently traced out if placement coordinates of thecircuit implementation are known a priori. Another advan-tage emanating out of this approach is that the designercan conveniently bypass certain logic nodes from gettingconfigured by the CAD tool, if deemed faulty, by issuingappropriate placement coordinates for mapping a design. Asthe physical primitives of the FPGA are of a fixed size andcapacity, it might so happen that the configured primitivemay be partially utilized for realizing the original design.This may give room for appendage of additional func-tionalities into the original design without any discerniblecompromise in speed and area. Such practice has beenadopted in our CSD recoding circuits, where we appendfault localization strategies into our CSD recoding circuitswithout any change in area or speed compared to the orig-

inal design. As an extension to [8], two additional variantsof FPGA amenable conversion architectures starting fromredundant SD to CSD input has been elaborated throughcertain architectural or encoding alterations. Correspondingfaster look-ahead (LA) techniques to achieve 4× speed inthe LA generator as compared to 2× fast LA proposed in [8]has been discussed for two’s complement to CSD recoding.

FPGA synthesis tools at times may produce the bestoptimized architecture when it comes to describing acertain functionality for a certain subset of high leveloperators using which adder, subtractor and comparatorlogic may be expressed. However the CSD recodingfunctions are not very commonplace functions, neithercan be expressed using high level operators. Additionally,manual optimization exercise can be made possible onlyif the gate level design matches closely with the FPGAlogic slice framework, and hence forms a crucial designoptimization step in designing efficient CSD recoders.Testability and fault localization structures as deemedamenable have been studied for all the configurations. Thecontributions of this paper are as follows:

– FPGA amenable architectures for CSD recoding oftwo’s complement input and redundant signed digit(SD) input for ripple carry (RC) and look-ahead (LA)modes have been proposed. A total of three variantsof the conversion architectures have been elaborated inthis paper.

– Choice of appropriate binary encoding of redundantSD inputs for FPGA optimized conversion circuitry hasbeen exclusively studied upon in this paper, such thatthe circuit framework can be mapped on to the fastFPGA hardwired fabric resources.

– Fault localization and testability approaches namely,signal monitoring, C-testability, self dual complementbased alternating logic and scan based design havebeen studied for every architecture and their subsequentvariants, keeping in mind that no hardware overheadis incurred or speed is worsened, compared to theoriginal circuit without the fault localization capability.This is often governed by the nature of FPGAcircuit implementation for the original recoding circuitmapping.

– The state-of-the-art conversion architectures for CSDrecoding have been comfortably outperformed by ourproposed FPGA fabric aware architectures in terms ofslice count (area) and speed.

The remaining paper is structured as follows. InSection 2, we present our proposed FPGA implementationsof the CSD recoding circuits. Section 3 elaborates onthe fault localization principles and testability measuresfor our proposed CSD recoding implementations. TheFPGA implementation results and subsequent insights into

J Electron Test (2019) 35:779–796780

the design methodology are illustrated in Section 4. Theconcluding remarks are presented in Section 5.

2 FPGA Implementations of CSD RecodingArchitectures

FPGAs, typically from Xilinx, comprise ConfigurableLogic Blocks which are fractured into slices [33]. Eachslice contains four dual output LUTs that can be configuredas a 6-input 1-output or 5-input 2-output logic entities.LUT outputs may be optionally fed to the inputs of threewide function multiplexers (WFMs), F7AMUX, F7BMUXand F8MUX for realizing higher input functions takingadvantage of the intra-slice routing fabric which providesa minimum delay path from the output of the LUTs to theinput of the WFMs. Additionally, there is a carry chaincomprising of hardwired multiplexers and XOR gates tocompute logic functions and most importantly facilitateaccelerated signal propagation, provided the mapped circuitconforms to the appropriate Boolean logic specificationessential for carry chain mapping. Several such carry chainsmay be cascaded for supporting higher input wordlengths.The sequential logic element comprises eight D flip-flops (FFs) in every slice that makes compact pipelinedimplementations favourable for FPGAs. The Virtex-7 slicelogic is depicted in Fig. 1. FPGA fabric conscious design,that categorically instantiates primitives for configurationduring the design entry phase prior to synthesis, has beenwidely adopted for realization of arithmetic circuits [31].This is because the input description of an arithmeticlogic strongly influences the quality of implementationon FPGA platforms [34]. Optimized implementation callsfor selection of the appropriate arithmetic algorithm andnecessary Boolean logic reformulations to conform to thein-built circuit topology specifications of the FPGA slicearchitecture, which a high level behavioral code may notalways be able to infer. Important datapath and controlarchitectures have been realized following such a designparadigm to ensure high performance implementation [35–40]. In all these works, a 6-input LUT, which is the state-of-the art, has been deployed for the high speed architectureconfigurations, often involving manual optimizations.

Our design philosophy is generic where Xilinx FPGAshave been chosen as a representative example. It may alsobe argued that integrating high-end CPUs with FPGAs mayultimately lead to the modification of the FPGA fabric.However, the direction of current evolution trend of thearchitectural enhancements in Xilinx FPGAs hint towardsincrease of logic capacity and routing fabric without anycompromise in the existing functionalities. Example may be

Fig. 1 Slice architecture for Xilinx Virtex-7 FPGAs [33].

cited of migration from 4-input LUTs (several generationsout of date) to 6-input LUTs, which further enhances logiccompaction, thereby ensuring lower area and higher speed,along with providing for added leverage to design flexibility.

2.1 Ripple Carry Based Architecture for Convertinga Two’s Complement Input to its CSD Representation

CSD encoding rules adopting the binary encoding of thedigits 0, 1 and -1 (or 1) as CSi = {CSd

i , CSsi } “00”,

“01” and “10” respectively [3] have been listed in Table 1.Assuming M as the two’s complement input, the logic canbe derived as a cascade of two stages where the first stagecomputes the signal si using the carry chain as si+1 =Mi+1(Mi+1 ⊕ Mi) + (Mi+1 ⊕ Mi)si , using whose outputswe compute the CSD recoded bits in the second stageof logic as CSd

i = (si ⊕ Mi)Mi+1 and CSsi = (si ⊕

Mi)Mi+1, where a single LUT computes the ith CSD digitcomprising of two bits [8]. The outputs of the first stage areregistered using all the intra slice FFs, thereby facilitatingpipelining. The final recoded outputs can be registered insimilar way. A pipelined FPGA implementation of a RC

J Electron Test (2019) 35:779–796 781

Table 1 CSD encoding rules

Mi+1 Mi si−1 CSi si Comments (Status upto ith position)

0 0 0 0 0 String of 0’s

0 0 1 1 0 Mi = 0 with a previous carry,

ith CSD output digit is 1

0 1 0 1 0 Single 1 amidst zeros

0 1 1 0 1 String of 1s from ith index

to a lower index value

1 0 0 0 0 String of 0s from ith to a lower index

1 0 1 1 1 String of 1s from (i − 1)th to lower index,

0 at index i, and 1 beyond

1 1 0 1 1 Beginning of 1s from ith index

1 1 1 0 1 String of 1’s

based architecture for converting a two’s complement inputto its CSD representation is shown in Fig. 2.

2.2 Look Ahead Based Architecture for Convertinga Two’s Complement Input to its CSD Representation

From the RC based pipelined CSD recoding architectureof Fig. 2, it is discernible that the 6-input LUTs drivingthe carry chain inputs have only two of its inputs utilized.This leaves a sufficient margin to accommodate additionalinputs, which has been exploited to our advantage forspeeding up signal propagation. Speeding up by a LAbased technique for FPGA based adder architectures was

Fig. 2 Pipelined implementation of RC based architecture forconverting a two’s complement input to its CSD representation

introduced in [38]. We adopt a similar philosophy for ourCSD recoding implementations. The CSD recoding circuitis partitioned into equal halves, each half accepting one halfwordlength of the two’s complement number as its inputs.The upper half of the first logic level of the recoder receivesits carry equivalent signal input from the fast carry/signalLA generator as shown in Fig. 3a. Here, the LA generatorcan speed up the signal generation using the carry chainby a factor depending on the availability of vacant inputsin a configured LUT compared to the RC method [41]. InFig. 2, each configured LUT in the first level of logic hastwo of its inputs utilized. Twice as fast signal generationis feasible as shown in Fig. 3b, where every LUT hasthree of its inputs utilized. Hence each multiplexer of thecarry chain computes si+1 from si−1 governed by si+1 =(Mi+1 ⊕ Mi)(Mi ⊕ Mi−1)(Mi+1 + Mi−1)Mi + (Mi+1 ⊕Mi)(Mi ⊕ Mi−1)si−1. This is made possible by coalescingthe logic spread across two LUTs and two multiplexers ofthe carry chain in the RC method of Fig. 2 into a single LUTand carry chain multiplexer pair. In a similar fashion, thesignal generation can be made four times as fast followingthe scheme shown in Fig. 3c by utilizing all the unutilizedinputs of the configured LUTs. The logic now computess4 = (M4 ⊕ M3)(M3 ⊕ M2)(M2 ⊕ M1)(M1 ⊕ M0)(M4

(M3 +M2M1)+M3(M2 +M1M0))+ (M4 ⊕M3)(M3 ⊕M2)(M2 ⊕ M1)(M1 ⊕ M0)s0 using a single LUT and onemultiplexer of the carry chain.

2.3 RC Based Architecture for Convertinga Redundant Signed Digit Input to its CSDRepresentation

Redundant SD to two’s complement recoding has beenextensively studied for ASIC and CMOS based implemen-tations [42–47], without much of significant insight intoFPGA based designs. Optimized FPGA implementations

J Electron Test (2019) 35:779–796782

Fig. 3 FPGA implementations of look ahead architectures for CSD recoding

for converting a redundant SD input into its CSD equivalentcalls for a two-step conversion which converts a redun-dant SD input to its two’s complement number followingwhich its CSD equivalent is generated [48]. This two stepconversion can facilitate maximal usage of the fast, hard-wired FPGA resources and permit a three stage pipelinedimplementation. The first step conversion from redundantSD to two’s complement can be approached in differentways, however the binary encoding for the signed digits dic-tates the circuit logic, which in turn plays a crucial role indeciding whether the circuit will conform to the carry chainmapping.

– Approach-I: A 2-bit (Hi, Gi) two’s complementencoding for the signed digits -1, 0 and 1 is adopted.The positive and negative magnitudes are distinctlyseparated and a two’s complement subtract operation isperformed. The carry signal si as evaluated by the carrychain fabric which is governed by a recursive logicfollowing a multiplexer based realization as si+1 =(Gi + Hi)Gi + (Gi + Hi)si . The XOR gates of thecarry chain evaluates the two’s complement outputas Mi = (Gi + Hi) ⊕ si . The corresponding logicimplementation is shown in Fig. 4a.

– Approach-II: For the same encoding strategy, adifferent algorithm was proposed in [42] that wassuitable for an all-NAND implementation in ASICthat facilitates VLSI testing. However, the same circuitcan be mapped onto the FPGA with a departure

from the existing methodology that the carry chainmultiplexers is made to evaluate an inverted carry signalfor amenability to carry chain mapping as si+1 =GiHi + Hi si . The final two’s complement outputremains unaffected as the XOR gate of the carry chaincomputing the same has both of its inputs being drivenby complemented values as Mi = Hi ⊕ si = Hi ⊕ si .This scheme is illustrated in Fig. 4b.

– Approach-III: This approach encodes 0, 1 and -1 as(Gi, Hi) = 01, 11 and 00 respectively following theformula (−1)(1−GiHi)(Gi − Hi + 1) or as 10, 11 and00 respectively, governed by (−1)(1−GiHi)(Hi − Gi +1). This methodology also calls for generation of acomplemented carry signal as si+1 = HiGi(Hi �Gi) + si(Hi ⊕ Gi). The XOR gates compute thetwo’s complement output as Mi = Gi ⊕ Hi ⊕ si =Gi ⊕ Hi ⊕ si as shown in Fig. 4c.

The conversion circuitry starting from a redundant SDinput to its CSD output is the ensemble of the architecturesthat converts redundant SD to two’s complement, whichis finally recoded to CSD. The first step conversion canbe realized using any one of the three approaches. Thecorresponding architecture following Approach-I is shownin Fig. 5. However, it can be conveniently replaced byApproach-II and Approach-III based designs, each of whichare at par with each other with respect to ease of FPGAimplementation. All the encoding schemes discussed abovehave been selected keeping FPGA centric logic function

J Electron Test (2019) 35:779–796 783

Fig. 4 FPGA based architecture design for conversion of redundant SD to two’s complement using three approaches

mapping in mind. However, the situation may be slightlydifferent if we adopt an encoding strategy for 0, 1 and -1as 00, 01 and 10, as was adhered to in [45]. It may beverified that such encoding scheme will give rise to a circuitimplementation which will not be able to harness the carrychain fabric for realization.

2.4 Look Ahead Based Architecture for Convertinga Redundant SD Input to its CSD Representation

To obtain the LA based equivalents of the RC basedarchitectures, we need to implement the LA versions ofthe RC architectures that convert a redundant SD input

Fig. 5 Three stage pipelined RC based redundant SD input to CSD recoder

J Electron Test (2019) 35:779–796784

to two’s complement and a two’s complement to CSDrepresentation individually. The LA based equivalent forthe RC based architectures shown in Fig. 4 is depicted inFig. 6. Corresponding to Approach 1, the signal computedusing a single LUT and carry chain multiplexer paircomputes si+1 = Gi(Hi + Gi−1Hi−1)(Gi + Hi +Gi−1 + Hi−1) + (Gi + Hi + Gi−1 + Hi−1)si−1. Similarlyfor Approach 2, LA is governed by si+1 = (GiHi +Gi−1Hi−1Hi)(Hi + Hi−1) + (Hi + Hi−1) si−1. The LAgeneration for Approach 3 is governed by si+1 = (GiHi +(Gi ⊕ Hi)Gi−1Hi−1)(Gi ⊕ Hi)(Gi−1 ⊕ Hi−1) + (Gi ⊕Hi)(Gi−1 ⊕ Hi−1)si−1. The LA based architectures forthe next stage of computing is implemented identically asshown in Section 2.2. The LA generator for the redundantSD input to two’s complement converter can maximallyachieve a speed up of two for signal propagation owingto the fixed size capacity of the LUTs. Hence, we alsochoose the LA configuration that accelerates the signalpropagation by a factor of two for the final CSD recodingstage as depicted in Fig. 3b instead of that depictedin Fig. 3c, to equalize the delay of the critical paths

across the two pipeline stages computing carry equivalentsignals.

3 Fault Localization Principlesand Testability Measures for the RecodingArchitectures and their FPGAImplementations

Design considerations for fault localization and testabilityapproaches almost go hand in hand with the original design.Often, the original gate level design may have to be re-worked upon for the ease of inserting fault localizationcircuitry. As a variety of fault localization approaches mayexist, choice of a specific approach is often dependent uponthe design application, as speed and area bottlenecks mayarise. The nature of the original design implementation alsocalls out for the choice of an appropriate fault localizationcircuitry. In our designs, we have addressed inserting thefault localization circuitry and discussed various testabilityapproaches in a manner so that there is no area overhead or

Fig. 6 FPGA implementations of LA based redundant SD to two’s complement conversion using three approaches

J Electron Test (2019) 35:779–796 785

speed penalty with respect to the original design. This hasbeen made possible by utilizing the underutilized space ofthe configured logic elements to insert the appropriate faultlocalization circuitry.

We have primarily settled for four testability and faultlocalization mechanisms namely self dual complementbased alternating logic, insertion of scan logic for FFs,monitoring of selected signals and C-testability. Alternatinglogic based fault localization calls for EX-ORing theoriginal function with its self dual complement, such that theoutput bits flip whenever the inputs are complemented [49].Scan based design calls for multiplexing the combinationaloutput responses that parallely loads a bank of registersin a way such that the parallel-in parallel-out registerfunctionality transforms to a serial-in serial-out register,to conveniently read out the given state of a circuit aswell as carry out an equality response check of the serialout and serial in data. Scan design in FPGA can serve toaccess the internal FFs of a design which do not have directcontrollability or observability. It can achieve verificationof functional correctness of the circuit operation whoselogic has been implemented, insert data stream into the FFsintended to serve as test vectors for those sub-circuits whoderive their inputs from the internal registers, can checkfor stuck-at faults to observe if an alternating input serialdata stream 0 → 1 → 0 → 1 is faithfully reproducedat the output of the scan chain, or facilitate initializingthe circuit to a desirable state. Signal monitoring techniqueinvolves broadcasting a signal over the contours of a circuitto observe whether it is faithfully reproduced at any chosencircuit output(s). C-testability on the other hand qualifiesfor certain iterative logic arrays (ILAs) which can be tested

pseudo-exhaustively using a constant number of test vectorsirrespective of the input wordlength of the ILA [50].

3.1 C-testability Analysis of Redundant Binaryto Two’s Complement FPGA Implementations

The redundant binary to two’s complement convertercircuits in the RC mode are C-testable. We examine thefeasibility of the C-testability of the circuits through aformal state diagram. The definition of C-testability canbe formally expressed from the graph theoretic point ofview, using the state transition diagram of a finite statemachine, wherein the state diagram of a C-testable ILA canbe regarded as an ensemble of disjoint subsets of branches,with each subset defining a closed cycle [50]. We considerevery LUT6 or LUT6 2 coupled with a single multiplexerfrom the carry chain of Fig. 4 as a single cell of an ILA.Here, q0 and q1 represents the states corresponding to thecarry chain multiplexer outputs si .

Figure 7 shows the state diagrams for the RC basedarchitectures (refer Fig. 4) converting redundant SDnumbers to two’s complement form for all three methods.The inputs < Hi, Gi > represent the transition conditionsfor either remaining in the same state or moving to the nextstate. Similarly, Fig. 8 represents the state diagrams for theLA architectures (refer Fig. 6) which accelerates conversionof redundant SD numbers to two’s complement form. Herethe input order< Hi+1, Hi, Gi+1, Gi > represents the transition condi-tions for either remaining in the same state or moving tothe next state. For either of the cases, the inputs are appliedaccordingly which ensures that the initial carry-in and inter-

Fig. 7 State diagram for two’scomplement conversion startingfrom redundant SD input usingRC architectures correspondingto Approach-I (Fig. 4a),Approach-II (Fig. 4b) andApproach-III (Fig. 4c)

J Electron Test (2019) 35:779–796786

Fig. 8 State diagram for two’scomplement conversion startingfrom redundant SD input for theLA logic corresponding to toApproach-I (Fig. 6a),Approach-II (Fig. 6b) andApproach-III (Fig. 6c)

mediate or final carry-out signals are either same or invertedversions of one another.

3.2 Alternating Logic and Signal Monitoring BasedFault Localization Support on Ripple Carry andLook-Ahead Based Architectures For Converting ATwo’s Complement Input to its CSD Representation

Alternating logic based circuit realization calls for realizingthe circuit which computes the self dual complement, andEX-ORing it with the original function. In the normalmode when T D = 0, the original function is computed.In the test mode, the alternating logic output is fed tosuccessive inputs. In order to realize this circuit withoutexcess hardware, it is required that one LUT input mustbe vacant to feed in the test mode signal, which calls for16.67% and 20% underutilized inputs for single and doubleoutput LUTs respectively. The RC based CSD recoderconforms to this underutilized input space for which theappendage of such circuitry does not incur additional logicand can be conveniently implemented as shown in Fig. 9a.

The LA generator that can compute the signal twiceas fast compared to the RC version, can also besupplemented with the alternating logic circuitry as therequisite percentage of underutilized inputs to support thesystem is available. The LUTs in the LA generator have40% underutilized inputs with other inputs populated byMi+2, Mi+1 and Mi inputs, where the self dual complementfor the O6 and O5 outputs may be computed as δ(X)O6 =Mi+2 and δ(X)O5 = Mi+2Mi+1Mi respectively. Thecorresponding architecture is shown in Fig. 9b.

For the LA generator that computes the signal four timesas fast compared to the RC version shown in Fig. 3c, there

is no availability of vacant inputs to insert alternating logiccircuitry based functionality. C-testability is also ruled outin such a situation as it is customary that inputs to everycell of an ILA must be independent of each other to qualifyfor C-testability. Since every LUT shares one input eachfrom the preceding and successive LUT, M4i−8 and M4i−4

respectively, the LA generator is no longer C-testable.Under this circumstance, we have an option of tying thecommon (shared amongst adjacent LUTs) input(s) either tologic zero or logic one, and apply requisite test vectors tothe unshared inputs. When the common inputs are tied tologic zero, s4i+4 = s4i is satisfied if {s4i , M4i−1:4i−3} ={00XX, 010X, 1101, 111X}; ands4i+4 �= s4i if {s4i , M4i−1:4i−3} = {011X, 10XX, 1100}.For the case of logic one, s4i+4 = s4i if {s4i , M4i−1:4i−3} ={000X, 0010, 101X, 11XX}; and s4i+4 �= s4i occurs if{s4i , M4i−1:4i−3} = {0011, 01XX, 100X}.

3.3 C-testability and Scan FF Based Approach forFault Localization on FPGA Architectures for Two’sComplement to CSD Encoding

As the first level logic of Fig. 2 clearly depicts adjacent inputsharing amongst every LUT, thereby disqualifying it fromsatisfying C-testable properties, we can append externalinputs to eliminate the input sharing characteristic. As onlytwo out of six inputs are populated in the original design,we configure the LUTs in the dual output mode, and usethe remaining vacant LUT inputs to add controllable inputvector E as shown in Fig. 10 to establish C-testability. Thesecond level of logic can be monitored using scan modewhere the register contents can be populated with a serialinput data stream SD, and can be observed whether the data

J Electron Test (2019) 35:779–796 787

Fig. 9 Alternating logic based fault localization support on RC and LA based architectures for converting a two’s complement input to its CSDrepresentation

Fig. 10 C-testability and scanFF architecture for faultlocalization in two’scomplement to CSD recoder

J Electron Test (2019) 35:779–796788

is being faithfully reproduced at the output. Test time can befurther reduced through a multiple scan path arrangement,where the scan path lengths can be conveniently decided bythe designer. As each second level of logic LUT outputs areregistered using dual pair of registers, we have a distinctscan path for the individual registers in every dual pair.Each of the LUTs in the second logic level have 60%unutilized inputs which we harness to our advantage to feedthe two serial data streams and the control input that enablesswitching of the mode of operation from normal to test andviceversa.

3.4 Alternating Logic Based Fault LocalizationSupport on Architectures for Converting aRedundant Signed Digit Input to its CSDRepresentation

On adopting Approach I for the redundant SD to CSDencoding, the alternating logic for the RC scheme can beimplemented in a pretty straightforward manner as shownin Fig. 11. The initial carry in SD for the first logiclevel is set to logic 1 in the normal mode and logic 0 inthe test mode. The successive logic levels are identicallyimplemented as discussed in Section 3.2. Similar argumentshold true if Approach-II or Approach-III is adopted in thedesign. The LA generators for either of the three approachescan be supplemented with the alternating logic circuitry bycomputing the self dual complements of the logic functions.Each of the LUTs in the LA generators have 20% unusedinputs through which we drive the test mode control input.The alternating logic based LA generator for each of the

three methods may be implemented as shown in Fig. 12.Self dual complement δ(X) may be individually computedfor the dual outputs O6 and O5 of a LUT, where, for the O6output, it may be computed as δ(X)O6 = Gi+1Hi+1GiHi ,whereas for the O5 output, δ(X)O5 = Gi+1Hi+1(Gi + Hi)

corresponding to Approach-I. Similarly for Approach-II,δ(X)O6 = Hi+1Hi , and δ(X)O5 = Hi(Gi+1(Hi+1 + Gi)+Gi+1GiHi+1. For Approach-III, δ(X)O6 = Hi , whereasδ(X)O5 = GiHi(Gi+1 ⊕ Hi+1).

3.5 C-testability and Scan Based Fault LocalizationSupport on Architectures for Converting aRedundant Signed Digit Input to its CSDRepresentation

The first level of logic shown in Fig. 13 refers to a scanbased design for the redundant SD to two’s complementconverter circuit. In the scan (test) mode, the serial data fedin as input to the carry chain which gets registered into theFFs through the LUT and carry chain based arrangement.The second logic level can be designed to receive thescanned in data inputs which in turn can serve as testvectors along with data inputE for a C-testable architecture.The last level of logic includes dual output LUTs withregistered outputs that can be tested for in the scan mode.Multiple scan path arrangements are also feasible for suchtopologies. As the LA generators do not require FFs for theirrealization, scan FF based design is ruled out in that case.Hence the LA generators may be tested using C-testabletechnique and the circuit accepting the upper half input wordmay be tested using scan FF based logic as shown in Fig. 14.

Fig. 11 Fault Localization usingalternating logic principles onFPGA for redundant SD to CSDrecoder

J Electron Test (2019) 35:779–796 789

Fig. 12 Alternating logic based circuitry for the LA generator of redundant SD to two’s complement converter

4 Results and Discussions

The architectures were implemented on Xilinx Virtex-7FPGA with XC7VX330T as the device family, package asFFG1157, and a speed grade of -2 using the Xilinx ISE 14.7design environment and the post place and route results havebeen reported for all cases. The circuits have been realizedby directly instantiating the FPGA primitives, namelyLUTs, carry chains and FFs. The primitive instantiationbased approach also facilitates the designer to pronounce

the physical FPGA coordinates on which the logic isto be mapped. Such an approach not only facilitates toobtain high speed, compact implementation [8, 32], butalso allows to determine the exact slice location fromwhich a faulty output has emanated. The methodologyto verify actual fault localization capabilities has beenachieved through post-route simulation of a design wherea particular Look-Up Table (LUT) is initialized with anincorrect truth-table to emulate the presence of the fault.On switching the circuit to test mode of operation and

Fig. 13 C-testability approach coupled with fault localization using scan FF based FPGA design for redundant SD to CSD recoder in RC mode

J Electron Test (2019) 35:779–796790

Fig. 14 C-testability approach coupled with fault localization usingscan FF based FPGA design for LA generator of redundant SD to CSDrecoder

driving the circuit with appropriate (as governed by the faultlocalization algorithm) primary input test vectors, the postroute simulation results revealed the incorrect output, whichcan be easily ascertained by comparing it with the golden(fault-free) response. As the placement coordinates of thecircuit primitives are known a priori, the exact location ofthe faulty response as determined by its position in theoutput vector or any intermediate output vector, shall pin-point out the defect-laden FPGA slice coordinates where thefaulty LUT or carry chain or a flip-flop may be located. Inthis manner, the fault localization exercise may be achieved.

Table 2 presents the implementation results for FPGAbased two’s complement to CSD converter. We have com-pared our results with the closest possible match proposedin [2] where an architecture for a processing element facil-itating binary to CSD conversion was introduced to fit intoa Virtex-4 FPGA slice containing 2 LUTs. The processingelement is shown in Fig. 15. Moreover, such processing ele-ments computing the carry equivalent signal pi cannot be

mapped amicably on the modern FPGAs in a speed-areaefficient manner, using the LUT and carry chain cascades.The Virtex-4 platform for which the circuit was originallyproposed in [2] is now an outdated platform with respect tothe modern architectural support. Both the RC and the LAversions outperform the behavioral realization of the circuitproposed in [2] (shown in Fig. 15) in terms of speed. TheLA versions have been reported for both the cases shownin Fig. 3b and c corresponding to 2× and 4× versionsrespectively. The behavioral design could not be pipelined,in contrast to our proposed pipelined implementation with-out the requirement of any staging delays to synchronizethe arrival times of inputs and outputs, and hence has beendenoted by NP. However, in all subsequent discussions, theLA generator with 2× fast signal propagation capabilityis referred to, unless otherwise mentioned. This is becausethe LA generator of the redundant SD to two’s comple-ment converter logic can only provide twice the speed incontrast to the 4× speed of the LA generator of two’s com-plement to CSD converter, and hence the inherent speed-upadvantage of the sole two’s complement to CSD encodergets overshadowed in the larger set up accepting redundantSD inputs. Additionally, following the 2× LA configurationequalizes the signal propagation delay across the multiplepipeline stages of the redundant SD to CSD converter. TheLA version for 2× and 4× versions comes with a marginallogic overhead for 12.5% and 6.25% respectively, comparedto the RC versions, with a marked improvement in speed.

Table 3 presents the FPGA implementation resultsof the redundant SD to CSD converter. Our proposedimplementations outperforms another CSD recoder circuitproposed in [7] where it was shown to be amenablefor ASIC implementations. However the circuit proposedin [7] cannot efficiently harness the fast carry chainimplementations, and are not amenable to forward pathpipelining. Hence our proposed implementations, both theRC and LA based configurations, comfortably outperformsthe behavioral implementation of the algorithm proposedin [7]. Table 4 and Table 5 illustrates the implementationresults of the same recoding circuitry, but with the faultlocalization circuitry appended with the original circuitversions. Results clearly reveal that compared to the originaldesign implementations, the circuits do not consumeextra hardware and operate at almost the same speedcompared to the original circuit versions. The multi-scanpath arrangements can be as conveniently accommodatedas in the case of a single scan path, without any speeddeterioration or hardware overhead. The generation ofthe design descriptions of the CSD recoding circuits interms of the hardware primitives as well as the placement

J Electron Test (2019) 35:779–796 791

Table 2 Post place and route implementation results for FPGA architectures facilitating two’s complement to CSD conversion

Op. Design #FF #LUT #Slice #Pipeline Conversion

Width Style stages Freq.(MHz)

Behav. [2] – 46 29 NP § 401.77

32 Prop. RC 127 64 16 2 852.51

Prop. LA (×2) 127 72 18 2 937.21

Prop. LA (×4) 127 68 17 2 987.17

Behav. [2] – 70 44 NP § 316.25

48 Prop. RC 191 96 24 2 720.46

Prop. LA (×2) 191 108 27 2 813.67

Prop. LA (×4) 191 104 26 2 850.48

Behav. [2] – 94 61 NP § 292.48

64 Prop. RC 255 128 32 2 624.22

Prop. LA (×2) 255 144 36 2 720.46

Prop. LA (×4) 255 136 34 2 781.25

Behav. [2] – 142 85 NP § 278.78

96 Prop. RC 383 192 48 2 491.64

Prop. LA (×2) 383 216 54 2 583.09

Prop. LA (×4) 383 204 51 2 645.99

Behav. [2] – 190 112 NP § 223.26

128 Prop. RC 511 256 64 2 402.41

Prop. LA (×2) 511 288 72 2 485.67

Prop. LA (×4) 511 272 68 2 550.36

§ The behavioral designs are essentially combinational circuits. The delay and the frequency of operation were obtained by inserting FFs at theprimary input and output ports of the circuits

constraints of the primitives on the FPGA silicon fabrichave been conveniently automated through C programshaving computational complexity of the order of O(n)

where n is the input digit wordlength. The instantiatedlogic primitives such as LUTs, FFs and carry chains areessentially a part of bit-sliced design, where an entire sub-circuit is partitioned into identically configured logic cellsfrom the point of view of functionality for a particularpipeline stage, but accepting different pairs of inputs.Hence, it becomes easier to generate the circuit descriptionsfor the individual bit slices through iterative loops in ahigh level program, and as a result, the design automationbecomes feasible and extremely simple and straightforward.The corresponding placement directives for the individualbit-slices can also be conveniently automated. The usermay give as input the location coordinates of the FPGAlogic slice, starting from which the circuit has to bemapped. When mapped in a columnar fashion, the logiccoordinates will be appropriately incremented in the y-direction for mapping each pipeline stage, as for thisexample. Subsequent pipelined stages may be implementedby properly incrementing the x-coordinate.

Our work primarily focuses upon testable logic insertionwithout any speed and area overhead. The approach behind

hypothesizing a fault model to describe and cover theeffects of varied physical failures at higher levels of logic,such as gate level, register transfer or at the functionalblock level, is an important step towards defining the testproblem. It brings along with it other overheads of designfor testability (DFT) such as test time, test data volume,test control complexity, test generation complexity andadditional input/output pins. A suitable trade-off may benecessary, depending upon the application and requirementsof the test designer, that can be extended as future work.

Fig. 15 One digit binary-to-CSD conversion proposed in [2]

J Electron Test (2019) 35:779–796792

Table 3 Post place and route implementation results for FPGA architectures facilitating redundant signed digit to CSD conversion

Op. Design # Pipeline Conversion

Width Style #FF #LUT #Slice stages Freq. (MHz)

Behav. [7] – 131 90 NP § 245.46

32 Prop. RC 158 96 24 3 735.29

Prop. LA 158 112 28 3 773.99

Behav. [7] – 195 126 NP § 199.52

48 Prop. RC 238 144 36 3 633.31

Prop. LA 238 168 42 3 688.71

Behav. [7] – 259 171 NP § 181.40

64 Prop. RC 318 192 48 3 556.48

Prop. LA 318 224 56 3 606.43

Behav. [7] – 387 245 NP § 175.38

96 Prop. RC 478 288 72 3 447.03

Prop. LA 478 336 84 3 496.28

Behav. [7] – 515 350 NP § 172.03

128 Prop. RC 638 384 96 3 370.92

Prop. LA 638 448 112 3 413.91

§ All the behavioral designs are pure combinational circuits proposed in [7]. The delay and the frequency of operation were obtained throughregistering of input and outputs

Table 4 Post place and route implementation results for FPGA architectures facilitating two’s complement to CSD conversion with suitableappendage of fault localization circuitry

Op. Fault Loc. # Scan Conversion

Width Mode Technique #FF #LUT #Slice Paths Freq. (MHz)

Alt. Logic 127 64 16 – 852.51

32 RC Sig. monitor+Scan 127 64 16 1 852.51

Alt. Logic 127 72 18 – 937.21

CLA Sig. monitor+Scan 127 72 18 1 937.21

Alt. Logic 191 96 24 – 720.46

48 RC Sig. monitor+Scan 191 96 24 1 720.46

Alt. Logic 191 108 27 – 813.67


Alt. Logic 255 128 32 – 624.22

RC Sig. monitor+Scan 255 128 32 1 624.22

64 255 128 32 2 624.22

Alt. Logic 255 144 36 – 720.46


255 144 36 2 720.46

Alt. Logic 383 192 48 – 491.64

RC 383 192 48 1 491.64

Sig. monitor+Scan 383 192 48 2 491.64

96 383 192 48 3 491.64

Alt. Logic 383 216 54 – 583.09

CLA 383 216 54 1 583.09

Sig. monitor+Scan 383 216 54 2 583.09

383 216 54 3 583.09

Alt. Logic 511 256 64 – 402.41

RC 511 256 64 1 402.41

J Electron Test (2019) 35:779–796 793

Table 4 (continued)



Sig. monitor+Scan 511 256 64 2 402.41

128 511 256 64 4 402.41

Alt. Logic 511 288 72 – 485.67

CLA 511 288 72 1 485.67

Sig. monitor+Scan 511 288 72 2 485.67

511 288 72 4 485.67

Table 5 Post place and route implementation results for FPGA architectures facilitating redundant SD to CSD conversion with suitable appendageof fault localization circuitry



Alt. Logic 158 96 24 – 731.53

32 RC C-Test+Scan 158 96 24 – 735.29

Alt. Logic 158 112 28 – 773.99

CLA C-Test+Scan 158 112 28 1 768.64

Alt. Logic 238 144 36 – 630.52

48 RC C-Test+Scan 238 144 36 1 633.31

Alt. Logic 238 168 42 – 688.71

CLA C-Test+Scan 238 168 42 1 668.90

Alt. Logic 318 192 48 – 554.32

RC C-Test+Scan 318 192 48 1 556.48

64 318 192 48 2 556.48

Alt. Logic 318 224 56 – 603.86

CLA C-Test+Scan 318 224 56 1 602.77

318 224 56 2 596.66

Alt. Logic 478 288 72 – 445.63

RC 478 288 72 1 447.03

C-Test+Scan 478 288 72 2 447.03

96 478 288 72 3 447.03

Alt. Logic 478 336 84 – 496.28

CLA 478 336 84 1 484.50

C-Test+Scan 478 336 84 2 495.79

478 336 84 3 490.44

Alt. Logic 638 384 96 – 369.96

RC 638 384 96 1 370.92

C-Test+Scan 638 384 96 2 370.92

128 638 384 96 4 370.92

Alt. Logic 638 448 112 – 413.91

CLA 638 448 112 1 413.56

C-Test+Scan 638 448 112 2 413.56

638 448 112 4 413.56

J Electron Test (2019) 35:779–796794

5 Conclusion

In this paper, we have addressed FPGA fabric awareapproaches and design methodology to implement CSDrecoding circuits that accepts a two’s complement or aredundant SD input. The fabric aware approach helpsus to optimally utilize the FPGA slice logic resourcesby carefully exploiting the logic capacity of every singleresource to our advantage and inserting the feasible andmost appropriate circuitry supporting fault localization. Thecircuit descriptions have be automated, thereby presentingthis design philosophy to be a commercially viable optionin an era where circuit reliability issues may be on the rise.As our proposed implementations have been projected tohave a generic design philosophy, they are often backwardcompatible to be realized on the six series of FPGA familysuch as Virtex-6, as well as are scalable for realization onother 7 series of FPGAs and the modern UltraScale families.

References

1. Parhi KK (2007) VLSI digital signal processing systems: designand implementation. Wiley India Pvt. Limited

2. Faust M, Gustafsson O, Chang CH (2011) Fast and VLSIefficient binary-to-CSD encoder using bypass signal. ElectronLett 47(1):18–20

3. Ruiz GA, Granda M (2011) Efficient canonic signed digitrecoding. Microelectron J 42(9):1090–1097

4. Herrfeld A, Hentschke S (1995) Look-ahead circuit for CSD-codecarry determination. Electron Lett 31(6):434–435

5. Koc KC (1996) Parallel canonical recoding. Electron Lett32(22):2063–2065

6. He Y, Zhang Z, Ma B, Li J, Zhen S, Luo P, Li Q (2015) Afast and energy efficient binary-topseudo CSD converter. In: Proc.IEEE international symposium on circuits and systems (ISCAS),pp 838–841

7. Tanaka Y (2016) Efficient signed-digit-to-canonical-signed-digitrecoding circuits. Microelectron J 57:21–25

8. Palchaudhuri A, Dhar AS (2018) High speed FPGA fabric awareCSD recoding with run-time support for fault localization. In:Proc. 31st international conference on VLSI design (VLSID),pp 186–191

9. Palchaudhuri A, Dhar AS (2017) Built-in fault localizationcircuitry for high performance FPGA based implementations. JElectron Test 33(4):529–537

10. Naouss M, Marc F (2016) Modelling delay degradation dueto NBTI in FPGA look-up tables. In: Proc. 26th internationalconference on field programmable logic and applications (FPL),pp 1–4

11. Naouss M, Marc F (2016) FPGA LUT delay degradation dueto HCI: experiment and simulation result. Microelectron Reliab64:31–35

12. Palchaudhuri A, Dhar AS (2017) Redundant arithmetic based highspeed carry free hybrid adders with built-in scan chain on FPGAs.In: Proc. 24th IEEE international conference on high performancecomputing (HiPC), pp 104–113

13. Basha BC, Pillement S, Piestrak SJ (2015) Fault-aware config-urable logic block for reliable reconfigurable FPGAs. In: Proc.

IEEE international symposium on circuits and systems, pp 2732–2735

14. Rao PMB, Amouri A, Kiamehr S, Tahoori MB (2013) Alter-ing LUT configuration for wear-out mitigation of FPGA-mappeddesigns. In: Proc. 23rd international conference on field pro-grammable logic and applications, pp 1–8

15. Raik J, Rannaste A, Jenihhin M, Villukas T, Ubar R, FujiwaraH (2011) Constraint-based hierarchical untestability identificationfor synchronous sequential circuits. In: Proc. 16th IEEE europeantest symposium (ETS), pp 147–152

16. Raik J, Ubar R, Krivenko A, Kruus M (2007) Hierarchicalidentification of untestable faults in sequential circuits. In: Proc.10th euromicro conference on digital system design architectures,methods and tools (DSD), pp 668–671

17. Raik J, Fujiwara H, Ubar R, Krivenko A (2008) Untestable faultidentification in sequential circuits using model-checking. In:Proc. 17th asian test symposium (ATS), pp 21–26

18. Bernardeschi C, Cassano L, Domenici A, Sterpone L (2016)Ua2TPG: an untestability analyzer and test pattern generatorfor SEUsin the configuration memory of SRAM-based FPGAs.Integration the VLSI Journal 55:85–97

19. Bernardeschi C, Cassano L, Domenici A, Sterpone L (2013)Unexcitability analysis of SEUs affecting the routing structureof SRAM-based FPGAs. In: Proc. 23rd ACM internationalconference on great lakes symposium on VLSI (GLSVLSI), pp 7–12

20. Bernardeschi C, Cassano L, Domenici A, Sterpone L (2012) SEU-X: a SEU un-excitability prover for SRAM-FPGAs. In: Proc. 18thIEEE international on-line testing symposium (IOLTS), pp 25–30

21. Tiwari A, Tomko KA (2003) Scan-chain based watch-points forefficient run-time debugging and verification of FPGA designs.In: Proc. 8th Asia and South Pacific design automation conference(ASP-DAC), pp 705–711

22. Gupte A, Vyas S, Jones PH (2015) A fault-aware toolchainapproach for FPGA fault tolerance. ACM Trans Design Automa-tion Electron Syst 20(2):32:1—32:22

23. Modi H, Athanas P (2015) In-system testing of Xilinx 7-seriesFPGAs: part 1-logic. In: Proc. IEEE international conference formilitary communications, pp 477–482

24. Devlin BS, Camarota RC (2017) Circuit for and method ofimplementing a scan chain in programmable resources of anintegrated circuit. United States Patent Application PublicationPatent 20 170 373 692

25. Palchaudhuri A, Amresh AA, Dhar AS (2017) Efficient automatedimplementation of testable cellular automata based pseudorandomgenerator circuits on FPGAs. J Cell Autom 12(3–4):217–247

26. Nazar GL, Carro L (2012) Fast error detection through efficientuse of hardwired resources in FPGAs. In: Proc. 17th IEEEEuropean test symposium, pp 1–6

27. Kyriakoulakos K, Pnevmatikatos D (2009) A novel SRAM-basedFPGA architecture for efficient TMR fault tolerance support. In:Proc. 19th international conference on field programmable logicand applications, pp 193–198

28. Palchaudhuri A, Dhar AS (2016) Efficient implementation of scanregister insertion on integer arithmetic cores for FPGAs. In: Proc.29th international conference on VLSI design, pp 433–438

29. Lala PK, Burress AL (2003) Self-checking logic design for FPGAimplementation. IEEE Trans Instrum Meas 52(5):1391–1398

30. Palchaudhuri A, Dhar AS (2019) Design and automation ofVLSI architectures for bidirectional scan based fault localizationapproach in FPGA fabric aware cellular automata topologies. JParallel Distributed Comput 130:110–125

31. Ehliar A (2010) Optimizing Xilinx designs through primitiveinstantiation. In: Proc. 7th FPGAworld conference, pp 20–27

J Electron Test (2019) 35:779–796 795

32. Palchaudhuri A, Chakraborty RS (2016) High performance integerarithmetic circuit design on FPGA: architecture, implementationand design automation. Springer, India

33. Xilinx Inc. (2016) 7 series FPGAs configurable logic block userguide UG474 (v1.8). [Online]. Available: https://www.xilinx.com/support/documentation/user guides/ug474 7Series CLB.pdf

34. Verma AK, Brisk P, Ienne P (2009) Challenges in automatic opti-mization of arithmetic circuits. In: Proc. 19th IEEE symposium oncomputer arithmetic, pp 213–218

35. KummM, Abbas S, Zipf P (2015) An efficient softcore multiplierarchitecture for Xilinx FPGAs. In: Proc. 22nd IEEE symposiumon computer arithmetic, pp 18–25

36. Kumm M, Kleinlein M, Zipf P (2016) Efficient sum of absolutedifference computation on FPGAs. In: Proc. 26th internationalConference on Field Programmable Logic and Applications(FPL), pp 1–4

37. Palchaudhuri A, Dhar AS (2019) VLSI architectures for Jacobisymbol computation. In: Proc. 32nd international conference onVLSI design, pp 335–340

38. Zicari P, Perri S (2010) A fast carry chain adder for Virtex-5 FPGAs. In: Proc. 15th IEEE mediterranean electrotechnicalconference (MELECON), pp 304–308

39. Palchaudhuri A, Dhar AS (2016) High performance bit-slicedpipelined comparator tree for FPGAs. In: Proc. 20th internationalsymposium on VLSI design and test (VDAT), pp 1–6

40. Kallstrom P, Gustafsson O (2016) Fast and area efficient adderfor wide data in recent Xilinx FPGAs. In: Proc. 26th internationalconference on field programmable logic and applications (FPL),pp 1–4

41. Palchaudhuri A, Dhar AS (2018) Fast carry chain basedarchitectures for two’s complement to CSD recoding on FPGAs.In: Proc. 14th international symposium on applied reconfigurablecomputing (ARC), pp 537–550

42. Ye SM, Laih CS, Chen CH, Lee JY (1992) An efficient redundant-binary number to binary number converter. IEEE Journal ofSolid-State Circuits 27(1):109–112

43. Herrfeld A, Hentschke S (1995) Conversion of redundantbinary into two’s complement representations. Electron Lett31(14):1132–1133

44. Wang G, Tull MP (2004) A new redundant binary number to2’s-complement number converter. In: Proc. region 5 conference:annual technical and leadership workshop, pp 141–143

45. He Y, Chang CH (2008) A Power-Delay efficient hybridcarry-lookahead/carry-select based redundant binary to two’s

complement converter. IEEE Trans on Circ Sys I 55(1):336–34646. Sahoo SK, Gupta A, Asati AR, Shekhar C (2010) A novel

redundant binary number to natural binary number converter. JSignal Process Sys 59(3):297–307

47. Barik RK, Pradhan M, Panda R (2017) Efficient conversiontechnique from redundant binary to nonredundant binary repre-sentation. J Circ Sys Comput 26(9):1–8

48. Palchaudhuri A, Dhar AS (2018) Redundant binary to two’scomplement converter on FPGAs through fabric aware scanbased encoding approach for fault localization support. In: Proc.IEEE international parallel and distributed processing symposium(IPDPS) workshops, pp 218–221

49. Saposhnikov VV, Saposhnikov VV, Dmitriev A, Goessel M(1998) Self-dual duplication for error detection. In: Proc. seventhasian test symposium, pp 296–300

50. Friedman AD (1994) A functional approach to efficient faultdetection in iterative logic arrays. IEEE Transcations on Comput-ers 43(12):1365–1375

Publisher’s Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Ayan Palchaudhuri is currently pursuing his Ph.D. from the Depart-ment of Electronics and Electrical Communication Engineering,Indian Institute of Technology Kharagpur. He received his M.S. degreefrom the Department of Computer Science and Engineering of thesame institute in 2015. His research interests include VLSI architecturedesign for high performance computer arithmetic applications.

Anindya Sundar Dhar received his Bachelor of Engineering Degreein Electronics and Telecommunication Engineering from BengalEngineering College, Shibpur, Howrah, India (presently known asIndian Institute of Engineering Science and Technology) in 1987.He received his M. Tech. degree in Electronics and ElectricalCommunication Engineering with specialization in Integrated Circuitsand Systems Engineering from Indian Institute of Technology,Kharagpur, India in 1989. He received his Ph. D. degree in1994 from the same Institute, where he is presently a Professorin the Department of Electronics and Electrical CommunicationEngineering. His research interests include VLSI architecture designfor real-time signal processing.

J Electron Test (2019) 35:779–796796

https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf

https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf

fault localization and testability approaches for fpga fabric...

Documents