vlsi abstract

65
Hades InfoTech Pvt. Ltd G2, Metha Complex, Little Mount, Saidapet, Chennai-15 Ph: 044-22200258, Mobile: 9840989556, 9952050233 Mail: [email protected], [email protected] , www.hades.in Page | 1 IEEE VLSI ABSTRACT 2013-14 HARDWARE IMPLEMENTATION HVL01 A Current Consumption Measurement Approach for FPGA-Based Embedded Systems Abstract: An approach for field programmable gate array (FPGA) based embedded system dynamic current consumption measurement is presented in the paper. The measurement method is based on current conversion to time interval. The time interval is then measured using timer implemented in FPGA. Measurement uncertainty budget analysis is performed. It reveals key components and their parameters mostly affecting current measurement uncertainty. System architecture for incorporating measurement setup to the standard FPGA development flow is suggested. HVL02 Design of a Real-Time FPGA-Based Data Acquisition Architecture for the LabPET Abstract: The LabPET II detector block was designed to achieve submillimeter spatial resolution in small animal PET imaging. Each detection block consists of two arrays of 4$, times,$ 8 avalanche photodiodes (APD) individually coupled to an 8$, times,$8 scintillator array, to form 64 independent detectors with parallel readout channels. This new detection block entails an eightfold increase in pixel density compared to the LabPET I. A 64-channel mixed-signal application-specific integrated circuit (ASIC) was designed to extract relevant PET data in real time from the LabPET II detection blocks. In order to interface the ASICs forming the PET camera with the storage units, a real- time FPGA-based digital data acquisition (DAQ) system was designed. The DAQ system allows event harvesting, processing and transmission to a host computer for data storage

Upload: alex-sagar

Post on 22-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Vlsi Abstract

TRANSCRIPT

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 1

IEEE VLSI ABSTRACT 2013-14

HARDWARE IMPLEMENTATION

HVL01

A Current Consumption Measurement Approach for FPGA-Based

Embedded Systems

Abstract:

An approach for field programmable gate array (FPGA) based embedded system

dynamic current consumption measurement is presented in the paper. The measurement

method is based on current conversion to time interval. The time interval is then

measured using timer implemented in FPGA. Measurement uncertainty budget analysis is

performed. It reveals key components and their parameters mostly affecting current

measurement uncertainty. System architecture for incorporating measurement setup to the

standard FPGA development flow is suggested.

HVL02

Design of a Real-Time FPGA-Based Data Acquisition Architecture for

the LabPET

Abstract:

The LabPET II detector block was designed to achieve submillimeter spatial

resolution in small animal PET imaging. Each detection block consists of two arrays of

4$, times,$ 8 avalanche photodiodes (APD) individually coupled to an 8$, times,$8

scintillator array, to form 64 independent detectors with parallel readout channels. This

new detection block entails an eightfold increase in pixel density compared to the

LabPET I. A 64-channel mixed-signal application-specific integrated circuit (ASIC) was

designed to extract relevant PET data in real time from the LabPET II detection blocks.

In order to interface the ASICs forming the PET camera with the storage units, a real-

time FPGA-based digital data acquisition (DAQ) system was designed. The DAQ system

allows event harvesting, processing and transmission to a host computer for data storage

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 2

as well as system programming and calibration. Real-time event processing embedded in

the DAQ includes time trigger, energy computation using a time-over-threshold (TOT)

conversion scheme, timing corrections, and event sorting trees. In the standard DAQ

mode, a real-time coincidence engine analyzes events and only keeps relevant

information to minimize data throughput and post-acquisition data processing. The

architecture consists of three FPGA-based electronic layers wired through gigabit links: a

Front-End layer extracts time and energy along with the pixel address, a custom Hub

layer chronologically sorts incoming events, and a Coincidence engine matches

coincident events and computes an estimate of the random events rate. Every FPGA in

the different layers is accessible through an Ethernet link. The real-time digital

architecture sustains the required throughput of $sim $111 million events/s for a $sim

{hbox{37thinspace000}}$-channel scanner configuration.

HVL03

Project-Based Learning in Embedded Systems Education Using an

FPGA Platform

Abstract:

With embedded systems becoming ubiquitous, there is a growing need to teach

and train engineers to be well-versed in their design and development. The

multidisciplinary nature of such systems makes it challenging to give students exposure

to and experience in all their facets. This paper proposes a generic architecture,

containing multiple processors, that allows easy integration of custom and/or predefined

peripherals. The architecture allows students to explore both the hardware and software

issues associated with real-time and embedded systems. Furthermore, the architecture can

be extended to train students in advanced concepts in embedded multiprocessor systems.

This generic architecture has been used for two courses at the National University of

Singapore—one on real-time embedded systems and the other emphasizing the hardware

aspects of embedded systems. The project in the real-time embedded systems course has

students develop a five-a-side soccer system on multiple field-programmable gate array

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 3

(FPGA) boards using embedded processors. In the embedded hardware design course

project, students use an embedded processor-based system to perform decryption of a

block encrypted image, accelerated through a custom co-processor. The use of displays

gives students a visual/interactive experience and a sense of accomplishment, while

reinforcing the theoretical concepts. Both qualitative and quantitative assessment results

are presented, showing how students perceived these projects and met the learning

objectives.

HVL04

A Reliable and Cost-Effective Sand Monitoring System on the Field

Programmable Gate Array

Abstract:

Sand monitoring gives the benefits of avoiding equipment erosion and production

failure in the oil industry. This paper presents the design and implementation of a reliable

and cost-effective sand monitoring system for measuring sand production in gas and oil

flows in real time. The designed monitoring system involves two acoustic emission (AE)

sensors and one Doppler sensor for gauging the sand impaction and the velocity of sand

particles in a nonintrusive manner, respectively. It is implemented on a field

programmable gate array (FPGA) as a prototype for real-time data acquisition and

processing, and evaluated using a testbed pump skid available in our laboratory. Relying

on a low-cost FPGA board to integrate all acquisition and processing functionality, our

monitoring system can measure the sand production amount on-the-fly reliably with high

accuracy, according to our experimental evaluation. It is likely to achieve better accuracy

than one without the Doppler sensor. The proposed monitoring system is affordable for

wide deployment, given its high accuracy, good resilience to surrounding noise, and low

cost (when compared with commercially available systems whose price tags can be some

tenfold higher).

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 4

HVL05

Simple Closed Loop DC Speed Control System On FPGA Platform for

VHDL Beginner

Abstract:

Students develop interest to understand the theory around a project that fascinates

them. The motivation of this paper has been to devise a model for teaching VHDL

language to embedded students. The teaching model is based on a practical example of a

simple DC Motor control application conducted in the lab session. A series of Lab

modules from basic digital logics leading gradually to a closed loop DC motor speed

control is presented as lab experiments to maintain the interest of first time learner of

VHDL language. The theory of VHDL language construction is covered separately by

way of classroom lectures but in conjunction with the Lab practice sessions.

HVL06

FPGA Based Real Time Systems For Position Tracking

Abstract:

Position tracking systems supported with RISS and gyroscopes are found to be

better solutions in the places where GPS is unemployable or in the places where GPS

cannot work. Generally systems that are based on GPS for position tracking, face a lot of

problems in the areas where line of sight is hard to achieve i.e. GPS denied environment,

like dense terrestrial areas, subways, tunnels and hidden places. This system provides

continuous and highly reliable position tracking by synchronising real time stimulus

obtained from the sensors and the actual GPS values. The core processor of the system is

built on an FPGA which is used in the system kernel. The key factor for using FPGA in

the system is its customisable core and its flexibility to interface with the sensors. The

core employees the Hybrid Kalman filter for estimating the displacement and position. In

this system we integrate the 3D-RISS with GPS to achieve a Reliable and uninterrupted

Position Tracking. In these systems the processor estimates the posit on of the object

based on the four inputs taken from the RISS and the Odometer, they are Velocity,

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 5

acceleration, orientation and position. Here the system integrates the offline Inertial data (

i.e. while the GPS is unavailable) with that of actual GPS data. The system starts to

compute the position and velocity using the initial data provided by the GPS at the instant

it was lost. This kind of position tracking systems used in various kinds of moving

objects like Aircrafts, Guided missiles, Land Rovers, and Marine navigations.

HVL07

FPGA based Real Time 3D-RISS / GPS Integrated System for Position

Tracking

Abstract:

Navigation algorithms integrating measurements from multi-sensor systems

overcome the problems that arise from using GPS navigation systems in standalone

mode. Algorithms which integrate the data from 2D low-cost reduced inertial sensor

system (RISS), consisting of a gyroscope and an odometer or wheel encoders, along with

a GPS receiver via a Kalman filter has proved to be worthy in providing a consistent and

more reliable navigation solution compared to standalone GPS receivers. It has been also

shown to be beneficial, especially in GPS-denied environments such as urban canyons

and tunnels. The main objective of this paper is to narrow the idea-to-implementation gap

that follows the algorithm development by realizing a low-cost real-time embedded

navigation system capable of computing the data-fused positioning solution. The role of

the developed system is to synchronize the measurements from the three sensors, relative

to the pulse per second signal generated from the GPS, after which the navigation

algorithm is applied to the synchronized measurements to compute the navigation

solution in real-time. Employing a customizable soft-core processor on an FPGA in the

kernel of the navigation system, provided the flexibility for communicating with the

various sensors and the computation capability required by the Kalman filter integration

algorithm.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 6

HVL08

FPGA based Braille to text & speech for blind persons

Abstract:

Blind people are an integral part of the society. However, their disabilities have

made them to have less access to computers, the Internet, and high quality educational

software than the people with clear vision. Consequently, they have not been able to

improve on their own knowledge, and have significant influenc e and impact on the

economic, commercial, and educational ventures in the society. One way to narrow this

widening gap and see a reversal of this trend is to develop a system, within their

economic reach, and whi ch will empower them to communicate freely and widely using

the Internet or any other information infrastructure. Over time, the Braille system has

been used by the visually impaired for communication and contact with the outside

world. This paper presents the implementation of Braille to Text/Speech Converter on

FPGA Spartan3 kit. The actual Braille language is converted into English language in

normal domain. The input is given through braille keypad which consists of diff erent

combinations of cells. This input goes to the FPGA Spartan3 Kit. According to the

combinations given, FPGA converts the input into corresponding english text through the

decoding logic in VHDL language. After decoding, the corresponding alphabet is

converted to speech through algorithm. Also it is displayed on the LCD by interfacing the

LCD to the Spartan3 kit.

HVL09

Implementation of FPGA based PID Controller for DC Motor Speed

Control System

Abstract:

In this paper, the implementation of software module using ‘VHDL’ for Xilinx

FPGA (XC3S400) based PID controller for DC motor speed control system is presented.

The tools used for building and testing the software modules are Xilinx ISE 9.2i and

ModelSim XE III 6.3c. Before verifying the design on FPGA the complete design is

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 7

simulated using Modelsim Simulation tool. A test bench is written where, the set speed

can be changed for the motor. It is observed that the motor speed gradually changes to the

set speed and locks to the set speed.

HVL010

Implementation of Hamming code using VLSI

Abstract:

This paper tries to explain the implementation of hamming code using VLSI. In

the present world the field of communication has got many applications, and in every

field the data is encoded at the transmitter and transferred on a communication channel

and received at the receiver after it is decoded. During the transmission of data it might

get corrupted because of some noise on the channel. So it is necessary for the receiver to

have some function which can detect the error in the received data. Hamming code is one

of such forward error correcting code which has got many applications. In this paper the

algorithm for hamming code is discussed and then implementation of it in verilog is done

to get the results. Hamming code is an improvement over parity check method. Here a

code is implemented in verilog in which 4-bit of information data is transmitted with 3-

redundancy bits. In order to find the value of these redundancy bits a code is written in

verilog which will be simulated in Xillinx 9.1 software. The r sult of simulation and test

bench waveforms are also shown.

HVL011

FPGA Based Critical Patient Health Monitoring Using Fuzzy Neural

Network

Abstract:

We have designed FPGA based system and trained a fuzzy neural network for

early diagnosis of a patient. The system employs a fuzzy interface cascaded with a feed-

forward neural network in order to obtain an optimum decision regarding the future

pathology physiological state of a patient,. The neurons that are considered in the

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 8

proposed network are devoid of self-connections instead of commonly used self-

connected neurons. The system has been trained and tested with renal data of patients

taken at 10 days interval of time. Applying the methodology, the chance of forecasting of

critical renal condition of a patient has been predicted accurately, 30 days ahead of

actually attaining the critical condition. The system has also been tested for pathology

physiological state prediction of patients at multiple time steps ahead and the prediction

at the next instant of time stands out to be the most accurate. The fuzzy interface

discussed here performs fuzzification of patient data. The data from the patient such as h

ight or weight data cannot always be trusted as they are subjected to the quality and

accuracy of measuring units and the skill of the technician. Moreover, based on a single

data, it would be highly uncertain to make an accurate decision about the future

physiological state of the patient. So the patient data have been fuzzified with the

objective of transformation of periodic measures into likelihoods that the body mass

index, blood glucose, urea, creatinine, systolic and diastolic blood pressure of the patient

is high, low or moderate.

HVL012

VIDEO acquisition through I2C using VHDL

Abstract:

With I2C implementation of video acquisition system, devices can communicate

with each other very speedily than with any other communication protocol. The main

purpose of this technique is not only to communicate the devices but keep in touch with

every operation which can be performing by with the help of this protocol. I2c just uses

two lines for data communication making it lightweight, economical, and omnipresent.

The design is use in acquisition system to make overall system more efficient and

accurate, with the use of I2C data transmit rate get also increased.OV7620 single-chip

CMOS VGA colour digital camera is used in the design using VHDL and FPGA.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 9

HVL013 FPGA and GSM Implementation of Advanced Home

Security System

Abstract:

Home and industrial security today needs to make use of the latest technological

components. In this paper I going to present the design and implementation of a remote

and sensing, control and home security system based on GSM (Global System for

Mobile) . This system offers a complete, low cost, powerful and user friendly way of 24

hours of real –time monitoring and remote control of a home and industrial security. The

system works as a remote sensing for the electrical appliances at home to check whether

it is on or off, at the same time the user can control the electrical appliances at home by

sending SMS ( Short Messaging Service) message to the system, for example turning on t

he AC before returning home. In case of fire/security the chip will receive signals from

the different sensors in the monitoring place and acts according to the received signal by

sending an SMS message to user’s Mobile Phone, it also works as automatic and

immediate reporting to the user in case of emergency for home security, as ell as

immediate and automatic reporting to the fire brigade and police station according to

activated sensor to decrease the time required for tacking action.The design has been

described using VHDL (VHSIC Hardware Description Language) and implemented in

hardware using FPGA (Field Programmable Gate Array).

HVL014

FPGA Implementation of Picoblaze based Embedded System for

Monitoring Applications

Abstract:

PicoBlaze is an 8-bit soft core microprocessor developed by Xilinx that can be

synthesized in some FPGA families. This paper presents a set of peripherals that have

been developed to interface with PicoBlaze: VGA control, serial communication, PS/2

keyboard port and LCD control. To demonstrate its capabilities, the system has been

implemented in a FPGA board and some typical control and monitoring systems have

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 10

been developed. The design approach of the peripherals and details of the integration of

the systems are explained.

HVL015

Design and Development of FPGA Based Data Acquisition System for

Process Automation

Abstract:

This paper presents a novel approach to the design of data acquisition system for

process applications. The core heart of the proposed system is Field Programmable Gate

Array (FPGA) which is configured and programmed to acquire a maximum of 16 MB

real time data. For the real time validation of the designed system, a process plant with

three parameters i.e. pressure, temperature and level is considered. Real time data from

the process is acquired using suitable temperature, pressure and level sensors. Signal

conditioners are designed for each sensor and are tested in real time. Designed FPGA

based data acquisition system along with corresponding signal conditioners is validated

in real-time by running the process and comparing the same with the corresponding

references. The data acquired in real time compares well with the references.

HVL016

Intelligent Car Parking Management System On FPGA

Abstract:

Car parking has become an immense issue, especially in big cities. There are two

main reasons: Firstly, the growth in population, secondly, the security. Moreover, the car

theft has become an evil art haunting drivers. In this paper, we provide an interface and a

software/ hardware module for Intelligent Car Park Management System (ICPMS). The

ICPMS will provide an extensive management for vehicles including parking facilities

and security. The ICPMS is validated using a test case scenario and extensive

experimentation proves the feasibility of the approach.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 11

HVL017

FPGA Based Implementation of Flat Panel Display Controller with DVI

Interface

Abstract:

Digital displays are a fast-growing market comprising LCD, plasma, and rear

projection television technologies as well as smaller displays for mobile handsets and

automobiles, in addition to many other applications. Digital image processing enhances

the overall viewing aesthetics of the displayed image and can differentiate our product.

This paper deals about the design and development of Flat panel display controller design

using Xilinx Spartan-3AN Development board and intended for display panel

applications to assist in developing products for this market. The display solution FPGA

consists of a DVI Input interface, colour temperature correction, precise gamma

correction, an image dithering engine, and Low- Voltage Differential Signaling (LVDS)

Transmit (TX) output interface.

HVL018

Design Of FPGA Based PWM Solar Power Inverter For Livelihood

Generation In Rural

Abstract:

The paper presents development of a utility interface solar power converter

(Inverter) in Grid / DG power supply for a Solar lighting system used in rural home of

Indian villages. The power supply system comprises of solar (PV) array, PWM converter

incorporating PWM control strategy, energy storage battery devices. The model of the

system has been designed for its operation and a prototype solar power converter. The

system simulation of PWM Pulse generation has been done on a XILINX based FPGA

Spartan 3E board using VHDL code. The test on simulation of PWM generation program

after synthesis and compilation were recorded and verified on a prototype sample.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 12

HVL019

Real Time Design & Implementation of Digital Speedometer On FPGA

Abstract:

In this paper, a Digital Speedometer is designed and implemented using FPGA

(Field Programmable Gate Array). Here, the FPGA used is Smart Fusion FPGA. It is

more flexible in hardware and embedded design where need a true system-on-chip (SoC)

solution FPGA devices are ideal than traditional fixed-function microcontrollers and

without the excessive cost of soft processor cores on traditional FPGAs. At the inception,

the speedometer is designed using Verilog Hardware Description Language. Synthesis-

software algorithmically transforms the Verilog source code into a netlist, a logically-

equivalent description consisting only of elementary logic primitives (AND, OR, NOT,

flip-flops, etc.) that are available in a specific FPGA or VLSI technology. The designed

speedometer gives lots of extra features than existing speedometers. The special addition

of this speedometer is the velocity of the speedometer is accurate not only in normal

times but also at exceedingly small velocities.

HVL020

Design of Control Module for Serial DAC Based on FPGA

Abstract:

In order to increase the flexibility of control for serial DAC, a new control method

for DAC based on FPGA is proposed in this paper. A state transition diagram can be

drawn according to the timing diagram of DAC, Which can be realized in FPGA using

Very High-speed Integrated Circuit Hardware Description Language. The simulate

results show that logic in FPGA is consistent with the requirements. The module based

on FPGA can be modified just by modifying software, not the hardware.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 13

HVL021

FPGA-Based Evolvable Hardware Systems

Abstract:

Since 1992, year where Hugo de Garis has published the first paper on Evolvable

Hardware (EHW), a period of intense creativity has followed. It has been actively

researched, developed and applied to various problems. Different approaches have been

proposed that created three main classifications: extrinsic, mixtrinsic and intrinsic EHW.

Each of these solutions has a real interest. Nevertheless, although the extrinsic evolution

generates some excellent results, the intrinsic systems are not so advanced. This paper

suggests 3 possible solutions to implement the run-time configuration intrinsic EHW

system: FPGA-based Run-Time Configuration system, JBits-based Run-Time

Configuration system and Multi-board functional-level Run-Time Configuration system.

The main characteristic of the proposed architectures is that they are implemented on

Field Programmable Gate Array. A comparison of proposed solutions demonstrates that

multi-board functional-level run-time configuration is superior in terms of scalability,

flexi ility and the implementation easiness.

HVL022

Implementation of Maximum Power Point Tracking Using Kalman

Filter for Solar Photovoltaic Array on FPGA

Abstract:

This paper proposes FPGA implementation of a novel approach to track

maximum power point of a solar photovoltaic array. The approach uses Kalman filter

algorithm to track maximum power point. Using this approach tracking becomes much

faster than using the generic Perturb & Observe algorithm in case of sudden weather

changes. In this paper output of the proposed algorithm on FPGA is provided.

Experimentation was performed under optimal conditions as well as under cloudy

conditions i.e. falling irradiance levels. Using the proposed technique the maximum

power point of a solar PV array is tracked with an efficiency of 97.11%. Moreover, the

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 14

maximum power point has been tracked at a much faster rate i.e. 4.5 ms using the

proposed algorithm compared to the existing generic Perturb and Observe approach.

HVL023

An FPGA Based Implementation of a Flexible Digital PID Controller

For a Motion Control System

Abstract:

Implementation of digital controllers in embedded environment suffers from the

inherent problems associated with analog-digital signals interfacing in hard real-time,

therefore, the control algorithms are invariantly subjected to approximations. This paper

presents a novel technique for implementation of an efficient FPGA based digital

Proportional-Integral-Derivative (PID) controller for the motion control of a permanent

magnet DC motor. The implementation technique circumnavigates the problem of

interfacing analog and digital systems in real-time. The controller is used in a speed

control loop. The hardware implementation has been done on a Xilinx Spartan 3 FPGA

chip. A novel technique has been adopted for the generation of the control input as a

PWM signal for controlling the motor driver circuit and decoding the optical encoder

data for using it for the speed feedback in the PID control loop. The VHDL algorithm for

the proposed implementation has also been presented in this paper. A comparison of the

experimental results with the Matlab®

based simulation shows the effectiveness of the

proposed method.

HVL024

Design of an Oximeter Based on LED-LED Configuration and FPGA

Technology

Abstract:

A fully digital photoplethysmographic (PPG) sensor and actuator has been

developed. The sensing circuit uses one Light Emitting Diode (LED) for emitting light

into human tissue and one LED for detecting the reflectance light from human tissue. A

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 15

Field Programmable Gate Array (FPGA) is used to control the LEDs and determine the

PPG and Blood Oxygen Saturation (SpO2). The configurations with two LEDs and four

LEDs are developed for measuring PPG signal and Blood Oxygen Saturation (SpO2). N-

LEDs configuration is proposed for multichannel SpO2 measurements. The approach

resulted in better spectral sensitivity, increased and adjustable resolution, reduced noise,

small size, low cost and low power consumption.

HVL025

Smart Camera Based on FPGA Oriented to Embedded Image

Processing

Abstract:

This paper presents an image processing system based on smart camera platform,

whose two principle elements are a Wide-VGA CMOS Sensor and a Field Programmable

Gate Array (FPGA). The latter is used to control the various sensor parameter

configurations and, where desired, to receive and process the images captured by the

CMOS sensor. With the advent of today's highly integrated Field Programmable Gate

Array (FPGA) it is possible to have a software programmable processor and hardware

computing resources on the same chip. Apart from having sufficient logic blocks on

which the hardware is implemented these chips also have an embedded processor with

system software to implement the application software around it. In this paper, the

Spartan-3A DSP based Xilinx VSK platform is used for developing the proposed

extensible hardware-software video streaming and processing modules. In order to

develop the required hardware and software in an integrated fashion, Xilinx Embedded

Development Kit (EDK) design tool has been used. A number of Xilinx provided IPs are

customized to realize the hardware modules in the FPGA fabric. Copyright © 2013 Praise

Worthy Prize S.r.l. -All rights reserved.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 16

HVL026

Real Time Vehicular Camera Vision Acquisition System Using FPGA

Abstract:

With the advent of active safety technologies in the automotive industry, a need to

record and replay the actual on-road vehicular scenario has risen, especially in systems

involving camera-based vision. The primary objective of the paper is to propose a design

of a system for real-time video acquisition. Hence, a design for a Camera Hardware

simulator has been proposed in this paper. The system involves a camera that captures

visual information through its image sensor. The system is designed such that it can do

direct display; that is, it can generate vertical and horizontal synchronization signals, as

per the specification of the camera and it can buffer the pixel clock coming from the

camera and send it to another system that uses the video information being received such

as an in-vehicle display to display it. It also includes the ability to record the incoming

data stream in a computer for offline processing. As the aforementioned functionality is

to be achieved for high incoming data rates and also handy interfacing to any recording

device is required, we are using a Universal Serial Bus (USB) 2.0 (high speed). Many

subsystems are to be designed on the same chip, so we propose the use of a Field

Programmable Gate Array (FPGA) based system for fast data processing and

miniaturization of the system. The system under consideration is comprised of a

Complementary Metal-Oxide Semiconductor (CMOS) camera at the input and a high-

speed USB interface at the output. The FPGA is programmed as a Video Graphics Array

(VGA) Controller, buffer and a USB Controller. FPGA consumes less power when

compared to any other embedded-based system, making it more usable inside an

automobile. This system can be used for offline video processing and simulation of the

exact on-road scenario in lab for vision based active safety system's testing purpose or for

active safety algorithm improvement to achieve desired results (by tweaking the

algorithm for desired results, based on the observations made from the recorded data

without having the need to go on road again and again).

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 17

HVL027

Constructivist Multi-Access Lab Approach in Teaching FPGA Systems

Design with LabVIEW

Abstract:

Embedded systems play vital role in modern applications [1]. They can be found

in autos, washing machines, electrical appliances and even in toys. FPGAs are the most

recent computing technology that is used in embedded systems. There is an increasing

demand on FPGA based embedded systems, in particular, for applications that require

rapid time responses. Engineering education curricula needs to respond to the increasing

industrial demand of using FPGAs by introducing new syllabus for teaching and learning

this subject. This paper describes the development of new course material for teaching

FPGA-based embedded systems design by using ‘G’ Programming Language of

LabVIEW. A general overview of FPGA role in engineering education is provided. A

survey of available Hardware Programming Languages for FPGAs is presented. A survey

about LabVIEW utilization in engineering education is investigated; this is followed by a

motivation section of why to use LabVIEW graphical programming in teaching and its

capabilities. Then, a section of choosing a suitable kit for the course is laid down. Later,

constructivist closed-loop model the FPGA course has been proposed in accordance with

[2-4; 80,86,89,92]. The paper is proposing a pedagogical framework for FPGA teaching;

pedagogical evaluation will be conducted in future studies. The complete study has been

done at the Faculty of Electrical and Electronic Engineering, Aleppo University.

HVL028

FPGA-Based Educational Platform for Real-Time Image Processing

Experiments

Abstract:

In this paper, an implementation of an educational platform for real-time linear

and morphological image filtering using a FPGA NexysII, Xilinx®, Spartan 3E, is

described. The system is connected to a USB port of a personal computer, which in that

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 18

way form a powerful and low-cost design station for educational purposes. The FPGA-

based system is accessed through a MATLAB graphical user interface, which handles the

communication setup and data transfer. The system allows the students to perform

comparisons between results obtained from MATLAB simulations and FPGA-based real-

time processing. Concluding remarks derived from course evaluations and lab reports are

presented.

HVL029

Real-time Traffic Sign detection and Recognition on FPGA

Abstract:

This paper presents the implementation of an embedded automotive system that

detects and recognizes traffic signs within a video stream. In addition, it discusses the

recent advances in driver assistance technologies and highlights the safety motivations

for smart in-car embedded systems. An algorithm is presented that processes RGB image

data, extracts relevant pixels, filters the image, labels prospective traffic signs and

evaluates them against template traffic sign images. A reconfigurable hardware system is

described which uses the Virtex-5 Xilinx FPGA and hardware/software co-design tools in

order to create an embedded processor and the necessary hardware IP peripherals. The

implementation is shown to have robust performance results, both in terms of timing and

accuracy.

HVL030

Design and implementation of a secure RFID system on FPGA

Abstract:

Radio Frequency Identification systems have been widely used in many

applications nowadays. Since then, data security has been an important issue in RFID

communication in order to prevent undesired people to decrypt communication data.

Considering the problem, a secure RFID system is designed in this study. An RFID

communication at 868MHz is demonstrated by programming transceiver modules via

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 19

Spartan3E FPGA Kits. To increase communication security between reader and tag, 2-

stage authentication protocol is preferred. Tiny Encryption Algorith (TEA) module is

used as IP Core in FPGA to provide data encryption and sistem design has been

completed.

STIMULATION ONLY

HVL031

A 2.63 Mbit/s VLSI Implementation of SISO Arithmetic Decoders for

High Performance Joint Source Channel Codes

Abstract:

This paper highlights the implementation challenges faced by the current high

performing error resilient joint source channel coding (JSCC) techniques based on the

concept of soft-input soft-output (SISO) decoding of arithmetic codes (AC). Further, it

proposes several efficient algorithmic and a very large scale integration (VLSI)

architectural techniques to improve the throughput performance of SISO for JSCC. The

VLSI hardware implementation of the proposed algorithm, when implemented on a 90

nm standard cells technology running at 588 MHz, achieves a decoding throughput of up

to 2.63 Mbits/s capable of decoding QCIF format for video conferencing.

HVL032

3-D Mesh-Based Optical Network-on-Chip for Multiprocessor System-

on-Chip

Abstract:

Optical networks-on-chip (ONoCs) are emerging communication architectures

that can potentially offer ultrahigh communication bandwidth and low latency to

multiprocessor systems-on-chip (MPSoCs). In addition to ONoC architectures, 3-D

integrated technologies offer an opportunity to continue performance improvements with

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 20

higher integration densities. In this paper, we present a 3-D mesh-based ONoC for

MPSoCs, and new low-cost nonblocking 4 × 4, 5 × 5, 6 × 6, and 7 × 7 optical routers for

dimension-order routing in the 3-D mesh-based ONoC. Besides, we propose an

optimized floorplan for the 3-D mesh-based ONoC. The floorplan follows the regular 3-

D mesh topology but implements all optical routers in a single optical layer. The

floorplan is optimized to minimize the number of extra waveguide crossings caused when

merging the 3-D ONoC to one optical layer. Based on a set of real applications and

uniform traffic pattern, we develop a SystemC-based cycle-accurate NoC simulator and

compare the 3-D mesh-based ONoC with the matched 2-D mesh-based ONoC and 2-D

electronic NoC for performance and energy efficiency. Additionally, we quantitatively

analyze thermal effects on the 3-D 8 × 8 × 2 mesh-based ONoC.

HVL033

A Built-In Repair Analyzer With Optimal Repair Rate for Word-

Oriented Memories

Abstract:

This paper presents a built-in self repair analyzer with the optimal repair rate for

memory arrays with redundancy. The proposed method requires only a single test, even

in the worst case. By performing the must-repair analysis on the fly during the test, it

selectively stores fault addresses, and the final analysis to find a solution is performed on

the stored fault addresses. To enumerate all possible solutions, existing techniques use

depth first search using a stack and a finite-state machine. Instead, we propose a new

algorithm and its combinational circuit implementation. Since our formulation for the

circuit allows us to use the parallel prefix algorithm, it can be configured in various ways

to meet area and test time requirements. The total area of our infrastructure is dominated

by the number of content addressable memory entries to store the fault addresses, and it

only grows quadratically with respect to the number of repair elements. The

infrastructure is also extended to support various types of word-oriented memories.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 21

HVL034

A Fast-Locking All-Digital Deskew Buffer With Duty-Cycle Correction

Abstract:

In this paper, a fast-locking all-digital deskew buffer with duty cycle correction is

proposed and implemented. A cyclic time-to-digital converter is introduced to decrease

the locking time in conventional register-controlled delay-locked loop to only two input

clock cycles in coarse tuning. With the aid of the three half delay lines technique, the

mismatch between half delay lines causing the duty cycle distortion can be alleviated by

interpolation. A balanced edge combiner to achieve a precise 50% output clock is also

presented. A test chip is fabricated in 0.18-μm technology to demonstrate the feasibility

of the proposed architecture. The circuit can accept the input clock rates from 250 to 625

MHz with the duty cycle variation within 30% and 70% to generate 50% output clocks. It

preserves the capability of closed-loop control with a small area and power consumption.

HVL035

A High-Speed Low-Complexity Modified FFT Processor for High Rate

WPAN Applications

Abstract:

This paper presents a high-speed low-complexity modified radix-25 512-point

fast Fourier transform (FFT) processor using an eight data-path pipelined approach for

high rate wireless personal area network applications. A novel modified radix-25 FFT

algorithm that reduces the hardware complexity is proposed. This method can reduce the

number of complex multiplications and the size of the twiddle factor memory. It also uses

a complex constant multiplier instead of a complex Booth multiplier. The proposed FFT

processor achieves a signal-to-quantization noise ratio of 35 dB at 12 bit internal word

length. The proposed processor has been designed and implemented using 90-nm CMOS

technology with a supply voltage of 1.2 V. The results demonstrate that the total gate

count of the proposed FFT processor is 290 K. Furthermore, the highest throughput rate

is up to 2.5 GS/s at 310 MHz while requiring much less hardware complexity.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 22

HVL036

A Low-Complexity Turbo Decoder Architecture for Energy-Efficient

Wireless Sensor Networks

Abstract:

Turbo codes have recently been considered for energy-constrained wireless

communication applications, since they facilitate a low transmission energy consumption.

However, in order to reduce the overall energy consumption, lookup table-log-BCJR

(LUT-Log-BCJR) architectures having a low processing energy consumption are

required. In this paper, we decompose the LUT-Log-BCJR architecture into its most

fundamental add compare select (ACS) operations and perform them using a novel low-

complexity ACS unit. We demonstrate that our architecture employs an order of

magnitude fewer gates than the most recent LUT-Log-BCJR architectures, facilitating a

71% energy consumption reduction. Compared to state-of-the-art maximum logarithmic

Bahl-Cocke-Jelinek-Raviv implementations, our approach facilitates a 10% reduction in

the overall energy consumption at ranges above 58 m.

HVL037

A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal

Quantum Circuits

Abstract:

We present an algorithm for computing depth-optimal decompositions of logical

operations, leveraging a meet-in-the-middle technique to provide a significant speedup

over simple brute force algorithms. As an illustration of our method, we implemented this

algorithm and found factorizations of commonly used quantum logical operations into

elementary gates in the Clifford+T set. In particular, we report a decomposition of the

Toffoli gate over the set of Clifford and T gates. Our decomposition achieves a total T-

depth of 3, thereby providing a 40% reduction over the previously best known

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 23

decomposition for the Toffoli gate. Due to the size of the search space, the algorithm is

only practical for small parameters, such as the number of qubits, and the number of

gates in an optimal implementation.

HVL038

AC-Plus Scan Methodology for Small Delay Testing and

Characterization

Abstract:

Small delay defects escaping traditional delay testing could cause a device to

malfunction in the field and thus detecting these defects is often necessary. To address

this issue, we propose three test modes in a new methodology called AC-plus scan, in

which versatile test clocks can be generated on the chip by embedding an all-digital

phase-locked loop (ADPLL) into the circuit under test (CUT). AC-plus scan can be

executed on an in-house wireless test platform called HOY system. The first test mode of

our AC-plus scan provides a more efficient way to measure the longest path delay

associated with each test pattern. Experimental result shows that our method could

greatly reduce the test time by 81.8%. The second test mode is designed for volume

production test. It could effectively detect small delay defects and provide fast

characterization on those defective chips for further processing. This mode could be used

to help predict which chips are more likely to fall victim to operational failure in the

field. The third test mode is to extract the waveform of each flip-flop's output in a real

chip. This is made possible by taking advantage of the almost unlimited test memory our

HOY test platform provides, so that we could easily store a great volume of data and

reconstruct the waveform for post-silicon debugging. We have successfully fabricated a

Viterbi decoder chip with such an AC-plus scan methodology inside to demonstrate its

capability.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 24

HVL039

Addressing Transient and Permanent Faults in NoC With Efficient

Fault-Tolerant Deflection Router

Abstract:

Continuing decrease in the feature size of integrated circuits leads to increases in

susceptibility to transient and permanent faults. This paper proposes a fault-tolerant

solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism

to detect both transient and permanent faults, a hybrid automatic repeat request, and

forward error correction link-level error control scheme to handle transient faults and a

reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to

tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-

based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR

router. Synthesized results show that, compared with the FTDR router, the FTDR-H

router can reduce the area by 27% in an 88 network. Simulation results demonstrate that

under synthetic workloads, in the presence of permanent link faults, the throughput of an

8 8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average

than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the

cost-based deflection routing algorithm, respectively. Under real application workloads,

the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN

algorithm. For transient faults, the performance of the FTDR router can achieve graceful

degradation even at a high fault rate. We also implement the fault-tolerant deflection

router which can achieve 400 MHz in TSMC 65-nm technology.

HVL040

All-Digital Fast-Locking Pulse width-Control Circuit With

Programmable Duty Cycle

Abstract:

This paper proposes an all-digital fast-locking pulsewidth-control circuit with

programmable duty cycle. In comparison with prior state-of-the-art methods, our use of

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 25

two delay lines and a time-to-digital detector allows the pulsewidth-control circuit to

operate over a wide frequency range with fewer delay cells, while maintaining the same

level of accuracy. This paper presents a new duty-cycle setting circuit that calculates the

desired output duty cycle without the need for a look-up table. The circuit was fabricated

under the two-stage matrix converter 0.18-CMOS process. Results show that the

proposed circuit performs well for an input operating frequency ranging from 200 to 600

MHz, and an input duty cycle ranging from 30% to 70%. It achieves a programmable

output duty cycle ranging from 31.25% to 68.75% in increments of 6.25%.

HVL041

An Analytical Latency Model for Networks-on-Chip

Abstract:

We propose an analytical model based on queueing theory for delay analysis in a

wormhole-switched network-on-chip (NoC). The proposed model takes as input an

application communication graph, a topology graph, a mapping vector, and a routing

matrix, and estimates average packet latency and router blocking time. It works for

arbitrary network topology with deterministic routing under arbitrary traffic patterns.

This model can estimate per-flow average latency accurately and quickly, thus enabling

fast design space exploration of various design parameters in NoC designs. Experimental

results show that the proposed analytical model can predict the average packet latency

more than four orders of magnitude faster than an accurate simulation, while the

computation error is less than 10% in non-saturated networks for different system-on-

chip platforms.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 26

HVL042

An Energy-Efficient L2 Cache Architecture Using Way Tag

Information Under Write-Through Policy

Abstract:

Many high-performance microprocessors employ cache write-through policy for

performance improvement and at the same time achieving good tolerance to soft errors in

on-chip caches. However, write-through policy also incurs large energy overhead due to

the increased accesses to caches at the lower level (e.g., L2 caches) during write

operations. In this paper, we propose a new cache architecture referred to as way-tagged

cache to improve the energy efficiency of write-through caches. By maintaining the way

tags of L2 cache in the L1 cache during read operations, the proposed technique enables

L2 cache to work in an equivalent direct-mapping manner during write hits, which

account for the majority of L2 cache accesses. This leads to significant energy reduction

without performance degradation. Simulation results on the SPEC CPU2000 benchmarks

demonstrate that the proposed technique achieves 65.4% energy savings in L2 caches on

average with only 0.02% area overhead and no performance degradation. Similar results

are also obtained under different L1 and L2 cache configurations. Furthermore, the idea

of way tagging can be applied to existing low-power cache design techniques to further

improve energy efficiency.

HVL043

An On-Chip Network Fabric Supporting Coarse-Grained Processor

Array

Abstract:

Coarse grained arrays (CGAs) with run-time reconfigurability play an important

role in accelerating reconfigurable computing applications. It is challenging to design on-

chip communication networks (OCNs) for such CGAs with dynamic run-time

reconfigurability whilst satisfying the tight budgets of power and area for an embedded

system. This paper presents a silicon-proven design of a 64-PE circuit-switched OCN

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 27

fabric with a dynamic path-setup scheme capable of supporting an embedded coarse-

grained processor array. A proof-of-concept test chip fabricated in a 0.13 μm CMOS

process occupies a silicon area of 23 mm2 and consumes a peak power of 200 mW @

128 MHz and 1.2 Vcc, at room temperature. The OCN overhead consumes 9.4% of the

area and 18% of the power of the total chip. Experimental results and analysis show that

the proposed OCN fabric with its dynamic path-setup is suitable for use in an embedded

CGA supporting fast run-time reconfigurability.

HVL044

An Ultrasynchronization Checking Method With Trace-Driven

Simulation for Fast and Accurate MPSoC Virtual Platform Simulation

Abstract:

Efficiency and accuracy are two critical concerns in multiprocessor system on

chip (MPSoC) virtual platform simulation. Traditional simulation approaches that

achieve high efficiency usually sacrifice accuracy. On the other hand, the cycle-accurate

simulation algorithms are generally slow in speed. In this paper, we propose an

ultrasynchronization checking method with an efficient trace-driven simulation

mechanism that can not only improve simulation speed, but also maintain simulation

accuracy. We build a SystemC-based MPSoC virtual platform to evaluate the

effectiveness of our approach. The experimental results show that our proposed

simulation scheme can improve simulation speed up to 156× over the traditional

clock-step simulation method. Furthermore, our proposed method can also guarantee the

cycle-accurate simulation result.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 28

HVL045

Application Space Exploration of a Heterogeneous Run-Time

Configurable Digital Signal Processor

Abstract:

This paper describes the application space exploration of a heterogeneous digital

signal processor with dynamic reconfiguration capabilities. The device is built around

three reconfigurable engines featuring different flavours and computation granularities

that make it suitable for a wide range of signal processing application domains such as

video coding, image processing, telecommunications, and cryptography. Performance of

signal processing applications is evaluated from measurements performed on a CMOS 90

nm prototype. In order to characterize the application space of the processor, performance

is compared with state-of-the-art devices, taking programmability, computational

capabilities, and energy efficiency as the main metrics. The device exploits performance

and energy efficiency significantly more than general purpose processors, while still

maintaining a user-friendly programming approach that mainly relies on software-

oriented languages. The device is able to achieve 1.2 to 15 GOPS with an energy

efficiency from 2 to 50 GOPS/W when running the selected applications

HVL046

Architecture for Real-Time Nonparametric Probability Density

Function Estimation

Abstract:

Adaptive systems are increasing in importance across a range of application

domains. They rely on the ability to respond to environmental conditions, and hence real-

time monitoring of statistics is a key enabler for such systems. Probability density

function (PDF) estimation has been applied in numerous domains; computational

limitations, however, have meant that proxies are often used. Parametric estimators

attempt to approximate PDFs based on fitting data to an expected underlying distribution,

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 29

but this is not always ideal. The density function can be estimated by rescaling a

histogram of sampled data, but this requires many samples for a smooth curve. Kernel-

based density estimation can provide a smoother curve from fewer data samples. We

present a general architecture for nonparametric PDF estimation, using both histogram-

based and kernel-based methods, which is designed for integration into streaming

applications on field-programmable gate array (FPGAs). The architecture employs

heterogeneous resources available on modern FPGAs within a highly parallelized and

pipelined design, and is able to perform real-time computation on sampled data at speeds

of over 250 million samples per second, while extracting a variety of statistical

properties.

HVL047

Built-In Generation of Functional Broadside Tests Using a Fixed

Hardware Structure

Abstract:

Functional broadside tests are two-pattern scan-based tests that avoid overtesting

by ensuring that a circuit traverses only reachable states during the functional clock

cycles of a test. In addition, the power dissipation during the fast functional clock cycles

of functional broadside tests does not exceed that possible during functional operation.

On-chip test generation has the added advantage that it reduces test data volume and

facilitates at-speed test application. This paper shows that on-chip generation of

functional broadside tests can be done using a simple and fixed hardware structure, with a

small number of parameters that need to be tailored to a given circuit, and can achieve

high transition fault coverage for testable circuits. With the proposed on-chip test

generation method, the circuit is used for generating reachable states during test

application. This alleviates the need to compute reachable states offline.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 30

HVL048

A Built-In Self-Repair Scheme for 3-D RAMs With Interdie

Redundancy

Abstract:

3-D integration using through silicon via is an emerging technology for integrated

circuit designs. Random access memory (RAM) is one good candidate for the application

of 3-D integration technology. However, yield will be a key challenge for the volume

production of 3-D RAMs. In this paper, we present yield-enhancement techniques for 3-

D RAMs. An interdie redundancy scheme is proposed to improve the yield of 3-D

RAMs. Three stacking flows with respect to different bonding technologies for 3-D

RAMs with interdie redundancy are proposed as well. Finally, a built-in self-repair

(BISR) scheme is proposed to perform the repair of 3-D RAMs with interdie

redundancies. The BISR circuits in two stacked dies can work together to allocate

interdie redundancies. Simulation results show that the proposed yield-enhancement

techniques can effectively improve the yield of 3-D RAMs.

HVL049

Check pointing for Virtual Platforms and SystemC-TLM

Abstract:

Integrating simulation models created using different simulation systems is a

common problem when constructing virtual platforms. Different companies and different

departments can create models, and virtual platforms for different purposes using

different tools. There are also existing models that need to be integrated into new tools, or

the other way around. The simulators can be quite different in details, even in the case of

transaction-level models. We present work in integrating SystemC transaction-level

models into two typical full-system simulation environments, QEMU and Simics. We

present issues in reconciling the semantics of the different platforms, and our proposed

solutions. In the Simics integration, we additionally enable checkpointing in the models,

based on the Simics checkpoint mechanism.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 31

HVL050

Circuit-Level Timing Error Tolerance forLow-Power DSP Filters and

Transforms

Abstract:

In this paper, we present a novel circuit-level timing error mitigation technique,

which aims to increase energy-efficiency of digital signal processing datapaths without

loss of robustness. Timing errors are detected using razor flip-flops on critical-paths, and

the error-rate feedback is used to control a dynamic voltage scaling control loop. In place

of conventional razor error correction by replay, we propose a new approach to bound the

magnitude of intermittent timing errors at the circuit level. A timing guard-band is

created by shaping the path delay distribution such that the critical paths correspond to a

group of least-significant bit registers. These end-points are ensured to be critical by

modifying the topology of the final stage carry-merge adder, and by using tool-based

device sizing. Hence, timing violations lead to weakly correlated logical errors of small

magnitude in a mean-squared-error sense. We examine this approach in an finite-impulse

response (FIR) filter and a 2-D discrete cosine transform implementation, in 32-nm

CMOS. Power saving compared to a conventional design at iso-frequency is 21%-23% at

the typical corner, while retaining a voltage guard-band to protect against fast transient

changes in switching activity and supply noise. The impact on minimum clock period is

small (16%-20%), as it does not necessitate the use of ripple-carry adders and also

requires only a bare minimum of additional design effort.

HVL051

Combined Architecture/Algorithm Approach to Fast FPGA Routing

Abstract:

We propose a new field-programmable gate array (FPGA) routing approach,

which, when combined with a low-cost architecture change, results in a 40% reduction in

router runtime, at the cost of a 6% area overhead and with no increase in critical path

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 32

delay. Our approach begins with PathFinder-style routing, which we run on a coarsened

representation of the routing architecture. This leads to fast generation of a partial routing

solution where the signals are assigned to groups of wire segments rather than individual

wire segments. A Boolean satisfiability (SAT)-based stage follows, generating a legal

routing solution from the partial solution. We explore approximately 165 000 FPGA

switch block architectures, showing that the choice of the architecture has a significant

impact on the complexity of the SAT formulation, and by extension, on routing runtime.

Our approach points to a new research direction, namely, reducing FPGA computer-aided

design runtime by exploring FPGA architectures and algorithms together.

HVL052

CORDIC Designs for Fixed Angle of Rotation

Abstract:

Rotation of vectors through fixed and known angles has wide applications in

robotics, digital signal processing, graphics, games, and animation. But, we do not find

any optimized coordinate rotation digital computer (CORDIC) design for vector-rotation

through specific angles. Therefore, in this paper, we present optimization schemes and

CORDIC circuits for fixed and known rotations with different levels of accuracy. For

reducing the area- and time-complexities, we have proposed a hardwired pre-shifting

scheme in barrel-shifters of the proposed circuits. Two dedicated CORDIC cells are

proposed for the fixed-angle rotations. In one of those cells, micro-rotations and scaling

are interleaved, and in the other they are implemented in two separate stages. Pipelined

schemes are suggested further for cascading dedicated single-rotation units and bi-

rotation CORDIC units for high-throughput and reduced latency implementations. We

have obtained the optimized set of micro-rotations for fixed and known angles. The

optimized scale-factors are also derived and dedicated shift-add circuits are designed to

implement the scaling. The fixed-point mean-squared-error of the proposed CORDIC

circuit is analyzed statistically, and strategies for reducing the error are given. We have

synthesized the proposed CORDIC cells by Synopsys Design Compiler using TSMC 90-

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 33

nm library, and shown that the proposed designs offer higher throughput, less latency and

less area-delay product than the reference CORDIC design for fixed and known angles of

rotation. We find similar results of synthesis for different Xilinx field-programmable

gate-array platforms.

HVL053

Design and Implementation of an On-Chip Permutation Network for

Multiprocessor System-On-Chip

Abstract:

This paper presents the silicon-proven design of a novel on-chip network to

support guaranteed traffic permutation in multiprocessor system-on-chip applications.

The proposed network employs a pipelined circuit-switching approach combined with a

dynamic path-setup scheme under a multistage network topology. The dynamic path-

setup scheme enables runtime path arrangement for arbitrary traffic permutations. The

circuit-switching approach offers a guarantee of permuted data and its compact overhead

enables the benefit of stacking multiple networks. A 0.13-μ m CMOS test-chip validates

the feasibility and efficiency of the proposed design. Experimental results show that the

proposed on-chip network achieves 1.9× to 8.2× reduction of silicon overhead compared

to other design approaches.

HVL054

Design of Digit-Serial FIR Filters: Algorithms,Architectures, and a

CAD Tool

Abstract:

In the last two decades, many efficient algorithms and architectures have been

introduced for the design of low-complexity bit-parallel multiple constant multiplications

(MCM) operation which dominates the complexity of many digital signal processing

systems. On the other hand, little attention has been given to the digit-serial MCM design

that offers alternative low-complexity MCM operations albeit at the cost of an increased

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 34

delay. In this paper, we address the problem of optimizing the gate-level area in digit-

serial MCM designs and introduce high-level synthesis algorithms, design architectures,

and a computer-aided design tool. Experimental results show the efficiency of the

proposed optimization algorithms and of the digit-serial MCM architectures in the design

of digit-serial MCM operations and finite impulse response filters.

HVL055

Efficient Implementation of Reconfigurable Warped Digital Filters

With Variable Low-Pass, High-Pass,Bandpass, and Bandstop Responses

Abstract:

In this brief, an efficient implementation of reconfigurable warped digital filter

with variable low-pass, high-pass, bandpass, and bandstop responses is presented. The

warped filters, obtained by replacing each unit delay of a digital filter with an all-pass

filter, are widely used for various audio processing applications. However, warped filters

require first-order all-pass transformation to obtain variable low-pass or high-pass

responses, and second-order all-pass transformation to obtain variable bandpass or

bandstop responses. To overcome this drawback, the proposed method combines the

warped filters with the coefficient decimation technique. The proposed architecture

provides variable low-pass or high-pass responses with fine control over cut-off

frequency and variable bandwidth bandpass or bandstop responses at an arbitrary center

frequency without updating the filter coefficients or filter structure. The design example

shows that the proposed variable digital filter is simple to design and offers substantial

savings in gate counts and power consumption over other approaches.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 35

HVL056

Error Detection in Majority Logic Decoding of Euclidean Geometry

Low Density Parity Check (EG-LDPC) Codes

Abstract:

In a recent paper, a method was proposed to accelerate the majority logic

decoding of difference set low density parity check codes. This is useful as majority logic

decoding can be implemented serially with simple hardware but requires a large decoding

time. For memory applications, this increases the memory access time. The method

detects whether a word has errors in the first iterations of majority logic decoding, and

when there are no errors the decoding ends without completing the rest of the iterations.

Since most words in a memory will be error-free, the average decoding time is greatly

reduced. In this brief, we study the application of a similar technique to a class of

Euclidean geometry low density parity check (EG-LDPC) codes that are one step

majority logic decodable. The results obtained show that the method is also effective for

EG-LDPC codes. Extensive simulation results are given to accurately estimate the

probability of error detection for different code sizes and numbers of errors.

HVL057

Exploration and Optimization of 3-D Integrated DRAM Subsystems

Abstract:

Energy efficiency is the major optimization criterion for systems-on-chip (SoCs)

for mobile devices (smartphones and tablets). Through silicon via (TSV) technology

enables 3-D integration of dies and the heterogeneous stacking of multiple memory or

logic layers, allowing increased bandwidth and lower energy consumption of the memory

interface compared to traditional approaches. In this paper, we explore the 3-D-DRAM

architecture design space. The result is an optimized 2 Gb 3-D-DRAM, which shows a

83% lower energy/bit than a 2 Gb device. Furthermore, we propose a highly energy-

efficient DRAM subsystem for next-generation 3-D-integrated SoCs, consisting of a

SDR/DDR 3-D-DRAM controller and an attached 3-D-DRAM cube with fine-grained

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 36

access and a flexible (WIDE-IO) interface. We assess the energy efficiency using a

synthesizable model of the SDR/DDR 3-D-DRAM channel controller (CC) as well as

functional models of the 3-D-stacked DRAM, including an accurate power estimation

engine. We also investigate different DRAM families (WIDE IO SDR/DDR, LPDDR,

and LPDDR2) and densities from 256 Mb to 4 Gb per channel. The implementation

results of the proposed 3-D-DRAM subsystem show that energy optimized accesses to

the 3-D-DRAM enable up to 50% energy savings compared to standard accesses. To the

best of our knowledge this is the first design space exploration for 3-D-stacked DRAM

considering different technologies based on real-world physical data and the first design

of a 3-D-DRAM CC and 3-D-DRAM model featuring co-optimization of memory and

controller architecture.

HVL058

Glitch-Free NAND-Based Digitally Controlled Delay-Lines

Abstract:

The recently proposed NAND-based digitally controlled delay-lines (DCDL)

present a glitching problem which may limit their employ in many applications. This

paper presents a glitch-free NAND-based DCDL which overcame this limitation by

opening the employ of NAND-based DCDLs in a wide range of applications. The

proposed NAND-based DCDL maintains the same resolution and minimum delay of

previously proposed NAND-based DCDL. The theoretical demonstration of the glitch-

free operation of proposed DCDL is also derived in the paper. Following this analysis,

three driving circuits for the delay control-bits are also proposed. Proposed DCDLs have

been designed in a 90-nm CMOS technology and compared, in this technology, to the

state-of-the-art. Simulation results show that novel circuits result in the lowest resolution,

with a little worsening of the minimum delay with respect to the previously proposed

DCDL with the lowest delay. Simulations also confirm the correctness of developed

glitching model and sizing strategy. As example application, proposed DCDL is used to

realize an All-digital spread-spectrum clock generator (SSCG). The employ of proposed

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 37

DCDL in this circuit allows to reduce the peak-to-peak absolute output jitter of more than

the 40% with respect to a SSCG using three-state inverter based DCDLs.

HVL059

Improved Trace Buffer Observation via Selective Data Capture Using

2-D Compaction for Post-Silicon Debug

Abstract:

This paper presents a novel technique for extending the capacity of trace buffers

when capturing debug data during post-silicon debug. It exploits the fact that is it not

necessary to capture error-free data in the trace buffer since that information can be

obtained from simulation. A selective data capture method is proposed in this paper that

only captures debug data during clock cycles in which errors are present. The proposed

debug method requires only three debug sessions. The first session estimates a rough

error rate, the second session identifies a set of suspect clock cycles where errors may be

present, and the third session captures the suspect clock cycles in the trace buffer. The

suspect clock cycles are determined through a 2-D compaction technique using multiple-

input signature register signatures and cycling register signatures. Intersecting both

signatures generates a small number of suspect clock cycles for which the trace buffer

needs to capture. The effective observation window of the trace buffer can be expanded

significantly, by up to orders of magnitude. Experimental results indicate very significant

increases in the effective observation window for a trace buffer can be obtained.

HVL060

IsoNet: Hardware-Based Job Queue Management for Many-Core

Architectures

Abstract:

Imbalanced distribution of workloads across a chip multiprocessor (CMP)

constitutes wasteful use of resources. Most existing load distribution and balancing

techniques employ very limited hardware support and rely predominantly on software for

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 38

their operation. This paper introduces IsoNet, a hardware-based conflict-free dynamic

load distribution and balancing engine. IsoNet is a lightweight job queue manager

responsible for administering the list of jobs to be executed, and maintaining load balance

among all CMP cores. By exploiting a micro-network of load-balancing modules, the

proposed mechanism is shown to effectively reinforce concurrent computation in many-

core environments. Detailed evaluation using a full-system simulation framework

indicates that IsoNet significantly outperforms existing techniques and scales efficiently

to as many as 1024 cores. Furthermore, to assess its feasibility, the IsoNet design is

synthesized, placed, and routed in 45-nm VLSI technology. Analysis of the resulting low-

level implementation shows that IsoNet's area and power overhead are almost negligible.

HVL061

Joint Decoding of LDPC Code and Phase Factors for OFDM Systems

With PTS PAPR Reduction

Abstract:

In this paper, we investigate a low-density parity-check (LDPC)-coded orthogonal

frequency-division multiplexing (OFDM) system with a peak-to-average power ratio

(PAPR) reduction using the partial transmit sequence (PTS), which does not transmit

PTS side information about the phase factors. We view the PTS processing as a stage of

coding and call the resulted code of LDPC coding and PTS processing a concatenated

LDPC-PTS code. Then, we derive the parity-check matrix of the concatenated LDPC-

PTS code. With the parity-check matrix, the LDPC code and phase factors can be jointly

decoded using belief propagation algorithms. Neither transmission of PTS side

information (phase factors) nor phase factor estimation before decoding is required by the

proposed scheme. Simulation results show that the proposed joint decoding provides

nearly perfect phase factor recovery and LDPC decoding for a small number of PTS

partitions.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 39

HVL062

Latch-Based Performance Optimization for Field-Programmable Gate

Arrays

Abstract:

We explore using pulsed latches for timing optimization -- a first in the FPGA

community. Pulsed latches are transparent latches driven by a clock with a non-standard

(non-50%) duty cycle. We exploit existing functionality within commercial FPGA chips

to implement latch-based optimizations that do not have the power or area drawbacks

associated with other timing optimization approaches, such as clock skew and retiming.

We propose an algorithm that iteratively replaces certain flip-flops in a logic design with

latches for an improvement in circuit speed. Results show that much of the performance

improvement achieved by using multiple skewed clocks can also be achieved using a

single clock and latches. We also consider the impact of short delay paths (i.e. minimum

delays), which can cause hold-time violations. Under conservative minimum delay

assumptions, our latch-based optimization, operating on the routed design, provides a 5%

performance improvement, on average, essentially for "free" (i.e. without any re-

routing/delay padding). We show that short paths greatly hinder the ability of using

latches for speed improvement, motivating further work to reduce their effects.

HVL063

MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM

Systems

Abstract:

This paper presents an multipath delay commutator (MDC)-based architecture

and memory scheduling to implement fast Fourier transform (FFT) processors for

multiple input multiple output-orthogonal frequency division multiplexing (MIMO-

OFDM) systems with variable length. Based on the MDC architecture, we propose to use

radix-$N_{s}$ butterflies at each stage, where $N_{s}$ is the number of data streams, so

that there is only one butterfly needed in each stage. Consequently, a 100% utilization

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 40

rate in computational elements is achieved. Moreover, thanks to the simple control

mechanism of the MDC, we propose simple memory scheduling methods for input data

and output bit/set-reversing, which again results in a full utilization rate in memory

usage. Since the memory requirements usually dominate the die area of FFT/inverse fast

Fourier transform (IFFT) processors, the proposed scheme can effectively reduce the

memory size and thus the die area as well. Furthermore, to apply the proposed scheme in

practical applications, we let $N_{s}=4$ and implement a 4-stream FFT/IFFT processor

with variable length including 2048, 1024, 512, and 128 for MIMO-OFDM systems. This

processor can be used in IEEE 802.16 WiMAX and 3GPP long term evolution

applications. The processor was implemented with an UMC 90-nm CMOS technology

with a core area of 3.1 ${rm mm}^{2}$. The power consumption at 40 MHz was

63.72/62.92/57.51/51.69 mW for 2048/1024/512/128-FFT, respectively in the post-layout

simulation. Finally, we analyze the complexity and performance of the implemented

processor and compare it with other processors. The results show advantages of the

proposed scheme in terms of area and power consumption.

HVL064

Mining Hardware Assertions With Guidance From Static Analysis

Abstract:

We present GoldMine, a methodology for generating assertions automatically in

hardware. Our method involves a combination of data mining and static analysis of the

register transfer level (RTL) design. The RTL design is first simulated to generate data

about the design's dynamic behavior. The generated data is then mined for “candidate

assertions” that are likely to be invariants. The data mining algorithm is a decision-tree-

based supervised learning algorithm. These candidate assertions are then passed through

a formal verification engine to filter out the spurious candidates. The assertions that are

attested as true by the formal engine are system invariants. These are then evaluated by a

process of designer ranking that is provided as feedback to the data mining engine. We

demonstrate the scalability of GoldMine by showing assertion generation of the RTL of

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 41

Sun's OpenSparc T2 many-threaded processor. Our results show that GoldMine can

generate complex, high coverage assertions for sequential as well as combinational

designs in RTL, thereby minimizing human effort in this process. GoldMine assertions

distill the random input stimulus space and can be used for calibrating directed tests.

They can be used in a regression test suite of an evolving RTL. They are also useful in

providing differing perspectives from the designer, as well as hints to designers for

manually writing assertions.

HVL065

NCTU-GR 2.0: Multithreaded Collision-Aware Global Routing with

Bounded-Length Maze Routing

Abstract:

Modern global routers employ various routing methods to improve routing speed

and quality. Maze routing is the most time-consuming process for existing global routing

algorithms. This paper presents two bounded-length maze routing (BLMR) algorithms

(optimal-BLMR and heuristic-BLMR) that perform much faster routing than traditional

maze routing algorithms. In addition, a rectilinear Steiner minimum tree aware routing

scheme is proposed to guide heuristic-BLMR and monotonic routing to build a routing

tree with shorter wirelength. This paper also proposes a parallel multithreaded collision-

aware global router based on a previous sequential global router (SGR). Unlike the

partitioning-based strategy, the proposed parallel router uses a task-based concurrency

strategy. Finally, a 3-D wirelength optimization technique is proposed to further refine

the 3-D routing results. Experimental results reveal that the proposed SGR uses less

wirelength and runs faster than most of other state-of-the-art global routers with a

different set of parameters , , , . Compared to the proposed SGR, the proposed parallel

router yields almost the same routing quality with average 2.71 and 3.12-fold speedup on

overflow-free and hard-to-route cases, respectively, when running on a 4-core system.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 42

HVL066

Novel MIMO Detection Algorithm for High-Order Constellations in the

Complex Domain

Abstract:

A novel detection algorithm with an efficient VLSI architecture featuring efficient

operation over infinite complex lattices is proposed. The proposed design results in the

highest throughput, the lowest latency, and the lowest energy compared to the complex-

domain VLSI implementations to date. The main innovations are a novel complex-

domain means of expanding/visiting the intermediate nodes of the search tree on demand,

rather than exhaustively, as well as a new distributed sorting scheme to keep track of the

best candidates at each search phase. Its support of unbounded infinite lattice decoding

distinguishes the present method from previous K-Best strategies and also allows its

complexity to scale sublinearly with the modulation order. Since the expansion and

sorting cores are data-driven, the architecture is well suited for a pipelined parallel VLSI

implementation. The proposed algorithm is used to fabricate a 4×4, 64-QAM complex

multiple-input-multiple-output detector in a 0.13-μm CMOS technology, achieving a

clock rate of 417 MHz with the core area of 340 kgates. The chip test results prove that

the fabricated design can sustain a throughput of 1 Gb/s with energy efficiency of 110

pJ/bit, the best numbers reported to date.

HVL067

On the Fixed-Point Accuracy Analysis and Optimization of Polynomial

Specifications

Abstract:

Fixed-point accuracy analysis and optimization of polynomial data-flow graphs

with respect to a reference model is a challenging task in many digital signal processing

applications. Range and precision analysis are two important steps of this process to

assign suitable integer and fractional bit-widths to the fixed-point variables and constant

coefficients in a design such that no overflow occurs and a given error bound on

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 43

maximum mismatch (MM) or mean-square-error (MSE) and signal-to-quantization-noise

ratio (SQNR) is satisfied. This paper explores efficient optimization algorithms based on

robust analyses of MM and MSE/SQNR for fixed-point polynomial data-flow graphs.

Experimental results illustrate the robustness of our analyses and the efficiency of the

optimization algorithms compared to previous work.

HVL068

Pipelined Radix- Feedforward FFT Architectures

Abstract:

The appearance of radix-22 was a milestone in the design of pipelined FFT

hardware architectures. Later, radix-22 was extended to radix-2k . However, radix-2k

was only proposed for single-path delay feedback (SDF) architectures, but not for

feedforward ones, also called multi-path delay commutator (MDC). This paper presents

the radix-2k feedforward (MDC) FFT architectures. In feedforward architectures radix-2k

can be used for any number of parallel samples which is a power of two. Furthermore,

both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be

used. In addition to this, the designs can achieve very high throughputs, which makes

them suitable for the most demanding applications. Indeed, the proposed radix-2k

feedforward architectures require fewer hardware resources than parallel feedback ones,

also called multi-path delay feedback (MDF), when several samples in parallel must be

processed. As a result, the proposed radix-2k feedforward architectures not only offer an

attractive solution for current applications, but also open up a new research line on

feedforward structures

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 44

HVL069

Pragmatic Integration of an SRAM Row Cache in Heterogeneous 3-D

DRAM Architecture Using TSV

Abstract:

As scaling DRAM cells becomes more challenging and energy-efficient DRAM

chips are in high demand, the DRAM industry has started to undertake an alternative

approach to address these looming issues-that is, to vertically stack DRAM dies with

through-silicon-vias (TSVs) using 3-D-IC technology. Furthermore, this emerging

integration technology also makes heterogeneous die stacking in one DRAM package

possible. Such a heterogeneous DRAM chip provides a unique, promising opportunity for

computer architects to contemplate a new memory hierarchy for future system design. In

this paper, we study how to design such a heterogeneous DRAM chip for improving both

performance and energy efficiency. In particular, we found that, if we want to design an

SRAM row cache in a DRAM chip, simple stacking alone cannot address the majority of

traditional SRAM row cache design issues. In this paper, to address these issues, we

propose a novel floorplan and several architectural techniques that fully exploit the

benefits of 3-D stacking technology. Our multi-core simulation results with memory-

intensive applications suggest that, by tightly integrating a small row cache with its

corresponding DRAM array, we can improve performance by 30% while saving dynamic

energy by 31%.

HVL070

Reconfigurable Adaptive Singular Value Decomposition Engine Design

for High-Throughput MIMO-OFDM Systems

Abstract:

Singular value decomposition (SVD) is an optimal method to obtain spatial

multiplexing gain in multi-input multi-output (MIMO) channels. However, the high cost

of implementation and high decomposing latency of the SVD restricts its usage in current

wireless communication applications. In this paper, we present a complete adaptive SVD

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 45

algorithm and a reconfigurable architecture for high-throughput MIMO-orthogonal

frequency division multiplexing systems. There are several proposed architectural design

techniques: reconfigurable scheme, division-free adaptive step size scheme, early

termination scheme, and data interleaving scheme. The reconfigurable scheme can

support all antenna configurations in a MIMO system. The division-free adaptive step

size and early termination schemes are used to effectively reduce the decomposing

latency and improve hardware utilization. The data interleaving scheme helps to deal

with several channel matrices concurrently. Besides, we propose an orthogonal

reconstruction scheme to obtain more accurate SVD outputs, and then the system

performance will be greatly enhanced. We apply our SVD design to the IEEE 802.11 n

applications. This design is implemented and fabricated in UMC 90 nm 1P9M CMOS

technology. The maximum operating frequency is measured to be at 101.2 MHz, and the

corresponding power dissipation is at 125 mW. The core size is 2.17 ${rm mm}^{2}$

and the die size occupies 4.93 ${rm mm}^{2}$. The chip result shows that the average

latency is only 0.33% of the wireless local area network coherence time. Hence, the

proposed reconfigurable adaptive SVD engine design is very suitable for high-throughput

wireless communication applications

HVL071

Scalability Analysis of Memory Consistency Models in NoC-based

Distributed Shared Memory SoCs

Abstract:

We analyze the scalability of six memory consistency models in network-on-chip

(NoC)-based distributed shared memory multicore systems: 1) protected release

consistency (PRC); 2) release consistency (RC); 3) weak consistency (WC); 4) partial

store ordering (PSO); 5) total store ordering (TSO); and 6) sequential consistency (SC).

Their realizations are based on a transaction counter and an address-stack-based

approach. The scalability analysis is based on different workloads mapped on various

sizes of networks using different problem sizes. For the experiments, we use Nostrum

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 46

NoC-based configurable multicore platform with a 2-D mesh topology and a deflection

routing algorithm. Under the synthetic workloads, the average execution time for the

PRC, RC, WC, PSO, and TSO models in the 8 × 8 network (64-cores) is reduced by

32.3%, 28.3%, 20.1%, 13.8%, and 9.9% over the SC model, respectively. For the

application workloads, as the network size grows, the average execution time under these

relaxed memory models decreases with respect to the SC model depending on the

application and its match to the architecture. The performance improvement of the PRC

and RC models over the SC model tends to be higher than 50% as observed in the

experiments, when the system is further scaled up. The area cost in the network interface

for the relaxed memory models is increased by less than 4% over the SC model.

HVL072

Scaling Energy Per Operation via an Asynchronous Pipeline

Abstract:

Statistical analysis of computations per unit energy in processors over the last 30

years is given that illustrates a sharp reduction in the rate of energy efficiency

improvements over the last several years resulting in the formation of an asymptotic

“wall” with our dataset; we use the measure of giga multiply accumulates per Joule. We

have developed an energy model which takes into account the realities of scaling,

specifically for asynchronous systems. Studies of an energy efficient asynchronous

pipeline show fabricated results of 17 Giga Operations per Joule in 0.6 μm at

subthreshold when fully pipelined, and simulations at a more modern 65 nm process

show a further order of magnitude improvement on that.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 47

HVL073

Secure Dual-Core Cryptoprocessor for Pairings Over Barreto-Naehrig

Curves on FPGA Platform

Abstract:

This paper is devoted to the design and the physical security of a parallel dual-

core flexible cryptoprocessor for computing pairings over Barreto-Naehrig (BN) curves.

The proposed design is specifically optimized for field-programmable gate-array (FPGA)

platforms. The design explores the in-built features of an FPGA device for achieving an

efficient cryptoprocessor for computing 128-bit secure pairings. The work further

pinpoints the vulnerability of those pairing computations against side-channel attacks and

demonstrates experimentally that power consumptions of such devices can be used to

attack these ciphers. Finally, we suggest a suitable countermeasure to overcome the

respective weaknesses. The proposed secure cryptoprocessor needs 1 730 000, 1 206 000,

and 821 000 cycles for the computation of Tate, ate, and optimal-ate pairings,

respectively. The implementation results on a Virtex-6 FPGA device shows that it

consumes 23 k Slices and computes the respective pairings in 11.93, 8.32, and 5.66 ms.

HVL074

Selective Flexibility:Creating Domain-Specific Reconfigurable Arrays

Abstract:

Historically, hardware acceleration technologies have either been application-

specific, therefore lacking in flexibility, or fully programmable, thereby suffering from

notable inefficiencies on an application-by-application basis. To address the growing

need for domain-specific acceleration technologies, this paper describes a design

methodology (i) to automatically generate a domain-specific coarse-grained array from a

set of representative applications and (ii) to introduce limited forms of architectural

generality to increase the likelihood that additional applications can be successfully

mapped onto it. In particular, coarse-grained arrays generated using our approach are

intended to be integrated into customizable processors that use application-specific

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 48

instruction set extensions to accelerate performance and reduce energy; rather than

implementing these extensions using application-specific integrated circuit (ASIC) logic,

which lacks flexibility, they can be synthesized onto our reconfigurable array instead,

allowing the processor to be used for a variety of applications in related domains. Results

show that our array is around 2× slower and 15× larger than an ultimately efficient ASIC

implementation, and thus far more efficient than fieldprogrammable gate arrays (FPGAs),

which are known to be 3-4× slower and 20-40× larger. Additionally, we estimate that our

array is usually around 2× larger and 2× slower than an accelerator synthesized using

traditional datapath merging, which has, if any, very limited flexibility beyond the design

set of DFGs.

HVL075

Self-Repairing Digital System With Unified Recovery Process Inspired

by Endocrine Cellular Communication

Abstract:

Self-repairing digital systems have recently emerged as the most promising

alternative for fault-tolerant systems. However, such systems are still impractical in many

cases, particularly due to the complex rerouting process that follows cell replacement.

They lose efficiency when the circuit size increases, due to the extra hardware in addition

to the functional circuit and the unutilization of normal operating hardware for fault

recovery. In this paper, we propose a system inspired by endocrine cellular

communication, which simplifies the rerouting process in two ways: 1) by lowering the

hardware overhead along with the increasing size of the circuit and 2) by reducing the

hardware unutilized for fault recovery while maintaining good fault-coverage. The

proposed system is composed of a structural layer and a gene-control layer. The structural

layer consists of novel modules and their interconnections. In each module of our system,

the encoded data, called the genome, contains information about the function and the

connection. Therefore, a faulty module can be replaced and the whole system's functions

and connections are maintained by simply assigning the same encoded data to a spare

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 49

(stem) module. In existing systems, a huge amount of hardware, such as a dynamic

routing system, is required for such an operation. The gene-control layer determines the

neighboring spare module in the structural layer to replace the faulty module without

collision. We verified the proposed mechanism by implementing the system with a field-

programmable gate array with the application of a digital clock whose status can be

monitored with light-emitting-diodes. In comparison with existing methods, the proposed

architecture and mechanism are efficient enough for application with real fault-tolerant

systems dealing with harsh and remote environments, such as outer space or deep sea.

HVL076

STBC-OFDM Downlink Baseband Receiver for Mobile WMAN

Abstract:

This paper proposes a space time block code-orthogonal frequency division

multiplexing downlink baseband receiver for mobile wireless metropolitan area network.

The proposed baseband receiver applied in the system with two transmit antennas and

one receive antenna aims to provide high performance in outdoor mobile environments. It

provides a simple and robust synchronizer and an accurate but hardware affordable

channel estimator to overcome the challenge of multipath fading channels. The coded bit

error rate performance for 16 quadrature amplitude modulation can achieve less than 10-6

under the vehicle speed of 120 km/hr. The proposed baseband receiver designed in 90-nm

CMOS technology can support up to 27.32 Mb/s uncoded data transmission under 10

MHz channel bandwidth. It requires a core area of 2.41 × 2.41 mm2 and dissipates 68.48

mW at 78.4 MHz with 1 V power supply.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 50

HVL077

Techniques for Compensating Memory Errors in JPEG2000

Abstract:

This paper presents novel techniques to mitigate the effects of SRAM memory

failures caused by low voltage operation in JPEG2000 implementations. We investigate

error control coding schemes, specifically single error correction double error detection

code based schemes, and propose an unequal error protection scheme tailored for

JPEG2000 that reduces memory overhead with minimal effect in performance.

Furthermore, we propose algorithm-specific techniques that exploit the characteristics of

the discrete wavelet transform coefficients to identify and remove SRAM errors. These

techniques do not require any additional memory, have low circuit overhead, and more

importantly, reduce the memory power consumption significantly with only a small

reduction in image quality.

HVL078

Test Patterns of Multiple SIC Vectors: Theory and Application in BIST

Schemes

Abstract:

This paper proposes a novel test pattern generator (TPG) for built-in self-test. Our

method generates multiple single-input change (MSIC) vectors in a pattern, i.e., each

vector applied to a scan chain is an SIC vector. A reconfigurable Johnson counter and a

scalable SIC counter are developed to generate a class of minimum transition sequences.

The proposed TPG is flexible to both the test-per-clock and the test-per-scan schemes. A

theory is also developed to represent and analyze the sequences and to extract a class of

MSIC sequences. Analysis results show that the produced MSIC sequences have the

favorable features of uniform distribution and low input transition density. The

performances of the designed TPGs and the circuits under test with 45 nm are evaluated.

Simulation results with ISCAS benchmarks demonstrate that MSIC can save test power

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 51

and impose no more than 7.5% overhead for a scan design. It also achieves the target

fault coverage without increasing the test length.

HVL079

The LUT-SR Family of Uniform Random Number Generators for

FPGA Architectures

Abstract:

Field-programmable gate array (FPGA) optimized random number generators

(RNGs) are more resource-efficient than software-optimized RNGs because they can take

advantage of bitwise operations and FPGA-specific features. However, it is difficult to

concisely describe FPGA-optimized RNGs, so they are not commonly used in real-world

designs. This paper describes a type of FPGA RNG called a LUT-SR RNG, which takes

advantage of bitwise xor operations and the ability to turn lookup tables (LUTs) into shift

registers of varying lengths. This provides a good resource–quality balance compared to

previous FPGA-optimized generators, between the previous high-resource high-period

LUT-FIFO RNGs and low-resource low-quality LUT-OPT RNGs, with quality

comparable to the best software generators. The LUT-SR generators can also be

expressed using a simple C++ algorithm contained within this paper, allowing 60 fully-

specified LUT-SR RNGs with different characteristics to be embedded in this paper,

backed up by an online set of very high speed integrated circuit hardware description

language (VHDL) generators and test benches.

HVL080

Theoretical Modeling of Elliptic Curve Scalar Multiplier on LUT-Based

FPGAs for Area and Speed

Abstract:

This paper uses a theoretical model to approximate the delay of different

characteristic two primitives used in an elliptic curve scalar multiplier architecture

(ECSMA) implemented on k input lookup table (LUT)-based field-programmable gate

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 52

arrays. Approximations are used to determine the delay of the critical paths in the

ECSMA. This is then used to theoretically estimate the optimal number of pipeline stages

and the ideal placement of each stage in the ECSMA. This paper illustrates suitable

scheduling for performing point addition and doubling in a pipelined data path of the

ECSMA. Finally, detailed analyses, supported with experimental results, are provided to

design the fastest scalar multiplier over generic curves. Experimental results for

GF(2163) show that, when the ECSMA is suitably pipelined, the scalar multiplication can

be performed in only 9.5 μs on a Xilinx Virtex V. Notably the design has an area which is

significantly smaller than other reported high-speed designs, which is due to the better

LUT utilization of the underlying field primitives.

HVL081

A Flexible and Customizable Architecture for the Relaxation Labeling

Algorithm

Abstract:

This brief presents a flexible and customizable architecture for the probabilistic

relaxation labeling (PRL) algorithm. The algorithm has been restructured by using a

hardware-friendly process that is executed on the proposed architecture. This enables the

design to handle different numbers of objects and labels flexibly. Moreover, in the

design, the proposed PRL unit can be easily duplicated for K times according to the

available resources on the field-programmable gate array (FPGA). In this brief, K can be

scalable up to 10 by using a Virtex-6 FPGA XC6VLX240T platform. Compared with

existing architectures that are not suitable for a large number of objects, the proposed

architecture reduces the time complexity from O(N × M) to O(N) with the same O(N ×

M2) space complexity, where N and M are the numbers of objects and labels,

respectively. The experimental results show that the execution time of our design is about

15 times less for five objects and about 35 times less for a 128 × 64 image block than the

software implementation running on a Quad-core Intel 32-nm machine.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 53

HVL082

A Novel VLSI DHT Algorithm for a Highly Modular and Parallel

Architecture

Abstract:

A new very large scale integration (VLSI) algorithm for a 2N-length discrete

Hartley transform (DHT) that can be efficiently implemented on a highly modular and

parallel VLSI architecture having a regular structure is presented. The DHT algorithm

can be efficiently split on several parallel parts that can be executed concurrently.

Moreover, the proposed algorithm is well suited for the subexpression sharing technique

that can be used to significantly reduce the hardware complexity of the highly parallel

VLSI implementation. Using the advantages of the proposed algorithm and the fact that

we can efficiently share the multipliers with the same constant, the number of the

multipliers has been significantly reduced such that the number of multipliers is very

small comparing with that of the existing algorithms. Moreover, the multipliers with a

constant can be efficiently implemented in VLSI.

HVL083

An Adaptive Subsystem Based Algorithm for Channel Equalization in a

SIMO System

Abstract:

The principle of multiple input/output inversion theorem (MINT) has been

employed for multi-channel equalization. In this work, we propose to partition a single-

input multiple-output system into two subsystems. The equivalence between the

deconvoluted signals of the two subsystems is termed as auto-relation and we

subsequently exploit this relation as an additional constraint to the existing adaptive

MINT algorithm. In addition, we provide analysis of the auto-relation constraint and

show that this constraint confines the solution of equalization filters within a multi-

dimensional space. We also explain through the use of convergence analysis why our

proposed algorithm can achieve a higher rate of convergence compared to the existing

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 54

MINT-based algorithms. Simulation results, using both synthetic and recorded channel

impulse responses, show that our proposed auto-relation aided MINT algorithm can

achieve a fast convergence compared to the existing MINT-based algorithms.

HVL084

Binary Discrete Cosine and Hartley Transforms

Abstract:

In this paper, a systematic method for developing a binary version of a given

transform by using the Walsh-Hadamard transform (WHT) is proposed. The resulting

transform approximates the underlying transform very well, while maintaining all the

advantages and properties of WHT. The method is successfully applied for developing a

binary discrete cosine transform (BDCT) and a binary discrete Hartley transform

(BDHT). It is shown that the resulting BDCT corresponds to the well-known sequency-

ordered WHT, whereas the BDHT can be considered as a new Hartley-ordered WHT.

Specifically, the properties of the proposed Hartley-ordering are discussed and a shift-

copy scheme is proposed for a simple and direct generation of the Hartley-ordering

functions. For software and hardware implementation purposes, a unified structure for the

computation of the WHT, BDCT, and BDHT is proposed by establishing an elegant

relationship between the three transform matrices. In addition, a spiral-ordering is

proposed to graphically obtain the BDHT from the BDCT and vice versa. The application

of these binary transforms in image compression, encryption and spectral analysis clearly

shows the ability of the BDCT (BDHT) in approximating the DCT (DHT) very well.

HVL085

Computing Two-Pattern Test Cubes for Transition Path Delay Faults

Abstract:

Considering full-scan circuits, incompletely-specified tests, or test cubes, are used

for test data compression. When considering path delay faults, certain specified input

values in a test cube are needed only for determining the lengths of the paths associated

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 55

with detected faults. Path delay faults, and therefore, small delay defects, would still be

detected if such values are unspecified. The goal of this paper is to explore the possibility

of increasing the number of unspecified input values in a test set for path delay faults by

unspecifying such values in order to make the test set more amenable to test data

compression. Experimental results indicate that significant numbers of such values exist.

The proposed procedure unspecifies them gradually to obtain a series of test sets with

increasing numbers of unspecified values and decreasing path lengths. Experimental

results also indicate that filling the unspecified values randomly (as with some test data

compression methods) recovers some or all of the path lengths associated with detected

path delay faults. The procedure uses a matching of the sets of detected faults for the

comparison of path lengths.

HVL086

Design of Hardware Function Evaluators Using Low-Overhead

Nonuniform Segmentation With Address Remapping

Abstract:

In the piecewise function evaluation with polynomial approximation, nonuniform

segmentation can effectively reduce the size of lookup tables for some arithmetic

functions compared to uniform segmentation approaches, at the cost of the extra segment

address (index) encoder that results in area and delay overhead. Also, it is observed that

the nonuniform segmentation reflects a design tradeoff between the ROM size and the

area cost of the subsequent arithmetic computation hardware. In this paper, we propose a

new nonuniform segmentation method that searches for the optimal segmentation scheme

with the goal of minimized ROM, total area, or delay. For some high-variation arithmetic

functions, the proposed segmentation method achieves significant area reduction

compared to the uniform segmentation method. We also demonstrate the design tradeoff

among uniform and nonuniform segmentation, and degree-one and degree-two

polynomial approximations, with respect to precision ranging from 12 to 32 bits for the

elementary function of reciprocal.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 56

HVL087

DS-CDMA Implementation With Iterative Multiple Access Interference

Cancellation

Abstract:

In this paper an implementation of iterative joint detection for multiple access

interference using direct-sequence code-division multiple-access (DS-CDMA) is

presented. Results for multiple field programmable gate array (FPGA) platforms and

multiple technology nodes for synthesized application specific integrated circuits (ASIC)

are presented. The joint detection is performed using a generalized version of interleave-

division multiple-access (IDMA) known as partition spreading (PS) CDMA. Decoding is

performed using iterative methods from turbo and sum-product decoding. The

synthesized ASIC system demonstrates a maximum aggregate throughput of 197 Mb/s

for a fully loaded 50-user system, while the implemented FPGA 50-user system has a

maximum aggregate throughput of 119 Mb/s.

HVL088

FPGA-Based 40.9-Gbits/s Masked AES With Area Optimization for

Storage Area Network

Abstract:

In order to protect “data-at-rest” in storage area networks from the risk of

differential power analysis attacks without degrading performance, a high-throughput

masked advanced encryption standard (AES) engine is proposed. However, this engine

usually adopts the unrolling technique which requires extremely large field

programmable gate array (FPGA) resources. In this brief, we aim to optimize the area for

a masked AES with an unrolled structure. We achieve this by mapping its operations

from to as much as possible. We reduce the number of mapping [ to ] and inverse

mapping [ to ] operations of the masked SubBytes step from ten to one. In order to be

compatible, the masked MixColumns, masked AddRoundKey, and masked ShiftRows

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 57

including the redundant masking values are carried over . We also use FPGA block RAM

(BRAM) to further reduce hardware resources. Compared with a state-of-the-art design,

our implementation reduces the overall area by 36.2% (20.5% is contributed by the main

method, and 15.7% is contributed by the BRAM optimization). It achieves 40.9-Gbits/s at

4.5-Mbits/s/slice on the Xilinx XC6VLX240T platform. We have attacked the iterative

version of this masked AES in hardware. Results show that none of the bytes can be

guessed from the masked AES with the collected 10 000 power traces, but 14 out of 16

bytes can be guessed from the unprotected AES with the same number of traces.

HVL089

Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated

Multiple Constant Multiplication/Accumulation

Abstract:

Low-cost finite impulse response (FIR) designs are presented using the concept of

faithfully rounded truncated multipliers. We jointly consider the optimization of bit width

and hardware resources without sacrificing the frequency response and output signal

precision. Nonuniform coefficient quantization with proper filter order is proposed to

minimize total area cost. Multiple constant multiplication/accumulation in a direct FIR

structure is implemented using an improved version of truncated multipliers.

Comparisons with previous FIR design approaches show that the proposed designs

achieve the best area and power results.

HVL090

Low-Resolution DAC-Driven Linearity Testing of Higher Resolution

ADCs Using PolynomialFitting Measurements

Abstract:

A low-cost linearity test methodology for high-resolution analog-to-digital

converters (ADCs) is presented in this paper. Linearity testing of ADCs requires high-

precision digital-to-analog conversion (DAC) capability, commonly 3-bit higher

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 58

resolution than the ADC under test. Further, a large number of ADC output data samples

must be collected making conventional histogram testing impractical for high-resolution

ADCs with 18-24 bit precision. In the proposed test methodology, two low-precision and

low-cost DACs are used to generate a high-resolution ADC test stimulus. Significant

reductions in test cost and test time are achieved by using low-cost instrumentation and

by making fewer measurements than required for conventional histogram test. A least-

squares-based polynomial fitting approach is used to determine the transfer function of

the ADC under test. The generated transfer function is used to compute the non-linearity

of the ADC accurately. No assumption is made regarding the linearity of the lower

precision signal generators (DACs) used in the testing procedure. Software simulations

and hardware experiments are performed to validate the proposed test methodology.

HVL091

One Analog STBC-DCSK Transmission Scheme not Requiring Channel

State Information

Abstract:

Both the inherently wideband differential-chaos-shift-keying (DCSK) modulation

and the space-time block code (STBC) are techniques that can mitigate the effect of

multipath fading. By applying STBC at the chaotic segment level, a novel analog STBC-

DCSK scheme is proposed in this paper. The proposed scheme is a simple configuration

that combines the advantages of STBC and chaotic modulation. Due to the very low

correlation between different analog chaotic signals, the proposed scheme can

remarkably suppress the inter-transmit-antenna interference so as to recover the desired

information and to achieve the full diversity gain. The theoretical bit-error-rate (BER)

performance and the highly consistent simulation results demonstrate that the STBC-

DCSK scheme outperforms the conventional single-input-single-output (SISO)-DCSK

scheme by about 5 dB at a BER of $10^{-4}$ . The performance superiority of the

proposed scheme is further demonstrated in a typical UWB channel by simulations. More

importantly, the proposed scheme maintains the same low transceiver cost as the SISO-

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 59

DCSK scheme. Consequently, this proposed scheme is a low-cost alternative for wireless

local area network (WLAN) applications.

HVL092

Reconfigurable Accelerator for the Word-Matching Stage of BLASTN

Abstract:

BLAST is one of the most popular sequence analysis tools used by molecular

biologists. It is designed to efficiently find similar regions between two sequences that

have biological significance. However, because the size of genomic databases is growing

rapidly, the computation time of BLAST, when performing a complete genomic database

search, is continuously increasing. Thus, there is a clear need to accelerate this process. In

this paper, we present a new approach for genomic sequence database scanning utilizing

reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive

an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate

the computation of the word-matching stage. The experimental results show that the

FPGA implementation achieves a speedup around one order of magnitude compared to

the NCBI BLASTN software running on a general purpose computer.

HVL093

Reduced-Complexity LCC Reed–Solomon Decoder Based on Unified

Syndrome Computation

Abstract:

Reed-Solomon (RS) codes are widely used in digital communication and storage

systems. Algebraic soft-decision decoding (ASD) of RS codes can obtain significant

coding gain over the hard-decision decoding (HDD). Compared with other ASD

algorithms, the low-complexity Chase (LCC) decoding algorithm needs less computation

complexity with similar or higher coding gain. Besides employing complicated

interpolation algorithm, the LCC decoding can also be implemented based on the HDD.

However, the previous syndrome computation for 2η test vectors and the key equation

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 60

solver (KES) in the HDD requires long latency and remarkable hardware. In this brief, a

unified syndrome computation algorithm and the corresponding architecture are

proposed. Cooperating with the KES in the reduced inversion-free Berlekamp-Messy

algorithm, the reduced-complexity LCC RS decoder can speed up by 57% and the area

will be reduced to 62% compared with the original design for η = 3.

HVL094

Scale-Free Hyperbolic CORDIC Processor and Its Application to

Waveform Generation

Abstract:

This paper presents a novel completely scaling-free CORDIC algorithm in

rotation mode for hyperbolic trajectory. We use most-significant-1 bit detection

technique for micro-rotation sequence generation to reduce the number of iterations. By

storing the sinh/cosh hyperbolic values at octant boundaries in a ROM, we can extend the

range of convergence to the entire coordinate space. Based on this, we propose a pipeline

hyperbolic CORDIC processor to implement a direct digital synthesizer (DDS). The DDS

is further used to derive an efficient arbitrary waveform generator (AWG), where a

pseudo-random number generator modulates the linear increments of phase to produce

random phase-modulated waveform. The proposed waveform generator requires only one

DDS for generating variety of modulated waveforms, while existing designs require

separate DDS units for different type of waveforms, and multiple DDS units are required

to generate composite waveforms. Therefore, area complexity of existing designs gets

multiplied with the number of different types waveforms they generate, while in case of

proposed design that remains unchanged. The proposed AWG when mapped on Xilinx

Spartan 2E device, consumes 1076 slices and 2016 4-input LUTs. The proposed AWG

involves significantly less area and lower latency, with nearly the same throughput

compared to the existing CORDIC-based designs.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 61

HVL095

Scaling, Offset, and Balancing Techniques in FFT-Based BP Nonbinary

LDPC Decoders

Abstract:

An analysis of finite precision effects in nonbinary mixed-domain low-density

parity-check decoders is presented. It is shown how improved decoding performance can

be achieved by using an offset-based method and proper scaling techniques. In addition, a

novel fast Fourier transform (FFT)-based belief propagation (BP) decoder architecture is

proposed which balances the computational load between processing units. The results

show a 47% reduction in the number of required field-programmable gate array slices

compared to a standard FFT-based BP architecture

HVL096

Two-Rate Based Low-Complexity Variable Fractional-Delay FIR Filter

Structures

Abstract:

This paper considers two-rate based structures for variable fractional-delay (VFD)

finite-length impulse response (FIR) filters. They are single-rate structures but derived

through a two-rate approach. The basic structure considered hitherto utilizes a regular

half-band (HB) linear-phase filter and the Farrow structure with linear-phase subfilters.

Especially for wide-band specifications, this structure is computationally efficient

because most of the overall arithmetic complexity is due to the HB filter which is

common to all Farrow-structure subfilters. This paper extends and generalizes existing

results. Firstly, frequency-response masking (FRM) HB filters are utilized which offer

further complexity reductions. Secondly, both linear-phase and low-delay subfilters are

treated and combined which offers trade-offs between the complexity, delay, and

magnitude response overshoot which is typical for low-delay filters. Thirdly, the HB

filter is replaced by a general filter which enables additional frequency-response

constraints in the upper frequency band which normally is treated as a don't-care band.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 62

Wide-band design examples (90, 95, and 98% of the Nyquist band) reveal arithmetic

complexity savings between some 20 and 85% compared with other structures, including

infinite-length impulse response structures. Hence, the VFD filter structures proposed in

this paper exhibit the lowest arithmetic complexity among all hitherto published VFD

filter structures.

HVL097

VLSI Architectures for the 4-Tap and 6-Tap 2-D Daubechies Wavelet

Filters Using Algebraic Integers

Abstract:

This paper proposes a novel algebraic integer (AI) based multi-encoding of

Daubechies-4 and -6 2-D wavelet filters having error-free integer-based computation.

Digital VLSI architectures employing parallel channels are proposed, physically realized

and tested. The multi-encoded AI framework allows a multiplication-free and

computationally accurate architecture. It also guarantees a noise-free computation

throughput the multi-level multi-rate 2-D filtering operation. A single final reconstruction

step (FRS) furnishes filtered and down-sampled image outputs in fixed-point, resulting in

low levels of quantization noise. Comparisons are provided between Daubechies-4 and -6

designs in terms of SNR, PSNR, hardware structure, and power consumptions, for

different word lengths. SNR and PSNR improvements of approximately 30% were

observed in favour of AI-based systems, when compared to 8-bit fixed-point schemes

(six fractional bits). Further, FRS designs based on canonical signed digit representation

and on expansion factors are proposed. The Daubechies-4 and -6 4-level VLSI

architectures are prototyped on a Xilinx Virtex-6 vcx240t-1ff1156 FPGA device at 282

MHz and 146 MHz, respectively, with dynamic power consumption of 164 mW and 339

mW, respectively, and verified on FPGA chip using an ML605 platform.

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 63

HVL098

VLSI Implementation of a Low-Cost High-Quality Image Scaling

Processor

Abstract:

In this brief, a low-complexity, low-memory-requirement, and high-quality

algorithm is proposed for VLSI implementation of an image scaling processor. The

proposed image scaling algorithm consists of a sharpening spatial filter, a clamp filter,

and a bilinear interpolation. To reduce the blurring and aliasing artifacts produced by the

bilinear interpolation, the sharpening spatial and clamp filters are added as prefilters. To

minimize the memory buffers and computing resources for the proposed image processor

design, a T-model and inversed T-model convolution kernels are created for realizing the

sharpening spatial and clamp filters. Furthermore, two T-model or inversed T-model

filters are combined into a combined filter which requires only a one-line-buffer memory.

Moreover, a reconfigurable calculation unit is invented for decreasing the hardware cost

of the combined filter. Moreover, the computing resource and hardware cost of the

bilinear interpolator can be efficiently reduced by an algebraic manipulation and

hardware sharing techniques. The VLSI architecture in this work can achieve 280 MHz

with 6.08-K gate counts, and its core area is 30378 μm2 synthesized by a 0.13-μm CMOS

process. Compared with previous low-complexity techniques, this work reduces gate

counts by more than 34.4% and requires only a one-line-buffer memory.

HVL099

VLSI Implementation of a Multi-Mode Turbo/LDPC Decoder

Architecture

Abstract:

Low-density parity-check (LDPC) codes and convolutional Turbo codes are two

of the most powerful error correcting codes that are widely used in modern

communication systems. In a multi-mode baseband receiver, both LDPC and Turbo

decoders may be required. However, the different decoding approaches for LDPC and

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 64

Turbo codes usually lead to different hardware architectures. In this paper we propose a

unified message passing algorithm for LDPC and Turbo codes and introduce a flexible

soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the

trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and

Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes

where each super-code has a simpler trellis structure so that the MAP algorithm can be

easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of

LDPC and Turbo codes with a low hardware overhead (about 15% area and timing

overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder

architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be

embedded into a parallel decoder for higher decoding throughput. As a case study, a

flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS

technology with a core area of 3.2 mm2. The decoder can support IEEE 802.16e LDPC

codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz

clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps

Turbo decoding.

HVL0100

An Efficient Interpolation-Based Chase BCH Decoder

Abstract:

BCH codes are adopted in many systems, such as flash memory, optical

communications, and digital video broadcasting. By trying 2η test vectors, the soft-

decision Chase decoding algorithm of BCH codes can achieve significant coding gain

over hard-decision decoding. Previous one-pass Chase schemes find the error locators

based on the Berlekamp's algorithm and need hardware-demanding selection methods to

decide which locator corresponds to the correct code word. In this brief, a novel

interpolation-based one-pass Chase decoder is proposed for BCH codes. By making use

of the binary property of BCH codes, an innovative yet low-complexity method is

developed to select the interpolation output leading to successful decoding without

Hades InfoTech Pvt. Ltd

G2, Metha Complex, Little Mount, Saidapet, Chennai-15

Ph: 044-22200258, Mobile: 9840989556, 9952050233

Mail: [email protected], [email protected], www.hades.in P a g e | 65

bringing any performance loss. The code word recovery step is also significantly

simplified through nontrivial mathematical derivations. From architectural analysis, the

proposed decoder with η = 4 for a (4200, 4096) BCH code has 2.3 times higher efficiency

in terms of throughput-over-area ratio than the prior one-pass Chase decoder based on the

Berlekamp's algorithm, while achieving the same error-correcting performance.

HVL0101

An Efficient Multi-Standard LDPC Decoder Design Using Hardware-

Friendly Shuffled Decoding

Abstract:

This paper presents an efficient multi-standard low-density parity-check (LDPC)

decoder architecture using a shuffled decoding algorithm, where variable nodes are

divided into several groups. In order to provide sufficient memory bandwidth without the

need for using registers, a FIFO-based check-mode memory, which dominates the

decoder area, is used. Since two compensation factors, rather than a single factor, are

dynamically used in the offset Min-Sum algorithm, the number of quantization bits, and,

hence, the memory size, can be reduced without degradation in error performance. In

order to further reduce the memory size, artificial minimum values, which do not need to

be stored in memory, are used. We also propose an algorithm that can be used to partition

variable nodes such that the hardware cost can be minimized. Using the proposed

techniques, a multi-standard decoder that supports the LDPC codes specified in the ITU

G.hn, IEEE 802.11n, and IEEE 802.16e standards was designed and implemented using a

90-nm CMOS process. This decoder supports 133 codes, occupies an area of 5.529 mm2

, and achieves an information throughput of 1.956 Gbps.