detecting background setting for dynamic …goaltechnologies.in/vlsi project.pdf · a median filter...

DETECTING BACKGROUND SETTING

FOR DYNAMIC SCENE

ABSTRACT

Processing Real-Time image sequence is now possible because of advancement of technological

developments in digital signal processing, wide-band communication, and high-performance VLSI.

With the developments in video technology, the surveillance system can be built with some low cost

gadget such as the web-camera. In this modern life with increasing number of crime rate, people in

society need for security and safety; video surveillance has become important reason to oppose threats

of crime and terrorism.

The most fundamental part of surveillance is foreground detection, which is retrieval of an object of

interest. The object of interest can remodel by common background subtraction technique. There is

some problem arises by using this technique, where because of variation of light source, the background

constantly changes.

The intensity of pixel changes throughout the object detection takes place. Intensity of pixel value

changes leads to improper foreground detection, the background detected as foreground object.

This paper proposes a method to model and update the background of the scene by intersection solving

method.

LOSSLESS IMPLEMENTATION OF

DAUBECHIES 8-TAP WAVELET TRANSFORM

ABSTRACT

A new mapping scheme and its hardware implementation to error-freely compute the Daubechies 8-tap

wavelet transform is presented. The multidimensional technique maps the irrational transform basis

coefficients with integers and results in considerable reduction in hardware and power consumption.

When implemented in Xilinx FPGA, the scheme costs 518 logic cells, 186 registers and runs at a

frequency of 71MHz. While comparing with finite-precision architecture, the proposed scheme yields a

reduction of 15% in hardware and 41% in power consumption for similar image reconstruction, and

noticeable improvement in image reconstruction quality.

PERFORMANCE ANALYSIS OF INTEGER WAVELET TRANSFORM FOR IMAGE COMPRESSION

ABSTRACT

For image compression, it is very necessary that the selection of transform should reduce the size of the

resultant data as compared to the original data set .In this paper, a new lossless image compression

method is proposed.

For continuous and discrete time cases, wavelet transform and wavelet packet transform has emerged

as popular techniques. While integer wavelet using the lifting scheme significantly reduces the

computation time, we propose a completely new approach for further speeding up the computation.

First, wavelet packet transform (WPT) and lifting scheme (LS) are described. Then an application of the

LS to WPT is presented which leads to the generation of integer wavelet packet transform (IWPT).

The proposed method, Integer Wavelet Packet Transform (IWPT) yields a representation which can be

lossless, as it maps an integer valued sequence onto the integer valued coefficients. The idea of

Wavelet Packet Tree is used to transform the still and color images.

IWPT tree can be built by iterating the single wavelet decomposition step on both the low-pass

and high-pass branches, with rounding off in order to achieve the integer transforms. Thus, the

proposed method provides good compression ratio.

A MEDIAN FILTER FPGA WITH HARVARD ARCHITECTURE

ABSTRACT

To improve the speed of the image processing chip, to quick share the market and to reduce costs, this

paper designs a chip with Harvard Architecture and FPGA.

The chip is also used with a new hardware algorithm. Using the chip, the processing time is 13.2? less

than the time of the chip with Von Neumann Architecture. The used units of filter are 13% of the whole

FPGA gates, less than the claim part of the multi-image processing chip.

AUTOMATIC ROAD EXTRACTION USING HIGH RESOLUTION SATELLITE IMAGES BASED ON LEVEL SET AND

MEAN SHIFT METHODS

ABSTRACT

Analysis of high resolution satellite images has been an important research topic for urban analysis. One

of the important features of urban areas in urban analysis is the automatic road network extraction.

Two approaches for road extraction based on Level Set and Mean Shift methods are proposed.

From an original image it is difficult and computationally expensive to extract roads due to presences of

other road-like features with straight edges. The image is preprocessed to improve the tolerance by

reducing the noise (the buildings, parking lots, vegetation regions and other open spaces) and roads are

first extracted as elongated regions, non-linear noise segments are removed using a median filter (based

on the fact that road networks constitute large number of small linear structures). Then road extraction

is performed using Level Set and Mean Shift method.

Finally the accuracy for the road extracted images is evaluated based on quality measures. The 1m

resolution IKONOS data has been used for the experiment.

A NEW ADAPTIVE WEIGHT ALGORITHM FOR SALT AND PEPPER NOISE REMOVAL

ABSTRACT

A new adaptive weight algorithm is developed for the removal of salt and pepper noise. It consists of

two major steps, first to detect noise pixels according to the correlations between image pixels, then use

different methods based on the various noise levels.

For the low noise level, neighborhood signal pixels mean method is adopted to remove the noise, and

for the high noise level, an adaptive weight algorithm is used.

Experiments show the proposed algorithm has advantages over regularizing methods in terms of both

edge preservation and noise removal, even for heavily contaminated image with noise level as high as

90%, it still can get a significant performance.

REMOVAL OF HIGH DENSITY SALT AND PEPPER NOISE THROUGH MODIFIED DECISION BASED

UNSYMMETRIC TRIMMED MEDIAN FILTER

ABSTRACT

A modified decision based unsymmetrical trimmed median filter algorithm for the restoration of gray

scale, and color images that are highly corrupted by salt and pepper noise is proposed in this paper.

The proposed algorithm replaces the noisy pixel by trimmed median value when other pixel values, 0's

and 255's are present in the selected window and when all the pixel values are 0's and 255's then the

noise pixel is replaced by mean value of all the elements present in the selected window.

This proposed algorithm shows better results than the Standard Median Filter (MF), Decision Based

Algorithm (DBA), Modified Decision Based Algorithm (MDBA), and Progressive Switched Median Filter

(PSMF).

The proposed algorithm is tested against different grayscale and color images and it gives better Peak

Signal-to-Noise Ratio (PSNR) and Image Enhancement Factor (IEF).

OPERATION IMPROVEMENT OF INDOOR ROBOT BY GESTURE RECOGNITION

ABSTRACT

Recently, the demand for the indoor robots has increased. Therefore, increased opportunities for many

people to operate the robots have emerged. However, for many people, it is often difficult to operate a

robot using the conventional methods like remote control.

To solve this problem, we propose a robot operation system using the hand gesture recognition. Our

method pays attention to the direction and movement of the hand. We were able to recognize several

gestures in real-time.

ADIABATIC TECHNIQUE FOR ENERGY EFFICIENT LOGIC CIRCUITS DESIGN

ABSTRACT

The Energy dissipation in conventional CMOS circuits can be minimized through adiabatic technique. By

adiabatic technique dissipation in PMOS network can be minimized and some of energy stored at load

capacitance can be recycled instead of dissipated as heat.

But the adiabatic technique is highly dependent on parameter variation. With the help of TSPICE

simulations, the energy consumption is analyzed by variation of parameter.

In analysis, two logic families, ECRL (Efficient Charge Recovery Logic) and PFAL (Positive Feedback

Adiabatic Logic) are compared with conventional CMOS logic for inverter and 2:1 multiplexer circuits. It

is find that adiabatic technique is good choice for low power application in specified frequency range.

DESIGN AND FPGA IMPLEMENTATION OF MODIFIED DISTRIBUTIVE ARITHMETIC BASED DWT-IDWT

PROCESSOR FOR IMAGE COMPRESSION

ABSTRACT

Image compression is one of the major image processing techniques that is widely used in medical,

automotive, consumer and military applications. Discrete wavelet transforms is the most popular

transformation technique adopted for image compression.

Complexity of DWT is always high due to large number of arithmetic operations. In this work a modified

Distributive Arithmetic based DWT architecture is proposed and is implemented on FPGA. The modified

approach consumes area of 6% on Virtex-II pro FPGA and operates at 134 MHz.

The modified DA-DWT architecture has a latency of 44 clock cycles and a throughput of 4 clock cycles.

This design is twice faster than the reference design and is thus suitable for applications that require

high speed image processing algorithms.

AN FPGA-BASED ARCHITECTURE FOR LINEAR AND MORPHOLOGICAL IMAGE FILTERING

ABSTRACT

Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of

real time algorithms suited to video image processing applications.

The unique architecture of the FPGA has allowed the technology to be used in many applications

encompassing all aspects of video image processing. Among those algorithms, linear filtering based on a

2D convolution, and non-linear 2D morphological filters, represent a basic set of image operations for a

number of applications.

In this work, an implementation of linear and morphological image filtering using a FPGA NexysII, Xilinx,

Spartan 3E, with educational purposes, is presented. The system is connected to a USB port of a

personal computer, which in that way form a powerful and low-cost design station.

The FPGA-based system is accessed through a Matlab graphical user interface, which handles the

communication setup. A comparison between results obtained from MATLAB simulations and the

described FPGA-based implementation is presented.

DESIGN OF A LOW POWER FLIP-FLOP USING CMOS DEEP SUBMICRON TECHNOLOGY

ABSTRACT

This paper enumerates low power, high speed design of flip-flop having less number of transistors and

only one transistor being clocked by short pulse train which is true single phase clocking (TSPC) flip-flop.

Compared to Conventional flip-flop, it has 5 Transistors and one transistor clocked, thus has lesser size

and lesser power consumption. It can be used in various applications like digital VLSI clocking system,

buffers, registers, microprocessors etc.

The analysis for various flip flops and latches for power dissipation and propagation delays at 0.13 Â¿m

and 0.35 Â¿m technologies is carried out. The leakage power increases as technology is scaled down.

The leakage power is reduced by using best technique among all run time techniques viz. MTCMOS.

Thereby comparison of different conventional flip-flops, latches and TSPC flip-flop in terms of power

consumption, propagation delays and product of power dissipation and propagation delay with SPICE

simulation results is presented.

LOW-POWER AND AREA-EFFICIENT

CARRY SELECT ADDER

ABSTRACT

Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to

perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for

reducing the area and power consumption in the CSLA.

This work uses a simple and efficient gate-level modification to significantly reduce the area and power

of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture

have been developed and compared with the regular SQRT CSLA architecture.

The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a

slight increase in the delay. This work evaluates the performance of the proposed designs in terms of

delay, area, power, and their products by hand with logical effort and through custom design and layout

in 0.18-$mu$m CMOS process technology. The results analysis shows that the proposed CSLA structure

is better than the regular SQRT CSLA.

A PIPELINE VLSI ARCHITECTURE FOR

HIGH-SPEED COMPUTATION OF

THE 1-D DISCRETE WAVELET TRANSFORM

ABSTRACT

In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of

the 1-D discrete wavelet transform (DWT) is proposed. The main focus of the scheme is on reducing the

number and period of clock cycles for the DWT computation with little or no overhead on the hardware

resources by maximizing the inter- and intrastage parallelisms of the pipeline.

The interstage parallelism is enhanced by optimally mapping the computational load associated with the

various DWT decomposition levels to the stages of the pipeline and by synchronizing their operations.

The intrastage parallelism is enhanced by decomposing the filtering operation equally into two subtasks

that can be performed independently in parallel and by optimally organizing the bitwise operations for

performing each subtask so that the delay of the critical data path from a partial-product bit to a bit of

the output sample for the filtering operation is minimized.

It is shown that an architecture designed based on the proposed scheme requires a smaller number of

clock cycles compared to that of the architectures employing comparable hardware resources. In fact,

the requirement on the hardware resources of the architecture designed by using the proposed scheme

also gets improved due to a smaller number of registers that need to be employed.

Based on the proposed scheme, a specific example of designing an architecture for the DWT

computation is considered. In order to assess the feasibility and the efficiency of the proposed scheme,

the architecture thus designed is simulated and implemented on a field-programmable gate-array

board.

It is seen that the simulation and implementation results conform to the stated goals of the proposed

scheme, thus making the scheme a viable approach for designing a practical and realizable architecture

for real-time DWT computation.

DUAL STACK METHOD: A NOVEL APPROACH

TO LOW LEAKAGE AND SPEED

POWER PRODUCT VLSI DESIGN

ABSTRACT

The development of digital integrated circuits is challenged by higher power consumption. The

combination of higher clock speeds, greater functional integration, and smaller process geometries has

contributed to significant growth in power density. Scaling improves transistor density and functionality

on a chip.

Scaling helps to increase speed and frequency of operation and hence higher performance. As voltages

scale downward with the geometries threshold voltages must also decrease to gain the performance

advantages of the new technology but leakage current increases exponentially.

Today leakage power has become an increasingly important issue in processor hardware and software

design. In 65 nm and below technologies, leakage accounts for 30-40% of processor power.

In this paper, we propose a new dual stack approach for reducing both leakage and dynamic powers.

Moreover, the novel dual stack approach shows the least speed power product when compared to the

existing methods.

POWER MANAGEMENT OF MIMO NETWORK INTERFACES ON MOBILE SYSTEMS

Very Large Scale Integration (VLSI) Systems,

IEEE Transactions on

ABSTRACT

High-speed wireless network interfaces are among the most power-hungry components on mobile

systems. This is particularly true for multiple-input-multiple-output (MIMO) network interfaces which

use multiple RF chains simultaneously.

In this paper, we present a novel power management solution for MIMO network interfaces on mobile

systems, called antenna management. The key idea is to adaptively disable a subset of antennas and

their RF chains to reduce circuit power consumption, when the capacity improvement of using a large

number of antennas is small. Antenna management judiciously determines the number of active

antennas to minimize energy per bit while satisfying the data rate requirement.

This work provides both theoretical framework and system design of antenna management. We first

present an algorithm that efficiently solves the problem of minimizing energy per bit and, then offer its

802.11n-compliant system designs.

We employ both Matlab-based simulation and prototype-based experiment to validate the energy

efficiency benefit of antenna management. The results show that antenna management can achieve

21% one-end energy per bit reduction to the front end of the MIMO network interface, compared to a

static MIMO configuration that keeps all antennas active.

HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS

Very Large Scale Integration (VLSI) Systems


ABSTRACT

High-speed, low-power design of Viterbi decoders for trellis coded modulation (TCM) systems is

presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module

determining the overall power consumption of TCM decoders.

We propose a pre-computation architecture incorporated with $T$-algorithm for VD, which can

effectively reduce the power consumption without degrading the decoding speed much. A general

solution to derive the optimal pre-computation steps is also given in the paper.

Implementation result of a VD for a rate-3/4 convolutional code used in a TCM system shows that

compared with the full trellis VD, the precomputation architecture reduces the power consumption by

as much as 70% without performance loss, while the degradation in clock speed is negligible

PVT VARIATION TOLERANT CURRENT SOURCE WITH ON-CHIP DIGITAL SELF-CALIBRATION



ABSTRACT

A current source with a small current error has been proposed to maintain the bandwidth of the system

without an increase in power consumption for a margin. It minimizes the current error under process,

supply voltage, and temperature (PVT) variations.

Because the on-resistance of the nMOS array is self-calibrated digitally by an on-chip digital PVT

detector, a current error of only ${pm}$ 2% is achieved.

The current source has been implemented in an 80-nm CMOS process, occupies 0.018 mm$^{2}$ and

consumes 94.9 $mu$ W at a supply voltage of 1.0 V.

LOW-COMPLEXITY SEQUENTIAL SEARCHER FOR ROBUST SYMBOL SYNCHRONIZATION IN OFDM SYSTEMS



ABSTRACT

Based on the frequency-domain analog-to-digital conversion (FD ADC), this work builds a low-complexity

sequential searcher for robust symbol synchronization in a 4$,times,$ 4 FD multiple-input multiple-

output orthogonal frequency-division multiplexing (MIMO-OFDM) modem.

The proposed scheme adopts a symbol-rate sequential search with simple cross-correlation metric to

recover symbol timing over the frequency domain. Simulation results show that the detection error is

less than 2% at signal-to-noise ratio (SNR) $leqq $5 dB. Performance loss is not significant when carrier

frequency offset (CFO) $leqq $100 ppm.

Using an in-house 65-nm CMOS technology, the proposed solution occupies 84.881 k gates and

consumes 5.2 mW at 1.0 V supply voltage. This work makes the FD ADC more attractive to be adopted in

high throughput OFDM systems

AN AUTONOMOUS VECTOR/SCALAR FLOATING POINT COPROCESSOR FOR FPGAS

ABSTRACT

We present a Floating Point Vector Coprocessor that works with the Xilinx embedded processors. The

FPVC is completely autonomous from the embedded processor, exploiting parallelism and exhibiting

greater speedup than alternative vector processors.

The FPVC supports scalar computation so that loops can be executed independently of the main

embedded processor. Floating point addition, multiplication, division and square root are implemented

with the Northeastern University VFLOAT library.

The FPVC is parameterized so that the number of vector lanes and maximum vector length can be easily

modified. We have implemented the FPVC on a Xilinx Virtex 5 connected via the Processor Local Bus

(PLB) to the embedded PowerPC. Our results show more than five times improved performance over

the PowerPC augmented with the Xilinx Floating Point Unit on applications from linear algebra: QR and

Cholesky decomposition.

BUILDING AN AMBA AHB COMPLIANT MEMORY CONTROLLER

ABSTRACT

Microprocessor performance has improved rapidly these years. In contrast, memory latencies and

bandwidths have improved little. The result is that the memory access time has been a bottleneck which

limits the system performance.

Memory controller (MC) is designed and built to attacking this problem. The memory controller is the

part of the system that, well, controls the memory. The memory controller is normally integrated into

the system chipset.

This paper shows how to build an Advanced Micro controller Bus Architecture (AMBA) compliant MC as

an Advanced High-performance Bus (AHB) slave.

The MC is designed for system memory control with the main memory consisting of SRAM and ROM.

Additionally, the problems met in the design process are discussed and the solutions are given in the

paper.

4 BIT SFQ MULTIPLIER

BASED ON BOOTH ENCODER

ABSTRACT

We have designed a 2-bit Booth encoder with Josephson Transmission Lines (JTLs) and Passive

Transmission Lines (PTLs) by using cell-based techniques and tools. The Booth encoding method is one

of the algorithms to obtain partial products.

With this method, the number of partial products decreases down to the half compared to the AND

array method. We have fabricated a test chip for a multiplier with a 2-bit Booth encoder with JTLs and

PTLs. It has a processing frequency of 20 GHz with the bias margin ±25%.

The frequency of this circuit increases up to 45 GHz with the bias voltage by 25% increased from the

design voltage. The circuit area of the multiplier designed with the Booth encoder method is compared

to that designed with the AND array method.

HIGH-ACCURACY FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS FOR LOSSY APPLICATIONS

ABSTRACT

The fixed-width multiplier is attractive to many multimedia and digital signal processing systems which

are desirable to maintain a fixed format and allow a little accuracy loss to output data. This paper

presents the design of high-accuracy fixed-width modified Booth multipliers.

To reduce the truncation error, we first slightly modify the partial product matrix of Booth multiplication

and then derive an effective error compensation function that makes the error distribution be more

symmetric to and centralized in the error equal to zero, leading the fixed-width modified Booth

multiplier to very small mean and mean-square errors.

In addition, a simple compensation circuit mainly composed of the simplified sorting network is also

proposed. Compared to the previous circuits, the proposed error compensation circuit can achieve a tiny

mean error and a significant reduction in mean-square error (e.g., at least 12.3% reduction for the 16-bit

fixed-width multiplier) while maintaining the approximate hardware overhead.

Furthermore, experimental results on two real-life applications also demonstrate that the proposed

fixed-width multipliers can improve the average peak signal-to-noise ratio of output images by at least

2.0 dB and 1.1 dB, respectively.

EFFICIENT WEIGHTED MODULO 2N+1 ADDERS BY PARTITIONED PARALLEL-PREFIX COMPUTATION AND

ENHANCED CIRCULAR CARRY GENERATION

ABSTRACT

In this paper, we propose a low complexity design of weighted modulo 2n+1 adder, derived by

decomposition of parallel-prefix computation into several blocks of smaller input bit-widths.

Besides, we have proposed a novel enhanced circular carry generation (ECCG) unit to process the carry-

bits produced by all the parallel-prefix computation units (of small input bit-widths) to obtain the final

modulo sum efficiently in terms of area-delay product.

We have implemented the proposed adders using 0.13 ?m CMOS technology; and from the synthesis

results we find that our proposed adder outperforms the previously reported weighted modulo 2n+1

adders. It offers a saving of area-delay product up to 49% over the existing methods.

DESIGN AND CHARACTERIZATION OF PARALLEL PREFIX ADDERS USING FPGAS

ABSTRACT

Parallel-prefix adders (also known as carry-tree adders) are known to have the best performance in VLSI

designs. However, this performance advantage does not translate directly into FPGA implementations

due to constraints on logic block configurations and routing overhead.

This paper investigates three types of carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and

spanning tree adder) and compares them to the simple Ripple Carry Adder (RCA) and Carry Skip Adder

(CSA).

These designs of varied bit-widths were implemented on a Xilinx Spartan 3E FPGA and delay

measurements were made with a high-performance logic analyzer. Due to the presence of a fast carry-

chain, the RCA designs exhibit better delay performance up to 128 bits.

The carry-tree adders are expected to have a speed advantage over the RCA as bit widths approach 256.

HIGH SPEED ASIC DESIGN OF COMPLEX MULTIPLIER USING VEDIC MATHEMATICS

ABSTRACT

Vedic Mathematics is the ancient methodology of Indian mathematics which has a unique technique of

calculations based on 16 Sutras (Formulae). A high speed complex multiplier design (ASIC) using Vedic

Mathematics is presented in this paper.

The idea for designing the multiplier and adder/sub-tractor unit is adopted from ancient Indian

mathematics “Vedas”. On account of those formulas, the partial products and sums are generated in

one step which reduces the carry propagation from LSB to MSB.

The implementation of the Vedic mathematics and their application to the complex multiplier ensure

substantial reduction of propagation delay in comparison with DA based architecture and parallel adder

based implementation which are most commonly used architectures.

The functionality of these circuits was checked and performance parameters like propagation delay and

dynamic power consumption were calculated by spice spectre using standard 90nm CMOS technology.

The propagation delay of the resulting (16, 16)×(16, 16) complex multiplier is only 4ns and consume 6.5

mW power. We achieved almost 25% improvement in speed from earlier reported complex multipliers,

e.g. parallel adder and DA based architectures.

A LIGHTWEIGHT HIGH-PERFORMANCE FAULT DETECTION SCHEME FOR THE ADVANCED ENCRYPTION

STANDARD USING COMPOSITE FIELDS

ABSTRACT

The faults that accidently or maliciously occur in the hardware implementations of the Advanced

Encryption Standard (AES) may cause erroneous encrypted/decrypted output. The use of appropriate

fault detection schemes for the AES makes it robust to internal defects and fault attacks.

In this paper, we present a lightweight concurrent fault detection scheme for the AES. In the proposed

approach, the composite field S-box and inverse S-box are divided into blocks and the predicted parities

of these blocks are obtained.

Through exhaustive searches among all available composite fields, we have found the optimum

solutions for the least overhead parity-based fault detection structures. Moreover, through our error

injection simulations for one S-box (respectively inverse S-box), we show that the total error coverage of

almost 100% for 16 S-boxes (respectively inverse S-boxes) can be achieved.

Finally, it is shown that both the application-specific integrated circuit and field-programmable gate-

array implementations of the fault detection structures using the obtained optimum composite fields,

have better hardware and time complexities compared to their counterparts.

IMPLEMENTATION AND PERFORMANCE ANALYSIS OF SEAL ENCRYPTION ON FPGA, GPU AND MULTI-

CORE PROCESSORS

ABSTRACT

Accelerators, such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs), are

special purpose processors designed to speed up compute-intensive sections of applications. FPGAs are

highly customizable, while GPUs provide massive parallel execution resources and high memory

bandwidth.

In this paper, we compare the performance of these architectures, presenting a performance study of

SEAL, a fast, software-oriented encryption algorithm on a Virtex-6 FPGA, a Graphics Processor Unit

(GPU), and Intel Core i7, a 2-way hyper-threaded, 4-core processor.

We show that each platform has relative competitive advantages in encrypting an input plaintext using

SEAL.

ON THE TRANSMISSION METHOD FOR SHORT RANGE MIMO COMMUNICATIONS

ABSTRACT

This paper investigates a transmission scheme that is suitable for short-range multiple-input-multiple-

output (MIMO) transmission. Since the distance between two array antennas that face each other is

comparable with the size of the array antenna aperture in short-range MIMO, the propagation

characteristics are greatly different from those in conventional MIMO.

Unlike conventional MIMO, the optimal element spacing, which maximizes channel capacity, exists in

short-range MIMO. Moreover, the channel capacity with optimal antenna spacing exceeds the ergodic

capacity of independent identically distributed (i.i.d.) channels since optimal eigenvalue distribution,

which can maximize channel capacity, is obtained in the short-range MIMO.

In this paper, we focus on the actual transmission methods, because complex transmission schemes

such as eigenmode transmission or maximum-likelihood detection are required to obtain ideal channel

capacity. We clarify that the channel capacity obtained by zero forcing (ZF) at the receiver without

beamforming at the transmitter is almost the same as that using eigenmode transmission when

considering the optimal element spacing.

The effectiveness of short-range MIMO communication is also clarified using a 4 × 4 MIMO testbed with

actual signals based on the IEEE 802.11n standard. Simulated and measured results show that optimal

element spacing is a key parameter in the short-range MIMO communication. We found that designing

antenna arrays with optimal element spacing is a very effective approach to achieving a simple

hardware configuration.

DESIGN AND IMPLEMENTATION OF CORDIC PROCESSOR FOR COMPLEX DPLL

ABSTRACT

Now-a-days various Digital Signal Processing systems are implemented on a platform of programmable

signal processors or on application specific VLSI chips. Coordinate Rotation Digital Computer (CORDIC)

algorithm has turned out to be such kind of programmable signal processor.

In recent times, it has been a widely researched topic in the field of vector rotated Digital Signal

Processing (DSP) applications due to its simplicity. This paper presents the design of pipelined

architecture for coordinate rotation algorithm for the computation of loop performance of complex

Digital Phase Locked Loop (DPLL) in In-phase and quadrature channel receiver.

The design of CORDIC in the vector rotation mode results in high system throughput due to its pipelined

architecture where latency is reduced in each of the pipelined stage.

For on-chip application, the area reduction in proposed design can is achieved through optimization in

the number of micro rotations. For better loop performance of first order complex DPLL and to minimize

quantization error, the numbers of iterations are also optimized.

DIRECT DIGITAL FREQUENCY SYNTHESIZER USING NONUNIFORM PIECEWISE-LINEAR APPROXIMATION

ABSTRACT

This paper investigates a novel direct digital frequency synthesizer architecture, based on piecewise

linear approximation with segments of nonuniform length.

The new approach allows reducing the total number of segments with respect to the well-known

uniform segmentation. In this way the size of the coefficient ROM is also reduced with beneficial effects

in terms of speed and power.

We show that the optimal nonuniform segmentation (that maximizes the spurious-free dynamic range

for a given number of nonuniform segments) can be obtained as the solution of a mixed-integer linear

programming problem.

Three simple, suboptimal, nonuniform segmentation schemes (which lend themselves to efficient

hardware implementation) are proposed in this paper. We present also several design examples and

VLSI implementation results, which demonstrate the effectiveness of the developed technique.

A ROTATION-BASED BIST WITH SELF-FEEDBACK LOGIC TO ACHIEVE COMPLETE FAULT COVERAGE

ABSTRACT

This paper presents a deterministic BIST technique that can efficiently achieve complete fault coverage

without using any storage devices. A novel test structure containing a self-feedback logic unit and a

circular shift register is proposed by which all the required deterministic patterns can be generated on-

chip in real time.

Experiments on ISCAS 85 benchmark circuits show that compared with previous work addressing the

same problem our technique requires much less test time to achieve 100% fault coverage for all testable

stuck-at faults.

TECHNIQUE OF LFSR BASED TEST GENERATOR SYNTHESIS FOR DETERMINISTIC AND PSEUDORANDOM

TESTING

ABSTRACT

The structure of test system based on application built-in self-test (BIST) circuitries has been proposed.

The main idea is oriented on minimization of hardware overheads and dealt with automatization of

BIST-circuitries generation.

Test generator based on linear feedback shift register (LFSR) provides two types of testing -

pseudorandom and deterministic. The proposed modified Berlekamp-Massey algorithm is used for

generation the LFSR polynomial coefficients.

The experimental results of technique application for some ISCAS'89 benchmark circuits have been

shown.

TASK MIGRATION IN MESH NOCS OVER VIRTUAL POINT-TO-POINT CONNECTIONS

Processor allocation in todays many core MPSoCs is a challenging task, especially since the order and

requirements of incoming applications are unknown during design stage. To improve network

performance, balance the workload across processing cores, or mitigate the effect of hot processing

elements in thermal management methodologies, task migration is a method which has attracted much

attention in recent years.

Runtime task migration was first proposed in multicomputer with load balancing as the major objective.

However, specific NoC properties such as limited amount of communication buffers, more sensitivity to

implementation complexity, and tight latency and power consumption constraints bring new challenges

in using task migration mechanisms in NoCs.

As a consequence, the efficiency and applicability of traditional migration mechanisms (developed for

multicomputers) are under question. Due to the limited resource budget in NoC-based MPSoCs as well

as tight performance constraints of running applications, in this paper, we propose an efficient

methodology based on virtual point-to-point (VIP for short) connections.

These dedicated VIP connections provide low-latency and low-power paths for heavy communication

flows created by task migration mechanisms. Analyzing the results show that the proposed scheme

reduces message latency by 13% and migration latency by 14%, while 10% power savings can be

achieved compared to the previously proposed task migration strategy (known as Gathering-Rout-

Scattering) for mesh multiprocessors.

detecting background setting for dynamic …goaltechnologies.in/vlsi project.pdf · a median filter...

Documents