detecting background setting for dynamic …goaltechnologies.in/vlsi project.pdf · a median filter...
TRANSCRIPT
DETECTING BACKGROUND SETTING
FOR DYNAMIC SCENE
ABSTRACT
Processing Real-Time image sequence is now possible because of advancement of technological
developments in digital signal processing, wide-band communication, and high-performance VLSI.
With the developments in video technology, the surveillance system can be built with some low cost
gadget such as the web-camera. In this modern life with increasing number of crime rate, people in
society need for security and safety; video surveillance has become important reason to oppose threats
of crime and terrorism.
The most fundamental part of surveillance is foreground detection, which is retrieval of an object of
interest. The object of interest can remodel by common background subtraction technique. There is
some problem arises by using this technique, where because of variation of light source, the background
constantly changes.
The intensity of pixel changes throughout the object detection takes place. Intensity of pixel value
changes leads to improper foreground detection, the background detected as foreground object.
This paper proposes a method to model and update the background of the scene by intersection solving
method.
LOSSLESS IMPLEMENTATION OF
DAUBECHIES 8-TAP WAVELET TRANSFORM
ABSTRACT
A new mapping scheme and its hardware implementation to error-freely compute the Daubechies 8-tap
wavelet transform is presented. The multidimensional technique maps the irrational transform basis
coefficients with integers and results in considerable reduction in hardware and power consumption.
When implemented in Xilinx FPGA, the scheme costs 518 logic cells, 186 registers and runs at a
frequency of 71MHz. While comparing with finite-precision architecture, the proposed scheme yields a
reduction of 15% in hardware and 41% in power consumption for similar image reconstruction, and
noticeable improvement in image reconstruction quality.
PERFORMANCE ANALYSIS OF INTEGER WAVELET TRANSFORM FOR IMAGE COMPRESSION
ABSTRACT
For image compression, it is very necessary that the selection of transform should reduce the size of the
resultant data as compared to the original data set .In this paper, a new lossless image compression
method is proposed.
For continuous and discrete time cases, wavelet transform and wavelet packet transform has emerged
as popular techniques. While integer wavelet using the lifting scheme significantly reduces the
computation time, we propose a completely new approach for further speeding up the computation.
First, wavelet packet transform (WPT) and lifting scheme (LS) are described. Then an application of the
LS to WPT is presented which leads to the generation of integer wavelet packet transform (IWPT).
The proposed method, Integer Wavelet Packet Transform (IWPT) yields a representation which can be
lossless, as it maps an integer valued sequence onto the integer valued coefficients. The idea of
Wavelet Packet Tree is used to transform the still and color images.
IWPT tree can be built by iterating the single wavelet decomposition step on both the low-pass
and high-pass branches, with rounding off in order to achieve the integer transforms. Thus, the
proposed method provides good compression ratio.
A MEDIAN FILTER FPGA WITH HARVARD ARCHITECTURE
ABSTRACT
To improve the speed of the image processing chip, to quick share the market and to reduce costs, this
paper designs a chip with Harvard Architecture and FPGA.
The chip is also used with a new hardware algorithm. Using the chip, the processing time is 13.2? less
than the time of the chip with Von Neumann Architecture. The used units of filter are 13% of the whole
FPGA gates, less than the claim part of the multi-image processing chip.
AUTOMATIC ROAD EXTRACTION USING HIGH RESOLUTION SATELLITE IMAGES BASED ON LEVEL SET AND
MEAN SHIFT METHODS
ABSTRACT
Analysis of high resolution satellite images has been an important research topic for urban analysis. One
of the important features of urban areas in urban analysis is the automatic road network extraction.
Two approaches for road extraction based on Level Set and Mean Shift methods are proposed.
From an original image it is difficult and computationally expensive to extract roads due to presences of
other road-like features with straight edges. The image is preprocessed to improve the tolerance by
reducing the noise (the buildings, parking lots, vegetation regions and other open spaces) and roads are
first extracted as elongated regions, non-linear noise segments are removed using a median filter (based
on the fact that road networks constitute large number of small linear structures). Then road extraction
is performed using Level Set and Mean Shift method.
Finally the accuracy for the road extracted images is evaluated based on quality measures. The 1m
resolution IKONOS data has been used for the experiment.
A NEW ADAPTIVE WEIGHT ALGORITHM FOR SALT AND PEPPER NOISE REMOVAL
ABSTRACT
A new adaptive weight algorithm is developed for the removal of salt and pepper noise. It consists of
two major steps, first to detect noise pixels according to the correlations between image pixels, then use
different methods based on the various noise levels.
For the low noise level, neighborhood signal pixels mean method is adopted to remove the noise, and
for the high noise level, an adaptive weight algorithm is used.
Experiments show the proposed algorithm has advantages over regularizing methods in terms of both
edge preservation and noise removal, even for heavily contaminated image with noise level as high as
90%, it still can get a significant performance.
REMOVAL OF HIGH DENSITY SALT AND PEPPER NOISE THROUGH MODIFIED DECISION BASED
UNSYMMETRIC TRIMMED MEDIAN FILTER
ABSTRACT
A modified decision based unsymmetrical trimmed median filter algorithm for the restoration of gray
scale, and color images that are highly corrupted by salt and pepper noise is proposed in this paper.
The proposed algorithm replaces the noisy pixel by trimmed median value when other pixel values, 0's
and 255's are present in the selected window and when all the pixel values are 0's and 255's then the
noise pixel is replaced by mean value of all the elements present in the selected window.
This proposed algorithm shows better results than the Standard Median Filter (MF), Decision Based
Algorithm (DBA), Modified Decision Based Algorithm (MDBA), and Progressive Switched Median Filter
(PSMF).
The proposed algorithm is tested against different grayscale and color images and it gives better Peak
Signal-to-Noise Ratio (PSNR) and Image Enhancement Factor (IEF).
OPERATION IMPROVEMENT OF INDOOR ROBOT BY GESTURE RECOGNITION
ABSTRACT
Recently, the demand for the indoor robots has increased. Therefore, increased opportunities for many
people to operate the robots have emerged. However, for many people, it is often difficult to operate a
robot using the conventional methods like remote control.
To solve this problem, we propose a robot operation system using the hand gesture recognition. Our
method pays attention to the direction and movement of the hand. We were able to recognize several
gestures in real-time.
ADIABATIC TECHNIQUE FOR ENERGY EFFICIENT LOGIC CIRCUITS DESIGN
ABSTRACT
The Energy dissipation in conventional CMOS circuits can be minimized through adiabatic technique. By
adiabatic technique dissipation in PMOS network can be minimized and some of energy stored at load
capacitance can be recycled instead of dissipated as heat.
But the adiabatic technique is highly dependent on parameter variation. With the help of TSPICE
simulations, the energy consumption is analyzed by variation of parameter.
In analysis, two logic families, ECRL (Efficient Charge Recovery Logic) and PFAL (Positive Feedback
Adiabatic Logic) are compared with conventional CMOS logic for inverter and 2:1 multiplexer circuits. It
is find that adiabatic technique is good choice for low power application in specified frequency range.
DESIGN AND FPGA IMPLEMENTATION OF MODIFIED DISTRIBUTIVE ARITHMETIC BASED DWT-IDWT
PROCESSOR FOR IMAGE COMPRESSION
ABSTRACT
Image compression is one of the major image processing techniques that is widely used in medical,
automotive, consumer and military applications. Discrete wavelet transforms is the most popular
transformation technique adopted for image compression.
Complexity of DWT is always high due to large number of arithmetic operations. In this work a modified
Distributive Arithmetic based DWT architecture is proposed and is implemented on FPGA. The modified
approach consumes area of 6% on Virtex-II pro FPGA and operates at 134 MHz.
The modified DA-DWT architecture has a latency of 44 clock cycles and a throughput of 4 clock cycles.
This design is twice faster than the reference design and is thus suitable for applications that require
high speed image processing algorithms.
AN FPGA-BASED ARCHITECTURE FOR LINEAR AND MORPHOLOGICAL IMAGE FILTERING
ABSTRACT
Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of
real time algorithms suited to video image processing applications.
The unique architecture of the FPGA has allowed the technology to be used in many applications
encompassing all aspects of video image processing. Among those algorithms, linear filtering based on a
2D convolution, and non-linear 2D morphological filters, represent a basic set of image operations for a
number of applications.
In this work, an implementation of linear and morphological image filtering using a FPGA NexysII, Xilinx,
Spartan 3E, with educational purposes, is presented. The system is connected to a USB port of a
personal computer, which in that way form a powerful and low-cost design station.
The FPGA-based system is accessed through a Matlab graphical user interface, which handles the
communication setup. A comparison between results obtained from MATLAB simulations and the
described FPGA-based implementation is presented.
DESIGN OF A LOW POWER FLIP-FLOP USING CMOS DEEP SUBMICRON TECHNOLOGY
ABSTRACT
This paper enumerates low power, high speed design of flip-flop having less number of transistors and
only one transistor being clocked by short pulse train which is true single phase clocking (TSPC) flip-flop.
Compared to Conventional flip-flop, it has 5 Transistors and one transistor clocked, thus has lesser size
and lesser power consumption. It can be used in various applications like digital VLSI clocking system,
buffers, registers, microprocessors etc.
The analysis for various flip flops and latches for power dissipation and propagation delays at 0.13 ¿m
and 0.35 ¿m technologies is carried out. The leakage power increases as technology is scaled down.
The leakage power is reduced by using best technique among all run time techniques viz. MTCMOS.
Thereby comparison of different conventional flip-flops, latches and TSPC flip-flop in terms of power
consumption, propagation delays and product of power dissipation and propagation delay with SPICE
simulation results is presented.
LOW-POWER AND AREA-EFFICIENT
CARRY SELECT ADDER
ABSTRACT
Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to
perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for
reducing the area and power consumption in the CSLA.
This work uses a simple and efficient gate-level modification to significantly reduce the area and power
of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture
have been developed and compared with the regular SQRT CSLA architecture.
The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a
slight increase in the delay. This work evaluates the performance of the proposed designs in terms of
delay, area, power, and their products by hand with logical effort and through custom design and layout
in 0.18-$mu$m CMOS process technology. The results analysis shows that the proposed CSLA structure
is better than the regular SQRT CSLA.
A PIPELINE VLSI ARCHITECTURE FOR
HIGH-SPEED COMPUTATION OF
THE 1-D DISCRETE WAVELET TRANSFORM
ABSTRACT
In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of
the 1-D discrete wavelet transform (DWT) is proposed. The main focus of the scheme is on reducing the
number and period of clock cycles for the DWT computation with little or no overhead on the hardware
resources by maximizing the inter- and intrastage parallelisms of the pipeline.
The interstage parallelism is enhanced by optimally mapping the computational load associated with the
various DWT decomposition levels to the stages of the pipeline and by synchronizing their operations.
The intrastage parallelism is enhanced by decomposing the filtering operation equally into two subtasks
that can be performed independently in parallel and by optimally organizing the bitwise operations for
performing each subtask so that the delay of the critical data path from a partial-product bit to a bit of
the output sample for the filtering operation is minimized.
It is shown that an architecture designed based on the proposed scheme requires a smaller number of
clock cycles compared to that of the architectures employing comparable hardware resources. In fact,
the requirement on the hardware resources of the architecture designed by using the proposed scheme
also gets improved due to a smaller number of registers that need to be employed.
Based on the proposed scheme, a specific example of designing an architecture for the DWT
computation is considered. In order to assess the feasibility and the efficiency of the proposed scheme,
the architecture thus designed is simulated and implemented on a field-programmable gate-array
board.
It is seen that the simulation and implementation results conform to the stated goals of the proposed
scheme, thus making the scheme a viable approach for designing a practical and realizable architecture
for real-time DWT computation.
DUAL STACK METHOD: A NOVEL APPROACH
TO LOW LEAKAGE AND SPEED
POWER PRODUCT VLSI DESIGN
ABSTRACT
The development of digital integrated circuits is challenged by higher power consumption. The
combination of higher clock speeds, greater functional integration, and smaller process geometries has
contributed to significant growth in power density. Scaling improves transistor density and functionality
on a chip.
Scaling helps to increase speed and frequency of operation and hence higher performance. As voltages
scale downward with the geometries threshold voltages must also decrease to gain the performance
advantages of the new technology but leakage current increases exponentially.
Today leakage power has become an increasingly important issue in processor hardware and software
design. In 65 nm and below technologies, leakage accounts for 30-40% of processor power.
In this paper, we propose a new dual stack approach for reducing both leakage and dynamic powers.
Moreover, the novel dual stack approach shows the least speed power product when compared to the
existing methods.
POWER MANAGEMENT OF MIMO NETWORK INTERFACES ON MOBILE SYSTEMS
Very Large Scale Integration (VLSI) Systems,
IEEE Transactions on
ABSTRACT
High-speed wireless network interfaces are among the most power-hungry components on mobile
systems. This is particularly true for multiple-input-multiple-output (MIMO) network interfaces which
use multiple RF chains simultaneously.
In this paper, we present a novel power management solution for MIMO network interfaces on mobile
systems, called antenna management. The key idea is to adaptively disable a subset of antennas and
their RF chains to reduce circuit power consumption, when the capacity improvement of using a large
number of antennas is small. Antenna management judiciously determines the number of active
antennas to minimize energy per bit while satisfying the data rate requirement.
This work provides both theoretical framework and system design of antenna management. We first
present an algorithm that efficiently solves the problem of minimizing energy per bit and, then offer its
802.11n-compliant system designs.
We employ both Matlab-based simulation and prototype-based experiment to validate the energy
efficiency benefit of antenna management. The results show that antenna management can achieve
21% one-end energy per bit reduction to the front end of the MIMO network interface, compared to a
static MIMO configuration that keeps all antennas active.
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS
Very Large Scale Integration (VLSI) Systems
IEEE Transactions on
ABSTRACT
High-speed, low-power design of Viterbi decoders for trellis coded modulation (TCM) systems is
presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module
determining the overall power consumption of TCM decoders.
We propose a pre-computation architecture incorporated with $T$-algorithm for VD, which can
effectively reduce the power consumption without degrading the decoding speed much. A general
solution to derive the optimal pre-computation steps is also given in the paper.
Implementation result of a VD for a rate-3/4 convolutional code used in a TCM system shows that
compared with the full trellis VD, the precomputation architecture reduces the power consumption by
as much as 70% without performance loss, while the degradation in clock speed is negligible
PVT VARIATION TOLERANT CURRENT SOURCE WITH ON-CHIP DIGITAL SELF-CALIBRATION
Very Large Scale Integration (VLSI) Systems
IEEE Transactions on
ABSTRACT
A current source with a small current error has been proposed to maintain the bandwidth of the system
without an increase in power consumption for a margin. It minimizes the current error under process,
supply voltage, and temperature (PVT) variations.
Because the on-resistance of the nMOS array is self-calibrated digitally by an on-chip digital PVT
detector, a current error of only ${pm}$ 2% is achieved.
The current source has been implemented in an 80-nm CMOS process, occupies 0.018 mm$^{2}$ and
consumes 94.9 $mu$ W at a supply voltage of 1.0 V.
LOW-COMPLEXITY SEQUENTIAL SEARCHER FOR ROBUST SYMBOL SYNCHRONIZATION IN OFDM SYSTEMS
Very Large Scale Integration (VLSI) Systems
IEEE Transactions on
ABSTRACT
Based on the frequency-domain analog-to-digital conversion (FD ADC), this work builds a low-complexity
sequential searcher for robust symbol synchronization in a 4$,times,$ 4 FD multiple-input multiple-
output orthogonal frequency-division multiplexing (MIMO-OFDM) modem.
The proposed scheme adopts a symbol-rate sequential search with simple cross-correlation metric to
recover symbol timing over the frequency domain. Simulation results show that the detection error is
less than 2% at signal-to-noise ratio (SNR) $leqq $5 dB. Performance loss is not significant when carrier
frequency offset (CFO) $leqq $100 ppm.
Using an in-house 65-nm CMOS technology, the proposed solution occupies 84.881 k gates and
consumes 5.2 mW at 1.0 V supply voltage. This work makes the FD ADC more attractive to be adopted in
high throughput OFDM systems
AN AUTONOMOUS VECTOR/SCALAR FLOATING POINT COPROCESSOR FOR FPGAS
ABSTRACT
We present a Floating Point Vector Coprocessor that works with the Xilinx embedded processors. The
FPVC is completely autonomous from the embedded processor, exploiting parallelism and exhibiting
greater speedup than alternative vector processors.
The FPVC supports scalar computation so that loops can be executed independently of the main
embedded processor. Floating point addition, multiplication, division and square root are implemented
with the Northeastern University VFLOAT library.
The FPVC is parameterized so that the number of vector lanes and maximum vector length can be easily
modified. We have implemented the FPVC on a Xilinx Virtex 5 connected via the Processor Local Bus
(PLB) to the embedded PowerPC. Our results show more than five times improved performance over
the PowerPC augmented with the Xilinx Floating Point Unit on applications from linear algebra: QR and
Cholesky decomposition.
BUILDING AN AMBA AHB COMPLIANT MEMORY CONTROLLER
ABSTRACT
Microprocessor performance has improved rapidly these years. In contrast, memory latencies and
bandwidths have improved little. The result is that the memory access time has been a bottleneck which
limits the system performance.
Memory controller (MC) is designed and built to attacking this problem. The memory controller is the
part of the system that, well, controls the memory. The memory controller is normally integrated into
the system chipset.
This paper shows how to build an Advanced Micro controller Bus Architecture (AMBA) compliant MC as
an Advanced High-performance Bus (AHB) slave.
The MC is designed for system memory control with the main memory consisting of SRAM and ROM.
Additionally, the problems met in the design process are discussed and the solutions are given in the
paper.
4 BIT SFQ MULTIPLIER
BASED ON BOOTH ENCODER
ABSTRACT
We have designed a 2-bit Booth encoder with Josephson Transmission Lines (JTLs) and Passive
Transmission Lines (PTLs) by using cell-based techniques and tools. The Booth encoding method is one
of the algorithms to obtain partial products.
With this method, the number of partial products decreases down to the half compared to the AND
array method. We have fabricated a test chip for a multiplier with a 2-bit Booth encoder with JTLs and
PTLs. It has a processing frequency of 20 GHz with the bias margin ±25%.
The frequency of this circuit increases up to 45 GHz with the bias voltage by 25% increased from the
design voltage. The circuit area of the multiplier designed with the Booth encoder method is compared
to that designed with the AND array method.
HIGH-ACCURACY FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS FOR LOSSY APPLICATIONS
ABSTRACT
The fixed-width multiplier is attractive to many multimedia and digital signal processing systems which
are desirable to maintain a fixed format and allow a little accuracy loss to output data. This paper
presents the design of high-accuracy fixed-width modified Booth multipliers.
To reduce the truncation error, we first slightly modify the partial product matrix of Booth multiplication
and then derive an effective error compensation function that makes the error distribution be more
symmetric to and centralized in the error equal to zero, leading the fixed-width modified Booth
multiplier to very small mean and mean-square errors.
In addition, a simple compensation circuit mainly composed of the simplified sorting network is also
proposed. Compared to the previous circuits, the proposed error compensation circuit can achieve a tiny
mean error and a significant reduction in mean-square error (e.g., at least 12.3% reduction for the 16-bit
fixed-width multiplier) while maintaining the approximate hardware overhead.
Furthermore, experimental results on two real-life applications also demonstrate that the proposed
fixed-width multipliers can improve the average peak signal-to-noise ratio of output images by at least
2.0 dB and 1.1 dB, respectively.
EFFICIENT WEIGHTED MODULO 2N+1 ADDERS BY PARTITIONED PARALLEL-PREFIX COMPUTATION AND
ENHANCED CIRCULAR CARRY GENERATION
ABSTRACT
In this paper, we propose a low complexity design of weighted modulo 2n+1 adder, derived by
decomposition of parallel-prefix computation into several blocks of smaller input bit-widths.
Besides, we have proposed a novel enhanced circular carry generation (ECCG) unit to process the carry-
bits produced by all the parallel-prefix computation units (of small input bit-widths) to obtain the final
modulo sum efficiently in terms of area-delay product.
We have implemented the proposed adders using 0.13 ?m CMOS technology; and from the synthesis
results we find that our proposed adder outperforms the previously reported weighted modulo 2n+1
adders. It offers a saving of area-delay product up to 49% over the existing methods.
DESIGN AND CHARACTERIZATION OF PARALLEL PREFIX ADDERS USING FPGAS
ABSTRACT
Parallel-prefix adders (also known as carry-tree adders) are known to have the best performance in VLSI
designs. However, this performance advantage does not translate directly into FPGA implementations
due to constraints on logic block configurations and routing overhead.
This paper investigates three types of carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and
spanning tree adder) and compares them to the simple Ripple Carry Adder (RCA) and Carry Skip Adder
(CSA).
These designs of varied bit-widths were implemented on a Xilinx Spartan 3E FPGA and delay
measurements were made with a high-performance logic analyzer. Due to the presence of a fast carry-
chain, the RCA designs exhibit better delay performance up to 128 bits.
The carry-tree adders are expected to have a speed advantage over the RCA as bit widths approach 256.
HIGH SPEED ASIC DESIGN OF COMPLEX MULTIPLIER USING VEDIC MATHEMATICS
ABSTRACT
Vedic Mathematics is the ancient methodology of Indian mathematics which has a unique technique of
calculations based on 16 Sutras (Formulae). A high speed complex multiplier design (ASIC) using Vedic
Mathematics is presented in this paper.
The idea for designing the multiplier and adder/sub-tractor unit is adopted from ancient Indian
mathematics “Vedas”. On account of those formulas, the partial products and sums are generated in
one step which reduces the carry propagation from LSB to MSB.
The implementation of the Vedic mathematics and their application to the complex multiplier ensure
substantial reduction of propagation delay in comparison with DA based architecture and parallel adder
based implementation which are most commonly used architectures.
The functionality of these circuits was checked and performance parameters like propagation delay and
dynamic power consumption were calculated by spice spectre using standard 90nm CMOS technology.
The propagation delay of the resulting (16, 16)×(16, 16) complex multiplier is only 4ns and consume 6.5
mW power. We achieved almost 25% improvement in speed from earlier reported complex multipliers,
e.g. parallel adder and DA based architectures.
A LIGHTWEIGHT HIGH-PERFORMANCE FAULT DETECTION SCHEME FOR THE ADVANCED ENCRYPTION
STANDARD USING COMPOSITE FIELDS
ABSTRACT
The faults that accidently or maliciously occur in the hardware implementations of the Advanced
Encryption Standard (AES) may cause erroneous encrypted/decrypted output. The use of appropriate
fault detection schemes for the AES makes it robust to internal defects and fault attacks.
In this paper, we present a lightweight concurrent fault detection scheme for the AES. In the proposed
approach, the composite field S-box and inverse S-box are divided into blocks and the predicted parities
of these blocks are obtained.
Through exhaustive searches among all available composite fields, we have found the optimum
solutions for the least overhead parity-based fault detection structures. Moreover, through our error
injection simulations for one S-box (respectively inverse S-box), we show that the total error coverage of
almost 100% for 16 S-boxes (respectively inverse S-boxes) can be achieved.
Finally, it is shown that both the application-specific integrated circuit and field-programmable gate-
array implementations of the fault detection structures using the obtained optimum composite fields,
have better hardware and time complexities compared to their counterparts.
IMPLEMENTATION AND PERFORMANCE ANALYSIS OF SEAL ENCRYPTION ON FPGA, GPU AND MULTI-
CORE PROCESSORS
ABSTRACT
Accelerators, such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs), are
special purpose processors designed to speed up compute-intensive sections of applications. FPGAs are
highly customizable, while GPUs provide massive parallel execution resources and high memory
bandwidth.
In this paper, we compare the performance of these architectures, presenting a performance study of
SEAL, a fast, software-oriented encryption algorithm on a Virtex-6 FPGA, a Graphics Processor Unit
(GPU), and Intel Core i7, a 2-way hyper-threaded, 4-core processor.
We show that each platform has relative competitive advantages in encrypting an input plaintext using
SEAL.
ON THE TRANSMISSION METHOD FOR SHORT RANGE MIMO COMMUNICATIONS
ABSTRACT
This paper investigates a transmission scheme that is suitable for short-range multiple-input-multiple-
output (MIMO) transmission. Since the distance between two array antennas that face each other is
comparable with the size of the array antenna aperture in short-range MIMO, the propagation
characteristics are greatly different from those in conventional MIMO.
Unlike conventional MIMO, the optimal element spacing, which maximizes channel capacity, exists in
short-range MIMO. Moreover, the channel capacity with optimal antenna spacing exceeds the ergodic
capacity of independent identically distributed (i.i.d.) channels since optimal eigenvalue distribution,
which can maximize channel capacity, is obtained in the short-range MIMO.
In this paper, we focus on the actual transmission methods, because complex transmission schemes
such as eigenmode transmission or maximum-likelihood detection are required to obtain ideal channel
capacity. We clarify that the channel capacity obtained by zero forcing (ZF) at the receiver without
beamforming at the transmitter is almost the same as that using eigenmode transmission when
considering the optimal element spacing.
The effectiveness of short-range MIMO communication is also clarified using a 4 × 4 MIMO testbed with
actual signals based on the IEEE 802.11n standard. Simulated and measured results show that optimal
element spacing is a key parameter in the short-range MIMO communication. We found that designing
antenna arrays with optimal element spacing is a very effective approach to achieving a simple
hardware configuration.
DESIGN AND IMPLEMENTATION OF CORDIC PROCESSOR FOR COMPLEX DPLL
ABSTRACT
Now-a-days various Digital Signal Processing systems are implemented on a platform of programmable
signal processors or on application specific VLSI chips. Coordinate Rotation Digital Computer (CORDIC)
algorithm has turned out to be such kind of programmable signal processor.
In recent times, it has been a widely researched topic in the field of vector rotated Digital Signal
Processing (DSP) applications due to its simplicity. This paper presents the design of pipelined
architecture for coordinate rotation algorithm for the computation of loop performance of complex
Digital Phase Locked Loop (DPLL) in In-phase and quadrature channel receiver.
The design of CORDIC in the vector rotation mode results in high system throughput due to its pipelined
architecture where latency is reduced in each of the pipelined stage.
For on-chip application, the area reduction in proposed design can is achieved through optimization in
the number of micro rotations. For better loop performance of first order complex DPLL and to minimize
quantization error, the numbers of iterations are also optimized.
DIRECT DIGITAL FREQUENCY SYNTHESIZER USING NONUNIFORM PIECEWISE-LINEAR APPROXIMATION
ABSTRACT
This paper investigates a novel direct digital frequency synthesizer architecture, based on piecewise
linear approximation with segments of nonuniform length.
The new approach allows reducing the total number of segments with respect to the well-known
uniform segmentation. In this way the size of the coefficient ROM is also reduced with beneficial effects
in terms of speed and power.
We show that the optimal nonuniform segmentation (that maximizes the spurious-free dynamic range
for a given number of nonuniform segments) can be obtained as the solution of a mixed-integer linear
programming problem.
Three simple, suboptimal, nonuniform segmentation schemes (which lend themselves to efficient
hardware implementation) are proposed in this paper. We present also several design examples and
VLSI implementation results, which demonstrate the effectiveness of the developed technique.
A ROTATION-BASED BIST WITH SELF-FEEDBACK LOGIC TO ACHIEVE COMPLETE FAULT COVERAGE
ABSTRACT
This paper presents a deterministic BIST technique that can efficiently achieve complete fault coverage
without using any storage devices. A novel test structure containing a self-feedback logic unit and a
circular shift register is proposed by which all the required deterministic patterns can be generated on-
chip in real time.
Experiments on ISCAS 85 benchmark circuits show that compared with previous work addressing the
same problem our technique requires much less test time to achieve 100% fault coverage for all testable
stuck-at faults.
TECHNIQUE OF LFSR BASED TEST GENERATOR SYNTHESIS FOR DETERMINISTIC AND PSEUDORANDOM
TESTING
ABSTRACT
The structure of test system based on application built-in self-test (BIST) circuitries has been proposed.
The main idea is oriented on minimization of hardware overheads and dealt with automatization of
BIST-circuitries generation.
Test generator based on linear feedback shift register (LFSR) provides two types of testing -
pseudorandom and deterministic. The proposed modified Berlekamp-Massey algorithm is used for
generation the LFSR polynomial coefficients.
The experimental results of technique application for some ISCAS'89 benchmark circuits have been
shown.
TASK MIGRATION IN MESH NOCS OVER VIRTUAL POINT-TO-POINT CONNECTIONS
Processor allocation in todays many core MPSoCs is a challenging task, especially since the order and
requirements of incoming applications are unknown during design stage. To improve network
performance, balance the workload across processing cores, or mitigate the effect of hot processing
elements in thermal management methodologies, task migration is a method which has attracted much
attention in recent years.
Runtime task migration was first proposed in multicomputer with load balancing as the major objective.
However, specific NoC properties such as limited amount of communication buffers, more sensitivity to
implementation complexity, and tight latency and power consumption constraints bring new challenges
in using task migration mechanisms in NoCs.
As a consequence, the efficiency and applicability of traditional migration mechanisms (developed for
multicomputers) are under question. Due to the limited resource budget in NoC-based MPSoCs as well
as tight performance constraints of running applications, in this paper, we propose an efficient
methodology based on virtual point-to-point (VIP for short) connections.
These dedicated VIP connections provide low-latency and low-power paths for heavy communication
flows created by task migration mechanisms. Analyzing the results show that the proposed scheme
reduces message latency by 13% and migration latency by 14%, while 10% power savings can be
achieved compared to the previously proposed task migration strategy (known as Gathering-Rout-
Scattering) for mesh multiprocessors.