Download - IEEE 2013-2014 Project titles
VLSI 2013-2014 IEEE TITLES
Zuara Technologies
Battle with bugs
Zuara Technologies,
82, Station Road, Radha nagar, Chrompet,
Chennai – 44,
Contact No: 9677465689/9790891931
Mail : [email protected]
Web: www.zuaratech.com
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
1. Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low
Adaptation-Delay
In this paper, we present an efficient architecture for the implementation of a delayed least mean
square adaptive filter. For achieving lower adaptation-delay and area-delay-power efficient
implementation, we use a novel partial product generator and propose a strategy for optimized
balanced pipelining across the time-consuming combinational blocks of the structure. From
synthesis results, we find that the proposed design offers nearly 17% less area-delay product
(ADP) and nearly 14% less energy-delay product (EDP) than the best of the existing systolic
structures, on average, for filter lengths N=8, 16, and 32. We propose an efficient fixed-point
implementation scheme of the proposed architecture, and derive the expression for steady-state
error. We show that the steady-state mean squared error obtained from the analytical result
matches with the simulation result. Moreover, we have proposed a bit-level pruning of the
proposed architecture, which provides nearly 20% saving in ADP and 9% saving in EDP over
the proposed structure before pruning without noticeable degradation of steady-state-error
performance.
2. Critical-Path Analysis and Low-Complexity Implementation of the LMS Adaptive
Algorithm
This paper presents a precise analysis of the critical path of the least-mean-square (LMS)
adaptive filter for deriving its architectures for high-speed and low-complexity implementation.
It is shown that the direct-form LMS adaptive filter has nearly the same critical path as its
transpose-form counterpart, but provides much faster convergence and lower register
LOW POWER VLSI
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
complexity. From the critical-path evaluation, it is further shown that no pipelining is required
for implementing a direct-form LMS adaptive filter for most practical cases, and can be realized
with a very small adaptation delay in cases where a very high sampling rate is required. Based on
these findings, this paper proposes three structures of the LMS adaptive filter: (i) Design 1
having no adaptation delays, (ii) Design 2 with only one adaptation delay, and (iii) Design 3 with
two adaptation delays. Design 1 involves the minimum area and the minimum energy per sample
(EPS). The best of existing direct-form structures requires 80.4% more area and 41.9% more
EPS compared to Design 1. Designs 2 and 3 involve slightly more EPS than the Design 1 but
offer nearly twice and thrice the MUF at a cost of 55.0% and 60.6% more area, respectively.
3. Efficient Integer DCT Architectures for HEVC
In this paper, we present area- and power-efficient architectures for the implementation of
integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video
Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to
derive parallel architectures for 1-D integer DCT of different lengths. We also show that the
proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32
DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed
architecture could be pruned to reduce the complexity of implementation substantially with only
a marginal affect on the coding performance. We propose power-efficient structures for folded
and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the
proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy
per sample (EPS) compared to the direct implementation of the reference algorithm, on average,
for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20%
saving in EPS can be achieved by the proposed pruning algorithm with nearly the same
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
throughput rate. The proposed architecture is found to support ultrahigh definition 7680 × 4320
at 60 frames/s video, which is one of the applications of HEVC.
4. An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply
Operator
Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications.
In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for
increasing performance. We investigate techniques to implement the direct recoding of the sum
of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient
recoding technique and explore three different schemes by incorporating them in FAM designs.
Comparing them with the FAM designs which use existing recoding schemes, the proposed
technique yields considerable reductions in terms of critical delay, hardware complexity and
power consumption of the FAM unit.
5. Improved design of high-frequency sequential decimal multipliers
Hardware implementation of decimal arithmetic operations has become a hot topic for research
during the last decade. Among various operations, decimal multiplication is considered as one of
the most complicated dyadic operations, which requires high-cost hardware implementation.
Therefore, the processor industry has opted to use the sequential decimal multipliers to reduce
the high cost of parallel architectures. However, the main drawback of iterative multipliers is
their high latency. In this reported work, the focus has been on reducing the latency of decimal
sequential multipliers while maintaining a low cost of area. Consequently, a high-frequency
sequential decimal multiplier is proposed whose cycle time is reduced to the latency of a binary
half-adder plus that of a decimal multiply-by-two operation, which overall is less than that of a
decimal carry-save adder. The synthesis results reveal that the proposed sequential multiplier
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
works with a higher clock frequency than the fastest previous decimal multiplier which in turn
leads to overall latency advantage.
6. On-Chip Codeword Generation to Cope With Crosstalk
Capacitive and inductive coupling between bus lines results in crosstalk induced delays. Many
bus encoding techniques have been proposed to improve the performance. Existing
implementation techniques and mapping algorithms in the literature only apply the specific
encoding. This paper presents the first generalized framework for a stall-free on-chip codeword
generation strategy that is scalable and easy to automate. It is applicable to the coupling aware
encoding techniques that allow recursive codeword generation. The proposed implementation
strategy iteratively generates codewords without explicitly enumerating them. Codeword
mapping relies on graph-based representation that is unique to the given encoding technique. The
codewords are calculated on-chip using basic function blocks, such as adders and multiplexers.
Three encoding techniques were implemented using the proposed strategy. Experimental results
show significant reduction in the area overhead and power dissipation over the existing method
that uses random logic to implement the codec.
7. Effects of Random Delay Errors in Continuous-Time Semi-Digital Transversal
Filters
The implementation of transversal filters requires basic circuit elements such as adders,
multipliers and (unit) delay elements. The filters designed under infinite precision of these
elements may behave differently when implemented with components with limited accuracy. In
fact, the effects of the coefficient inaccuracies in analog and digital transversal filters have been
investigated extensively in the literature [1], [2]. On the other hand, the effects of the unit delays
with limited precision have not received similar attention. In this paper, we find that such effects
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
especially in very high frequency continuous-time semi-digital transversal filters may not be
ignored. As an example, we analyze the impact of delay errors in the implementation of the
direct modulation transmitter. Specifically, we provide the analytical statistical performance
bounds and confirm the results with simulations.
8. Digitally Synthesized Stochastic Flash ADC Using Only Standard Digital Cells
It is demonstrated in this paper that it is possible to synthesize a stochastic flash ADC entirely
from Verilog code and a standard digital library. An analog comparator is introduced that is
constructed from two cross-coupled 3-input digital NAND gates, and can be described in
Verilog. The synthesized comparators have random, Gaussian offsets that are used as virtual
voltage references to make a flash ADC. A piecewise-linear inverse Gaussian CDF function is
used to correct the nonlinearity introduced by the Gaussian offset distribution. The prototype IC
is fabricated in 90 nm CMOS and implements a 2047-comparator version of the proposed
architecture. All components including the comparators, the ones adder, and the peicewise
inverse Gaussian function are all implemented in Verilog. Conventional digital synthesis and
place-and-route is then used to generate the physical layout, making this the first fully
synthesized ADC. SNDR of 35.9 dB (without calibration) is achieved at 210 MSPS from the
Verilog synthesized design.
9. Memory Footprint Reduction for Power-Efficient Realization of 2-D Finite Impulse
Response Filters
We have analyzed memory footprint and combinational complexity to arrive at a systematic
design strategy to derive area-delay-power-efficient architectures for two-dimensional (2-D)
finite impulse response (FIR) filter. We have presented novel block-based structures for
separable and non-separable filters with less memory footprint by memory sharing and memory-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
reuse along with appropriate scheduling of computations and design of storage architecture. The
proposed structures involve L times less storage per output (SPO), and nearly L times less energy
consumption per output (EPO) compared with the existing structures, where L is the input block-
size. They involve L times more arithmetic resources than the best of the corresponding existing
structures, and produce L times more throughput with less memory band-width (MBW) than
others. We have also proposed separate generic structures for separable and non-separable filter-
banks, and a unified structure of filter-bank constituting symmetric and general filters. The
proposed unified structure for 6 parallel filters involves nearly 3.6L times more multipliers, 3L
times more adders, (N2-N+2) less registers than similar existing unified structure, and computes
6L times more filter outputs per cycle with 6L times less MBW than the existing design, where
N is FIR filter size in each dimension. ASIC synthesis result shows that for filter size (4 × 4),
input-block size L=4, and image-size (512 × 512), proposed block-based non-separable and
generic non-separable structures, respectively, involve 5.95 times and 11.25 times less area-
delay-product (ADP), and 5.81 times and 15.63 times less EPO than the corresponding existing
structures. The proposed unified structure involves 4.64 times less ADP and 9.78 times less EPO
than the corresponding existing structure.
10. Improved matrix multiplier design for high-speed digital signal processing
applications
A transistor level implementation of an improved matrix multiplier for high-speed digital signal
processing applications based on matrix element transformation and multiplication is reported in
this study. The improvement in speed was achieved by rearranging the matrix element into a
two-dimensional array of processing elements interconnected as a mesh. The edges of each row
and column were interconnected in torus structure, facilitating simultaneous implementation of
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
several multiplications. The functionality of the circuitry was verified and the performance
parameters for example, propagation delay and dynamic switching power consumptions were
calculated using spice spectre using 90 nm CMOS technology. The proposed methodology
ensures substantial reduction in propagation delay compared with the conventional algorithm,
systolic array and pseudo number theoretic transformation (PNTT)-based implementation, which
are the most commonly used techniques, for matrix multiplication. The propagation delay of the
implemented 4 × 4 matrix multiplierwas only ~2 μs, whereas the power consumption of the
implemented 4 × 4 matrix multiplier was ~3.12 mW only. Improvement in speed compared with
earlier reported matrix multipliers, for example, conventional algorithm, systolic array and
PNTT-based implementation was found to be ~67, ~56 and ~65%, respectively.
11. High Step-Up High-Efficiency Interleaved Converter With Voltage Multiplier
Module for Renewable Energy System
A novel high step-up converter, which is suitable for renewable energy system, is proposed in
this paper. Through a voltage multiplier module composed of switched capacitors and coupled
inductors, a conventional interleaved boost converter obtains high step-up gain without operating
at extreme duty ratio. The configuration of the proposed converter not only reduces the current
stress but also constrains the input current ripple, which decreases the conduction losses and
lengthens the lifetime of the input source. In addition, due to the lossless passive clamp
performance, leakage energy is recycled to the output terminal. Hence, large voltage spikes
across the main switches are alleviated, and the efficiency is improved. Even the low voltage
stress makes the low-voltage-rated MOSFETs be adopted for reductions of conduction losses and
cost. Finally, the prototype circuit with 40-V input voltage, 380-V output, and 1000-W output
power is operated to verify its performance. The highest efficiency is 97.1%.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
12. Ultra-High Throughput Low-Power Packet Classification
Packet classification is used by networking equipment to sort packets into flows by comparing
their headers to a list of rules, with packets placed in the flow determined by the matched rule. A
flow is used to decide a packet's priority and the manner in which it is processed. Packet
classification is a difficult task due to the fact that all packets must be processed at wire speed
and rulesets can contain tens of thousands of rules. The contribution of this paper is a hardware
accelerator that can classify up to 433 million packets per second when using rulesets containing
tens of thousands of rules with a peak powerconsumption of only 9.03 W when using a Stratix III
field-programmable gate array (FPGA). The hardware accelerator uses a modified version of the
HyperCuts packet classification algorithm, with a new pre-cutting process used to reduce the
amount of memory needed to save the search structure for large rulesets so that it is small
enough to fit in the on-chip memory of an FPGA. The modified algorithm also removes the need
for floating point division to be performed when classifying a packet, allowing higher clock
speeds and thus obtaining higher throughputs.
13. Low-Cost Low-Power ASIC Solution for Both DAB+ and DAB Audio Decoding
DAB+ is the upgraded version of digital audio broadcasting (DAB). DAB and DAB+ coexist in
many countries, so receivers are required to be compatible with both standards. In this paper, a
solution integrating an MPEG1-LayerII (MP2) decoder and an advanced audio coding
(AAC) low-complexity (AAC LC) decoder is proposed to provide basic audio decoding for both
DAB and DAB+. It also utilizes simple methods to improve high frequencies and stereo quality
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
instead of complicated spectrum band replication and parametric stereo. A highly integrated low-
power audio decoder design compatible with DAB/DAB+ and using a purely ASIC approach is
presented. As a result of the system structure optimization and hardware sharing, the audio
decoder is fabricated in 1P4M 0.18- μm CMOS technology using only 3.2 mm2 silicon area
(including 147 456 bits RAM and 170 496 bits ROM). The powerconsumption of the audio
decoder is 10.4 mW for DAB audio decoding and 8.5 mW for DAB+ audio decoding.
Laboratory and field tests show that the function is correct and the audio quality is good for
receiving both DAB and DAB+. The audio decoder is thus proven to be a low-cost low-
power solution for the two existing DAB standards.
14. Low-Power Digital Signal Processor Architecture for Wireless Sensor Nodes
Radio communication exhibits the highest energy consumption in wireless sensor nodes. Given
their limited energy supply from batteries or scavenging, these nodes must trade data
communication for on-the-node computation. Currently, they are designed around off-the-
shelf low-power microcontrollers. But by employing a more appropriate processing element, the
energy consumption can be significantly reduced. This paper describes the design and
implementation of the newly proposed folded-tree architecture for on-the-node data processing
in wireless sensor networks, using parallel prefix operations and data locality in hardware.
Measurements of the silicon implementation show an improvement of 10-20× in terms of energy
as compared to traditional modern micro-controllers found in sensor nodes.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
15. Area–Delay–Power Efficient Carry-Select Adder
In this brief, the logic operations involved in conventional carry select adder (CSLA) and binary
to excess-1 converter (BEC)-based CSLA are analyzed to study the data dependence and to
identify redundant logic operations. We have eliminated all the redundant logic operations
present in the conventional CSLA and proposed a new logic formulation for CSLA. In the
proposed scheme, the carry select (CS) operation is scheduled before the calculation of final-
sum, which is different from the conventional approach. Bit patterns of two anticipating carry
words (corresponding to $c_{rm in} = 0 hbox{and} 1$) and fixed $c_{rm in}$ bits are used for
logic optimization of CS and generation units. An efficient CSLA design is obtained using
optimized logic units. The proposed CSLA design involves significantly less area and delay than
the recently proposed BEC-based CSLA. Due to the small carry-output delay, the proposed
CSLA design is a good candidate for square-root (SQRT) CSLA. A theoretical estimate shows
that the proposed SQRT-CSLA involves nearly 35% less area–delay–product (ADP) than the
BEC-based SQRT-CSLA, which is best among the existing SQRT-CSLA designs, on average,
for different bit-widths. The application-specified integrated circuit (ASIC) synthesis result
shows that the BEC-based SQRT-CSLA design involves 48% more ADP and consumes 50%
more energy than the proposed SQRT-CSLA, on average, for different bit-widths.
16. An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply
Operator
Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications.
In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for
increasing performance. We investigate techniques to implement the direct recoding of the sum
of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
recoding technique and explore three different schemes by incorporating them in FAM designs.
Comparing them with the FAM designs which use existing recoding schemes, the proposed
technique yields considerable reductions in terms of critical delay, hardware complexity
and power consumption of the FAM unit.
17. Improved design of high-frequency sequential decimal multipliers
Hardware implementation of decimal arithmetic operations has become a hot topic for research
during the last decade. Among various operations, decimal multiplication is considered as one of
the most complicated dyadic operations, which requires high-cost hardware implementation.
Therefore, the processor industry has opted to use the sequential decimal multipliers to reduce
the high cost of parallel architectures. However, the main drawback of iterative multipliers is
their high latency. In this reported work, the focus has been on reducing the latency of decimal
sequential multipliers while maintaining a low cost of area. Consequently, a high-frequency
sequential decimal multiplier is proposed whose cycle time is reduced to the latency of a binary
half-adder plus that of a decimal multiply-by-two operation, which overall is less than that of a
decimal carry-save adder. The synthesis results reveal that the proposed sequential multiplier
works with a higher clock frequency than the fastest previous decimal multiplier which in turn
leads to overall latency advantage.
18. Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for
Efficient FIR Filter Implementation
Multiple constant multiplication (MCM) scheme is widely used for implementing transposed
direct-form FIR filters. While the research focus of MCM has been on more effective common
subexpression elimination, the optimization of adder-trees, which sum up the computed sub-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
expressions for each coefficient, is largely omitted. In this paper, we have identified the resource
minimization problem in the scheduling of adder-tree operations for the MCM block, and
presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based
implementation of FIR filters. Experimental result shows that up to 15% reduction of area and
11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved
on the top of already optimized adder/subtractor network of the MCM block.
19. Improved matrix multiplier design for high-speed digital signal processing
applications
A transistor level implementation of an improved matrix multiplier for high-speed digital signal
processing applications based on matrix element transformation and multiplication is reported in
this study. The improvement in speed was achieved by rearranging the matrix element into a
two-dimensional array of processing elements interconnected as a mesh. The edges of each row
and column were interconnected in torus structure, facilitating simultaneous implementation of
several multiplications. The functionality of the circuitry was verified and the performance
parameters for example, propagation delay and dynamic switching power consumptions were
calculated using spice spectre using 90 nm CMOS technology. The proposed methodology
ensures substantial reduction in propagation delay compared with the conventional algorithm,
systolic array and pseudo number theoretic transformation (PNTT)-based implementation, which
are the most commonly used techniques, for matrix multiplication. The propagation delay of the
implemented 4 × 4 matrix multiplierwas only ~2 μs, whereas the power consumption of the
implemented 4 × 4 matrix multiplier was ~3.12 mW only. Improvement in speed compared with
earlier reported matrix multipliers, for example, conventional algorithm, systolic array and
PNTT-based implementation was found to be ~67, ~56 and ~65%, respectively.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
20. A Novel Distortion Model and Lagrangian Multiplier for Depth Maps Coding
In three-dimensional videos (3-DV) coding systems, depth maps are not used for viewing but for
rendering virtual views. Therefore, the traditional rate distortion criterion (including distortion
criterion, and Lagrangian multiplier) is not suitable for depth map coding. In order to design an
effective rate distortion criterion for depth maps, the relationship between the distortion of
synthesized virtual view and the coding error of depth maps is analyzed in detail. Through the
analysis, a polynomial model revealing the relationship between the coding error of depth maps
and the distortion of synthesized virtual view is derived. Model parameters are estimated by
utilizing camera parameters and features of the texture video corresponding to the depth map.
Based on the model, a virtual view-based Lagrangian multiplierfor depth map coding is also
proposed. Experimental results demonstrated the accuracy of the model. The squared correlation
coefficients between the actual distortion of virtual view and the estimated distortion are all
larger than 0.98 for all tested sequences. When incorporating the proposed model and
Lagrangian multiplier into the mode decision procedure of joint model version 18.5 (JM18.5) of
H.264/AVC, a maximum 0.470 dB BD PSNR and an average 0.251 dB BD PSNR can be
achieved.
21. Dual-Basis Superserial Multipliers for Secure Applications and Lightweight
Cryptographic Architectures
Cryptographic algorithms utilize finite-field arithmetic operations in their computations. Due to
the constraints of the nodes which benefit from the security and privacy advantages of these
algorithms in sensitive applications, these algorithms need to be lightweight. One of the well-
known bases used in sensitive computations is dual basis (DB). In this brief, we present low-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
complexity superserial architectures for the DB multiplication over GF(2m
). To the best of our
knowledge, this is the first time that such a multiplier is proposed in the open literature. We have
performed complexity analysis for the proposed lightweight architectures, and the results show
that the hardware complexity of the proposed superserial multiplier is reduced compared with
that of regular serial multipliers. This has been also confirmed through our application-specific
integrated circuit hardware- and time-equivalent estimations. The proposed superserial
architecture is a step forward toward efficient and lightweight cryptographic algorithms and is
suitable for constrained implementations of cryptographic primitives in applications such as
smart cards, handheld devices, life-critical wearable and implantable medical devices, and
constrained nodes in the blooming notion of Internet of nano-Things.
22. Multifunction Residue Architectures for Cryptography
A design methodology for incorporating Residue Number System (RNS) and Polynomial
Residue Number System (PRNS) in Montgomery modular multiplication in GF(p) or GF(2n)
respectively, as well as a VLSI architecture of a dual-field residue arithmetic
Montgomery multiplier are presented in this paper. An analysis of input/output conversions
to/from residue representation, along with the proposed residue Montgomery multiplication
algorithm, reveals common multiply-accumulate data paths both between the converters and
between the two residue representations. A versatile architecture is derived that supports all
operations of Montgomery multiplication in GF(p) and GF(2n), input/output conversions, Mixed
Radix Conversion (MRC) for integers and polynomials, dual-field modular exponentiation and
inversion in the same hardware. Detailed comparisons with state-of-the-art implementations
prove the potential of residue arithmetic exploitation in dual-field modular multiplication.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
1. Physical Layer Encryption in OFDM-PON Employing Time-Variable Keys From
ONUs
We propose and experimentally demonstrate a dynamic encryption method to realize physical
layer security for orthogonal frequency division multiplexing passive optical network (OFDM-
PON). In our scheme, encryption of the downstream signal is obtained by applying exclusive or
(xor) operation between optical network units' (ONUs') downstream signals and received
upstream signals at the optical line terminal side. The upstream signals are used as secure keys
for corresponding ONUs. Then the encrypted downstream signals are sent to the ONU sides,
where the downstream signal can be retrieved by applying xor operation again between the
encrypted downstream signal and the stored upstream signal. Since each ONU cannot obtain the
upstream signals of other ONUs, only the ONU itself can recover its downstream signal from the
encrypted downstream signal. Moreover, the secure key is dynamically changing along with the
upstream signal, significantly improving the security of the downstream signal for the OFDM-
PON system. A 5-Gb/s 16-quadrature amplitude modulation OFDMsignal with xor-based
encryption has been successfully implemented over a 25-km standard single-mode fiber.
Experimental results verify that the encryption scheme can effectively prevent eavesdropping by
malicious users.
Signal processing & Communications
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
2. Channel Quantization Using Constellation Based Codebooks for Multiuser MIMO-
OFDM
In this paper, we propose clustered quantization techniques for multiuser multi-input/multi-
output (MIMO) orthogonal frequency division multiplexing (OFDM) using constellation based
codebooks. Constellation based codebooks provide scalability and efficient codeword search
capability, which are key features for practical multiuser MIMO-OFDM systems with a large
number of antennas. The proposed clustered quantization scheme quantizes consecutive
subcarriers into a single codeword that minimizes aggregated quantization errors. We base our
new clustering techniques on two constellation based quantization methods, namely equal-
magnitude angular quantization (EMAQ) and squared-lattice angular quantization. New efficient
codebook search algorithms are proposed for the clustered quantization. In addition, we propose
new constellations to guarantee different users quantize channels into distinct codewords. One is
a rotated M-PSK constellation suitable for randomly-distributed user scenarios, and the other is a
random phase equal-magnitude (RPEM) constellation suitable for ill-conditioned user scenarios.
Thus, full spatial multiplexing gain can be achievable even with small number of users. Finally, a
near-sphere codeword search algorithm is proposed for the RPEM. In simulations, the proposed
clustered quantization shows up to 50% higher throughput compared to conventional fixed-pilot
channel quantization. Also, we show our new constellations for EMAQ improve throughput
almost 35% compared to the standard EMAQ.
3. Impulse Noise Estimation and Removal for OFDM Systems
Orthogonal Frequency Division Multiplexing (OFDM) is a modulation scheme that is widely
used in wired and wireless communication systems. While OFDM is ideally suited to deal with
frequency selective channels and AWGN, its performance may be dramatically impacted by the
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
presence of impulse noise. In fact, very strong noise impulses in the time domain might result in
the erasure of whole OFDM blocks of symbols at the receiver. Impulse noise can be mitigated by
considering it as a sparse signal in time, and using recently developed algorithms for sparse
signal reconstruction. We propose an algorithm that utilizes the guard band null subcarriers for
the impulse noise estimation and cancellation. Instead of relying on ell_1 minimization as done
in some popular general-purpose compressive sensing schemes, the proposed method jointly
exploits the specific structure of this problem and the available a priori information for sparse
signal recovery. The computational complexity of the proposed algorithm is very competitive
with respect to sparse signal reconstruction schemes based on ell_1 minimization. The proposed
method is compared with respect to other state-of-the-art methods in terms of achievable rates
for an OFDM system with impulse noise and AWGN.
4. A Low Complexity PAPR Reduction Scheme for OFDM Systems via Neural
Networks
Peak-to-average power ratio (PAPR) reduction is one of the key components in orthogonal
frequency division multiplexing (OFDM) systems. Among various PAPR reduction techniques,
artificial neural network (NN) has been one of the powerful techniques in reducing the PAPR
due to its good generalization properties with flexible modeling and learning capabilities. In this
letter, we propose a new method that uses NNs trained on the active constellation extension
(ACE) signals to reduce the PAPR of OFDM signals. Unlike other NN based techniques, the
proposed method employs a receiver NN unit, at the OFDM receiver side, achieving significant
bit error rate (BER) improvement with low computational complexity.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
5. An Adaptive Allocation Scheme in Multiuser OFDM Systems with Time-Varying
Channels
Previously, a scheme in [1] is proposed for the subcarrier, bit, and power allocation problem to
minimize the total transmit power for multiuser orthogonal frequency division multiplexing
systems in downlink transmission. However, it is a batch mode which may not be so efficient in
terms of computational complexity for slowly time-varying channel environments. The solution
of the current frame can be obtained with slight modification from that of the previous frame in
an adaptive fashion. By utilizing this property, we propose a scheme to obtain the solution in the
adaptive fashion, which offers comparable performance with a reduced complexity compared to
the previously proposed method and other existing suboptimal methods. Based on a derived
expression composed of the channel gains, the numbers of assigned subcarriers, and the data
rates along with the new proposed processing procedures, the proposed adaptive scheme is able
to track the channel variation for the solution adjustment in a faster speed compared to the
original batch mode method. Simulation results reveal that the proposed adaptive scheme has the
competitive performance compared with those of the optimal and the existing schemes while the
computational complexity and the number of iterations are both reduced.
6. Channel estimation and symbol detection for OFDM systems using data-nulling
superimposed pilots
A novel data-nulling superimposed pilot scheme for orthogonal frequency division multiplexing
(OFDM) systems is proposed, where the input data vector is spread over all the subcarriers by a
precoding matrix and then nulled at certain subcarriers for the insertion of training pilots. This
method avoids the loss of the data rate for frequency-division multiplexed pilots, but results in
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
the distortion of input data. To mitigate the distortion introduced by the nulling operation, a
simple iterative reconstruction scheme is used to improve the detection performance.
7. Designing Hardware-Efficient Fixed-Point FIR Filters in an Expanding
Subexpression Space
This paper presents a practical method for designing fixed-point FIR filters. The proposed
method takes both the filter's magnitude response and its hardware cost into consideration in the
design process. The method constructs a basis set based on the fixed-point coefficients that have
been synthesized already. The elements in the basis set are used to synthesize the undetermined
fixed-point coefficients later. Thus, this basis set expands gradually along with the progress of
the coefficient design. The method employs some strategies to speed up the design process. For
example, a complexity estimation strategy helps us stop digging deeper in some branches of the
search tree, and a solution prediction strategy for high-order FIR filters helps us design fixed-
point FIR filters of length equal to a few hundreds. Applying the proposed method to design
twenty benchmark cases, we can obtain hardware-efficient results in a reasonable design time. In
two long filter design cases, our design results are better than those designed by the other
methods.
8. Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications for
Efficient FIR Filter Implementation
Multiple constant multiplication (MCM) scheme is widely used for implementing transposed
direct-formFIR filters. While the research focus of MCM has been on more effective common
subexpression elimination, the optimization of adder-trees, which sum up the computed sub-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
expressions for each coefficient, is largely omitted. In this paper, we have identified the resource
minimization problem in the scheduling of adder-tree operations for the MCM block, and
presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based
implementation of FIR filters. Experimental result shows that up to 15% reduction of area and
11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved
on the top of already optimized adder/subtractor network of the MCM block.
9. Optimal Memory for Discrete-Time FIR Filters in State-Space
In this correspondence, we propose an efficient estimator of optimal memory (averaging
interval) for discrete-time finite impulse response (FIR) filters in state-space. Its crucial property
is that only real measurements and the filter output are involved with no reference and noise
statistics. Testing by the two-state polynomial model has shown a very good correspondence
with predicted values. Even in the worst case of the harmonic model, the estimator demonstrates
practical applicability.
10. On Efficient Design of High-Order Filters With Applications to Filter Banks and
Transmultiplexers With Large Number of Channels
This paper proposes a method for designing high-order linear-phase finite-length impulse
response (FIR) filters which are required as, e.g., the prototype filters in filter banks (FBs) and
transmultiplexers (TMUXs) with a large number of channels. The proposed method uses the
Farrow structure to express the polyphase components of the desired filter. Thereby, the only
unknown parameters, in the filterdesign, are the coefficients of the Farrow subfilters. The
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
number of these unknown parameters is considerably smaller than that of the direct filter design
methods. Besides these unknown parameters, the proposed method needs some predefined
multipliers. Although the number of these multipliers is larger than the number of unknown
parameters, they are known a priori. The proposed method is generally applicable to any linear-
phase FIR filter irrespective of its order being high, low, even, or odd as well as the impulse
response being symmetric or antisymmetric. However, it is more efficient forfilters with high
orders as the conventional design of such filters is more challenging. For example, to design a
linear-phase FIR lowpass filter of order 131071 with a stopband attenuation of about 55 dB,
which is used as the prototype filter of a cosine modulated filter bank (CMFB) with 8192
channels, our proposed method requires only 16 unknown parameters. The paper gives design
examples for individual lowpass filters as well as the prototype filters for fixed and flexible
modulated FBs.
11. Frequency Estimation of Distorted and Noisy Signals in Power Systems by FFT-
Based Approach
This paper focuses on the accurate frequency estimation of power signals corrupted by a
stationary white noise. The noneven item interpolation FFT based on the triangular self-
convolution window is described. A simple analytical expression for the variance of noise
contribution on the frequency estimation is derived, which shows the variances of frequency
estimation are proportional to the energy of the adopted window. Based on the proposed method,
the noise level of the measurement channel can be estimated, and optimal parameters (e.g.,
sampling frequency and window length) of the interpolation FFT algorithm that minimize the
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
variances of frequency estimation can thus be determined. The application in a power quality
analyzer verified the usefulness of the proposed method.
12. Accurate and Efficient On-Chip Spectral Analysis for Built-In Testing and
Calibration Approaches
The fast Fourier transform (FFT) algorithm is widely used as a standard tool to carry out spectral
analysis because of its computational efficiency. However, the presence of multiple tones
frequently requires a fine frequency resolution to achieve sufficient accuracy, which imposes the
use of a large number of FFT points that results in large area and power overheads. In this paper,
an FFT method is proposed for on-chip spectral analysis of multi-tone signals with particular
harmonic and intermodulation components. This accurate FFT analysis approach is based on
coherent sampling, but it requires a significantly smaller number of points to make
the FFT realization more suitable for on-chip built-in testing and calibration applications that
require area and power efficiency. The technique was assessed by comparing the simulation
results from the proposed method of single and multiple tones with the simulation results
obtained from the FFT of coherently sampled tones. The results indicate that the proper selection
of test tone frequencies can avoid spectral leakage even with multiple narrowly spaced tones.
When low-frequency signals are captured with an analog-to-digital converter (ADC) for on-chip
analysis, the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth
limitations. Post-layout simulations of a 16-point FFT showed that third-order intermodulation
(IM3) testing with two tones can be performed with 1.5-dB accuracy for IM3 levels of up to 50
dB below the fundamental tones that are quantized with a 10-bit resolution. In a 45-nm CMOS
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
technology, the layout area of the 16-point FFT for on-chip built-in testing is 0.073 mm2, and its
estimated power consumption is 6.47 mW.
13. A 16-Core Processor With Shared-Memory and Message-Passing Communications
A 16-core processor with both message-passing and shared-memory inter-core communication
mechanisms is implemented in 65 nm CMOS. Message-passing communication is enabled in a 3
× 6 Mesh packet-switched network-on-chip, and shared-memory communication is supported
using the shared memory within each cluster. The processor occupies 9.1 mm2 and operates fully
functional at a clock rate of 750 MHz at 1.2 V and maximum 800 MHz at 1.3 V. Each core
dissipates 34 mW under typical conditions at 750 MHz and 1.2 V while executing embedded
applications such as an LDPC decoder, a 3780-point FFT module, an H.264 decoder and an LTE
channel estimator.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
1. Subjective evaluation of HEVC and AVC/H.264 in mobile environments
This paper compares the quality of AVC/H.264 and HEVC encoded video in low bandwidth
mobile environments. In this study, the focus within the mobile environment is smart phones.
The key characteristics of a smart phone are smaller screen size, which is usually 3.5 inches
diagonal to 5.0 inches diagonal for high end smart phones and typical cellular network
bandwidth, which is 3G or faster. Subjective evaluations were conducted to evaluate the user
experience on a mobile device with a small screen size and video coded at 200 and 400 Kbps.
The studies showed compelling evidence that a user's experience in low bandwidth mobile
environments is very similar between HEVC and AVC/H.264. The results suggest the benefits of
HEVC over AVC/H.264 in a mobile environment with lower video bitrates and resolutions are
not as clear.
2. Efficient Integer DCT Architectures for HEVC
In this paper, we present area- and power-efficient architectures for the implementation of
integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video
Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to
derive parallel architectures for 1-D integer DCT of different lengths. We also show that the
proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32
DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed
Audio, Image and Video Processing
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
architecture could be pruned to reduce the complexity of implementation substantially with only
a marginal affect on the coding performance. We propose power-efficient structures for folded
and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the
proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy
per sample (EPS) compared to the direct implementation of the reference algorithm, on average,
for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20%
saving in EPS can be achieved by the proposed pruning algorithm with nearly the same
throughput rate. The proposed architecture is found to support ultrahigh definition 7680 × 4320
at 60 frames/s video, which is one of the applications of HEVC.
3. Improved Method to Select the Lagrange Multiplier for Rate-Distortion Based
Motion Estimation in Video Coding
The motion estimation (ME) process used in the H.264/AVC reference software is based on
minimizing a cost function that involves two terms (distortion and rate) that are properly
balanced through a Lagrangian parameter, usually denoted as λmotion. In this paper we propose
an algorithm to improve the conventional way of estimating λmotion and, consequently, the ME
process. First, we show that the conventional estimation of λmotion turns out to be significantly
less accurate when ME-compromising events, which make the ME process to perform poorly,
happen. Second, with the aim of improving the coding efficiency in these cases, an efficient
algorithm is proposed that allows the encoder to choose between three different values of
λmotion for the Inter 16x16 partition size. To be more precise, for this partition size, the
proposed algorithm allows the encoder to additionally test λmotion=0 and λmotionarbitrarily
large, which corresponds to minimum distortion and minimum rate solutions, respectively. By
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
testing these two extreme values, the algorithm avoids making large ME errors. The
experimental results on video segments exhibiting this type of ME-compromising events reveal
an average rate reduction of 2.20% for the same coding quality with respect to the JM15.1
reference software of H.264/AVC. The algorithm has been also tested in comparison with a
state-of-the-art algorithm called context adaptive Lagrange multiplier. Additionally, two
illustrative examples of the subjective performance improvement are provided.
4. Low Power Motion Estimation Based on Probabilistic Computing
As CMOS technology driven by Moore's law has approached device sizes in the range of 5-20
nm, noise immunity of such future technology nodes is predicted to decrease considerably,
eventually affecting the reliability of computations through them. A shift in the design paradigm
is expected from 100% accurate computations to probabilistic computing with accuracy
dependent on the target application or circuit specifications. One model developed for CMOS
technology that emulates the erroneous behavior predicted is termed probabilistic CMOS
(PCMOS). In this paper, we propose a PCMOS-based architecture implementation for
traditional motion estimation algorithms and show that up to 57% energy savings are possible for
different existing motion estimation algorithms. Furthermore, algorithmic modifications are
proposed that can enhance the energy savings to 70% with a PCMOS architectural
implementation. About 1.8-5 dB improvement in peak signal-to-noise ratio under energy savings
of 57% to 70% for two different motion estimation algorithms is shown, establishing the
resilience of the proposed algorithm to probabilistic computing over the comparable
conventional algorithm.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
5. Two-layer motion estimation algorithm for video coding
A novel two-layer motion estimation which searches motion vectors on two layers with partial
distortion measures in order to reduce the overwhelming computational complexity
of motion estimation (ME) in video coding is proposed. A layer is an image which is derived
from the reference frame such that the summation of a block of pixels in the reference frame
determines the point of a layer. It has been noted on different video sequences that
many motion vectors on the layers are the same as those searched on the reference frame.
Experimental results on a wide variety of video sequences show that the proposed algorithm
achieves both fast speed and good motion prediction quality when compared with the state-of-
the-art fast block matching algorithms.
6. H.264-based hierarchical two-layer lossless video coding method
An efficient lossless coding technique is very important for storage and transmission applications
of error sensitive information such as medical, seismic and digital artistic data. In this study, the
authors proposed an H.264-based advance video coding (H.264/AVC)-based hierarchical
lossless coding method, where the input video will be firstly encoded by H.264/AVC coder with
a quantisation parameter (QP) selector in the base layer and the coded error is encoded by a QP-
adaptive Rice coder in the enhancement layer. To reduce encoding time, the QP selector can be
simplified to select the nearly optimal QP. Simulation results show that the proposed hierarchical
lossless coding architecture achieves better compression ratio than the traditional H.264/AVC-
based lossless coding systems. Since the proposed system could provide both lossy and lossless
coding services at the same time, the proposed lossless video coding system has advantages of
efficiency and flexibility for practical applications. Experimental results show that the proposed
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
coding system can provide less coding bits and reduce coding complexity compare with H.264-
differential pulse code modulation.
7. A cache-aware motion estimation organization for a hardware-based H.264 encoder
The video resolution required for many types of video content has increased as technology has
advanced. For the real-time encoding of the high resolutions such as full high definition (FHD),
quad-FHD (QFHD) and beyond, various fast motion estimation (ME) algorithms have been
researched. Caches are used for many fast MEs in a hardware-based encoder, in order to increase
local memory utilization and thereby reduce external memory access. However, most previous
works do not pay attention to the amount of cache access from multiple MEs. In a multi-core
environment for high resolution videos, access conflicts directly affect the computation time. In
this paper, various types of caches are compared in terms of the size, hit ratio, cache port
conflicts and hardware overhead. To reduce the amount of cache access associated with the basic
shared cache, zigzag snake scan and selective data-storage schemes are proposed for integer and
fractional MEs, respectively. Additionally, the cache access arbitration hides the computation
delay which arises due to a cache port conflict in a pipeline system. The proposed schemes are
applicable for the existing cache design achieving a good scalability in a multi-core environment.
Simulation results show that the ME computation time reduced by the proposed schemes is
comparable to that of the dual-port shared cache which shows the least amount of port conflicts.
8. An Overview of Information Hiding in H.264/AVC Compressed Video
Information hiding refers to the process of inserting information into a host to serve specific
purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
domain are surveyed. First, the general framework of information hiding is conceptualized by
relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by
using various data representation schemes such as bit plane replacement, spread spectrum,
histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which
information hiding takes place are then identified, including prediction process, transformation,
quantization, and entropy coding. Related information hiding methods at each venue are briefly
reviewed, along with the presentation of the targeted applications, appropriate diagrams, and
references. A timeline diagram is constructed to chronologically summarize the invention of
information hiding methods in the compressed still image and video domains since 1992. A
comparison among the considered information hiding methods is also conducted in terms of
venue, payload, bitstream size overhead, video quality, computational complexity, and video
criteria. Further perspectives and recommendations are presented to provide a better
understanding of the current trend of information hiding and to identify new opportunities for
information hiding in compressed video.
9. Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching
Estimating the geometric and reflective properties of the environment is important for a wide
range of applications of space-time audio processing, from acoustic scene analysis to room
equalization and spatial audio rendering. In this manuscript, we propose a methodology for
frequency-subband in-situ estimation of the reflection coefficients of planar surfaces. This is a
rather challenging task, as the reflection coefficients depend on the frequency and the angle of
incidence and their estimate is highly sensitive to background noise and interfering sources. Our
method is based on the assumption that we know the geometry of the reflectors; the position and
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
the radiation pattern of the source; the position and the spatial response of the array. Applying
beamforming algorithms on a single set of measured sensor data, we estimate the angular
distribution of the acoustic energy (angular pseudospectrum) that impinges on a microphone
array. We then apply a two-step iterative estimation technique based on an Expectation-
Maximization (EM) algorithm. The first step estimates the scaling factors. The second one infers
the reflection coefficients from the scaling factors. Under the assumption of additive white
Gaussian noise, we finally determine the reflection coefficients with a Maximum Likelihood
(ML) estimation method. The effectiveness and the accuracy of the proposed technique are
assessed through experiments based on measured data.
10. Speech Processing on a Reconfigurable Analog Platform
We describe architectures for audio classification front ends on a reconfigurable analog platform.
Real-time implementation of audio processing algorithms involving discrete-time signals tend to
be power-intensive. We present an alternate continuous-time system implementation of a noise-
suppression algorithm on our reconfigurable chip, while detailing the design considerations. We
also describe a framework that enables future implementations of other
speech processing algorithms, classifier front ends, and hearing aids.
11. Nonlinear Audio Systems Identification Through Audio Input Gaussianization
Nonlinear audio system identification generally relies on Gaussianity, whiteness and stationarity
hypothesis on the input signal, although audio signals are non-Gaussian, highly correlated and
non-stationary. However, since the physical behavior of nonlinear audio systems is input-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
dependent, they should be identified using natural audio signals (speech or music) as input,
instead of artificial signals (sweeps or noise) as usually done. We propose an identification
scheme that conditions audio signals to fit the desired properties for an efficient identification.
The identification system consists in (1) a Gaussianization step that makes the signal near-
Gaussian under a perceptual constraint; (2) a predictor filterbank that whitens the signal; (3) an
orthonormalization step that enhances the statistical properties of the input vector of the last step,
under a Gaussianity hypothesis; (4) an adaptive nonlinear model. The proposed scheme enhances
the convergence rate of the identification and reduces the steady state identification error,
compared to other schemes, for example the classical adaptive nonlinear identification.
12. Low Distortion Switching Amplifier With Discrete-Time Click Modulation
An all-digital Class-D amplifier based on a discrete-time implementation of the click modulator
is presented. The algorithm is able to generate binary signals with separated baseband, displacing
the harmonic content produced by the modulation process above certain frequency chosen by the
designer. Perfect demodulation can be achieved by a simple low-pass filter. Previous
implementations of the discrete-time click modulator reported in the literature suffer from
aliasing in the frequency domain. The approach proposed here avoids aliasing, without the
necessity to increase (interpolate) the sampling frequency of the signals. Following a brief
theoretical introduction, the performance of the proposed architecture is demonstrated by
experimental measurements performed on an H-bridge amplifier. An 88 dB signal-to-noise ratio
(SNR) and a total harmonic distortion (THD) + N less than 0.04% is attainable over the
entire audio band, extending from 20 Hz up to 20 kHz; on the other hand, no traces of IMD
appear above the predicted noise floor. These performance indices are obtained for switching
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
rates as low as 40 kHz. The reduction of the switching frequency provides more flexibility for
the design of the demodulation stage allowing to trade off between the complexity of the
demodulation filter and the achievable efficiency of the switching stage.
13. ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for
Real-Time Segmentation of High Definition Video
Background identification is a common feature in many video processing systems. This paper
proposes two hardware implementations of the OpenCV version of the Gaussian mixture model
(GMM), a background identification algorithm. The implemented version of the algorithm
allows a fast initialization of the background model while an innovative, hardware-oriented,
formulation of the GMM equations makes the proposed circuits able to perform real-time
background identification on high definition (HD) video sequences with frame size 1920 × 1080.
The first of the two circuits is designed with commercial field-programmable gate-array (FPGA)
devices as target. When implemented on Virtex6 vlx75t, the proposed circuit process 91 HD fps
(frames per second) and uses 3% of FPGA logic resources. The second circuit is oriented to the
implementation in UMC-90 nm CMOS standard cell technology, and is proposed in two
versions. Both versions can process at a frame rate higher than 60 HD fps. The first version uses
the constant voltage scaling technique to provide a low power implementation. It provides silicon
area occupation of 28847 μm2 and energy dissipation per pixel of 15.3 pJ/pixel. The second
version is designed to reduce silicon area utilization and occupies 21847 μm2with an energy
dissipation of 49.4 pJ/pixel.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
14. VLSI Architecture Design of Guided Filter for 30 Frames/s Full-HD Video
Filtering is widely used in image and video processing for various applications. Recently, the
guided filter has been proposed and became one of the popular filtering methods. In this paper, to
achieve the computation demand of guided filtering in full-HD video, a double integral image
architecture for guided filter ASIC design is proposed. In addition, a reformation of the guided
filter formula is proposed, which can prevent the error resulted from truncation in the fractional
part and modify the regularization parameter ε on user's demand. The hardware architecture of
the guided image filter is then proposed and can be embedded in mobile devices to achieve real-
time HD applications. To the best of our knowledge, this paper is also the first ASIC design for
guided image filter. With a TSMC 90-nm cell library, the design can operate at 100 MHz and
support for Full-HD (1920 × 1080) 30 frame/s with 92.9K gate counts and 3.2 KB on-chip
memory. Moreover, for the hardware efficiency, our architecture is also the best compared to
other previous works with bilateral filter.
15. Video Colorization Using Parallel Optimization in Feature Space
We present a new scheme for video colorization using optimization in rotation-aware Gabor
feature space. Most current methods of video colorization incur temporal artifacts and
prohibitive processingcosts, while this approach is designed in a spatiotemporal manner to
preserve temporal coherence. The parallel implementation on graphics hardware is also
facilitated to achieve realtime performance of color optimization. By adaptively
clustering video frames and extending Gabor filtering to optical flow computation, we can
achieve real-time color propagation within and between frames. Temporal coherence is further
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
refined through user scribbles in video frames. The experimental results demonstrate that our
proposed approach is efficient in producing high-quality colorized videos.
16. Joint Non-Gaussian Denoising and Superresolving of Raw High Frame Rate Videos
High frame rate cameras capture sharp videos of highly dynamic scenes by trading off signal-
noise-ratio and image resolution, so combinational super-resolving and denoising is crucial for
enhancing high speed videos and extending their applications. The solution is nontrivial due to
the fact that two deteriorations co-occur during capturing and noise is nonlinearly dependent on
signal strength. To handle this problem, we propose conducting noise separation and super
resolution under a unified optimization framework, which models both spatiotemporal priors of
high quality videos and signal-dependent noise. Mathematically, we align the frames along
temporal axis and pursue the solution under the following three criterion: 1) the sharp noise-free
image stack is low rank with some missing pixels denoting occlusions; 2) the noise follows a
given nonlinear noise model; and 3) the recovered sharp image can be reconstructed well with
sparse coefficients and an over complete dictionary learned from high quality natural images. In
computation aspects, we propose to obtain the final result by solving a convex optimization
using the modern local linearization techniques. In the experiments, we validate the proposed
approach in both synthetic and real captured data.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
17. Intuitive real-times platform for audio signal processing and musical instrument
response emulation
In recent years, the DSP group at the University of Manchester has developed a range of DSP
platforms for realtime filtering and processing of acoustic signals. These include Signal Wizard
2.5, Signal Wizard 3 and Vsound. These incorporate processors operating at 100 million
multiplication-accumulations per second (MMACs) for SW 2.5 and 600 MMACS for SW 3 and
Vsound. SW 3 features six input and eight output analogue channels, digital input/output in the
form of S/PDIF and a USB interface. For all devices, The software allows the user, with no
knowledge of filter theory or programming, to design and run standard or completely arbitrary
FIR, IIR and adaptive filters. Processing tasks are specified using the graphical icon based
interface. In addition, the system has the capability to emulate in real-time linear system behavior
such as sensors, instrument bodies, string vibrations, resonant spaces and electrical networks.
Tests have confirmed a high degree of fidelity between the behavior of the physical system and
its digitally emulated counterpart. In addition to the supplied software, the user may also
program the system using a variety of commercial packages via the JTAG interface.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
1. On the Relation of Random Grid and Deterministic Visual Cryptography
Visual cryptography is a special type of secret sharing. Two models of visual cryptography have
been independently studied: 1) deterministic visual cryptography, introduced by Naor and
Shamir, and 2) random grid visual cryptography, introduced by Kafri and Keren. In this paper,
we show that there is a strict relation between these two models. In particular, we show that to
any random grid scheme corresponds a deterministic scheme and vice versa. This allows us to
use results known in a model also in the other model. By exploiting the (many) results known in
the deterministic model, we are able to improve several schemes and to provide many upper
bounds for the random grid model and by exploiting some results known for the random grid
model, we are also able to provide new schemes for the deterministic model. A side effect of this
paper is that future new results for any one of the two models should not ignore, and in fact be
compared with, the results known in the other model.
2. Efficient Algorithm and Architecture for Elliptic Curve Cryptography for
Extremely Constrained Secure Applications
Recently, considerable research has been performed in cryptography and security to optimize the
area, power, timing, and energy needed for the point multiplication operations over binary
elliptic curves. In this paper, we propose an efficient implementation of point multiplication on
Koblitz curves targeting extremely-constrained, secure applications. We utilize the Gaussian
Cryptography and Steganography
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
normal basis (GNB) representation of field elements over GF(2m
) and employ an efficient bit-
level GNB multiplier. One advantage of this GNB multiplier is that we are able to reduce the
hardware complexity through sharing the addition/accumulation with other field additions. We
utilized the special property of normal basis representation and squarings are implemented very
efficiently by only rewiring in hardware. We introduce a new technique for point addition in
affine coordinate which requires fewer registers. Based on this technique, we propose an
extremely small processor architecture for point multiplication. Through application-specific
integrated circuit (ASIC) implementations, we evaluate the area, performance, and energy
consumption of the proposed crypto-processor. Utilizing two different working frequencies, it is
shown that the proposed architecture reaches better results compared to the previous works,
making it suitable for extremely-constrained, secure environments.
3. Property Analysis of XOR-Based Visual Cryptography
A (k,n) visual cryptographic scheme (VCS) encodes a secret image into n shadow images
(printed on transparencies) distributed among n participants. When any k participants
superimpose their transparencies on an overhead projector (OR operation), the secret image can
be visually revealed by a human visual system without computation. However, the monotone
property of OR operation degrades the visual quality of reconstructed image for OR-based VCS
(OVCS). Accordingly, XOR-based VCS (XVCS), which uses XOR operation for decoding, was
proposed to enhance the contrast. In this paper, we investigate the relation between OVCS and
XVCS. Our main contribution is to theoretically prove that the basis matrices of (k,n)-OVCS can
be used in (k,n)-XVCS. Meantime, the contrast is enhanced 2(k-1)
times.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
4. Multifunction Residue Architectures for Cryptography
A design methodology for incorporating Residue Number System (RNS) and Polynomial
Residue Number System (PRNS) in Montgomery modular multiplication in GF(p) or GF(2n)
respectively, as well as a VLSI architecture of a dual-field residue arithmetic Montgomery
multiplier are presented in this paper. An analysis of input/output conversions to/from residue
representation, along with the proposed residue Montgomery multiplication algorithm, reveals
common multiply-accumulate data paths both between the converters and between the two
residue representations. A versatile architecture is derived that supports all operations of
Montgomery multiplication in GF(p) and GF(2n), input/output conversions, Mixed Radix
Conversion (MRC) for integers and polynomials, dual-field modular exponentiation and
inversion in the same hardware. Detailed comparisons with state-of-the-art implementations
prove the potential of residue arithmetic exploitation in dual-field modular multiplication.
5. Error Detection and Recovery for ECC: A New Approach Against Side-Channel
Attacks
Side channel attacks allow an attacker to retrieve secret keys with far less effort than other
attacks. Countermeasures against these attacks should be considered during cryptosystem design.
This paper presents a novel low-cost error detection and recovery scheme (LOEDAR) to counter
fault attacks. The proposed architecture retains the efficiency of the Montgomery ladder
algorithm and shows strong resistance to both environmental-induced faults as well as attacker-
introduced faults. Moreover, the proposed LOEDAR scheme is compatible with most existing
countermeasures against various power analysis attacks including differential power analysis and
its variants, which makes it extendable to a comprehensive countermeasure against both fault
attacks and power analysis attacks.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
6. Effectiveness of Leakage Power Analysis Attacks on DPA-Resistant Logic Styles
Under Process Variations
This paper extends the analysis of the effectiveness of Leakage Power Analysis (LPA) attacks to
cryptographic VLSI circuits on which circuit level countermeasures against Differential Power
Analysis (DPA) are adopted. Security metrics used for assessing the DPA-resistance of crypto
core implementations, such as the minimum number to disclosure (MTD) and the asymptotic
correlation coefficient, have been extended to the case of LPA. The LPA-resistance has been
evaluated in terms of MTD as a function of the on chip noise. Noise variances up to 10000 times
greater than the signal variance have been taken into account and LPA attacks have been
successfully executed for all the logic styles under analysis using less than 100000
measurements. Moreover the role of process variations has been investigated through extensive
Monte Carlo simulations in order to evaluate their impact on the leakage model for the logic
styles under analysis. Results show that LPA attacks can be successfully carried out on the
different anti-DPA logic styles even in presence of process variations. To the best of our
knowledge, this work proves for the first time the effectiveness of LPA attacks in a real scenario
where on chip noise and process variations are taken into account.
7. New and Improved Methods to Analyze and Compute Double-Scalar
Multiplications
We address several algorithms to perform a double-scalar multiplication on an elliptic curve. All
the methods investigated are related to the double-base number system (DBNS) and extend
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
previous work of Doche et al. [25]. We refine and rigorously prove the complexity analysis of
the joint binary-ternary (JBT) algorithm. Experiments are in line with the theory and show that
the JBT requires approximately 6 percent less field multiplications than the standard joint sparse
form (JSF) method to compute [n]P + [m]Q. We also introduce a randomized version of the JBT,
called JBT-Rand, that gives total control of the number of triplings in the expansion that is
produced. So it becomes possible with the JBT-Rand to adapt and tune the number of triplings to
the coordinate system and bit length that are used, to further decrease the cost of a double-scalar
multiplication. Then, we focus on Koblitz curves. For extension degrees enjoying an optimal
normal basis of type II, we discuss a Joint τ-DBNS approach that reduces the number of field
multiplications by at least 35 percent over the traditional τ-JSF. For other extension degrees
represented in polynomial basis, the Joint τ-DBNS is still relevant provided that appropriate
bases conversion methods are used. In this situation, tests show that the speedup over the τ-JSF
is then larger than 20 percent. Finally, when the use of the τ-DBNS becomes unrealistic, for
instance because of the lack of an efficient normal basis or the lack of memory to allow an
efficient conversion, we adapt the joint binary-ternary algorithm to Koblitz curves giving rise to
the Joint τ-τ method whose complexity is analyzed and proved. The Joint τ-τ induces a speedup
of about 10 percent over the τ-JSF.
8. A Hybrid Scheme for Authenticating Scalable Video Codestreams
A scalable video coding (SVC) codestream consists of one base layer and possibly several
enhancement layers. The base layer, which contains the lowest quality and resolution images, is
the foundation of the SVC codestream and must be delivered to recipients, whereas enhancement
layers contain richer contour/texture of images in order to supplement the base layer in
resolution, quality, and temporal scalabilities. This paper presents a novel hybrid authentication
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
(HAU) scheme. The HAU employs both cryptographic authentication and content-based
authentication techniques to ensure integrity and authenticity of the SVC codestreams. Our
analysis and experimental results indicate that the HAU is able to detect malicious manipulations
and locate the tampered image regions while is robust to content-preserving manipulations for
enhancement layers. Although our focus in this paper is on authenticating H.264/SVC
codestreams, the proposed technique is also applicable to authenticate other scalable multimedia
contents such as MPEG-4 fine grain scalability and JPEG2000 codestreams.
9. Authenticated Encryption: Toward Next-Generation Algorithms
Wondering whether researchers have a cryptographic tool able to provide both confidentiality
(privacy) and integrity (authenticity) of a message? They do: authenticated encryption (AE), a
symmetric-key mechanism that transforms a message into a ciphertext. This article discusses
standard AE algorithms, classic security models' shortcomings for AE algorithms, and related
attacks. Motivated by these attacks, the crypto community started CAESAR (Competition for
Authenticated Encryption: Security, Applicability, and Robustness) to promote the development
of next-generation AE algorithms.
10. E-MACs: Toward More Secure and More Efficient Constructions of Secure
Channels
In cryptography, secure channels enable the confidential and authenticated message exchange
between authorized users. A generic approach of constructing such channels is by combining an
encryption primitive with an authentication primitive (MAC). In this work, we introduce the
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
design of a new cryptographic primitive to be used in the construction of secure channels.
Instead of using general purpose MACs, we propose the deployment of special purpose MACs,
named ε-MACs. The main motivation behind this work is the observation that, since the message
must be both encrypted and authenticated, there might be some redundancy in the computations
performed by the two primitives. Therefore, removing such redundancy can improve the
efficiency of the overall composition. Moreover, computations performed by the encryption
algorithm can be further utilized to improve the security of the authentication algorithm. In
particular, we will show how ε-MACs can be designed to reduce the amount of computation
required by standard MACs based on universal hash functions, and show how ε-MACs can be
secured against key-recovery attacks.
11. Robust lightweight fingerprint encryption using random block feedback
Fingerprint encryption in embedded environments should satisfy both lightweightedness and
secureness. Normally, the encryption scheme divides the 8-bit pixel images into bit planes and
then performs full encryption for one bit plane, e.g. least significant bit plane, and simple
operations for the remaining bit planes. Thus, the scheme performs better compared with the 8-
bit full encryption, while the security is decreased since only one bit plane is fully encrypted. An
innovative fingerprint encryption scheme is proposed which supports better security while
maintaining the overall performance. The proposed scheme uses a bit plane encryption and a
random block feedback. The encryption schemes are implemented and tested with 320 sample
fingerprint images. The result shows that the scheme has superior aspects compared with the
existing bit plane encryption and even with the naive full encryption.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
12. Optimising the SHA-512 cryptographic hash function on FPGAs
In this study, novel pipelined architectures, optimised in terms of throughput and throughput/area
factors, for the SHA-512 cryptographic hash function, are proposed. To achieve this,
algorithmic- and circuit-level optimisation techniques such as loop unrolling, re-timing, temporal
pre-computation, resource re-ordering and pipeline are applied. All the techniques, except
pipeline are applied in the function's transformation round. The pipeline was applied through the
development of all the alternative pipelined architectures and implementation in several Xilinx
FPGA families and they are evaluated in terms of frequency, area, throughput and
throughput/area factors. Compared to the initial un-optimised implementation of SHA-512
function, the introduced five-stage pipelined architecture improves the both the throughput and
throughput/area factors by 123 and 61.5%, respectively. Furthermore, the proposed five-stage
pipelined architecture outperforms the existing ones both in throughput (3.4× up to 16.9×) and
throughput/area (19.5% up to 6.9×) factors.
13. Constructions of Resilient S-Boxes With Strictly Almost Optimal Nonlinearity
Through Disjoint Linear Codes
In this paper, a novel approach of finding disjoint linear codes is presented. The cardinality of a
set of [u, m, t+1] disjoint linear codes largely exceeds all the previous best known methods used
for the same purpose. Using such sets of disjoint linear codes, not necessarily of the same length,
we have been able to provide a construction technique of t-resilient S-boxes F:F2n→2
m ( n even,
) with strictly almost optimal nonlinearity . This is the first time that the bound 2n-1
-2n/2
has been
exceeded by multiple output resilient functions. Actually, the nonlinearity of our functions is in
many cases equal to the best known nonlinearity of balanced Boolean functions. A large class of
previously unknown cryptographic resilient S-boxes is obtained, and several improvements of
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
the original approach are proposed. Some other relevant cryptographic properties are also briefly
discussed. It is shown that these functions may reach Siegenthaler's bound n-t-1, and can be
either of optimal algebraic immunity or of slightly suboptimal algebraic immunity, which was
confirmed by simulations.
14. Data Hiding in Encrypted H.264/AVC Video Streams by Codeword Substitution
Digital video sometimes needs to be stored and processed in an encrypted format to maintain
security and privacy. For the purpose of content notation and/or tampering detection, it is
necessary to perform data hiding in these encrypted videos. In this way, data hiding in encrypted
domain without decryption preserves the confidentiality of the content. In addition, it is more
efficient without decryption followed by data hiding and re-encryption. In this paper, a novel
scheme of data hiding directly in the encrypted version of H.264/AVC video stream is proposed,
which includes the following three parts, i.e., H.264/AVC video encryption, data embedding, and
data extraction. By analyzing the property of H.264/AVC codec, the codewords of
intraprediction modes, the codewords of motion vector differences, and the codewords of
residual coefficients are encrypted with stream ciphers. Then, a data hider may embed additional
data in the encrypted domain by using codeword substitution technique, without knowing the
original video content. In order to adapt to different application scenarios, data extraction can be
done either in the encrypted domain or in the decrypted domain. Furthermore, video file size is
strictly preserved even after encryption and data embedding. Experimental results have
demonstrated the feasibility and efficiency of the proposed scheme.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
15. A Novel Joint Data-Hiding and Compression Scheme Based on SMVQ and Image
Inpainting
In this paper, we propose a novel joint data-hiding and compression scheme for digital images
using side match vector quantization (SMVQ) and image inpainting. The two functions
of data hiding and image compression can be integrated into one single module seamlessly. On
the sender side, except for the blocks in the leftmost and topmost of the image, each of the other
residual blocks in raster-scanning order can be embedded with secret data and compressed
simultaneously by SMVQ or image inpainting adaptively according to the current embedding bit.
Vector quantization is also utilized for some complex blocks to control the visual distortion and
error diffusion caused by the progressive compression. After segmenting the image compressed
codes into a series of sections by the indicator bits, the receiver can achieve the extraction of
secret bits and image decompression successfully according to the index values in the segmented
sections. Experimental results demonstrate the effectiveness of the proposed scheme.
16. A New Secure Image Transmission Technique via Secret-Fragment-Visible Mosaic
Images by Nearly Reversible Color Transformations
A new secure image transmission technique is proposed, which transforms automatically a given
large-volume secret image into a so-called secret-fragment-visible mosaic image of the same
size. The mosaic image, which looks similar to an arbitrarily selected target image and may be
used as a camouflage of the secret image, is yielded by dividing the secret image into fragments
and transforming their color characteristics to be those of the corresponding blocks of the target
image. Skillful techniques are designed to conduct the color transformation process so that the
secret image may be recovered nearly losslessly. A scheme of handling the
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
overflows/underflows in the converted pixels' color values by recording the color differences in
the untransformed color space is also proposed. The information required for recovering the
secret image is embedded into the created mosaic image by a lossless data hiding scheme using a
key. Good experimental results show the feasibility of the proposed method.
17. An Overview of Information Hiding in H.264/AVC Compressed Video
Information hiding refers to the process of inserting information into a host to serve specific
purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video
domain are surveyed. First, the general framework of information hiding is conceptualized by
relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by
using various data representation schemes such as bit plane replacement, spread spectrum,
histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which
information hiding takes place are then identified, including prediction process, transformation,
quantization, and entropy coding. Related information hidingmethods at each venue are briefly
reviewed, along with the presentation of the targeted applications, appropriate diagrams, and
references. A timeline diagram is constructed to chronologically summarize the invention of
information hiding methods in the compressed still image and video domains since 1992. A
comparison among the considered information hiding methods is also conducted in terms of
venue, payload, bitstream size overhead, video quality, computational complexity, and video
criteria. Further perspectives and recommendations are presented to provide a better
understanding of the current trend of information hiding and to identify new opportunities for
information hiding in compressed video.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
18. Optimal Transport for Secure Spread-Spectrum Watermarking of Still Images
This paper studies the impact of secure watermark embedding in digital images by proposing a
practical implementation of secure spread-spectrum watermarking using distortion optimization.
Because strong security properties (key-security and subspace-security) can be achieved using
naturalwatermarking (NW) since this particular embedding lets the distribution of the host and
watermarked signals unchanged, we use elements of transportation theory to minimize the global
distortion. Next, we apply this new modulation, called transportation NW (TNW), to design a
secure watermarking scheme for grayscale images. The TNW uses a multiresolution image
decomposition combined with a multiplicative embedding which is taken into account at the
distribution level. We show that the distortion solely relies on the variance of the wavelet
subbands used during the embedding. In order to maximize a target robustness after JPEG
compression, we select different combinations of subbands offering the lowest Bit Error Rates
for a target PSNR ranging from 35 to 55 dB and we propose an algorithm to select them. The use
of transportation theory also provides an average PSNR gain of 3.6 dB on PSNR with respect to
the previous embedding for a set of 2000 images.
19. A Phase-Based Audio Watermarking System Robust to Acoustic Path Propagation
Today, comparing audio watermarking systems remain a challenge due to the lack of publicly-
available reference algorithms. In addition, robustness against acoustic path transmission is only
occasionally evaluated. This jeopardizes the chances of digital watermarking to be adopted in the
context of applications where such a feature is vital, e.g., second screen, audience measurement,
and so on. In this paper, we introduce a rather simple audio watermarking algorithm, whose
source code has been publicized for potential reuse by the watermarking community. We then
complement this baseline system with three additional components, namely a psychoacoustic
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
model, a resynchronization framework, and an improved correlation-based detector. Reported
experimental results clearly demonstrate that the resulting high-fidelity
audio watermarking system manages to survive the acoustic path.
20. Tolerance Evaluation for Defocused Images to Optical Watermarking Technique
In this paper, we describe a new aspect to evaluating the robustness of the
optical watermarkingtechnique, which is a unique technology that can add watermarked
information to object image data taken with digital cameras without any specific extra hardware
architecture. However, since this technology uses light with embedded watermarked information,
which is irradiated onto object images, the condition of taking a picture with digital cameras may
affect the accuracy with which embedded watermarked data can be detected. Images taken with
digital cameras are usually defocused, which occurs under non-optimal conditions. We evaluated
the defocusing in images against the accuracy with which optical watermarking could be
detected. Defocusing in images can be expressed with convolution with a line-spread function
(LSF). We used the value of full-width at half-maximum (FWHM) of a Gaussian function as the
degree to which images were defocused, which could approximate LSF. We carried out
experiments where the accuracies of detection were evaluated as we varied the degree to which
images were defocused. The results from the experiments revealed that
optical watermarkingtechnology was extremely robust against defocusing in images.
21. Adaptive Watermarking and Tree Structure Based Image Quality Estimation
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
Image quality evaluation is very important. In applications involving signal transmission, the
Reduced- or No-Reference quality metrics are generally more practical than the Full-Reference
metrics. In this study, we propose a quality estimation method based on a novel semi-fragile and
adaptivewatermarking scheme. The proposed scheme uses the embedded watermark to estimate
the degradation of cover image under different distortions. The watermarking process is
implemented in DWT domain of the cover image. The correlated DWT coefficients across the
DWT subbands are categorized into Set Partitioning in Hierarchical Trees (SPIHT). Those
SPHIT trees are further decomposed into a set of bitplanes. The watermark is embedded into the
selected bitplanes of the selected DWT coefficients of the selected tree without causing
significant fidelity loss to the cover image. The accuracy of the quality estimation is made to
approach that of Full-Reference metrics by referring to an "Ideal Mapping Curve" computed a
priori. The experimental results show that the proposed scheme can estimate image quality in
terms of PSNR, wPSNR, JND and SSIM with high accuracy under JPEG compression,
JPEG2000 compression, Gaussian low-pass filtering and Gaussian noise distortion. The results
also show that the proposed scheme has good computational efficiency for practical applications.
22. A Fragile Watermarking Algorithm for Hologram Authentication
A fragile watermarking algorithm for hologram authentication is presented in this paper. In the
proposed algorithm, the watermark is embedded in the discrete cosine transform (DCT) domain
of a hologram. The watermarked hologram is stored in spatial domain with finite precision level.
By enhancing the precision for storing the watermarked hologram pixels, the distortion produced
by the proposedwatermarking scheme can be lowered. While providing high perceptual
transparency, the proposed algorithm also attains high performance detection to delivery errors
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
and malicious tampering. Experimental results reveal that the proposed algorithm can be used as
an effective filter for blocking polluted or tampered holograms from 3D magnitude and/or phase
reconstruction.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
1. Time Domain Channel Estimation for OQAM-OFDM Systems:
Algorithms and Performance Bounds
In this paper, we first present a general time domain model for
the channel estimation in the orthogonal frequency division multiplexing system
with offset quadrature amplitude modulation (OQAM-OFDM), and utilize the
frequency domain pilots to estimate the time domain channel impulse responses.
Different form the conventional methods, there is no specific requirement for the
length of the symbol interval compared to the the maximum channel delay spread
in the proposed scheme. Furthermore, with the proposed time domain model,
the channel statistic information could be utilized to improve the performance of
the channel estimation. Then, we propose two channel estimation schemes, i.e.,
linear minimum mean square error (LMMSE) and weighted least square (WLS),
and we also derive their corresponding Bayesian Cramer -Rao Bound (BCRB) and
Cramer-Rao Bound (CRB) bounds, respectively. Simulation results demonstrate
that the BCRB and CRB bounds could be achieved by the proposed LMMSE and
WLS methods, respectively. Moreover, simulation results show that the proposed
methods are much robust to the time synchronization error compared to the
conventional frequency domain methods, and imply that the pulse shaping filter
Wireless Communication & 4G Technology
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
with waveforms concentrated in the time domain could be employed in OQAM-
OFDM systems to improve the channel estimationperformance and spectral
efficiency.
2. Robust Training Sequence Design for Correlated MIMO Channel
Estimation
We study how to design a worst-case robust training sequence for multiple-input
multiple-output (MIMO)channel estimation. We consider mean-squared error
of channel estimates as the figure of merit which is a function of second-order
statistics of the MIMO channel, i.e., channel covariance matrix, in order to
optimize training sequences under a total power constraint. In practical
applications, the channelcovariance matrix is not known perfectly. Thus the main
aspect of our design is to improve robustness of the training sequences against
possible uncertainties in the available channel covariance matrix. Using a
deterministic uncertainty model, we formulate a robust training sequence design as
a minimax optimization problem where we take such imperfections into account.
We investigate the robust design problem assuming the general case of an
arbitrarily correlated MIMO channel and a non-empty compact convex uncertainty
set. We prove that such a problem admits a globally optimal solution by exploiting
the convex-concave structure of the objective function, and propose numerical
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
algorithms to address the robust training design problem. We proceed the analysis
by considering multiple-input single-output (MISO) channels and Kronecker
structured MIMO channels along with unitarily-invariant uncertainty sets. For
these scenarios, we show that the problem is diagonalized by the eigenvectors of
the nominal covariance matrices so that the robust design is significantly simplified
from a complex matrix-variable problem to a real vector-variable power allocation
problem. For the MISO channel, we provide closed-form solutions for the robust
training sequences with the uncertainty sets defined by the spectral norm and
nuclear norm.
3. On Forward Channel Estimation for MIMO Precoding in Cooperative
Relay Wireless Transmission Systems
Linear precoding for wireless multi-input multi-output (MIMO) transceivers has
demonstrated substantial strength in cooperative relay networks for achieving high
system throughput and performance. However, traditional precoder optimization
critically assumes knowledge of channel state information (CSI) at source nodes.
For linear MIMO source precoding design, we propose a novel method to estimate
the quadratic product of forward-link channel information between source and
relay nodes. To conserve bandwidth, our source estimates the forward-link MIMO
CSI by utilizing inherent signals transmitted by amplify-and-forward (AF) relays
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
without requiring the cumbersome default method of coordinated
relay channel estimation and relay feedback of its estimated CSI. From the
overheard AF relay signal, the source node simply extracts the
quadratic channel information of its forward-link for designing its precoder. In
addition to presenting a low overhead method for forward channel estimation, we
also analyze the channel estimation performance by investigating its bias and its
Cramer-Rao lower bound. Finally, we present analytical results in comparison with
simulations.
4. Impact of Channel Estimation Errors on SC-FDE Systems
Single carrier transmissions with frequency domain equalization (SC-FDE) have
gained widespread use in emergent broadband wireless systems becoming an
attractive alternative to popular Orthogonal Frequency Division Multiplexing
(OFDM) schemes, particularly at the uplink. Since coherent receivers are usually
employed with SC-FDE, accurate channel estimates are required so as to avoid
substantial performance degradation. Several channel estimation strategies have
been proposed for SC-FDE, but a thoroughly evaluation of the degradation caused
by channel estimation errors and a comparison against OFDM is still lacking. In
this paper we study the impact of imperfect channel knowledge on SC transmission
with focus on the linear frequency domain equalizer (FDE) and on the Iterative
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
Block Decision Feedback Equalizer (IB-DFE). We propose a modified IB-DFE
which incorporates knowledge of the channel estimation error model and show that
its performance becomes more robust against the presence of strong error
components in the channel estimates. We also evaluate, analytically and through
simulations, the degradation caused by imperfect channel estimation in SC-FDE
and compare it against OFDM schemes (Orthogonal Frequency Division
Multiplexing). It is shown that the channelestimation requirements for SC-FDE are
higher than for OFDM unless a channel estimation error aware receiver is
employed.
5. Pilot Design for Sparse Channel Estimation in OFDM-Based Cognitive
Radio Systems
In this correspondence, sparse channel estimation is first introduced in orthogonal
frequency-division multiplexing (OFDM)-based cognitive radio systems. Based on
the results of spectrum sensing, the pilot design is studied by minimizing the
coherence of the dictionary matrix used for sparse recovery. Then, it is formulated
as an optimal column selection problem where a table is generated and the indexes
of the selected columns of the table form a pilot pattern. A novel scheme using
constrained cross-entropy optimization is proposed to obtain an optimized pilot
pattern, where it is modeled as an independent Bernoulli random process. The
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
updating rule for the probability of each active subcarrier selected as a pilot
subcarrier is derived. A projection method is proposed so that the number of pilots
during the optimization is fixed. Simulation results verify the effectiveness of the
proposed scheme and show that it can achieve 11.5% improvement in spectrum
efficiency with the same channel estimationperformance compared with the least
squares (LS) channel estimation.
6. Low complexity minimum mean square error channel estimation for
adaptive coding and modulation systems
Performance of the Adaptive Coding and Modulation (ACM) strongly depends on
the retrieved ChannelState Information (CSI), which can be obtained using
the channel estimation techniques relying on pilot symbol transmission. Earlier
analysis of methods of pilot-aided channel estimation for ACM systems were
relatively little. In this paper, we investigate the performance of CSI prediction
using the Minimum Mean Square Error (MMSE) channel estimator for an ACM
system. To solve the two problems of MMSE: high computational operations and
oversimplified assumption, we then propose the Low-Complexity schemes (LC-
MMSE and Recursion LC-MMSE (R-LC-MMSE)). Computational complexity
and Mean Square Error (MSE) are presented to evaluate the efficiency of the
proposed algorithm. Both analysis and numerical results show that LC-MMSE
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
performs close to the well-known MMSE estimator with much lower complexity
and R-LC-MMSE improves the application of MMSE estimation to specific
circumstances.
7. Embedded Iterative Semi-Blind Channel Estimation for Three-Stage-
Concatenated MIMO-Aided QAM Turbo Transceivers
The lack of accurate and efficient channel estimation (CE) for multiple-input-
multiple-output (MIMO)channel state information (CSI) has long been the
stumbling block of near-MIMO-capacity operation. We propose a semi-blind joint
CE and three-stage iterative detection/decoding scheme for near-capacity MIMO
systems. The main novelty is that our decision-directed (DD) CE exploits the a
posteriori information produced by the MIMO soft demapper within the inner
turbo loop to select a “just sufficient number” of high-quality detected soft bit
blocks or symbols for DDCE, which significantly improves the accuracy and
efficiency of DDCE. Moreover, our DDCE is naturally embedded into the iterative
three-stage detection/decoding process, without imposing an additional external
iterative loop between the DDCE and the three-stage turbo detector/decoder.
Hence, the computational complexity of our joint CE and three-stage turbo
detector/decoder remains similar to that of the three-stage turbo detection/decoding
scheme associated with the perfect CSI. Most significantly, the mean square error
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
(MSE) of our DD channel estimator approaches the Cramer -Rao lower bound
(CRLB) associated with the optimal-training-based CE, whereas the bit error rate
(BER) of our semi-blind scheme is capable of achieving the optimal maximum-
likelihood (ML) performance bound associated with the perfect CSI.
8. Channel estimation relying on the minimum bit-errorratio criterion for
BPSK and QPSK signals
The authors consider the channel estimation problem in the context of a linear
equaliser designed for a frequency selective channel, which relies on the minimum
bit-error-ratio (MBER) optimisation framework. Previous literature has shown that
the MBER-based signal detection may outperform its minimum-mean-square-error
(MMSE) counterpart in the bit-error-ratio performance sense. In this study, they
develop a framework for channel estimation by first discretising the parameter
space and then posing it as a detection problem. Explicitly, the MBER cost
function (CF) is derived and its performance studied, when transmitting binary
phase shift keying (BPSK) and quadrature phase shift keying (QPSK) signals. It is
demonstrated that the MBER based CF aided scheme is capable of outperforming
existing MMSE, least square-based solutions.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
9. Low-Complexity DFT-Based Channel Estimation with Leakage Nulling
for OFDM Systems
In this letter, a low-complexity but near-optimal DFT-based channel estimator with
leakage nulling is proposed for OFDM systems using virtual subcarriers. The
proposed estimator is composed of a time-domain (TD) index
set estimation considering the leakage effect followed by a low-complexity TD
post-processing to suppress the leakage. The performance and complexity of the
proposed channelestimator are analyzed and verified by computer simulation.
Simulation results show that the proposed estimator outperforms conventional
estimators and provides near-optimal performance while keeping the low
complexity comparable to the simple DFT-based channel estimator.
10. Improved Matching-Pursuit Implementation for LTE Channel
Estimation
An implementation of a reduced complexity matching pursuit channel estimator
for LTE is presented. The design contains an FFT/IFFT module with non-radix-2
units and a core estimator. The module is flexible enough to perform FFT and
IFFT at different resolutions needed, using the same hardware. Based on prior
work the needed internal word lengths are found. Internal shifts are employed to
maximize the use of available resources. The design is implemented in a 65 nm
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
low power process from STMicroelectronics. The total area of the implementation
is 1 mm2 design, including input pads and extra control logic. The algorithmic
improvements reduce the complexity by up to 56% compared to prior art. At the
same time estimator shows great improvement in speed, allowing over 6 times the
number of estimations in the same time. Power consumption of the estimator is
simulated to ~ 20 mW, running at 70 MHz.
1. Time-Based All-Digital Technique for Analog Built-in Self-Test
A scheme for built-in self-test of analog signals with minimal area overhead for measuring on-
chip voltages in an all-digital manner is presented. The method is well suited for a distributed
architecture, where the routing of analog signals over long paths is minimized. A clock is routed
serially to the sampling heads placed at the nodes of analog test voltages. This sampling head
present at each test node, which consists of a pair of delay cells and a pair of flip-flops, locally
converts the test voltage to a skew between a pair of subsampled signals, thus giving rise to as
many subsampled signal pairs as the number of nodes. To measure a certain analog voltage, the
corresponding subsampled signal pair is fed to a delay measurement unit to measure the skew
between this pair. The concept is validated by designing a test chip in a UMC 130-nm CMOS
process. Sub-millivolt accuracy for static signals is demonstrated for a measurement time of a
ANALOG VLSI
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
few seconds, and an effective number of bits of 5.29 is demonstrated for low-bandwidth signals
in the absence of sample-and-hold circuitry.
2. Speech Processing on a Reconfigurable Analog Platform
We describe architectures for audio classification front ends on a reconfigurable analog platform.
Real-time implementation of audio processing algorithms involving discrete-time signals tend to
be power-intensive. We present an alternate continuous-time system implementation of a noise-
suppression algorithm on our reconfigurable chip, while detailing the design considerations. We
also describe a framework that enables future implementations of other speech processing
algorithms, classifier front ends, and hearing aids.
3. Analysis and Design of a Low-Voltage Low-Power Double-Tail Comparator
The need for ultra low-power, area efficient, and high speed analog-to-digital converters is
pushing toward the use of dynamic regenerative comparators to maximize speed and power
efficiency. In this paper, an analysis on the delay of the dynamic comparators will be presented
and analytical expressions are derived. From the analytical expressions, designers can obtain an
intuition about the main contributors to the comparator delay and fully explore the tradeoffs in
dynamic comparator design. Based on the presented analysis, a new dynamic comparator is
proposed, where the circuit of a conventional double-tail comparator is modified for low-power
and fast operation even in small supply voltages. Without complicating the design and by adding
few transistors, the positive feedback during the regeneration is strengthened, which results in
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
remarkably reduced delay time. Post-layout simulation results in a 0.18- μm CMOS technology
confirm the analysis results. It is shown that in the proposed dynamic comparator both the power
consumption and delay time are significantly reduced. The maximum clock frequency of the
proposed comparator can be increased to 2.5 and 1.1 GHz at supply voltages of 1.2 and 0.6 V,
while consuming 1.4 mW and 153 μW, respectively. The standard deviation of the input-referred
offset is 7.8 mV at 1.2 V supply.
4. Accurate and Efficient On-Chip Spectral Analysis for Built-In Testing and
Calibration Approaches
The fast Fourier transform (FFT) algorithm is widely used as a standard tool to carry out spectral
analysis because of its computational efficiency. However, the presence of multiple tones
frequently requires a fine frequency resolution to achieve sufficient accuracy, which imposes the
use of a large number of FFT points that results in large area and power overheads. In this paper,
an FFT method is proposed for on-chip spectral analysis of multi-tone signals with particular
harmonic and intermodulation components. This accurate FFT analysis approach is based on
coherent sampling, but it requires a significantly smaller number of points to make the FFT
realization more suitable for on-chip built-in testing and calibration applications that require area
and power efficiency. The technique was assessed by comparing the simulation results from the
proposed method of single and multiple tones with the simulation results obtained from the FFT
of coherently sampled tones. The results indicate that the proper selection of test tone
frequencies can avoid spectral leakage even with multiple narrowly spaced tones. When low-
frequency signals are captured with an analog-to-digital converter (ADC) for on-chip analysis,
the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth
limitations. Post-layout simulations of a 16-point FFT showed that third-order intermodulation
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
(IM3) testing with two tones can be performed with 1.5-dB accuracy for IM3 levels of up to 50
dB below the fundamental tones that are quantized with a 10-bit resolution. In a 45-nm CMOS
technology, the layout area of the 16-point FFT for on-chip built-in testing is 0.073 mm2, and its
estimated power consumption is 6.47 mW.
1. On-Chip Measurement of Rise/Fall Gate Delay Using Reconfigurable Ring
Oscillator
In this brief, a new technique to measure the on-chip rise/fall delay of an individual gate is
presented. In the proposed technique, the rise/fall gate delay is measured using the duty cycle of
a reconfigurable ring oscillator (RRO). A set of linear equations is formed with the different
configuration settings of the RRO, relating the rise/fall delay of all the gates in the path of the
RRO to the positive/negative duty cycle of the undivided RRO. The high-frequency undivided
RRO signal is needed for this type of measurement as it preserves the rise/fall delay of an
individual gate. However, it is difficult to bring the high-frequency undivided RRO signal
outside the chip due to the frequency limitation of the output pad. The high-frequency RRO
signal is subsampled by a clock that is generated from an on-chip phase-locked loop to make it
low frequency. The rise and fall delays of an individual gate can be calculated from the
difference of the duty cycle of the subsampled RRO signal at two different configurations of the
RRO. The proposed concept is validated in a test chip that is fabricated in an industrial 65-nm
technology node.
2. Smart: Single-Cycle Multihop Traversals over a Shared Network on Chip
ANALOG VLSI
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
As the number of on-chip cores increases, scalable on-chip topologies such as meshes inevitably
add multiple hops to each network traversal. The best practice today is to design one-cycle
routers, such that the low-load network latency between a source and destination is equal to the
number of routers and links (that is, twice the hops) between them. Designers of operating
systems, compilers, and cache coherence protocols often try to limit communication to within a
few hops because on-chiplatency is critical for their scalability. In this article, the authors
propose an on-chip network called Smart (Single-cycle Multihop Asynchronous Repeated
Traversal) that aims to present a single-cycle datapath all the way from the source to the
destination. They do not add any additional fast physical express links in the datapath; instead,
they drive the shared crossbars and links asynchronously up to multiple hops within a single
cycle. They designed a router and link microarchitecture to achieve such a traversal, and a flow-
control technique to arbitrate and set up multihop paths within a cycle. A place-and-route design
at 45 nm achieves 11 hops within a 1-GHz cycle for paths without turns (9 hops for paths with
turns). The authors observe 5 to 8 times reduction in low-load latencies across synthetic traffic
patterns on an 8×8 chip multiprocessor, compared to a baseline one-cycle router
network. Full-system simulations with Splash-2 and Parsec benchmarks demonstrate 27 and 52
percent reduction in runtime for private and shared level-2 designs, respectively.
3. Accurate and Efficient On-Chip Spectral Analysis for Built-In Testing and
Calibration Approaches
The fast Fourier transform (FFT) algorithm is widely used as a standard tool to carry out spectral
analysis because of its computational efficiency. However, the presence of multiple tones
frequently requires a fine frequency resolution to achieve sufficient accuracy, which imposes the
use of a large number of FFT points that results in large area and power overheads. In this paper,
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
an FFT method is proposed for on-chip spectral analysis of multi-tone signals with particular
harmonic and intermodulation components. This accurate FFT analysis approach is based on
coherent sampling, but it requires a significantly smaller number of points to make the FFT
realization more suitable for on-chipbuilt-in testing and calibration applications that require area
and power efficiency. The technique was assessed by comparing the simulation results from the
proposed method of single and multiple tones with the simulation results obtained from the FFT
of coherently sampled tones. The results indicate that the proper selection of test tone
frequencies can avoid spectral leakage even with multiple narrowly spaced tones. When low-
frequency signals are captured with an analog-to-digital converter (ADC) for on-chip analysis,
the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth
limitations. Post-layout simulations of a 16-point FFT showed that third-order intermodulation
(IM3) testing with two tones can be performed with 1.5-dB accuracy for IM3 levels of up to 50
dB below the fundamental tones that are quantized with a 10-bit resolution. In a 45-nm CMOS
technology, the layout area of the 16-point FFT for on-chip built-in testing is 0.073 mm2, and its
estimated power consumption is 6.47 mW.
4. Methodology for adapting on-chip interconnect architectures
Network-on-chip (NoC) has been proposed to solve the scalability problem experienced in bus-
based system-on-chip. The main challenge is the ability to predict the quality of service that the
network infrastructure provides while meeting other system constraints, namely power and area.
Although these architectures are regular with predictable electrical parameters, they may suffer
from higher latency and lower throughput. To tackle this issue, the network structure needs to be
adaptable in response to the needs of the application. This paper presents a methodology for
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
augmenting an NoC with a programmable infrastructure that allows application-specific
adaptation. Based on the developed infrastructure, an algorithm is also presented for static
adaptation based on application traffic patterns. To evaluate the proposed methodology of the
adaptable NoC, the WK-recursive on-chip interconnect is used as a case study. Simulations are
conducted and reported results demonstrate the usefulness of the proposed approach.
5. Energy Efficiency Optimization Through Codesign of the Transmitter and Receiver
in High-Speed On-Chip Interconnects
A novel equalized global link architecture and driver-receiver codesign flow are proposed for
high-speed and low-energy on-chip communication by utilizing a continuous-time linear
equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the
formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver-receiver design flow, over 50% energy reduction is observed.
The final optimal solution achieves 20-Gb/s signaling over 10 mm, 2.6- μm pitch on-
chip transmission line with 15.5-ps/mm latency and 0.196-pJ/b energy using 45-nm technology.
Monte Carlo simulation also shows that 3 σ/μ for power and delay variation in the proposed
global link are 13.1% and 4.6%, respectively.
6. Fault-Tolerant Network Interfaces for Networks-on-Chip
As the complexity of designs increases and technology scales down into the deep-submicron
domain, the probability of malfunctions and failures in the networks-on-chip (NoCs) components
increases. In this work, we focus on the study and evaluation of techniques for increasing
reliability and resilience of network interfaces (NIs) within NoC-based multiprocessor system-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
on-chip architectures. NIs act as interfaces between intellectual property cores and the
communication infrastructure; the faulty behavior of one of them could affect, therefore, the
overall system. In this work, we propose a functional fault model for the NI components by
evaluating their susceptibility to faults. We present a two-level fault-tolerant solution that can be
employed for mitigating the effects of both permanent and temporary faults in the NI.
Experimental simulations show that with a limited overhead, we can obtain an NI reliability
comparable to the one obtainable by implementing the system by using standard triple modular
redundancy techniques, while saving up to 48 percent in area, as well as obtaining a significant
energy reduction.
7. On Deadlock Problem of On-Chip Buses Supporting Out-of-Order Transactions
Modern on-chip communication protocols such as advanced eXtensible interface and open core
protocol support advanced transactions to improve communication efficiency. Out-of-order
transactions that allow responses to be returned in an order different from their request order play
an important role in this improvement. However, a deadlock situation may occur if these
transactions are not properly manipulated. In this paper, we address the deadlock problem in an
on-chip bus system supporting out-of-order transactions. We present a graphic model that can
well represent the status of a bus system and show that a cycle exists in the graph if and only if
the bus system is in an unsafe state that may lead to a bus deadlock. Based on this model, we
propose a novel bus design technique that can efficiently resolve the bus deadlock problem.
Experimental results show that buses with the proposed technique can be up to 3.3 times faster
than those with the currently available techniques.
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
8. Built-In Binary Code Inversion Technique for On-Chip Flash Memory Sense
Amplifier With Reduced Read Current Consumption
The bit-line sense amplifier (S/A) for on-chip flash memory compares cell current with reference
current to identify data that are programmed. The S/A for 0 (erased) cell data consumes a large
sink current, which is greater than off-current for 1 (programmed) cell data. This brief proposes a
built-in write/read path based on binary inversion methods to reduce the sensing current of S/A.
An original binary code is programmed into flash memory with an inverted binary code based on
the proposed bit inversion techniques. The de-inversion hardware, which is implemented with
small logic gates to restore original binary data, only consumes logic current instead of analog
sink current in the S/A. The proposed techniques are evaluated for the DSPStone benchmark and
are applied to the modified S/A for ARM Cortex-M3-based microcontroller with 128-kB on-
chip flash memory based on a 0.18-um EEPROM technology. The circuit-level simulation result
for the DSPStone benchmark shows that a newly implemented chip with the S/A based on the
proposed technique consumes approximately less than 22% of the operating power that
conventional S/A uses.
9. Data Encoding Techniques for Reducing Energy Consumption in Network-on-Chip
As technology shrinks, the power dissipated by the links of a network-on-chip (NoC) starts to
compete with the power dissipated by the other elements of the communication subsystem,
namely, the routers and the network interfaces (NIs). In this paper, we present a set of data
encoding schemes aimed at reducing the power dissipated by the links of an NoC. The proposed
schemes are general and transparent with respect to the underlying NoC fabric (i.e., their
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
application does not require any modification of the routers and link architecture). Experiments
carried out on both synthetic and real traffic scenarios show the effectiveness of the proposed
schemes, which allow to save up to 51% of power dissipation and 14% of energy consumption
without any significant performance degradation and with less than 15% area overhead in the NI.
10. Path-Congestion-Aware Adaptive Routing With a Contention Prediction Scheme
for Network-on-Chip Systems
Network-on-chip systems can achieve higher performance than bus systems
for chip multiprocessor systems. However, as the complexity of the network increases, the
channel and switch congestion problems become major performance bottlenecks. An effective
adaptive routing algorithm can help minimize path congestion through load balancing. However,
conventional adaptive routing schemes only use channel-based information to detect the
congestion status. Due to the lack of switch-based information, channel-based information is
difficult to reveal the real congestion status along the routing path. Therefore, in this paper, we
remodel the path congestion information to show hidden spatial congestion information and
improve the effectiveness of routing path selection. We propose a path-congestion-aware
adaptive routing (PCAR) scheme based on the following techniques: 1) a path-congestion-aware
selection strategy that simultaneously considers switch congestion and channel congestion, and
2) a contention prediction technique that uses the rate of change in the buffer level to predict
possible switch contention. The experimental results show that the proposed PCAR scheme can
achieve a high saturation throughput with an improvement of 15.4%-48.7% compared to existing
routing schemes. The proposed PCAR method also includes a VLSI architecture, which has
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
higher area efficiency with an improvement of 16%-35.7% compared with the other router
designs.
11. On-Chip Codeword Generation to Cope With Crosstalk
Capacitive and inductive coupling between bus lines results in crosstalk induced delays. Many
bus encoding techniques have been proposed to improve the performance. Existing
implementation techniques and mapping algorithms in the literature only apply the specific
encoding. This paper presents the first generalized framework for a stall-free on-chip codeword
generation strategy that is scalable and easy to automate. It is applicable to the coupling aware
encoding techniques that allow recursive codeword generation. The proposed implementation
strategy iteratively generates codewords without explicitly enumerating them. Codeword
mapping relies on graph-based representation that is unique to the given encoding technique. The
codewords are calculated on-chip using basic function blocks, such as adders and multiplexers.
Three encoding techniques were implemented using the proposed strategy. Experimental results
show significant reduction in the area overhead and power dissipation over the existing method
that uses random logic to implement the codec.
12. Low-Overhead Network-on-Chip Support for Location-Oblivious Task Placement
Many-core processors will have many processing cores with a network-on-chip (NoC) that
provides access to shared resources such as main memory and on-chip caches. However, locally-
fair arbitration in multi-stage NoC can lead to globally unfair access to shared resources and
impact system-level performance depending on where each task is physically placed. In this
work, we propose an arbitration to provide equality-of-service (EoS) in the network and provide
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
support for location-oblivious task placement. We propose using probabilistic arbitration
combined with distance-based weights to achieve EoS and overcome the limitation of round-
robin arbiter. However, the complexity of probabilistic arbitration results in high area and long
latency which negatively impacts performance. In order to reduce the hardware complexity, we
propose an hybrid arbiter that switches between a simple arbiter at low load and a complex
arbiter at high load. The hybrid arbiter is enabled by the observation that arbitration only impacts
the overall performance and global fairness at a high load. We evaluate our arbitration scheme
with synthetic traffic patterns and GPGPU benchmarks. Our results shows that hybrid arbiter that
combines round-robin arbiter with probabilistic distance-based arbitration reduces performance
variation as task placement is varied and also improves average IPC.
13. DPPC: Dynamic Power Partitioning and Control for Improved Chip
Multiprocessor Performance
A key challenge in chip multiprocessor (CMP) design is to optimize the performance within a
power budget limited by the CMP’s cooling, packaging, and power supply capacities. Most
existing solutions rely solely on dynamic voltage and frequency scaling (DVFS) to adapt the
power consumption of CPU cores, without coordinating with the last-level on-chip (e.g., L2)
cache. This paper proposes DPPC, achip-level power partitioning and control strategy that can
dynamically and explicitly partition the chip-level power budget among different CPU cores and
the shared last-level cache in a CMP based on the workload characteristics measured online.
DPPC features a novel performance-power model and an online model estimator to
quantitatively estimate the performance contributed by each core and the cache with their
respective local power budgets. DPPC then re-partitions the chip-level power budget among
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09677465689 Mail :[email protected].
Web site: www.zuaratech.com
82, Station road, Radha nagar, Chrompet, Chennai-44
Mob.No: 09677465689,
Mail id: [email protected]
them for optimized CMP performance. The partitioned local power budgets for the CPU cores
and cache are precisely enforced by power control algorithms designed rigorously based on
feedback control theory. Our extensive experimental results demonstrate that DPPC achieves
better CMP performance, within a given power budget, than several state-of-the-art power
control solutions for both SPEC CPU2006 benchmarks and multi-threaded SPLASH-2
workloads.
14. Crosstalk-Aware Multiple Error Detection Scheme Based on Two-Dimensional
Parities for Energy Efficient Network on Chip
Achieving reliable operation under the influence of deep-submicrometer noise sources including
crosstalk noise at low voltage operation is a major challenge for network on chip links. In this
paper, we propose a coding scheme that simultaneously addresses crosstalk effects on signal
delay and detects up to seven random errors through wire duplication and simple parity checks
calculated over the rows and columns of the two-dimensional data. This high error detection
capability enables the reduction of operating voltage on the wire leading to energy saving. The
results show that the proposed scheme reduces the energy consumption up to 53% as compared
to other schemes at iso-reliability performance despite the increase in the overhead number of
wires. In addition, it has small penalty on the network performance, represented by the average
latency and comparable codec area overhead to other schemes.