distributed signal processing algorithms for wireless networksdelamare.cetuc.puc-rio.br/songcen xu...

Distributed Signal Processing Algorithms forWireless Networks

This thesis is submitted in partial fulfilment of the requirements for

Doctor of Philosophy (Ph.D.)

Songcen Xu

Communications and Signal Processing Research Group

Electronics

University of York

May 2015

Abstract

Distributed signal processing algorithms have become a key approach for statistical in-

ference in wireless networks and applications such as wireless sensor networks and smart

grids. It is well known that distributed processing techniques deal with the extraction of

information from data collected at nodes that are distributed over a geographic area. In

this context, for each specific node, a set of neighbor nodes collect their local information

and transmit the estimates to a specific node. Then, each specific node combines the col-

lected information together with its local estimate to generate an improved estimate. In

this thesis, novel distributed cooperative algorithms for inference in ad hoc, wireless sen-

sor networks and smart grids are investigated. Low-complexity and effective algorithms

to perform statistical inference in a distributed way are devised. A number of innovative

approaches for dealing with node failures, compression of data and exchange of informa-

tion are proposed.

Firstly, distributed adaptive algorithms based on the conjugate gradient (CG) method

for distributed networks are presented. Both incremental and diffusion adaptive solutions

are considered. The distributed conventional CG (CCG) and modified CG (MCG) algo-

rithms provide an improved performance in terms of mean square error (MSE) as com-

pared with least–mean–square (LMS)–based algorithms and a performance that is close

to recursive least–squares (RLS) algorithms. The resulting algorithms are distributed, co-

operative and able to respond in real time to changes in the environment. Applications to

parameter and spectrum estimation illustrate the performance of the proposed algorithms.

Secondly, adaptive link selection algorithms for distributed estimation and their appli-

cation to wireless sensor networks and smart grids are proposed. In particular, exhaustive

search-based LMS and RLS link selection algorithms and sparsity-inspired LMS/RLS

link selection algorithms that can exploit the topology of networks with poor-quality links

are considered. The proposed link selection algorithms are then analyzed in terms of their

stability, steady-state and tracking performance, and computational complexity. In com-

parison with existing centralized or distributed estimation strategies, key features of the

proposed algorithms are: 1) more accurate estimates and faster convergence speed can

be obtained; and 2) the network is equipped with the ability of link selection that can

circumvent link failures and improve the estimation performance. The performance of

the proposed algorithms for distributed estimation is illustrated via simulations in appli-

cations of parameter estimation with wireless sensor networks and smart grids.

Thirdly, a novel distributed compressed estimation scheme is introduced for sparse

signals and systems based on compressive sensing techniques. The proposed scheme

consists of compression and decompression modules inspired by compressive sensing to

perform distributed compressed estimation. A design procedure is also presented and

an algorithm is developed to optimize measurement matrices, which can further improve

the performance of the proposed distributed compressed estimation scheme. Simulations

for parameter estimation with a wireless sensor network illustrate the advantages of the

proposed scheme and algorithm in terms of convergence rate and mean square error per-

formance.

Lastly, a novel distributed reduced-rank scheme and adaptive algorithms are proposed

for distributed estimation in wireless sensor networks and smart grids. The proposed

distributed scheme is based on a transformation that performs dimensionality reduction at

each agent of the network followed by a reduced–dimension parameter vector. Distributed

reduced–rank joint iterative estimation algorithms are developed, which have the ability to

achieve significantly reduced communication overhead and improved performance when

compared with existing techniques. Simulation results illustrate the advantages of the

proposed strategy in terms of convergence rate and mean square error performance.

Contents

List of Figures viii

List of Tables xii

Acknowledgements xiv

Declaration xv

Glossary xvi

1 Introduction 1

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

S. Xu, Ph.D. Thesis, Department of Electronics, University of York

i

2015

2 Literature Review 9

2.1 Distributed Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Distributed Wireless Networks . . . . . . . . . . . . . . . . . . . 10

2.1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2.1 Distributed Estimation . . . . . . . . . . . . . . . . . . 11

2.1.2.2 Spectrum Estimation . . . . . . . . . . . . . . . . . . 12

2.1.2.3 Smart Grids . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Protocols for Cooperation and Exchange of Information . . . . . . . . . . 16

2.2.1 Incremental Strategy . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Diffusion Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 Consensus Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Adaptive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.1 The Least Mean Square (LMS) Algorithm . . . . . . . . . . . . . 22

2.3.2 The Recursive Least Squares (RLS) Algorithm . . . . . . . . . . 23

2.3.3 The Conjugate Gradient (CG) Algorithm . . . . . . . . . . . . . 25

2.4 Compressive Sensing Techniques . . . . . . . . . . . . . . . . . . . . . . 26

2.4.1 Goal of Compressive Sensing . . . . . . . . . . . . . . . . . . . 27

2.4.2 Measurement Matrix . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4.3 Reconstruction Algorithms . . . . . . . . . . . . . . . . . . . . . 29

2.5 Sparsity–Aware Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5.1 The Zero–Attracting Strategy . . . . . . . . . . . . . . . . . . . 32

2.5.2 The Reweighted Zero–Attracting Strategy . . . . . . . . . . . . . 33

2.5.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Distributed Conjugate Gradient Strategies for Distributed Estimation Over

Sensor Networks 37

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.1 Distributed Parameter Estimation . . . . . . . . . . . . . . . . . 40

3.2.2 Distributed Spectrum Estimation . . . . . . . . . . . . . . . . . . 41

3.3 Proposed Incremental Distributed CG–Based Algorithms . . . . . . . . . 42

3.3.1 Incremental Distributed CG–Based Solutions . . . . . . . . . . . 42

3.3.1.1 Proposed IDCCG Algorithm . . . . . . . . . . . . . . 43

3.3.1.2 Proposed IDMCG Algorithm . . . . . . . . . . . . . . 45

3.3.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . 46

3.4 Proposed Diffusion Distributed CG–Based Algorithms . . . . . . . . . . 47

3.4.1 Diffusion Distributed CG–Based Algorithms . . . . . . . . . . . 47

3.4.1.1 Proposed DDCCG Algorithm . . . . . . . . . . . . . . 48

3.4.1.2 Proposed DDMCG Algorithm . . . . . . . . . . . . . . 48


3.5 Preconditioner Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.6.1 Distributed Estimation in Wireless Sensor Networks . . . . . . . 53

3.6.1.1 Performance of Proposed Incremental Distributed CG–

Based Algorithms . . . . . . . . . . . . . . . . . . . . 53

3.6.1.2 Performance of Proposed Diffusion Distributed CG–

Based Algorithms . . . . . . . . . . . . . . . . . . . . 55

3.6.2 Distributed Spectrum Estimation . . . . . . . . . . . . . . . . . . 55

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Adaptive Link Selection Algorithms for Distributed Diffusion Estimation 60

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.1 Prior and Related Work . . . . . . . . . . . . . . . . . . . . . . . 61

4.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 System Model and Problem Statement . . . . . . . . . . . . . . . . . . . 63

4.3 Proposed Adaptive Link Selection Algorithms . . . . . . . . . . . . . . . 65

4.3.1 Exhaustive Search–Based LMS/RLS Link Selection . . . . . . . 66

4.3.2 Sparsity–Inspired LMS/RLS Link Selection . . . . . . . . . . . 68

4.4 Analysis of the proposed algorithms . . . . . . . . . . . . . . . . . . . . 73

4.4.1 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.4.2 MSE Steady–State Analysis . . . . . . . . . . . . . . . . . . . . 78

4.4.2.1 ES–LMS . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.4.2.2 SI–LMS . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4.2.3 ES–RLS . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4.2.4 SI–RLS . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.4.3 Tracking Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.4.3.1 ES–LMS . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.4.3.2 SI–LMS . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.3.3 ES–RLS . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.3.4 SI–RLS . . . . . . . . . . . . . . . . . . . . . . . . . 90


4.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.5.1 Diffusion Wireless Sensor Networks . . . . . . . . . . . . . . . . 92

4.5.2 MSE Analytical Results . . . . . . . . . . . . . . . . . . . . . . 96

4.5.3 Smart Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Distributed Compressed Estimation Based on Compressive Sensing 102

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.2 System Model and Problem Statement . . . . . . . . . . . . . . . . . . . 104

5.3 Proposed Distributed Compressed Estimation Scheme . . . . . . . . . . . 105

5.4 Measurement Matrix Optimization . . . . . . . . . . . . . . . . . . . . . 109

5.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6 Distributed Reduced–Rank Estimation Based on Joint Iterative Optimization

in Sensor Networks 116

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.3 Distributed Dimensionality Reduction and Adaptive Processing . . . . . . 120

6.4 Proposed Distributed Reduced-Rank Algorithms . . . . . . . . . . . . . 121

6.4.1 Proposed DRJIO–NLMS algorithm . . . . . . . . . . . . . . . . 122

6.4.2 Proposed DRJIO–RLS algorithm . . . . . . . . . . . . . . . . . 124

6.4.3 Computational Complexity Analysis . . . . . . . . . . . . . . . . 126

6.4.4 Sufficient Conditions for Convergence . . . . . . . . . . . . . . . 129

6.4.5 Convergence to the Optimal Reduced-Rank Estimator . . . . . . 131

6.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.5.1 Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . 134

6.5.2 Smart Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7 Conclusions and Future Work 142

7.1 Summary of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

List of Figures

1.1 Distributed wireless network sample . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Network topology with N nodes . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 IEEE 14–bus system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Incremental Distributed Estimation . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Diffusion Distributed Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Consensus Distributed Estimation . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Adaptive Filter Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.7 Searching Direction of the CG Algorithm . . . . . . . . . . . . . . . . . . . . . 26

2.8 System identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.9 MSD comparison for scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . 35



3.1 Incremental distributed CG–based network processing . . . . . . . . . . . . . . . 43


viii

2015

3.2 Diffusion Distributed CG–Based Network Processing . . . . . . . . . . . . . . . . 48

3.3 MSD performance comparison for the incremental distributed strategies . . . . . . . 54

3.4 MSE performance comparison for the incremental distributed strategies . . . . . . . . 54

3.5 Network structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6 MSD performance comparison for the diffusion distributed strategies . . . . . . . . . 56

3.7 MSE performance comparison for the diffusion distributed strategies . . . . . . . . . 57

3.8 Performance comparison for the distributed spectrum estimation . . . . . . . . . . . 58

3.9 Example of distributed spectrum estimation . . . . . . . . . . . . . . . . . . . . 58

4.1 Network topology with N nodes . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Sparsity aware signal processing strategies . . . . . . . . . . . . . . . . . . . . 70

4.3 Covariance matricesA3 for different sets of Ω3 . . . . . . . . . . . . . . . . . . 81

4.4 Covariance matrixA3 upon convergence . . . . . . . . . . . . . . . . . . . . . 84

4.5 Diffusion wireless sensor networks topology with 20 nodes . . . . . . . . . . . . . 92

4.6 Network MSE curves in a static scenario . . . . . . . . . . . . . . . . . . . . . 94

4.7 Network MSE curves in a time-varying scenario . . . . . . . . . . . . . . . . . . 94

4.8 Set of selected links for node 16 with ES–LMS in a static scenario . . . . . . . . . . 95

4.9 Link selection state for node 16 with ES–LMS in a time-varying scenario . . . . . . . 95

4.10 MSE performance against SNR for ES–LMS and SI–LMS . . . . . . . . . . . . . 96

4.11 MSE performance against SNR for ES–RLS and SI–RLS . . . . . . . . . . . . . . 97

4.12 MSE performance against SNR for ES–LMS and SI–LMS in a time-varying scenario . . 98

4.13 MSE performance against SNR for ES–RLS and SI–RLS in a time-varying scenario . . 98

4.14 IEEE 14–bus system for simulation . . . . . . . . . . . . . . . . . . . . . . . . 100

4.15 MSE performance curves for smart grids. . . . . . . . . . . . . . . . . . . . . . 100

5.1 Proposed Compressive Sensing Modules . . . . . . . . . . . . . . . . . . . . . 106

5.2 Proposed DCE Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3 Proposed Compressive Sensing Module with Measurement Matrix Optimization . . . . 111

5.4 Diffusion wireless sensor network with 20 Nodes . . . . . . . . . . . . . . . . . 112

5.5 MSE performance against time . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.6 MSE performance against time with measurement matrix optimization . . . . . . . . 113

5.7 MSE performance against reduced dimension D for different levels of resolution in bits

per coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.1 Network topology with 20 nodes . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.2 Proposed dimensionality reduction scheme at each node or agent. . . . . . . . . . . 120

6.3 Complexity in terms of multiplications . . . . . . . . . . . . . . . . . . . . . . 128

6.4 Full–rank system with M=20 . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.5 Sparse system with M=20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.6 Full–rank system with M=60 . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.7 DRJIO–NLMS vs DCE scheme with sparsity level S=3 . . . . . . . . . . . . . . . 138

6.8 DRJIO–NLMS vs DCE scheme with sparsity level S=10 . . . . . . . . . . . . . . 138

6.9 Hierarchical IEEE 14–bus system . . . . . . . . . . . . . . . . . . . . . . . . 140

6.10 MSE performance for smart grids . . . . . . . . . . . . . . . . . . . . . . . . 141

List of Tables

2.1 Comparison of Main Steps for ATC and CTA strategies . . . . . . . . . . 19

2.2 Main Steps for CG algorithm . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 IDCCG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 IDMCG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Computational Complexity of Different Incremental Algorithms . . . . . 47

3.4 DDCCG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 DDMCG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Computational Complexity Of Different Diffusion Algorithms . . . . . . 51

4.1 The ES-LMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 The ES-RLS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.3 The SI-LMS and SI-RLS Algorithms . . . . . . . . . . . . . . . . . . . . 74

4.4 Coefficients αkl for different sets of Ω3 . . . . . . . . . . . . . . . . . . . 82

4.5 Computational complexity for the adaptation step per node per time instant 91


xii

2015

4.6 Computational complexity for combination step per node per time instant 91

4.7 Computational complexity per node per time instant . . . . . . . . . . . . 91

5.1 The Proposed DCE Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1 The DRJIO–NLMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . 124

6.2 The DRJIO-RLS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 127

6.3 Computational Complexity of Different Algorithms . . . . . . . . . . . . 128

Acknowledgements

I would like to show my sincere gratitude to my supervisor, Prof. Rodrigo C. de Lamare,

for his support and encouragement during my PhD study.

I am very grateful to my thesis advisor, Dr. Yuriy Zakharov, whose insightful discus-

sions and suggestions have benefited me.

Special thanks are also extended to Prof. Vincent Poor from Princeton University for

the collaboration which resulted in several papers.

I would also like to thank Dr. Peng Tong, Dr. Yi Wang, and Dr. Dong Fang and other

colleagues in the Communications and Signal Processing Research Group.

This thesis is dedicated to my parents and my love Jiaqi Gu.


xiv

2015

Declaration

Some of the research presented in this thesis has resulted in some publications. These

publications are listed at the end of Chapter 1.

All work presented in this thesis as original is so, to the best knowledge of the author.

References and acknowledgements to other researchers have been given as appropriate.


xv

2015

Glossary

AP Affine Projection

ATC Adapte Then Combin

CCG Conventional Conjugate Gradient

CG Conjugate Ggradient

CoSaMP Compressive Sampling Matching Pursuit

CS Compressive Sensing

CTA Combine Then Adapt

DCE Distributed Compressed Estimation

DCT Discrete Cosine Transform

DDCCG Diffusion Distributed Conventional Conjugate Ggradient

DDMCG Diffusion Distributed Modified Conjugate Ggradient

DFT Discrete Fourier Transform

dNLMS distributed Normalized Least Mean Square

DRJIO–NLMS Distributed Reduced-rank Joint Iterative Optimization–Normalized Least

Mean Square

DRJIO–RLS Distributed Reduced-rank Joint Iterative Optimization–Recursive Least

Square

ES–LMS Exhaustive Search–based Least Mean Square

EMSE Excess Mean Square Error

ES–RLS Exhaustive Search–based Recursive Least Square

FIR Finite Impulse Response

IDCCG Incremental Distributed Conventional Conjugate Ggradient

IDMCG Incremental Distributed Modified Conjugate Ggradient

iid independent and identically distributed

KLT Kahunen–Loeve Transform

LMS Least Mean Square

MCG Modified Conjugate Ggradient

MIMO Multi Input Multi Output

MMSE Minimum Mean Square Error

MSE Mean Square Error

MSD Mean Square Deviation

OMP Orthogonal Matching Pursuit

PSD Power Spectral Density

RIP Restricted Isometry Property

RLS Recursive Least Square

RZA–LMS Reweighted Zero Attracting–Least Mean Square

SI–LMS Sparsity–Inspired Least Mean Square

SI–RLS Sparsity–Inspired Recursive Least Square

StOMP Stagewise Orthogonal Matching Pursuit

WSNs Wireless Sensor Networks

ZA–LMS Zero Attracting–Least Mean Square

Chapter 1

Introduction

Contents1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Overview

Distributed signal processing algorithms have become of paramount importance for sta-

tistical inference in wireless networks and applications such as wireless sensor networks,

spectrum estimation and smart grids [1–4]. It is well known that distributed processing

techniques deal with the extraction of information from data collected at nodes that are

distributed over a geographic area [1]. In this context, for each specific node, a set of

neighbor nodes collect their local information and transmit their estimates to a specific

node. Then, each specific node combines the collected information together with its local

estimate to generate an improved estimate. When compared with the centralized solu-

tion, the distributed solution has significant advantages. The centralized solution needs a


1

2015

CHAPTER 1. INTRODUCTION 2

central processor, where each node sends its information to the central processor and gets

the information back after the processor completes the task. This type of communication

needs the central processor to be powerful and reliable enough. With distributed solutions,

each node only requires the local information and its neighbors to process the informa-

tion. This approach for processing information can significantly reduce the amount of

processing and the communications bandwidth requirements. In Fig. 1.1, the idea of

distributed signal processing is illustrated, where each bullet stands for a sensor and the

rectangle corresponds to the target to estimate.

Figure 1.1: Distributed wireless network sample

There are three main protocols for cooperation and exchange of information for dis-

tributed processing, incremental, diffusion and consensus strategies, and recent studies

indicate that the diffusion strategy is the most effective one [5]. Details of each strategy

will be introduced and discussed in Chapter 2. For distributed diffusion processing, many

challenges still exist. Firstly, the neighbors for each node are fixed and the combining

coefficients are calculated after the network topology is deployed and starts its operation.

One disadvantage of this approach is that the estimation procedure may be affected by

poorly performing links. Moreover, when the number of neighbor nodes is large, each

node requires a large bandwidth and transmit power. Secondly, in many scenarios, when

the unknown parameter vector to be estimated has a large dimension, the network re-

quires a large communication bandwidth between neighbor nodes to transmit their local

estimate. This problem limits the application of existing algorithms in applications with

large data sets as the convergence speed is dependent on the length of the parameter vec-

S. Xu, Ph.D. Thesis, Department of Electronics, University of York 2015


tor. Thirdly, in some scenarios, the unknown parameter vector to be estimated can be

sparse and contain only a few nonzero coefficients. However, when the full dimension

of the observed data is taken into account, the challenge lies in the increased computa-

tional cost, the slowed down convergence rate and the degraded mean square error (MSE)

performance.

1.2 Motivation

In this thesis, a number of innovative distributed cooperative strategies for dealing with

exchange of information, node failures and compression of data are considered for wire-

less sensor networks, spectrum estimation and smart grids, which require low complexity

and are highly effective to perform statistical inference about the environment in a dis-

tributed way. Firstly, the conjugate gradient (CG) method [6–8] is considered to design

distributed adaptive algorithms for distributed networks. In particular, both incremental

and diffusion adaptive solutions are proposed. The distributed conventional CG (CCG)

and modified CG (MCG) algorithms provide an improved performance in terms of MSE

as compared with least–mean–square (LMS)–based algorithms and a performance that

is close to recursive least-squares (RLS) algorithms. The proposed distributed CG algo-

rithms are examined for parameter estimation in wireless sensor networks and spectrum

estimation.

Secondly, adaptive link selection algorithms for distributed estimation and their appli-

cation to wireless sensor networks and smart grids are investigated. Specifically, based

on the LMS/RLS strategies [9, 10], exhaustive search-based LMS/RLS link selection al-

gorithms and sparsity-inspired LMS/RLS link selection algorithms that can exploit the

topology of networks with poor-quality links are considered. The proposed link selection

algorithms are then analyzed in terms of their stability, steady-state and tracking perfor-

mance, and computational complexity.

Thirdly, a distributed compressed estimation scheme is proposed for sparse signals

and systems based on compressive sensing techniques [11, 12]. Inspired by compressive

sensing, the proposed scheme consists of compression and decompression modules to



perform distributed compressed estimation. To further improve the performance of the

proposed distributed compressed estimation scheme, an algorithm is developed to opti-

mize measurement matrices. The proposed distributed compressed estimation scheme and

algorithms are assessed for parameter estimation problems in wireless sensor networks.

Fourthly, the challenge that estimating large dimension unknown parameter vectors in

wireless sensor networks requires large communication bandwidth is addressed. In par-

ticular, we develop a novel distributed reduced-rank scheme and adaptive algorithms for

performing distributed dimensionality reduction and computing low–rank approximations

of unknown parameter vectors. The proposed distributed scheme is based on a transfor-

mation that performs dimensionality reduction at each agent of the network followed by

a reduced-dimension parameter vector [13]. Distributed reduced–rank joint iterative esti-

mation algorithms based on the NLMS and RLS techniques are also developed to achieve

significantly reduced communication overhead and improved performance.

1.3 Contributions

The contributions presented in this thesis are summarised as follows:

• Based on the fact that the CG algorithm has a faster convergence rate [6] than the

LMS-type algorithms and a lower computational complexity than RLS-type tech-

niques, two CG-based incremental distributed solution and two diffusion distributed

CG-based strategy are proposed. In detail, the incremental distributed CCG solution

(IDCCG), incremental distributed MCG solution (IDMCG), diffusion distributed

CCG solution (DDCCG) and diffusion distributed MCG solution (DDMCG) are de-

veloped and analysed in terms of their computational complexity. These algorithms

can be used in defence applications, such as battlefield information identification,

movement estimation and detection and distributed spectrum estimation.

• The adaptive link selection algorithms for distributed estimation problems are pro-

posed and studied. Specifically, we develop adaptive link selection algorithms that

can exploit the knowledge of poor links by selecting a subset of data from neighbor



nodes. The first approach consists of exhaustive search–based LMS/RLS link se-

lection (ES–LMS/ES–RLS) algorithms, whereas the second technique is based on

sparsity–inspired LMS/RLS link selection (SI–LMS/SI–RLS) algorithms. The pro-

posed algorithms result in improved estimation performance in terms of the MSE

associated with the estimates. In contrast to previously reported techniques, a key

feature of the proposed algorithms is that they involve only a subset of the data

associated with the best performing links.

• The design of an approach that exploits lower dimensions and the sparsity present

in the signals, reduces the required bandwidth, and improves the convergence rate

and the MSE performance is proposed. Inspired by CS, a scheme that incorporates

compression and decompression modules into the distributed estimation procedure

is introduced and namely distributed compressed estimation (DCE) scheme. We

also present a design procedure and develop an algorithm to optimize the measure-

ment matrices, which can further improve the performance of the proposed scheme.

Specifically, we derive an adaptive stochastic gradient recursion to update the mea-

surement matrix.

• A scheme for distributed signal processing along with distributed reduced–rank al-

gorithms for parameter estimation is presented. In particular, the proposed algo-

rithms are based on an alternating optimization strategy and are called distributed

reduced-rank joint iterative optimization normalized least mean squares (DRJIO–

NLMS) algorithm and distributed reduced–rank joint iterative optimization recur-

sive least square (DRJIO–RLS) algorithm. In contrast to prior work on reduced–

rank techniques and distributed methods, the proposed reduced–rank strategies are

distributed and performs dimensionality reduction without costly decompositions

at each agent. The proposed DRJIO–NLMS and DRJIO–RLS algorithms are flexi-

ble with regards to the amount of information that is exchanged, have low cost and

high performance.

1.4 Thesis Outline

The structure of the thesis is as follows:



• Chapter 2 presents an overview of the theory relevant to this thesis and introduces

the system models that are used to present the work in this thesis. The topics of

distributed signal processing, incremental and diffusion strategies, adaptive signal

processing, compressive sensing and sparsity–aware techniques are covered with

an outline of previous work in these fields and important applications .

• Chapter 3 presents the design of distributed adaptive algorithms for distributed net-

works based on CG strategies. Both incremental and diffusion adaptive solutions

are proposed, alongside a computational complexity analysis and the application to

distributed estimation and spectrum estimation.

• In Chapter 4, adaptive link selection algorithms for distributed estimation and their

application to wireless sensor networks and smart grids are proposed. The analysis

of the proposed algorithms are presented in terms of their stability, steady-state and

tracking performance, and computational complexity.

• Chapter 5 presents a novel distributed compressed estimation scheme for sparse sig-

nals and systems based on compressive sensing techniques. The compression and

decompression modules inspired by compressive sensing are introduced to perform

distributed compressed estimation. A design procedure to optimize measurement

matrices is presented as well.

• Chapter 6 presents a novel distributed reduced-rank scheme and adaptive algorithms

for distributed estimation in wireless sensor networks and smart grids. A distributed

reduced–rank joint iterative estimation algorithm is developed and compared with

existing techniques.

• Chapter 7 presents the conclusions of this thesis, and suggests directions in which

further research could be carried out.

1.5 Notation

a the vector (boldface lower case letters)

A the matrix (boldface upper case letters)

ℜ real part



IN N ×N identity matrix

(·)∗ complex conjugate

(·)T matrix transpose

(·)H Hermitian transpose

E expectation operator

⊗ Kronecker product

tr(·) trace of a matrix

⟨·, ·⟩ inner product

1.6 Publication List

Journal Papers

1. S. Xu, R. C. de Lamare and H. V. Poor, “Distributed Reduced-Rank Estimation

Based on Joint Iterative Optimization”, IEEE Transactions on Signal Processing.,

under preparation, 2015.

2. S. Xu, R. C. de Lamare and H. V. Poor, “Distributed Estimation Over Sensor Net-

works Based on Distributed Conjugate Gradient Strategies”, IET Signal Process-

ing., submitted, 2015.

3. S. Xu, R. C. de Lamare and H. V. Poor, “Distributed Compressed Estimation Based

on Compressive Sensing”, IEEE Signal Processing Letters., vol. 22, no. 9, pp.

1311-1315, September. 2015.

4. S. Xu, R. C. de Lamare and H. V. Poor, “Adaptive Link Selection Algorithms for

Distributed Estimation”, IEEE Transactions on Signal Processing., submitted in the

third round of review, 2014.

Conference Papers

1. T. G. Miller, S. Xu and R. C. de Lamare, “Sparsity-Aware Distributed Conjugate

Gradient Algorithm for Spectrum Sensing“, in preparation for IEEE International

Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 2016.



2. S. Xu, R. C. de Lamare and H. V. Poor, “Distributed Reduced-Rank Estimation

Based on Joint Iterative Optimization in Sensor Networks”, European Signal. Pro-

cessing Conference, pp. 2360-2364, Lisbon, Portugal, Sept. 2014.

3. S. Xu, R. C. de Lamare and H. V. Poor, “Dynamic Topology Adaptation for Dis-

tributed Estimation in Smart Grids”, Proc. IEEE workshop on Computational Ad-

vances in Multi-Sensor Adaptive Processing, pp. 420-423, Saint Martin, Dec. 2013.

4. S. Xu, R. C. de Lamare and H. V. Poor, “Adaptive Link Selection Strategies for

Distributed Estimation in Diffusion Wireless Networks”, Proc. IEEE International

Conference on Acoustics, Speech and Signal Processing, pp. 5402-5405, Vancou-

ver, Canada, May. 2013.

5. S. Xu and R. C. de Lamare, “Distributed Conjugate Gradient Strategies for Dis-

tributed Estimation over Sensor Networks”, Proc. Sensor Signal Processing for

Defence pp. 1-5, London, UK, Sept. 2012.


Chapter 2

Literature Review

Contents2.1 Distributed Signal Processing . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Protocols for Cooperation and Exchange of Information . . . . . . . 16

2.3 Adaptive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Compressive Sensing Techniques . . . . . . . . . . . . . . . . . . . . 26

2.5 Sparsity–Aware Techniques . . . . . . . . . . . . . . . . . . . . . . . 31

In this chapter, an introduction to fundamental techniques related to the research car-

ried out during the preparation of this thesis such as distributed signal processing, pro-

tocols for cooperation and exchange of information, adaptive algorithms, compressive

sensing and sparsity–aware techniques are presented.

2.1 Distributed Signal Processing

Distributed signal processing deals with information processing over graphs and the

method of collaboration among nodes in a network, it aims to provide a superior adap-

tation performance [14]. Distributed signal processing covers strategies and results that

relate to the design and analysis of networks that are able to solve optimization and adap-

tation problems in an efficient and distributed manner. One main research aspect for


9

2015

CHAPTER 2. LITERATURE REVIEW 10

distributed processing is how to perform distributed adaptation and optimization over net-

works. Under this topic, the advantages and limitations of centralized and distributed

solutions are examined and compared.

In the centralized mode of operation, each node transmits their data to a fusion center,

which has the function of processing the data centrally. The fusion center then transmits

the results of the analysis back to the nodes. Although centralized solutions sometimes

can be powerful, they still suffer from some limitations. In real–time applications where

nodes collect data continuously, the reiterant exchange of information between the nodes

and the fusion center can be costly especially when these exchanges occur over wireless

links or limited bandwidth scenarios. In addition, centralized solutions may face a critical

failure. When a central processor fails, this solution method collapses altogether [14].

On the other hand, in the distributed mode of operation, nodes are connected by a

topology and share information only with their immediate neighbors. The continuous d-

iffusion of information across the network enables nodes to adapt their performance in

relation to data and network conditions. It also results in improved adaptation perfor-

mance relative to non–cooperative nodes [15]. Besides, recently, the distributed solution

has already been introduced for parameter estimation in wireless networks and power

networks [1–4].

2.1.1 Distributed Wireless Networks

Distributed wireless network is one context of distributed signal processing. Distributed

wireless networks linking PCs, laptops, cell phones, sensors and actuators will form the

backbone of future data communication and control networks [5]. Distributed networks

have their own characteristics, any node in the network will be connected with at least

two other nodes directly, which means the network has increased reliability. Compared

with centrally controlled networks, the advantages of distributed networks are significant.

Since they do not contain a central controller, the whole network will not crash when

the center is under attack. Moreover, in a distributed network the nodes are connected

to each other. This enables the network with the ability to choose multiple routes for

data transmission. Furthermore, it can help to reduce the negative effect of fading, noise,



interference and so on.

Distributed wireless networks deal with the extraction of information from data col-

lected at nodes that are distributed over a geographic area [1]. In distributed wireless net-

works, each node will collect information about the target in the network and exchange

this information with other nodes, which influenced by the network topology, will finally

produce an estimate of the parameters of interest. Several algorithms have already been

developed and reported in the literature for distributed wireless networks, i.e, steepest–

descent, LMS [1], affine projection (AP) [16] and RLS [17, 18].

2.1.2 Applications

In this subsection, the applications of distributed signal processing are presented. We

focus on three main applications, which are distributed estimation, spectrum estimation

and smart grids.

2.1.2.1 Distributed Estimation

In general, for distributed estimation, a set of nodes is required to collectively estimate

some parameter of interest from noisy measurements. Thus, consider a set of N nodes,

which have limited processing capabilities, distributed over a given geographical area as

depicted in Fig. 2.1. The nodes are connected and form a network, which is assumed

to be partially connected because nodes can exchange information only with neighbors

determined by the connectivity topology. We call a network with this property a partially

connected network whereas a fully connected network means that data broadcast by a

node can be captured by all other nodes in the network in one hop [19].

The aim of distributed estimation is to estimate an unknown parameter vector ω0,

which has length M . At every time instant i, each node k takes a scalar measurement

dk(i) according to

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (2.1)



k

Nk

Figure 2.1: Network topology with N nodes

where xk(i) is the M × 1 random regression input signal vector and nk(i) denotes the

Gaussian noise at each node with zero mean and variance σ2n,k. This linear model is able

to capture or approximate well many input–output relations for estimation purposes [9]

and we assume I > M . To compute an estimate of ω0 in a distributed fashion, we need

each node to minimize the MSE cost function [2]

Jωk(i)

(ωk(i)

)= E

∣∣dk(i)− ωHk (i)xk(i)

∣∣2, (2.2)

where E denotes expectation and ωk(i) is the estimated vector generated by node k at

time instant i. Then, the global network cost function could be described as

Jω(ω)=

N∑k=1

E∣∣dk(i)− ωHxk(i)

∣∣2. (2.3)

Many distributed estimation algorithms have been proposed to minimize the cost func-

tion (2.3), in the context of distributed adaptive filtering. These include incremental LM-

S [1, 5], incremental RLS [5], diffusion LMS [2, 5] and diffusion RLS [20]. Kalman

filtering and smoothing algorithms were also proposed in [21].

2.1.2.2 Spectrum Estimation

In this subsection, the application of distributed signal processing technique for spectrum

estimation is introduced. We aim to estimate the spectrum of a transmitted signal s with



N nodes using a wireless sensor network. Let Φs(f) denote the power spectral density

(PSD) of the signal s. The PSD can be represented as a linear combination of some Bbasis functions, as described by

Φs(f) =B∑

m=1

bm(f)ω0m = bT0 (f)ω0, (2.4)

where b0(f) = [b1(f), ..., bB(f)]T is the vector of basis functions evaluated at frequency

f , ω0 = [ω01, ..., ω0B] is a vector of weighting coefficients representing the power that

transmits the signal s over each basis, and B is the number of basis functions. For B suffi-

ciently large, the basis expansion in (2.4) can well approximate the transmitted spectrum.

Possible choices for the set of basis bm(f)Bm=1 include [22–24]:

• rectangular functions

• raised cosines

• Gaussian bells

• Splines

Let Hk(f, i) be the channel transfer function between a transmit node conveying the signal

s and receive node k at time instant i, the PSD of the received signal observed by node k

can be expressed as

Ik(f, i) = |Hk(f, i)|2Φs(f) + v2n,k

=B∑

m=1

|Hk(f, i)|2bm(f)ω0m + v2n,k

= bTk,i(f)ω0 + v2n,k (2.5)

where bTk,i(f) = [|Hk(f, i)|2bm(f)]Bm=1 and v2n,k is the receiver noise power at node k. For

simplification, let us assume that the link between receive node k and the transmit node is

perfect and there is no receiver noise at node k.

At every time instant i, every node k observes measurements of the noisy version of

the true PSD Ik(f, i) described by (2.5) over Nc frequency samples fj = fmin : (fmax −fmin)/Nc : fmax, for j = 1, ..., Nc, according to the model:

djk(i) = bTk,i(fj)ω0 + v2n,k + nj

k(i). (2.6)



The term njk(i) denotes observation noise and have zero mean and variance σ2

n,j . Collect-

ing measurements over Nc contiguous channels, we obtain a linear model given by

dk(i) = Bk(i)ω0 + nk(i), (2.7)

where Bk(i) = [bTk,i(fj)]Ncj=1 ∈ RNc×B, with Nc > B, and nk(i) is a zero mean random

vector with covariance matrix Rn,i. At this point, we can generate the cost function for

node k as:

Jωk(i)(ωk(i)) = E∣∣dk(i)−Bk(i)ωk(i)

∣∣2 (2.8)

and the global network cost function could be described as

Jω(ω)=

N∑k=1

E∣∣dk(i)−Bk(i)ω

∣∣2. (2.9)

In the literature, some distributed estimation algorithms have been proposed to min-

imize the cost function in (2.9), including cooperative sparse PSD estimation [22] and

sparse distributed spectrum estimation [25].

2.1.2.3 Smart Grids

The electric power industry is likely to involve many more fast information gathering and

processing devices (e.g., phasor measurement units) in the future, enabled by advanced

control, communication, and computation technologies [26]. As a result, the need for

more decentralized estimation and control in smart grid systems will experience a high

priority. State estimation is one of the key functions in control centers involving energy

management systems. A state estimator converts redundant meter readings and other

available information obtained from a supervisory control and data acquisition system into

an estimate of the state of an interconnected power system and distribution system [4].

In recent years, distributed signal processing has become a powerful tool to perform

distributed state estimation in smart grids. To discuss the application in smart grids, we

consider the IEEE 14–bus system [26], where 14 is the number of substations. At every

time instant i, each bus k, k = 1, 2, . . . , 14, takes a scalar measurement dk(i) according to

dk(i) = Xk

(ω0(i)

)+ nk(i), k = 1, 2, . . . , 14, (2.10)



where ω0(i) is the state vector of the entire interconnected system, Xk(ω0(i)) is a non-

linear measurement function of bus k. The quantity nk(i) is the measurement error with

mean equal to zero and which corresponds to bus k. Fig. 2.2 shows a standard IEEE–14

bus system with four non–overlapping control areas.

1

2

5

12

6

13

11

7

8

4

3

910

14

Figure 2.2: IEEE 14–bus system

Initially, we focus on the linearized DC state estimation problem. The system is built

with 1.0 per unit (p.u) voltage magnitudes at all buses and j1.0 p.u. branch impedance.

Then, the state vector ω0(i) is taken as the voltage phase angle vector ω0 for all buses.

Therefore, the nonlinear measurement model for state estimation (2.10) is approximated

by

dk(i) = xHk (i)ω0 + nk(i), k = 1, 2, . . . , 14, (2.11)

wherexk(i) is the measurement Jacobian vector for bus k. Then, the aim of the distributed

estimation algorithm is to compute an estimate of ω0 at each node, which can minimize

the cost function given by

Jωk(i)(ωk(i)) = E|dk(i)− xHk (i)ωk(i)|2. (2.12)

Many distributed state estimation algorithms have been proposed in the literature to

minimize the cost function (2.12), including M–CSE algorithm [4], the single link strat-

egy [27] and DESTA and DSITA algorithms [28].



2.2 Protocols for Cooperation and Exchange of Informa-

tion

As previously mentioned, there are three main cooperation strategies, incremental, dif-

fusion and consensus [5]. In this section, we introduce and analyze each protocol for

cooperation and exchange of information in detail.

2.2.1 Incremental Strategy

The incremental strategy is the simplest cooperation strategy. It works following a Hamil-

tonian cycle, the information goes through these nodes in one direction, which means each

node passes the information to its adjacent node in a uniform direction. For the incremen-

tal strategy, starting from a given network topology, at each time instant i, every node k

in the network will take a scalar measurement dk(i) according to

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (2.13)

where xk(i) is the M × 1 input signal vector, nk(i) is the noise sample at each node with

zero mean and variance σ2v,k. Node k will use the scalar measurement dk(i) and the input

signal vector xk(i), together with the local estimateψk−1(i) from its neighbor, to generate

a local estimate ψk(i) of the network through a distributed estimation strategy [1]. Then,

the local estimate ψk(i) will be pushed to the next neighbor k + 1 in one direction. The

final estimate of the network will be equal to the final node’s estimate. To illustrate the

incremental processing, based on the traditional LMS algorithm, the incremental LMS

algorithm updates the estimate at node k as [1]

ψk(i) = ψk−1(i) + µkxk(i)[dk(i)− xHk (i)ψk−1(i)]

∗. (2.14)

The resulting incremental implementation is briefly illustrated in Fig. 2.3.



Node 1d1(i),x1(i)

Node k-1

dk−1(i),xk−1(i)

Node k

dk(i),xk(i)

Node k+1dk+1(i),xk+1(i)

Node NdN(i),xN(i)

ψ1(i)

ψk−1(i)

ψk(i)

ψk+1(i)

Figure 2.3: Incremental Distributed Estimation

2.2.2 Diffusion Strategy

For the diffusion strategy, the situation is different. Instead of getting information from

one neighbor node, each node in the diffusion network will have some linked neighbors.

There are two kinds of basic distributed diffusion estimation strategies, the Adapt–then–

Combine (ATC) strategy and the Combine–then–Adapt (CTA) Strategy [17]. In the ATC

strategy, each node will use adaptive algorithms such as LMS, RLS and CG to obtain a

local estimate ψk(i). Then, each node will collect this estimate through all its neighbor

nodes and combine them together through

ωk(i+ 1) =∑l∈Nk

cklψl(i), (2.15)

where ckl is the combination coefficients and calculated through the Metropolis, the

Laplacian or the nearest neighbor rules [2], l indicates the neighbour node l linked to

node k and Nk denotes the set of neighbors of node k. The Metropolis rule can be imple-

mented as [29]

ckl =

1

max|Nk|,|Nl|, if k = l are linked

1−∑

l∈Nk/k

ckl, for k = l,(2.16)

where |Nk| denotes the cardinality of Nk. The Laplacian rule is given by [30]

C = IN − kL, (2.17)

where C is the N ×N combining coefficient matrix with entries [ckl], L = D−Ad, with

D = diag|N1|, |N2|, . . . , |NN |, k = 1/|Nmax| and Ad is the N × N network adjacent



matrix formed as

[Ad]kl =

1, if k and l are linked

0, otherwise.(2.18)

For the nearest neighbor rule, the combining coefficient ckl is defined as [31]

ckl =

1|Nk|

, if k and l are linked

0, otherwise.(2.19)

The combining coefficients ckl should satisfy∑l∈Nk∀k

ckl = 1. (2.20)

Fig. 2.4 (a) describes the ATC diffusion strategy. Based on the traditional LMS algorithm,

the diffusion LMS ATC strategy is illustrated as follows:

for each time instant i

each node k performs the update:

ψk(i) = ωk(i) + µkxk(i)[dk(i)− xHk (i)ωk(i)]

∗. (2.21)

then


cklψl(i). (2.22)

end

The CTA strategy just operates in a reverse way. Each node starts with collecting

the estimates of their neighbors in the previous time instant and combines them together

through

ψk(i) =∑l∈Nk

cklωl(i). (2.23)

After the ψk(i) is generated, each node k will employ the ψk(i) together with its dk(i)

and xk(i) to generate ωk(i + 1). Based on the traditional LMS algorithm, the diffusion

LMS CTA strategy is illustrated as follows:





ψk(i) =∑l∈Nk

cklωl(i). (2.24)

ωk(i+ 1) = ψk(i) + µkxk(i)[dk(i)− xHk (i)ψk(i)]

∗. (2.25)

end

Fig. 2.4 (b) illustrates the CTA process and the comparison of the main steps for these

two strategies are summarised in Table 2.1.

Node 1

Node k-1

Node k

Node k+1Node N

ψ1(i)

ψk−1(i)

ψk+1(i)

ψk−1(i)

ψ1(i)

ψk+1(i)

Combine and get ωk(i+ 1)

ψk(i)

Node 1

Node k-1

Node k

Node k+1

ω1(i)

ωk−1(i)

ωk+1(i)

ωk−1(i)

ω1(i)

ωk+1(i)

Combine and get ψk(i)

ωk(i)

Node N

Adapt with ψk(i) and get ωk(i+ 1)

(a) ATC Strategy (b) CTA Strategy

Figure 2.4: Diffusion Distributed Estimation

Table 2.1: Comparison of Main Steps for ATC and CTA strategies

Step ATC Strategy CTA Strategy

1 Adapt Exchange

ωl(i)

2 Exchange Combine

ψl(i)

3 Combine Adapt



2.2.3 Consensus Strategy

In the consensus strategy, each node will first collect all the previous estimate from all

its neighbors and combine them together through the Metropolis, the Laplacian or the

nearest neighbor rules to generate ψk(i). Then, each node will update its local estimate

ωk(i+1) though adaptive algorithms (LMS, RLS, etc) with the combined estimate ψk(i)

and its local estimate ωk(i).

Based on the traditional LMS algorithm, the LMS consensus strategy [14] is illustrated

as follows:



ψk(i) =∑l∈Nk

cklωl(i). (2.26)

ωk(i+ 1) = ψk(i) + µkxk(i)[dk(i)− xHk (i)ωk(i)]

∗. (2.27)

end

The resulting consensus implementation is briefly illustrated in Fig. 2.5. If we compare

the diffusion LMS CTA (2.25) and LMS consensus (2.27), the only difference is the

diffusion LMS CTA updates the estimate ωk(i + 1) only through the combined estimate

ψk(i), while the LMS consensus strategy employs ψk(i) and ωk(i) together, to update

the estimate ωk(i+ 1).

2.3 Adaptive Algorithms

Adaptive signal processing is mainly based on algorithms that have the ability to learn

by observing the environment and are guided by reference signals. An adaptive filter is

a kind of digital filter which is able to automatically adjust the performance according to

the input signal for various tasks such as estimation, prediction, smoothing and filtering.

In most applications, designers employ finite impulse response (FIR) filters due to their



Node 1

Node k-1

Node k

Node k+1

ω1(i)

ωk−1(i)

ωk+1(i)

ωk−1(i)

ω1(i)

ωk+1(i)

Combine and get ψk(i)

ωk(i)

Node N

Adapt ψk(i) with ωk(i) and get ωk(i+ 1)

Figure 2.5: Consensus Distributed Estimation

inherent stability. In contrast, a non–adaptive filter has static coefficients and these static

coefficients form the transfer function. According to the changes of the environment, the

adaptive filter will use an adaptive algorithm to adjust the parameters and its structure.

Under general circumstances, an adaptive filter does not change its structure. The most

important characteristic of the adaptive filter is that, it can work effectively in an unknown

environment and be able to track the time-varying features of the input signal [9]. With

the performance enhancements of digital signal processors, adaptive filtering applications

are becoming more common. Nowadays, they have been widely used in mobile phones

and other communication devices, digital video recorders and digital cameras, as well as

medical monitoring equipment. The basic structure of an adaptive filter is introduced in

Fig. 2.6, where x(i) is the input signal vector, y(i) and d(i) are the output signal and

desired signal, respectively, and e(i) is the error signal which is calculated by d(i)− y(i).

There are two widely used adaptive algorithms for adaptive signal processing, which

are the least mean square (LMS) algorithm and the recursive least squares (RLS) algo-

rithm [10]. The LMS algorithm is the simplest and the most basic algorithm for adaptive

filters. It has the lowest computational complexity, however, its performance is not always

satisfactory. The RLS has a better performance, but requires a high complexity. Addi-



Adaptive Filter

Adaptive Algorithm

x(i) y(i)

e(i)

d(i)

Figure 2.6: Adaptive Filter Structure

tionally, the conjugate gradient (CG) algorithm is also a well known strategy for adaptive

signal processing, as the CG algorithm has a faster convergence rate than the LMS–type

algorithms and a lower computational complexity than RLS–type techniques [6–8]. In

this section, the LMS, RLS and the CG algorithms are introduced.

2.3.1 The Least Mean Square (LMS) Algorithm

The LMS algorithm can be developed from the MSE cost function [9, 32]:

J(i) = E|d(i)− ωH(i)x(i)|2, (2.28)

where E denotes expectation, d(i) is the desired signal, x(i) is the input signal and ω(i)

is the tap weight vector. Then, the gradient vector of the cost function can be described as

∂J (i)

∂ω∗ (i)= Rxω (i)− bx, (2.29)

where Rx is the input signal’s correlation matrix and bx stands for the crosscorrelation

vector between the desired signal and the input signal. The optimum solution for the cost

function (2.28) is the Wiener solution which is given by

ω0 (i) = R−1x bx. (2.30)

Since Rx and bx are statistics of the received signal and are not given for the adaptive

algorithms, these quantities must be estimated. LMS algorithms adopt the simplest es-

timator that use instantaneous estimates for Rx and bx [9, 32], which can be expressed

mathematically as

Rx = x(i)xH(i), (2.31)



bx = d∗(i)x(i). (2.32)

By plugging (2.31) and (2.32) into (2.29), we obtain

∂J (i)

∂ω∗ (i)= −d∗(i)x(i) + x(i)xH(i)ω(i). (2.33)

As a result, the corresponding filter coefficient vector is updated by

ω(i+ 1) = ω(i)− µ∂J(i)

∂ω∗(i)

= ω(i) + µx(i)[d∗(i)− xH(i)ω(i)],

(2.34)

where µ is the step size to control the speed of the convergence.

2.3.2 The Recursive Least Squares (RLS) Algorithm

For the RLS algorithm, the cost function is redefined by the least squares error as [9, 32]

J(i) =i∑

n=0

λi−nϵ (n)2

=i∑

n=0

λi−n[d(n)− ωH(i)x(n)]2,

(2.35)

where λ is the forgetting factor and ϵ(n) is the posteriori error at time instant n. After

taking the gradient of J(i) and letting it equal to zero, we obtain

∂J (i)

∂ω∗ (i)= −2

i∑n=0

λi−nx (n)[d∗ (n)− xH (n)ω (i)

]= 0, (2.36)

and

ω(i) = [i∑

n=0

λi−nx (n)H x (n)]−1

i∑n=0

λi−nx (n) d∗ (n) . (2.37)

To simplify (2.37), we define the following quantities:

Φ(i) =i∑

n=0

λi−nx (n)xH (n) (2.38)

θ(i) =i∑

n=0

λi−nx (n) d∗ (n) . (2.39)

and (2.37) becomes

ω(i) = Φ(i)−1θ(i). (2.40)



Then, (2.38) can be rewritten as

Φ(i) = λ[i−1∑n=0

λi−1−nx (n)H x (n)] + x(i)xH(i)

= λΦ(i− 1) + x(i)xH(i).

(2.41)

For (2.39), the same derivation process can be done and we get

θ(i) = λθ(i− 1) + x(i)d∗(i). (2.42)

Then, by using the matrix inversion lemma, we have

Φ−1(i) = λ−1Φ−1(i− 1)− λ−2Φ−1(i− 1)x(i)xH(i)Φ−1(i− 1)

1 + λ−1xH(i)Φ−1(i− 1)x(i). (2.43)

Let

P (i) = Φ−1(i) (2.44)

and

k(i) =λ−1P (i− 1)x(i)

1 + λ−1xH(i)P (i− 1)x(i). (2.45)

Using (2.44) and (2.45), (2.40) and (2.43) can be respectively turned into

ω(i) = P (i)θ(i) (2.46)

and

P (i) = λ−1P (i− 1)− λ−1k(i)xH(i)P (i− 1). (2.47)

In fact, (2.45) can be further rewritten as

k(i) = [λ−1P (i− 1)− λ−1k(i)xH(i)P (i− 1)]x(i)

= P (i)x(i).(2.48)

Finally, when combining (2.42), (2.46) and (2.47) together

ω(i+ 1) = ω(i)− k(i)xH(i)ω(i) + P (i)x(i)d∗(i)

= ω(i) + k(i)[d(i)− ωH(i)x(i)]∗.(2.49)

The derivation of the RLS algorithm is now complete.



2.3.3 The Conjugate Gradient (CG) Algorithm

The conjugate gradient (CG) algorithm is well known for its faster convergence rate than

the LMS algorithm and lower computational complexity than the RLS algorithm [6–8].

In adaptive filtering techniques, the CG algorithm applied to the system Rω = b, starts

with an initial guess of the solution ω(0), with an initial residual g(0) = b, and with

an initial search direction that is equal to the initial residual: p(0) = g(0), where R is

the correlation or the covariance matrix of the input signal and b is the cross–correlation

vector between the desired signal and the input signal.

The strategy for the conjugate gradient method is that at step j, the residual g(j) =

b − Rω(j) is orthogonal to the Krylov subspace generated by b, and therefore each

residual is perpendicular to all the previous residuals. The residual is computed at each

step.

The solution at the next step is achieved using a search direction that is only a lin-

ear combination of the previous search directions, which for ω(1) is just a combination

between the previous and the current residual.

Then, the solution at step j,ω(j), could be obtained throughω(j−1) from the previous

iteration plus a step size α(j) times the last search direction. The immediate benefit of the

search directions is that there is no need to store the previous search directions. Using the

orthogonality of the residuals to these previous search directions, the search is linearly

independent of the previous directions. For the solution in the next step, a new search

direction is computed, as well as a new residual and new step size. To provide an optimal

approximate solution of ω, the step size α(j) is calculated according to [6–8].

To illustrate the CG algorithm, Fig. 2.7 shows how the CG algorithm finds the approx-

imate solution to the exact solution. The iterative formulas of the CG algorithm [6–8] are

summarized in Table 2.2.



ω(0)

p(0)

p(1)

ω(1)g(1)

ω(2)

g(2)

p(2)

p(1) = g(1) + β(1)p(0)

p(0)HRp(1) = 0

p(1)HRp(2) = 0

Figure 2.7: Searching Direction of the CG Algorithm

Table 2.2: Main Steps for CG algorithm

– Step size:α(j) = g(j−1)Hg(j−1)p(j−1)HRp(j−1)

– Approximate solution: ω(j) = ω(j − 1) + α(j)p(j − 1)

– Residual: g(j) = g(j − 1)− α(j)Rp(j − 1)

– Improvement at step i: β(j) = g(j)Hg(j)g(j−1)Hg(j−1)

– Search direction: p(j) = g(j) + β(j)p(j − 1)

2.4 Compressive Sensing Techniques

In recent years, reconstructing sparse signals from a small number of incoherent linear

measurements has attracted a growing interest. In this context, a novel theory has be

proposed in the literature as compressive sensing (CS) or compressed sampling [11, 12,

33]. CS techniques can provide a superior performance when compared with traditional

signal reconstruction techniques under suitable conditions. The rationale behinds CS is

that certain classes of sparse or compressible signals in some basis, where most of their

coefficients are zero or small and only a few are large, can be exactly or sufficiently

accurately reconstructed with high probability [33]. The measurement process projects



the signals onto a small set of vectors, which is incoherent with the sparsity basis.

For the reconstruction, one of the original breakthroughs in CS [11,34,35] was to show

that linear programming methods can be used to efficiently reconstruct the data signal with

high accuracy. Since then, many alternative methods have been proposed as a faster or

superior (terms of reconstruction rate) alternative to these linear programming algorithm-

s. One approach is to use matching pursuit techniques, which was originally proposed

in [36]. Based on the original matching pursuit algorithm, a variety of algorithms have

been proposed in the literature, such as orthogonal matching pursuit (OMP) [37], stage-

wise orthogonal matching pursuit (StOMP) [38], compressive sampling matching pursuit

(CoSaMP) [39] and gradient pursuit algorithms [40].

2.4.1 Goal of Compressive Sensing

Consider a real–valued, finite–length, one–dimensional, discrete–time signal x, which

can be viewed as an M × 1 column vector in RM with elements x[m],m = 1, 2, ...,M .

Any signal in RM can be represented in terms of a basis of M × 1 vectors uiMi=1. Let

U be an M × M orthonormal matrix where the i–th column is the i–th basis vector ui.

Then the signal x ∈ RM can be expressed as a linear combination of these basis vectors

by

x =M∑i=1

ziui (2.50)

or

x = Uz, (2.51)

where z is the M × 1 column vector of weighting coefficients zi = ⟨x,ui⟩ = uHi x and

⟨·⟩ denotes inner product. It is clear that x and z are equivalent representations of the

signal, with x in the time or space domain and z in the U domain.

The signal x is S–sparse if it is a linear combination of only S basis vectors. In other

words, only S of the zi coefficients in (2.50) or (2.51) are nonzero and M − S are zero.

When S ≪ M , the signal x is compressible if the representation (2.50) or (2.51) has just

a few large coefficients and many small or zero coefficients.



CS techniques deal with sparse signals by directly acquiring a compressed signal rep-

resentation [11]. Under this situation, we consider a general linear measurement process

that computes D < M inner products between x and a collection of vectors γjDj=1

as in yj = ⟨x, γj⟩ [33]. Then, the measurements yj are arranged in an D × 1 vector y

and the measurement vectors γHj as rows in a D × M matrix Γ. The matrix Γ is called

measurement matrix. By substituting U from (2.51), y can be written as

y = Γx = ΓUz = Θz, (2.52)

where Θ = ΓU is an D ×M matrix.

In conclusion, the goal of CS is that from D measurements where D ≪ M and D ≥ S,

the original signal x can be perfectly reconstructed, where the measurements are not

chosen in an adaptive manner. To achieve this goal, the next step for compressive sensing

is to design the measurement matrix Γ and the reconstruction algorithms.

2.4.2 Measurement Matrix

In this subsection, the design of a stable measurement matrix is discussed. The ultimate

goal is to design the matrix Γ which does not destroy any information contained in the

original signal x. However since Γ ∈ RD×M and D < M , it is not possible in general

as Equation (2.52) is under–determined, making the problem of solving for x or z ill–

conditioned. If, however, x is S–sparse and the S locations of the nonzero coefficients in

z are known, then the problem can be solved provided D ≥ S. A necessary and sufficient

condition for this simplified problem to be well conditioned is that, for any vector v

sharing the same S nonzero entries as z and for some ϵ > 0 [11, 33]

1− ϵ ≤ ||Θv||2||v||2

≤ 1 + ϵ, (2.53)

where || · ||2 denotes the ℓ2 norm. That is the matrix Θ must preserve the lengths of these

S–sparse vectors. However, it is unlikely that the positions of the non–zero elements are

known in priori, but one can show that a sufficient condition for a stable solution for both

S–sparse and compressible signals is that Θ satisfies (2.53) for an arbitrary 3S–sparse

vector v. This condition is referred to as the restricted isometry property (RIP) [34]. Apart

from the RIP condition, a related condition, referred to as incoherence, requires that the



rows γj of Γ cannot sparsely represent the columns ui of U [33]. In other words,

the design of the measurement matrix requires a high degree of incoherence between the

measurement matrix Γ and the basis matrix U .

According to [11, 34, 41], both the RIP and incoherence conditions can be achieved

with high probability simply by selecting Γ as a random matrix. The Gaussian measure-

ment matrix Γ has two useful properties:

• The measurement matrix Γ is incoherent with the basis U = I of delta spikes

with high probability. More specifically, a D × M independent and identically

distributed (iid) Gaussian matrix Θ = ΓI = Γ can be shown to have the RIP

with high probability if D ≥ cS log(M/S), with c a small constant [11, 34, 41].

Therefore, S–sparse and compressible signals of length M can be recovered from

only D ≥ cS log(M/S) ≪ M random Gaussian measurements.

• The measurement matrix Γ is universal in the sense that Θ = ΓU will be iid Gaus-

sian matrix and thus have the RIP with high probability regardless of the choice of

the orthonormal basis U .

2.4.3 Reconstruction Algorithms

In this subsection, we present an overview of existing algorithms that can be used to

reconstruct the signal x from the measured signal y. In particular, we will focus on

the Orthogonal Matching Pursuit (OMP) algorithm. In general, the signal reconstruction

algorithm must take the D measurements in the vector y, the random measurement matrix

Γ and then reconstruct the length–M signal x.

Matching pursuit is a class of iterative algorithms that decomposes a signal into a linear

expansion of functions that form a dictionary. Matching pursuit was first introduced by

Mallat in [36]. At each iteration of the algorithm, matching pursuit chooses dictionary

elements in a greedy fashion that best approximates the signal.

Orthogonal matching pursuit is an improved reconstruction algorithm based on match-

ing pursuit. The principle behind OMP is similar to matching pursuit, at every iteration



an element is chosen from the dictionary that best approximates the residual. However,

instead of simply taking the scalar product of the residual and the new dictionary element,

to calculate the coefficient weight, the original function will be fitted to all the already s-

elected dictionary elements via least squares or projecting the function orthogonally onto

all selected dictionary atoms [36, 42].

In the following, we describe the OMP algorithm in detail.

Input:

• A measurement matrix Γ ∈ RD×M .

• Observation vector y ∈ RD.

• The sparsity level S of the ideal signal x ∈ RM .

Output:

• An estimate x ∈ RM of the ideal signal x.

• A set ΛS containing the positions of the non–zero elements of x.

• An approximation aS of the measurements y.

• The residual r = y − aS .

Procedure:

1) Initialize the residual r0 = y, the index set ΛS = ∅, and the iteration counter i = 1.

2) Find the index λi that solves the optimization problem

λi = arg maxj=1,...,M

|⟨ri−1,γj⟩|, (2.54)

where γj is the column of Γ. If the maximum occurs for multiple indices, break the

tie deterministically.

3) Augment the index set and the matrix of chosen atoms: Λi = Λi−1 ∪ λi and

Γi = [Γi−1 γλi]. We define Γ0 as an empty matrix.



4) Solve a least squares problem to obtain a new signal estimate:

xi = argminx

||y − Γix||2. (2.55)

5) Calculate the new approximation of the data and the new residual

ai = Γixi (2.56)

ri = y − ai. (2.57)

6) Increment i, and return to Step 2 if i < S.

7) The estimate x for the ideal signal has nonzero indices at the components listed in

ΛS . The value of the estimate x in component λi equals the ith component of xi.

2.5 Sparsity–Aware Techniques

Sparsity–aware strategy is another technique that deals with the sparse signals. In many

situations, the signal of interest is sparse, containing only a few relatively large coef-

ficients among many negligible ones. Any prior information about the sparsity of the

signal of interest can be exploited to help improve the estimation performance, as demon-

strated in many recent efforts in the area of CS [43]. However, the performance of most

CS heavily relies on the recovery strategies, where the estimation of the desired vector is

achieved from a collection of a fixed number of measurements.

Motivated by LASSO [44] and recent progress in compressive sensing [11, 12, 33],

sparsity–aware strategies have been proposed in [3]. The basic idea of sparsity–aware

strategies is to introduce a penalty which favors sparsity in the cost function. In this sec-

tion, two kinds of sparsity–aware strategies are introduced, which are the Zero–Attracting

strategy and the Reweighted Zero–Attracting strategy. First, an ℓ1–norm penalty on the

coefficients is incorporated into the cost function. This results in a modified update step

with a zero attractor for all the coefficients, which is the reason why this is called the

Zero–Attracting strategy. Then, the Reweighted Zero–Attracting strategy has been pro-

posed to further improve the performance. It employs reweighted step sizes of the zero



attractor for different coefficients, inducing the attractor to selectively promote zero coef-

ficients rather than uniformly promote zeros on all the coefficients. Details of these two

strategies will be discussed in the following subsections.

2.5.1 The Zero–Attracting Strategy

In this subsection, the Zero–Attracting strategy is introduced in detail and derived based

on the LMS algorithm. Starting from the cost function (2.28), a convex regularization

term f(ω(i)) weighted by the parameter ρ is incorporated into the cost function, which

results in:

J(i) = E|d(i)− ω(i)Hx(i)|2 + ρf(ω(i)), (2.58)

where f(ω(i)) is used to enforce sparsity. Based on (2.34) and using the gradient descent

updating, the corresponding filter coefficient vector is updated by

ω(i+ 1) = ω(i) + µx(i)[d(i)− xH(i)ω(i)]− µρ∂f(ω(i)). (2.59)

This results in a modified LMS update with a zero attractor for all the coefficients, nam-

ing the Zero–Attracting LMS (ZA–LMS) algorithm. According to [3], for the ZA–LMS

algorithm, the ℓ1–norm is proposed as penalty function, i.e.,

f1(ω(i)) = ∥ω(i)∥1, (2.60)

in the new cost function (2.58). This choice leads to an algorithm update in (2.59) where

the subgradient vector is given by ∂f1(ω(i)) = sign(ω(i)), where sign(a) is a component–

wise function defined as

sign(a) =

a/|a| a = 0

0 a = 0.(2.61)

The Zero–Attracting update uniformly shrinks all coefficients of the vector, and does

not distinguish between zero and non–zero elements. Intuitively, the zero attractor will

speed–up convergence when the majority of coefficients of ω0 are zero, i.e., the system is

sparse. However, since all the coefficients are forced toward zero uniformly, the perfor-

mance would deteriorate for systems that are not sufficiently sparse.



2.5.2 The Reweighted Zero–Attracting Strategy

Apart from the Zero–Attracting strategy, the Reweighted Zero–Attracting strategy is an-

other kind of sparsity–aware technique. Motivated by the idea of reweighting in compres-

sive sampling [45], the term f(ω(i)) is now defined as

f2(ω(i)) =M∑

m=1

log(1 + ε|ωm(i)|). (2.62)

The log–sum penalty∑M

m=1 log(1 + ε|ωm(i)|) has been introduced as it behaves more

similarly to the ℓ0–norm than the ℓ1–norm [45]. Thus, it enhances the sparsity recovery

of the algorithm. The algorithm in (2.59) is then updated by using

∂f2(ω(i)) = εsign(ω(i))1 + ε|ω(i)|

, (2.63)

leading to what we shall refer to as the reweighted zero–attracting LMS (RZA–LMS)

algorithm. The RZA–LMS selectively shrinks coefficients with large magnitudes and the

ones with small magnitudes. The reweighted zero attractor takes effect only on those

coefficients whose magnitudes are comparable to 1/ε and there is little shrinkage exerted

on the coefficients whose |ωm(i) ≫ 1/ε|.

2.5.3 Simulation results

In this subsection, the performance of the ZA–LMS and the RZA–LMS algorithms are

compared with the standard LMS algorithm in terms of their MSE performance, based on

the system identification scenario.

System identification involves the following steps: experimental planning, the selec-

tion of a model structure, parameter estimation, and model validation [9]. Suppose we

have an unknown plant ω0 which is linear and time varying. x(i) is the available input

signal at time instant i with size M × 1 and is applied simultaneously to the plant and the

model. Then, d(i) and y(i) are employed to stand for the desired signal output and the

output of the adaptive filter respectively. The output y(i) is obtained by

y(i) = ωH(i)x(i), (2.64)



where ω(i) is estimate of the unknow plant generated by adaptive filter. The error e(i) =

d(i)− y(i) is used to adjust the adaptive filter. When the plant is time–varying, the plant

output is nonstationary, and so is the desired response presented to the adaptive filtering

algorithm. In such a situation, the adaptive filtering algorithm has the task of not only

keeping the modeling error small but also continually tracking the time variations in the

dynamics of the plant [32]. The structure for the system identification is shown in Fig.

2.8.

Adaptive Filter

Unknown PlantSystem input

y(n)

e(n)

d(n) System output

Figure 2.8: System identification

The unknown plant ω0 contains 20 coefficients and three scenarios are considered.

In the first scenario, we set the 5th coefficient with value 1 and others to zero, making

the system has a sparsity of 1/20. In the second scenario, all the odd coefficients are set

to 1 while all the even coefficients remains to be zero, i.e., a sparsity of 10/20. In the

third scenario, all the even coefficients are set with value -1 while all the odd coefficients

are maintained to be 1, leaving a completely non–sparse system. The input signal and

the observed noise are white Gaussian random sequences with variance of 1 and 10−3,

respectively. The parameters are set as µ = 0.05, ρ = 5 × 10−4 and ε = 10. Note that

the same µ and ρ are used for LMS, ZA–LMS and RZA–LMS algorithms. The average

estimate of mean square deviation (MSD) is shown in Fig. 2.9, 2.10 and 2.11.

It is clear that, from Fig. 2.9, when the system is very sparse, both ZA–LMS and

RZA–LMS yield faster convergence rate and better steady–state performances than the

LMS algorithm. The RZA–LMS algorithm achieves lower MSD than ZA–LMS. In the

second scenario, as the number of non–zero coefficients increases to 10, the performance

of ZA–LMS deteriorates while RZA–LMS maintains the best performance among the

three algorithm. In the third scenario, RZA–LMS still performs comparably with the



standard LMS algorithm even though the system is now completely non–sparse.

0 50 100 150 200 250 300 350 400 450 500−35

−30

−25

−20

−15

−10

−5

0

5

Time i

MS

D (

dB)

LMSZA−LMSRZA−LMS

Figure 2.9: MSD comparison for scenario 1

0 50 100 150 200 250 300 350 400 450 500−35

−30

−25

−20

−15

−10

−5

0

5

10

Time i

MS

D (

dB)





0 50 100 150 200 250 300 350 400 450 500−30

−25

−20

−15

−10

−5

0

5

10

15

Time i

MS

D (

dB)




Chapter 3

Distributed Conjugate Gradient

Strategies for Distributed Estimation

Over Sensor Networks

Contents3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Proposed Incremental Distributed CG–Based Algorithms . . . . . . 42

3.4 Proposed Diffusion Distributed CG–Based Algorithms . . . . . . . . 47

3.5 Preconditioner Design . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.1 Introduction

In recent years, distributed processing has become popular in wireless networks. Dis-

tributed processing of information consists of collecting data at each node of a network of

sensing devices spread over a geographical area, conveying information to the whole net-

work and performing statistical inference in a distributed way [1, 46]. These techniques


37

2015

CHAPTER 3. DISTRIBUTED CONJUGATE GRADIENT STRATEGIES FOR DISTRIBUTED

ESTIMATION OVER SENSOR NETWORKS 38

exhibit several distinctive advantages such as flexibility, robustness to sensor failures and

improved performance. In this context, for each specific node, a set of neighbor nodes

collect their local information and transmit their estimates to a specific node. Then, each

specific node combines the collected information together with its local estimate to gen-

erate an improved estimate. There are three main protocols for cooperation and exchange

of information for distributed processing, incremental, diffusion and consensus strategies,

and recent studies indicate that the diffusion strategy is the most effective one [5].

In the last few years, several algorithms have been developed and reported in the liter-

ature for distributed networks. Steepest-descent, least mean square (LMS) [1], recursive

least squares (RLS) [47] and affine projection (AP) [16] solutions have been considered

with incremental adaptive strategies over distributed networks [1], while LMS, AP and re-

cursive least squares (RLS) algorithms have been reported using diffusion adaptive strate-

gies [2, 18, 20, 48, 49]. Although the LMS–based algorithms have their own advantages,

when compared with conjugate gradient (CG) algorithms, there are several disadvantages.

First, for the LMS–based algorithms, the adaptation speed is often slow, especially for

the conventional LMS algorithm. Second, with the increase of the adaptation speed, the

system stability may decrease significantly [13]. Furthermore, the RLS–based algorithm-

s usually have a high computational complexity and are prone to numerical instability

when implemented in hardware [9]. In order to develop distributed solutions with a more

attractive tradeoff between performance and complexity, we focus on the development of

distributed CG algorithms. To the best of our knowledge, CG–based algorithms have not

been developed so far for distributed processing. The existing standard CG algorithm has

a faster convergence rate than the LMS-type algorithms and a lower computational com-

plexity than RLS-type techniques [6] even though its performance is often comparable to

RLS algorithms. We consider variants of CG algorithms, including the conventional CG

(CCG) and modified CG (MCG) algorithms [8, 50].

In this chapter, we propose distributed CG algorithms for both incremental and diffu-

sion adaptive strategies. In particular, we develop distributed versions of the CCG algo-

rithm and of the MCG algorithm for distributed estimation and spectrum estimation using

wireless sensor networks. The design of preconditioners for CG algorithms, which have

the ability to improve the performance of the proposed CG algorithms is also present-

ed in this chapter. These algorithms can be applied to civilian and defence applications,




such as parameter estimation in wireless sensor networks, biomedical engineering, cel-

lular networks, battlefield information identification, movement estimation and detection

and distributed spectrum estimation.

In summary, the main contributions of this chapter are:

• We devise distributed CCG and MCG algorithms with incremental and diffusion

adaptive algorithms to perform distributed estimation tasks.

• The design of preconditioners for CG algorithms, which have the ability to improve

the performance of the proposed CG algorithms is presented.

• A simulation study of the proposed and existing distributed estimation algorithms

is conducted with applications to distributed parameter estimation and spectrum

estimation.

3.2 System Models

In this section, we describe the system models of two applications of distributed signal

processing, namely, parameter estimation and spectrum estimation. In these applications,

we consider a wireless sensor network which employs distributed signal processing tech-

niques to perform the desired tasks. We consider a set of N nodes, which have limited

processing capabilities, distributed over a given geographical area. The nodes are con-

nected and form a network, which is assumed to be partially connected because nodes

can exchange information only with neighbors determined by the connectivity topology.

We call a network with this property a partially connected network whereas a fully con-

nected network means that data broadcast by a node can be captured by all other nodes in

the network in one hop [19].




3.2.1 Distributed Parameter Estimation

For distributed parameter estimation, we focus on the processing of an adaptive algorithm

for estimating an unknown vector ωo with size M × 1. The desired signal of each node

at time instant i is

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (3.1)

where xk(i) is the M × 1 input signal vector, nk(i) is the Gaussian noise at each node

with zero mean and variance σ2n,k. At the same time, the output of the adaptive algorithm

for each node is given by

yk(i) = ωHk (i)xk(i), i = 1, 2, . . . , I, (3.2)

where ωk(i) is the local estimate of ω0 for each node at time instant i.

To compute the optimum solution of the unknown vector, we need to solve a problem

expressed in the form of a minimization of the cost function in the distributed form for

each node k:

Jωk(i)

(ωk(i)

)= E


∣∣2 (3.3)


Jω(ω)=

N∑k=1


∣∣2. (3.4)

The optimum solution for the cost function (3.3) is the Wiener solution which is given

by

ωk(i) = R−1k (i)bk(i). (3.5)

where the M × M autocorrelation matrix is given by Rk(i) = E[xk(i)xHk (i)] and

bk(i) = E[xk(i)d∗k(i)] is an M × 1 cross–correlation matrix. In this chapter, we focus

on incremental and diffusion CG–based algorithms to solve the equation and perform

parameter estimation and spectrum estimation in a distributed fashion.




3.2.2 Distributed Spectrum Estimation

In distrusted spectrum estimation, we aim to estimate the spectrum of a transmitted signal

s with N nodes using a wireless sensor network. Let Φs(f) denote the power spectral

density (PSD) of the signal s. The PSD can be represented as a linear combination of

some B basis functions, as described by

Φs(f) =B∑

m=1

bm(f)ω0m = bT0 (f)ω0, (3.6)

where b0(f) = [b1(f), ..., bB(f)]T is the vector of basis functions evaluated at frequency

f , ω0 = [ω01, ..., ω0B] is a vector of weighting coefficients representing the power that

transmits the signal s over each basis, and B is the number of basis functions. For B suffi-

ciently large, the basis expansion in (3.6) can well approximate the transmitted spectrum.

Possible choices for the set of basis bm(f)Bm=1 include [22–24]: rectangular functions,

raised cosines, Gaussian bells and Splines.

Let Hk(f, i) be the channel transfer function between a transmit node conveying the

signal s and receive node k at time instant i, the PSD of the received signal observed by

node k can be expressed as

Ik(f, i) = |Hk(f, i)|2Φs(f) + v2n,k

=B∑

m=1

|Hk(f, i)|2bm(f)ω0m + v2n,k

= bTk,i(f)ω0 + v2n,k (3.7)

where bTk,i(f) = [|Hk(f, i)|2bm(f)]Bm=1 and v2n,k is the receiver noise power at node k.

At every time instant i, every node k observes measurements of the noisy version of

the true PSD Ik(f, i) described by (3.7) over Nc frequency samples fj = fmin : (fmax −fmin)/Nc : fmax, for j = 1, ..., Nc, according to the model:

djk(i) = bTk,i(fj)ω0 + v2n,k + nj

k(i). (3.8)

The term njk(i) denotes observation noise and have zero mean and variance σ2

n,j . The

receiver noise power v2n,k can be estimated with high accuracy in a preliminary step using,

e.g., an energy estimator over an idle band, and then subtracted from (3.8) [25,51]. Thus,




collecting measurements over Nc contiguous channels, we obtain a linear model given by

dk(i) = Bk(i)ω0 + nk(i), (3.9)

where Bk(i) = [bTk,i(fj)]Ncj=1 ∈ RNc×B, with Nc > B, and nk(i) is a zero mean random

vector with covariance matrix Rn,i. At this point, we can generate the cost function for

node k as:

Jωk(i)(ωk(i)) = E∣∣dk(i)−Bk(i)ωk(i)

∣∣2 (3.10)


Jω(ω)=

N∑k=1

E∣∣dk(i)−Bk(i)ω

∣∣2. (3.11)

3.3 Proposed Incremental Distributed CG–Based Algo-

rithms

In this section, we propose two CG–based algorithms which are are based on the CCG [8]

and MCG [52] algorithms with incremental distributed solution for distributed parameter

estimation and spectrum estimation over wireless sensor networks.

3.3.1 Incremental Distributed CG–Based Solutions

In the incremental distributed strategy, each node is only allowed to communicate with

its direct neighbor at each time instant. To describe the whole process, we define a cycle

where each node in this network could only access its immediate neighbor in this cycle

[1]. The quantity ψk(i) is defined as a local estimate of the unknown vector ω0 at time

instant i. As a result, we assume that node k has access to an estimate of ω0 at its

immediate neighbor node k − 1 which is ψk−1(i) in the defined cycle. Fig.3.1 illustrates

this processing. In the following, we introduce two kinds of incremental distributed CG–

based algorithms, which are the incremental distributed CCG (IDCCG) algorithm and the

incremental distributed MCG (IDMCG) algorithm.




Node 1

Node k-1

Node k

Node k+1Node N

ψ1(i)

ψk−1(i)

ψk(i)

ψk+1(i)

Incremental Distributed

CG–Based Algorithm

ψk−1(i) dk(i),xk(i)

ψk(i), pass to node k+1

Figure 3.1: Incremental distributed CG–based network processing

3.3.1.1 Proposed IDCCG Algorithm

Based on the main steps of CG algorithm which are described in Table. 2.2, we introduce

the main steps of the proposed IDCCG algorithm. In the IDCCG algorithm, the iteration

procedure is introduced. At the jth iteration of time instant i, the step size αjk(i) for

updating the local estimate at node k is defined as:

αjk(i) =

(gj−1k (i)

)Hgj−1k (i)(

pj−1k (i)

)HRk(i)p

j−1k (i)

, (3.12)

where pk(i) is the search direction and defined as

pjk(i) = gjk(i) + βj

k(i)pj−1k (i). (3.13)

In (3.13), the coefficient βjk(i) is calculated by the Gram–Schmidt orthogonalization pro-

cedure [7] for the conjugacy:

βjk(i) =

(gjk(i)

)Hgjk(i)(

gj−1k (i)

)Hgj−1k (i)

. (3.14)

gjk(i) is the residual, which is obtained as

gjk(i) = gj−1k (i)− αj

k(i)Rk(i)pj−1k (i). (3.15)

The initial search direction is equal to the initial residual, which is given by p0k(i) =

g0k(i) = bk(i)−Rk(i)ψ0k(i). Then, the local estimate is updated as

ψjk(i) = ψ

j−1k (i) + αj

k(i)pj−1k (i). (3.16)




Table 3.1: IDCCG Algorithm

Initialization:

ω(0) = 0

For each time instant i=1,2, . . . , I

ψ01(i) = ω(i− 1)

For each node k=1,2, . . . , N

Rk(i) = λfRk(i− 1) + xk(i)xHk (i)

bk(i) = λfbk(i− 1) + d∗k(i)xk(i)

p0k(i) = g0k(i) = bk(i)−Rk(i)ψ

0k(i)

For iterations j=1,2, . . . , J

αjk(i) =

(gj−1k (i)

)H

gj−1k (i)(

pj−1k (i)

)H

Rk(i)pj−1k (i)

ψjk(i) = ψ

j−1k (i) + αj

k(i)pj−1k (i)


k(i)Rk(i)pj−1k (i)

βjk(i) =

(gjk(i)

)H

gjk(i)(gj−1k (i)

)H

gj−1k (i)


k(i)pj−1k (i)

End

When k < N

ψ0k+1(i) = ψ

Jk (i)

End

ω(i) = ψJN(i)

End

There are two ways to compute the correlation and cross–correlation matrices which

are the ’finite sliding data window’ and the ’exponentially decaying data window’ [8]. In

this chapter, we mainly focus on the ’exponentially decaying data window’. The recur-

sions are given by:

Rk(i) = λfRk(i− 1) + xk(i)xHk (i) (3.17)

and

bk(i) = λfbk(i− 1) + d∗k(i)xk(i) (3.18)

where λf is the forgetting factor. The IDCCD algorithm is summarized in Table 3.1




3.3.1.2 Proposed IDMCG Algorithm

The idea of the IDMCG algorithm comes from the existing CCG algorithm. For the

IDMCG solution, a recursive formulation for the residual vector is employed, which can

be found by using (3.12), (3.17) and (3.18) [8, 50], resulting in

gk(i) = bk(i)−Rk(i)ψk(i)

= λfgk(i− 1)− αk(i)Rk(i)pk(i− 1) + xk(i)[dk(i)−ψHk−1(i)xk(i)].

(3.19)

Premultiplying (3.19) by pHk (i− 1) gives

pHk (i− 1)gk(i) = λfpHk (i− 1)gk(i− 1)− αk(i)p

Hk (i− 1)Rk(i)pk(i− 1)

+ pHk (i− 1)xk(i)[dk(i)−ψHk−1(i)xk(i)].

(3.20)

Taking the expectation of both sides and considering pk(i − 1) uncorrelated with xk(i),

dk(i) and ψk−1(i) yields

E[pHk (i− 1)gk(i)] ≈ λfE[pk(i− 1)Hgk−1(i)]− E[αk(i)]E[pHk (i− 1)Rk(i)pk(i− 1)]

+ E[pHk (i− 1)]E[xk(i)[dk(i)− ωH

k−1(i)xk(i)]].

(3.21)

Assuming that the algorithm converges, the last term of (3.21) could be neglected and we

obtain:

E[αk(i)] =E[pHk (i− 1)gk(i)]− λfE[pHk (i− 1)gk(i− 1)]

E[pHk (i− 1)Rk(i)pk(i− 1)](3.22)

and

(λf − 0.5)E[pHk (i− 1)gk(i− 1)]

E[pHk (i− 1)Rk(i)pk(i− 1)]≤ E[αk(i)] ≤

E[pHk (i− 1)gk(i− 1)]

E[pHk (i− 1)Rk(i)pk(i− 1)]

(3.23)

The inequalities in (3.23) are satisfied if we define [8]:

αk(i) = ηpHk (i− 1)gk(i− 1)

pHk (i− 1)Rk(i)pk(i− 1), (3.24)

where (λf − 0.5) ≤ η ≤ λf . The direction vector pk(i) for the IDMCG algorithm is

defined by

pk(i) = gk(i) + βk(i)pk(i− 1). (3.25)

For the IDMCG algorithm, for the computation of βk(i), the Polak–Ribiere method [8],

which is given by

βk(i) =

(gk(i)− gHk (i− 1)

)gk(i)

gHk (i− 1)gk(i− 1)(3.26)




Table 3.2: IDMCG Algorithm

Initialization:

ω(0) = 0


bk(1) = d∗k(1)xk(1)

pk(0) = gk(0) = bk(1)

End


ψ0(i) = ω(i− 1)



αk(i) = ηpHk (i−1)gk(i−1)

pHk (i−1)Rk(i)pk(i−1)

where (λf − 0.5) ≤ η ≤ λf

ψk(i) = ψk−1(i) + αk(i)pk(i− 1)

gk(i) = λfgk(i− 1)− αk(i)Rk(i)pk(i− 1) + xk(i)[dk(i)−ψHk−1(i)xk(i)]

βk(i) =

(gk(i)−gHk (i−1)

)gk(i)

gHk (i−1)gk(i−1)

pk(i) = gk(i) + βk(i)pk(i− 1)

End

ω(i) = ψN(i)

End

should be used for improved performance, according to [53, 54].

In the comparison of the IDCCG algorithm with the IDMCG algorithm, the difference

between these two strategies is that IDCCG needs to run J iterations while IDMCG only

needs one iteration. The details of the IDMCG solution is shown in Table 3.2.

3.3.2 Computational Complexity

To analyze the proposed incremental distributed CG algorithms, we detail the computa-

tional complexity in terms of arithmetic operations. Additions and multiplications are

used to measure the complexity and are listed in Table 3.3. The parameter M is the length




Table 3.3: Computational Complexity of Different Incremental Algorithms

Algorithm Additions Multiplications

IDCCG M2 +M 2M2 + 2M

+J(M2 + 6M − 4) J(M2 + 7M + 3)

IDMCG 2M2 + 10M − 4 3M2 + 12M + 3

Incremental LMS [1] 4M − 1 3M + 1

Incremental RLS [1] 4M2 + 12M + 1 4M2 + 12M − 1

of the unknown vector ω0 that needs to be estimated. It is obvious that the complexity

of the IDCCG solution depends on the number of iterations J and an advantage of the

IDMCG algorithm is that it only requires one iteration per time instant.

3.4 Proposed Diffusion Distributed CG–Based Algo-

rithms

In this section, we detail the proposed diffusion distributed CCG (DDCCG) and diffusion

distributed MCG (DDMCG) algorithms for distributed parameter estimation and spec-

trum estimation using wireless sensor networks.

3.4.1 Diffusion Distributed CG–Based Algorithms

In the derivation of diffusion distributed CG–based strategy, we consider a network struc-

ture where each node from the same neighborhood could exchange information with each

other at every time instant. For each node in the network, the CTA scheme [17] is em-

ployed. Each node can collect information from all its neighbors and itself, and then

convey all the information to its local adaptive algorithm and update the estimate of the

weight vector through our algorithms. Specifically, at any time instant i, we define that

node k has access to a set of estimates ωl(i − 1)l∈Nkfrom its neighbors, where Nk

denotes the set of neighbor nodes of node k including node k itself. Then, these local




estimates are combined at node k as

ψk(i) =∑l∈Nk

cklωl(i− 1) (3.27)

where ckl is calculated through the Metropolis rule in (2.16) due to its simplicity and good

performance [55]. For the proposed diffusion distributed CG–based algorithms, the whole

processing is shown in Fig. 3.2.

Node 1

Node k-1

Node k

Node k+1

Node N

ω1(i− 1)

ωk−1(i− 1)

ωk+1(i− 1)

Diffusion Distributed

CG–Based Algorithm

ψk(i) dk(i),xk(i)

ωk(i + 1)

Figure 3.2: Diffusion Distributed CG–Based Network Processing

3.4.1.1 Proposed DDCCG Algorithm

For the DDCCG algorithm, (3.27) is employed to combine the estimates ωl(i− 1), l ∈Nk from node k’s neighbor nodes and then the estimate at node k is updated as follows:

ωjk(i) = ω

j−1k (i) + αj

k(i)pj−1k (i), (3.28)

where ω0k(i) = ψk(i). The rest of the derivation is similar to the IDCCG solution and the

pseudo–code is detailed in Table 3.4.

3.4.1.2 Proposed DDMCG Algorithm

For the DDMCG algorithm, the iteration j is removed and the estimate at node k is up-

dated as:

ωk(i) = ψk(i) + αk(i)pk(i), (3.29)

The complete DDMCG solution is described in Table 3.5.




Table 3.4: DDCCG Algorithm

Initialization:

ωk(0) = 0, k=1,2, . . . , N


For each node k=1,2, . . . , N (Combination Step)

ψk(i) =∑

l∈Nkcklωl(i− 1)

End

For each node k=1,2, . . . , N (Adaptation Step)



ω0k(i) = ψk(i)

p0k(i) = g0k(i) = bk(i)−Rk(i)ω

0k(i)

For iterations j=1,2, . . . , J

αjk(i) =

(gj−1k (i)

)H

gj−1k (i)(

pj−1k (i)

)H

Rk(i)pj−1k (i)

ωjk(i) = ω

j−1k (i) + αj

k(i)pj−1k (i)


k(i)Rk(i)pj−1k (i)

βjk(i) =

(gjk(i)

)H

gjk(i)(gj−1k (i)

)H

gj−1k (i)


k(i)pj−1k (i)

End

ωk(i) = ωJk (i)

End

End




Table 3.5: DDMCG Algorithm

Initialization:

ωk(0) = 0, k=1,2, . . . , N


bk(1) = d∗k(1)xk(1)

pk(0) = gk(0) = bk(1)

End


For each node k=1,2, . . . , N (Combination Step)

ψk(i) =∑

l∈Nkcklωl(i− 1)

End

For each node k=1,2, . . . , N (Adaptation Step)



αk(i) = ηpHk (i−1)gk(i−1)

pHk (i−1)Rk(i)pk(i−1)

where (λf − 0.5) ≤ η ≤ λf

ωk(i) = ψk(i) + αk(i)pk(i− 1)

gk(i) = λfgk(i− 1)− αk(i)Rk(i)pk(i− 1) + xk(i)[dk(i)−ψHk−1(i)xk(i)]

βk(i) =

(gk(i)−gHk (i−1)

)gk(i)

gHk (i−1)gk(i−1)

pk(i) = gk(i) + βk(i)pk(i− 1)

End

End




Table 3.6: Computational Complexity Of Different Diffusion Algorithms

Algorithm Additions Multiplications

DDCCG M2 +M 2M2 + 2M

+J(M2 + 6M +J(M2 + 7M

+|Nk|M − 4) +|Nk|M + 3)

DDMCG 2M2 + 10M − 4 3M2 + 12M + 3

+|Nk|M +|Nk|M

Diffusion LMS [17] 4M − 1 + |Nk|M 3M + 1 + |Nk|M

Diffusion RLS [20] 4M2 + 16M + 1 + |Nk|M 4M2 + 12M − 1 + |Nk|M


The computational complexity is used to analyse the proposed diffusion distributed CG–

based algorithms where additions and multiplications are measured. The details are listed

in Table 3.6. Similarly to the incremental distributed CG–based algorithms, it is clear

that the complexity of the DDCCG solution depends on the iteration number J and both

DDCCG and DDMCG solutions depend on the number of neighbor nodes |Nk| of node

k. The parameter M is the length of the unknown vector ω0 that needs to be estimated.

3.5 Preconditioner Design

Preconditioning is an important technique which can be used to improve the performance

of CG algorithms [56–59]. The idea behind preconditioning is to employ the CG al-

gorithms on an equivalent system or in a transform–domain. Thus, instead of solving

Rω = b we solve a related problem Rω = b, which is modified with the aim of ob-

taining better convergence and steady state performance. The relationships between these

two equations are given by

R = TRTH , (3.30)

ω = Tω (3.31)

and

b = Tb, (3.32)




where the M × M matrix T is called a preconditioner. We design the matrix T as an

arbitrary unitary matrix of size M ×M and has the following property [10]

TTH = THT = I. (3.33)

Two kinds of unitary transformations are considered to build the preconditioner T ,

which are discrete Fourier transform (DFT) and discrete cosine transform (DCT) [10].

The motivation behind employing these two matrix is they have useful de–correlation

properties and often reduce the eigenvalue spread of the auto–correlation matrix of the

input signal [10].

For the DFT scheme, we employ the following expression

[TDFT ]vm , 1√M

e−j2πmv

M , v,m = 0, 1, 2, . . . ,M − 1, (3.34)

where v indicates the row index and m the column index. M is the length of the unknown

parameter ω0. The matrix form of TDFT is illustrated as

TDFT =1√M

1 1 1 · · · 1

1 e−j2πM e−

j4πM · · · e−

j2(M−1)πM

1 e−j4πM e−

j8πM · · · e−

j4(M−1)πM

......

... . . . ...

1 e−j2(M−1)π

M e−j4(M−1)π

M · · · e−j2(M−1)2π

M

(3.35)

For the DCT scheme, the preconditioner T is defined as

[TDCT ]vm , δ(v) cos

(v(2m+ 1)π

2M

), v,m = 0, 1, 2, . . . ,M − 1, (3.36)

where

δ(0) =1√M

and δ(v) =

√2

Mfor v = 0 (3.37)

and the matrix form of TDCT is illustrated as

TDCT =1√M

1 1 1 · · · 1

1√2 cos( 3π

2M)

√2 cos( 5π

2M) · · ·

√2 cos( (2M−1)π

2M)

1√2 cos( 6π

2M)

√2 cos(10π

2M) · · ·

√2 cos(2(2M−1)π

2M)

......

... . . . ...

1√2 cos(3(M−1)π

2M)

√2 cos(5(M−1)π

2M) · · ·

√2 cos( (2M−1)(M−1)π

2M)

(3.38)




Then, for the DCT scheme, we choose T = THDCT . It should be noticed that the scal-

ing factor 1√M

is added in the expression for the TDFT in order to result in a unitary

transformation since then TDFT satisfies TDFTTHDFT = TH

DFTTDFT = I [10].

The optimal selection of the preconditioner is the Kahunen–Loeve transform (KLT)

[10]. However, using KLT is not practical since it requires knowledge of the auto–

correlation matrix R of the input signal and this information is generally lacking in im-

plementations.

3.6 Simulation Results

In this section, we investigate the performance of the proposed incremental and diffusion

distributed CG–based algorithms in two scenarios: distributed estimation and distributed

spectrum estimation in wireless sensor networks.

3.6.1 Distributed Estimation in Wireless Sensor Networks

In this subsection, we compare the proposed incremental and diffusion distributed CG–

based algorithms with LMS [1, 17] and RLS [1, 20] algorithms, based on the MSE and

MSD performance metrics. For each comparison, the number of time instants is set to

1000, and we assume there are 20 nodes in the network. The length of the unknown

parameter ω0 is 10, the variance for the input signal and the noise are 1 and 0.001, respec-

tively. In addition, the noise samples are modeled as circular Gaussian noise with zero

mean.

3.6.1.1 Performance of Proposed Incremental Distributed CG–Based Algorithms

First, we define the parameters of the performance test for each algorithm and the network.

The step size µ for the LMS algorithm [1] is set to 0.2, the forgetting factor λ for the

RLS [1] algorithm is set to 0.998. The λf for IDCCG and IDMCG are both set to 0.998.




For IDMCG, the ηf is equal to 0.55. The iteration number J for IDCCG is set to 5. We

choose the DCT matrix as the preconditioner.

0 100 200 300 400 500 600 700 800 900 1000−45

−40

−35

−30

−25

−20

−15

−10

−5

0

Time instant

MS

D(d

B)

Incremental LMS[1]Incremental RLS[1]IDCCGIDMCGPreconditioned IDCCGPreconditioned IDCCG

Figure 3.3: MSD performance comparison for the incremental distributed strategies

0 100 200 300 400 500 600 700 800 900 1000−30

−25

−20

−15

−10

−5

0

5

Time instant

MS

E(d

B)

Incremental LMS[1]Incremental RLS[1]IDCCGIDMCGPreconditioned IDCCGPreconditioned IDCCG

Figure 3.4: MSE performance comparison for the incremental distributed strategies




The MSD and MSE performances of each algorithm have been shown in Fig. 3.3

and 3.4 respectively. We can verify that, the IDMCG and IDCCG algorithm performs

better than incremental LMS, while IDMCG is close to the RLS algorithm. With the

preconditioning strategy, the performance of the IDCCG and IDMCG is further improved.

The reason why the proposed IDMCG algorithm has a better performance than IDCCG

is because IDMCG employs the negative gradient vector gk with a recursive expression

and the βk is computed using the Polak–Ribiere approach, which results in more accurate

estimates. Comparing with the IDCCG algorithm, the IDMCG is a non–reset and low

complexity algorithm with one iteration per time instant. Since the frequency which the

algorithm resets influences the performance, the IDMCG algorithm introduces the non–

reset method together with the Polak– Ribiere approach which are used to improve the

performance [8].

3.6.1.2 Performance of Proposed Diffusion Distributed CG–Based Algorithms

The parameters of the performance test for each algorithm and the network are defined

as follows: the step size µ for the LMS [17] algorithm is set to 0.2, the forgetting factor

λ for the RLS [20] algorithm is set to 0.998. The λf for DDCCG and DDMCG are both

0.998. The ηf is equal to 0.45 for DDMCG. The iteration number J for DDCCG is set to

5. We choose the DCT matrix as the preconditioner.

For the diffusion strategy, the combine coefficients ckl are calculated following the

Metropolis rule. Fig. 3.5 shows the network structure. The results are illustrated in Fig.

3.6 and 3.7. We can see that, the proposed DDMCG and DDCCG still have a better

performance than the LMS algorithm and DDMCG is closer to the RLS’s performance.

The performance of the DDCCG and DDMCG can still benefit from the preconditioning

strategy.

3.6.2 Distributed Spectrum Estimation

In this simulation, we consider a network composed of N = 20 nodes estimating the

unknown spectrum ω0, as illustrated in Fig. 3.5. The nodes scan Nc = 100 frequencies




0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

12

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

Figure 3.5: Network structure

0 100 200 300 400 500 600 700 800 900 1000−50

−40

−30

−20

−10

0

10

Time instant

MS

D(d

B)

Diffusion LMS[17]Diffusion RLS[20]DDCCGDDMCGPreconditioned DDCCGPreconditioned DDMCG

Figure 3.6: MSD performance comparison for the diffusion distributed strategies




0 100 200 300 400 500 600 700 800 900 1000−30

−25

−20

−15

−10

−5

0

5

Time instant

MS

E(d

B)

Diffusion LMS[17]Diffusion RLS[20]DDCCGDDMCGPreconditioned DDCCGPreconditioned DDMCG

Figure 3.7: MSE performance comparison for the diffusion distributed strategies

over the frequency axis, which is normalized between 0 and 1, and use B = 50 non–

overlapping rectangular basis functions to model the expansion of the spectrum [25]. The

basis functions have amplitude equal to one. We assume that the unknown spectrum ω0 is

transmitted over 8 basis functions, thus leading to a sparsity ratio equal to 8/50. The power

transmitted over each basis function is set equal to 1. The variance for the observation

noise is 0.01.

For distributed estimation, we employ the DDMCG and the DDCCG algorithms, to-

gether with the preconditioned DDMCG algorithm to solve the cost function (3.11) re-

spectively. The λf for DDCCG and DDMCG are both 0.99. The ηf is equal to 0.3 for

DDMCG. The iteration number J for DDCCG is set to 5. The DCT matrix is employed

as the preconditioner. We compare the proposed DDCCG and DDMCG algorithms with

the sparse ATC diffusion algorithm [25], diffusion LMS algorithm [17] and diffusion RLS

algorithm [20]. The step–sizes for the sparse ATC diffusion algorithm and diffusion LMS

algorithm are set equal to 0.2, while for the sparse ATC diffusion algorithm, γ is set to

2.2 × 10−3 and β is set to 50. The forgetting factor λ for the diffusion RLS algorithm is

set to 0.998.




0 20 40 60 80 100 120 140 160 180−25

−20

−15

−10

−5

0

5

10

Time instant

MS

D (

dB)

DDMCGPreconditioned DDMCGDiffusion LMS[17]Sparse ATC diffusion[25]DDCCGDiffusion RLS[20]

Figure 3.8: Performance comparison for the distributed spectrum estimation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−30

−25

−20

−15

−10

−5

0

Frequency Axis

PS

D (

dB

)

Diffusion LMS[17]Sparse ATCDiffusion[25]DDMCGTrue spectrum

Figure 3.9: Example of distributed spectrum estimation




We illustrate the result of distributed spectrum estimation carried out by different al-

gorithms in the term of the MSD comparison in Fig. 3.8. We also select the sparse ATC

diffusion algorithm [25], diffusion LMS algorithm [17] and DDMCG to compare their

performance in term of PSD in Fig. 3.9. The true transmitted spectrum is also reported in

Fig. 3.9.

From Fig. 3.8, the DDMCG still performs better than other algorithms and is close to

the diffusion RLS algorithm. From Fig. 3.9, we can notice that all the algorithms are able

to identify the spectrum, but it is also clear that the DDMCG algorithm is able to strongly

reduce the effect of the spurious terms.

3.7 Summary

In this chapter, we have proposed distributed CG algorithms for both incremental and dif-

fusion adaptive strategies. We have investigated the proposed algorithms in distributed es-

timation for wireless sensor networks and distributed spectrum estimation. The CG–based

strategies has low computational complexity when compared with the RLS algorithm and

have a faster convergence than the LMS algorithm. The preconditioning strategy is also

introduced to further improve the performance of the proposed algorithms. Simulation re-

sults have proved the advantages of the proposed IDCCG/IDMCG and DDCCG/DDMCG

algorithms in different applications.


Chapter 4

Adaptive Link Selection Algorithms for

Distributed Diffusion Estimation


4.2 System Model and Problem Statement . . . . . . . . . . . . . . . . . 63

4.3 Proposed Adaptive Link Selection Algorithms . . . . . . . . . . . . 65

4.4 Analysis of the proposed algorithms . . . . . . . . . . . . . . . . . . 73

4.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.1 Introduction

Distributed signal processing algorithms have become a key approach for statistical in-

ference in wireless networks and applications such as wireless sensor networks and smart

grids [1–4]. It is well known that distributed processing techniques deal with the extrac-

tion of information from data collected at nodes that are distributed over a geographic

area [1]. In this context, for each specific node, a set of neighbor nodes collect their local

information and transmit the estimates to a specific node. Then, each specific node com-

bines the collected information together with its local estimate to generate an improved


60

2015

CHAPTER 4. ADAPTIVE LINK SELECTION ALGORITHMS FOR DISTRIBUTED DIFFUSION

ESTIMATION 61

estimate.

4.1.1 Prior and Related Work

Several works in the literature have proposed strategies for distributed processing which

include incremental [1, 60–62], diffusion [2, 17], sparsity–aware [3, 63] and consensus–

based strategies [4]. With the incremental strategy, the processing follows a Hamiltonian

cycle, i.e., the information flows through these nodes in one direction, which means each

node passes the information to its adjacent node in a uniform direction. However, in order

to determine a cyclic path that covers all nodes, this method needs to solve an NP–hard

problem. In addition, when any of the nodes fails, data communication through the cycle

is interrupted and the distributed processing breaks down [1].

In distributed diffusion strategies [2,63], the neighbors for each node are fixed and the

combining coefficients are calculated after the network topology is deployed and starts

its operation. One disadvantage of this approach is that the estimation procedure may be

affected by poorly performing links. More specifically, the fixed neighbors and the pre–

calculated combining coefficients may not provide an optimized estimation performance

for each specified node because there are links that are more severely affected by noise or

fading. Moreover, when the number of neighbor nodes is large, each node requires a large

bandwidth and transmit power. Prior work on topology design and adjustment techniques

includes the studies in [64, 65] and [66], which are not dynamic in the sense that they

cannot track changes in the network and mitigate the effects of poor links.

4.1.2 Contributions

The adaptive link selection algorithms for distributed estimation problems are proposed

and studied in this chapter. Specifically, we develop adaptive link selection algorithms

that can exploit the knowledge of poor links by selecting a subset of data from neigh-

bor nodes. The first approach consists of exhaustive search–based LMS/RLS link selec-

tion (ES–LMS/ES–RLS) algorithms, whereas the second technique is based on sparsity–

inspired LMS/RLS link selection (SI–LMS/SI–RLS) algorithms. With both approaches,



ESTIMATION 62

distributed processing can be divided into two steps. The first step is called the adapta-

tion step, in which each node employs LMS or RLS to perform the adaptation through its

local information. Following the adaptation step, each node will combine its collected es-

timates from its neighbors and local estimate, through the proposed adaptive link selection

algorithms. The proposed algorithms result in improved estimation performance in terms

of the mean–square error (MSE) associated with the estimates. In contrast to previously

reported techniques, a key feature of the proposed algorithms is that the combination step

involves only a subset of the data associated with the best performing links.

In the ES–LMS and ES–RLS algorithms, we consider all possible combinations for

each node with its neighbors and choose the combination associated with the smallest

MSE value. In the SI–LMS and SI–RLS algorithms, we incorporate a reweighted zero

attraction (RZA) strategy into the adaptive link selection algorithms. The RZA approach

is often employed in applications dealing with sparse systems in such a way that it shrinks

the small values in the parameter vector to zero, which results in better convergence and

steady–state performance. Unlike prior work with sparsity–aware algorithms [3, 13, 67,

68], the proposed SI–LMS and SI–RLS algorithms exploit the possible sparsity of the

MSE values associated with each of the links in a different way. In contrast to existing

methods that shrink the signal samples to zero, SI–LMS and SI–RLS shrink to zero the

links that have poor performance or high MSE values. By using the SI–LMS and SI–

RLS algorithms, data associated with unsatisfactory performance will be discarded, which

means the effective network topology used in the estimation procedure will change as

well. Although the physical topology is not changed by the proposed algorithms, the

choice of the data coming from the neighbor nodes for each node is dynamic, leads to

the change of combination weights and results in improved performance. We also remark

that the topology could be altered with the aid of the proposed algorithms and a feedback

channel which could inform the nodes whether they should be switched off or not. The

proposed algorithms are considered for wireless sensor networks and also as a tool for

distributed state estimation that could be used in smart grids.

In summary, the main contributions of this chapter are:

• We present adaptive link selection algorithms for distributed estimation that are able

to achieve significantly better performance than existing algorithms.



ESTIMATION 63

• We devise distributed LMS and RLS algorithms with link selection capabilities to

perform distributed estimation.

• We analyze the MSE convergence and tracking performance of the proposed algo-

rithms and their computational complexities and we derive analytical formulas to

predict their MSE performance.

• A simulation study of the proposed and existing distributed estimation algorithms

is conducted along with applications in wireless sensor networks and smart grids.

4.2 System Model and Problem Statement

k

Nk

Figure 4.1: Network topology with N nodes

We consider a set of N nodes, which have limited processing capabilities, distributed

over a given geographical area as depicted in Fig. 4.1. The nodes are connected and

form a network, which is assumed to be partially connected because nodes can exchange

information only with neighbors determined by the connectivity topology. We call a net-

work with this property a partially connected network whereas a fully connected network

means that data broadcast by a node can be captured by all other nodes in the network

in one hop [19]. We can think of this network as a wireless network, but our analysis

also applies to wired networks such as power grids. In our work, in order to perform link

selection strategies, we assume that each node has at least two neighbors.



ESTIMATION 64

The aim of the network is to estimate an unknown parameter vector ω0, which has

length M . At every time instant i, each node k takes a scalar measurement dk(i) according

to

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (4.1)

where xk(i) is the M × 1 random regression input signal vector and nk(i) denotes the

Gaussian noise at each node with zero mean and variance σ2n,k. This linear model is able

to capture or approximate well many input-output relations for estimation purposes [9]

and we assume I > M . To compute an estimate of ω0 in a distributed fashion, we need

each node to minimize the MSE cost function [2]

Jωk(i)

(ωk(i)

)= E


∣∣2, (4.2)

where E denotes expectation and ωk(i) is the estimated vector generated by node k at

time instant i. Equation (4.3) is also the definition of the MSE and the global network

cost function could be described as

Jω(ω) =N∑k=1

E|dk(i)− ωHxk(i)|2. (4.3)

To solve this problem, diffusion strategies have been proposed in [2, 17] and [16].

In these strategies, the estimate for each node is generated through a fixed combination

strategy given by

ωk(i) =∑l∈Nk

cklψl(i), (4.4)

where Nk denotes the set of neighbors of node k including node k itself, ckl ≥ 0 is the

combining coefficient and ψl(i) is the local estimate generated by node l through its local

information.

There are many ways to calculate the combining coefficient ckl which include the

Hastings [55], the Metropolis [29], the Laplacian [30] and the nearest neighbor [31] rules.

In this work, due to its simplicity and good performance we adopt the Metropolis rule [29]

given by (2.16).

The set of coefficients ckl should satisfy [2]∑l∈Nk ∀k

ckl = 1. (4.5)



ESTIMATION 65

For the combination strategy mentioned in (4.4), the choice of neighbors for each node

is fixed, which results in some problems and limitations, namely:

• Some nodes may face high levels of noise or interference, which may lead to inac-

curate estimates.

• When the number of neighbors for each node is high, large communication band-

width and high transmit power are required.

• Some nodes may shut down or collapse due to network problems. As a result, local

estimates to their neighbors may be affected.

Under such circumstances, a performance degradation is likely to occur when the network

cannot discard the contribution of poorly performing links and their associated data in the

estimation procedure. In the next section, the proposed adaptive link selection algorithms

are presented, which equip a network with the ability to improve the estimation procedure.

In the proposed scheme, each node is able to dynamically select the data coming from its

neighbors in order to optimize the performance of distributed estimation techniques.

4.3 Proposed Adaptive Link Selection Algorithms

In this section, we present the proposed adaptive link selection algorithms. The goal of

the proposed algorithms is to optimize the distributed estimation and improve the perfor-

mance of the network by dynamically changing the topology. These algorithmic strategies

give the nodes the ability to choose their neighbors based on their MSE performance. We

develop two categories of adaptive link selection algorithms; the first one is based on an

exhaustive search, while the second is based on a sparsity–inspired relaxation. The details

will be illustrated in the following subsections.



ESTIMATION 66

4.3.1 Exhaustive Search–Based LMS/RLS Link Selection

The proposed ES–LMS and ES–RLS algorithms employ an exhaustive search to select

the links that yield the best performance in terms of MSE. First, we describe how we

define the adaptation step for these two strategies. In the ES–LMS algorithm, we employ

the adaptation strategy given by

ψk(i) = ωk(i− 1) + µkxk(i)[dk(i)− ωH

k (i− 1)xk(i)]∗, (4.6)

where µk is the step size for each node. In the ES–RLS algorithm, we employ the follow-

ing steps for the adaptation:

Φ−1(i) = λ−1Φ−1(i− 1)

−λ−2Φ−1(i− 1)x(i)xH(i)Φ−1(i− 1)

1 + λ−1xH(i)Φ−1(i− 1)x(i), (4.7)

where λ is the forgetting factor. Then, we let

P (i) = Φ−1(i) (4.8)

and

k(i) =λ−1P (i)x(i)

1 + λ−1xH(i)P (i)x(i). (4.9)

ψk(i) = ωk(i− 1) + k(i)[dk(i)− ωH

k (i− 1)xk(i)]∗, (4.10)

P (i+ 1) = λ−1P (i)− λ−1k(i)xH(i)P (i). (4.11)

Following the adaptation step, we introduce the combination step for both ES–LMS and

ES–RLS algorithms, based on an exhaustive search strategy. At first, we introduce a

tentative set Ωk using a combinatorial approach described by

Ωk ∈ 2Nk\∅, (4.12)

where the set Ωk is a nonempty set with 2Nk elements. After the tentative set Ωk is defined,

we write the cost function (4.3) for each node as

Jψ(i)

(ψ(i)

), E

∣∣dk(i)−ψH(i)xk(i)∣∣2, (4.13)

where

ψ(i) ,∑l∈Ωk

ckl(i)ψl(i) (4.14)



ESTIMATION 67

is the local estimator and ψl(i) is calculated through (4.6) or (4.10), depending on the

algorithm, i.e., ES–LMS or ES–RLS. With different choices of the set Ωk, the combining

coefficients ckl will be re–calculated through (2.16), to ensure condition (4.5) is satisfied.

Then, we introduce the error pattern for each node, which is defined as

eΩk(i) , dk(i)−

[∑l∈Ωk

ckl(i)ψl(i)

]Hxk(i). (4.15)

For each node k, the strategy that finds the best set Ωk(i) must solve the following opti-

mization problem:

Ωk(i) = arg minΩk∈2Nk\∅

|eΩk(i)|. (4.16)

After all steps have been completed, the combination step in (4.4) is performed as de-

scribed by

ωk(i) =∑

l∈Ωk(i)

ckl(i)ψl(i). (4.17)

At this stage, the main steps of the ES–LMS and ES–RLS algorithms have been complet-

ed. The proposed ES–LMS and ES–RLS algorithms find the set Ωk(i) that minimizes the

error pattern in (4.15) and (4.16) and then use this set of nodes to obtain ωk(i) through

(4.17). The ES–LMS/ES–RLS algorithms are briefly summarized as follows:

Step 1 Each node performs the adaptation through its local information based on the

LMS or RLS algorithm.

Step 2 Each node finds the best set Ωk(i), which satisfies (4.16).

Step 3 Each node combines the information obtained from its best set of neighbors

through (4.17).

The details of the proposed ES–LMS and ES–RLS algorithms are shown in Tables 4.1

and 4.2. When the ES–LMS and ES–RLS algorithms are implemented in networks with

a large number of small and low–power sensors, the computational complexity cost may

become high, as the algorithm in (4.16) requires an exhaustive search and needs more

computations to examine all the possible sets Ωk(i) at each time instant.



ESTIMATION 68

Table 4.1: The ES-LMS Algorithm

Initialize: ωk(0)=0



ψk(i) = ωk(i− 1) + µkxk(i)[dk(i)− ωHk (i− 1)xk(i)]

∗

end


find all possible sets of Ωk

eΩk(i) = dk(i)− [

∑l∈Ωk

ckl(i)ψl(i)]Hxk(i)

Ωk(i) = argminΩk

|eΩk(i)|

ωk(i) =∑

l∈Ωk(i)

ckl(i)ψl(i)

end

end

4.3.2 Sparsity–Inspired LMS/RLS Link Selection

The ES–LMS/ES–RLS algorithms previously outlined need to examine all possible sets to

find a solution at each time instant, which might result in high computational complexity

for large networks operating in time–varying scenarios. To solve the combinatorial prob-

lem with reduced complexity, we propose sparsity-inspired based SI–LMS and SI–RLS

algorithms, which are as simple as standard diffusion LMS or RLS algorithms and are

suitable for adaptive implementations and scenarios where the parameters to be estimated

are slowly time–varying. The zero–attracting strategy (ZA), reweighted zero–attracting

strategy (RZA) and zero–forcing (ZF) are reported in [3] and [69] as for sparsity aware

techniques. These approaches are usually employed in applications dealing with sparse

systems in scenarios where they shrink the small values in the parameter vector to zero,

which results in better convergence rate and steady–state performance. Unlike existing

methods that shrink the signal samples to zero, the proposed SI–LMS and SI–RLS al-

gorithms shrink to zero the links that have poor performance or high MSE values. To

detail the novelty of the proposed sparsity–inspired LMS/RLS link selection algorithms,

we illustrate the processing in Fig.4.2.



ESTIMATION 69

Table 4.2: The ES-RLS Algorithm


Φ−1(0) = δ−1I, δ = small positive constant



Φ−1(i) = λ−1Φ−1(i− 1)

−λ−2Φ−1(i− 1)xk(i)xHk (i)Φ

−1(i− 1)

1 + λ−1xHk (i)Φ

−1(i− 1)xk(i)

P (i) = Φ−1(i)

k(i) =λ−1P (i)xk(i)

1 + λ−1xHk (i)P (i)xk(i)

ψk(i) = ωk(i− 1) + k(i)[dk(i)− ωHk (i− 1)xk(i)]

∗

P (i+ 1) = λ−1P (i)− λ−1k(i)xHk (i)P (i)

end


find all possible sets of Ωk

eΩk(i) = dk(i)− [

∑l∈Ωk

ckl(i)ψl(i)]Hxk(i)

Ωk(i) = argminΩk

|eΩk(i)|

ωk(i) =∑

l∈Ωk(i)

ckl(i)ψl(i)

end

end



ESTIMATION 70

MSE Value

Sparsity Aware

Technique

MSE Value

Nodesa ) Sparsity Aware Technique

MSE Value

Nodes

SILMS/SIRLS

Algorithms

MSE Value

Nodesb ) SILLS and SIRLS Algorithms

Nodes

Figure 4.2: Sparsity aware signal processing strategies

Fig. 4.2 (a) shows a standard type of sparsity–aware processing. We can see that, after

being processed by a sparsity–aware algorithm, the nodes with small MSE values will

be shrunk to zero. In contrast, the proposed SI–LMS and SI–RLS algorithms will keep

the nodes with lower MSE values and shrink the nodes with large MSE values to zero as

illustrated in Fig. 4.2 (b). In the following, we will show how the proposed SI–LMS/SI–

RLS algorithms are employed to realize the link selection strategy automatically.

In the adaptation step, we follow the same procedure in (4.6)–(4.10) as that of the

ES–LMS and ES–RLS algorithms for the SI–LMS and SI–RLS algorithms, respective-

ly. Then we reformulate the combination step. First, we introduce the log–sum penalty

into the combination step in (4.4). Different penalty terms have been considered for this

task. We have adopted a heuristic approach [3, 70] known as reweighted zero–attracting

strategy into the combination step in (4.4) because this strategy has shown an excellent

performance and is simple to implement. The regularization function with the log–sum

penalty is defined as:

f1(ek(i)) =∑l∈Nk

log(1 + ε|ekl(i)|

), (4.18)

where the error pattern ekl(i)(l ∈ Nk), which stands for the neighbor node l of node k

including node k itself, is defined as

ekl(i) , dk(i)−ψHl (i)xk(i) (4.19)

and ε is the shrinkage magnitude. Then, we introduce the vector and matrix quantities

required to describe the combination step. We first define a vector ck that contains the

combining coefficients for each neighbor of node k including node k itself as described



ESTIMATION 71

by

ck , [ckl], l ∈ Nk. (4.20)

Then, we define a matrix Ψk that includes all the estimated vectors, which are generated

after the adaptation step of SI–LMS and of SI–RLS for each neighbor of node k including

node k itself as given by

Ψk , [ψl(i)], l ∈ Nk. (4.21)

Note that the adaptation steps of SI–LMS and SI–RLS are identical to (4.6) and (4.10),

respectively. An error vector ek that contains all error values calculated through (4.19)

for each neighbor of node k including node k itself is expressed by

ek , [ekl(i)], l ∈ Nk. (4.22)

Here, we use a hat to distinguish ek defined above from the original error ek. To devise

the sparsity–inspired approach, we have modified the vector ek in the following way:

1. The element with largest absolute value |ekl(i)| in ek will be employed as |ekl(i)|.

2. The element with smallest absolute value will be set to −|ekl(i)|. This process will

ensure the node with smallest error pattern has a reward on its combining coeffi-

cient.

3. The remaining entries will be set to zero.

At this point, the combination step can be defined as [70]

ωk(i) =

|Nk|∑j=1

[ck[j]− ρ

∂f1(ek[j])

∂ek[j]

]Ψk[j], (4.23)

where ck[j], ek[j] and Ψk[j] stand for the jth element in the ck, ek and Ψk. The parameter

ρ is used to control the algorithm’s shrinkage intensity. We then calculate the partial

derivative of ek[j]:

∂f1(ek[j])

∂ek[j]=

∂(log(1 + ε|ekl(i)|)

)∂(ekl(i)

)= ε

sign(ekl(i))

1 + ε|ekl(i)|l ∈ Nk

= εsign(ek[j])

1 + ε|ek[j]|. (4.24)



ESTIMATION 72

To ensure that|Nk|∑j=1

(ck[j] − ρ∂f1(ek[j])

∂ek[j]

)= 1, we replace ek[j] with ξmin in the denom-

inator, where the parameter ξmin stands for the minimum absolute value of ekl(i) in ek.

Then, (4.24) can be rewritten as

∂f1(ek[j])

∂ek[j]= ε

sign(ek[j])

1 + ε|ξmin|. (4.25)

At this stage, the MSE cost function governs the adaptation step, while the combination

step employs the combining coefficients with the derivative of the log-sum penalty which

performs shrinkage and selects the set of estimates from the neighbor nodes with the best

performance. The function sign(a) is defined as

sign(a) =

a/|a| a = 0

0 a = 0.(4.26)

Then, by inserting (4.25) into (4.23), the proposed combination step is given by

ωk(i) =

|Nk|∑j=1

[ck[j]− ρε

sign(ek[j])

1 + ε|ξmin|

]Ψk[j]. (4.27)

Note that the condition ck[j] − ρε sign(ek[j])1+ε|ξmin| ≥ 0 is enforced in (4.27). When ck[j] −

ρε sign(ek[j])1+ε|ξmin| = 0, it means that the corresponding node has been discarded from the com-

bination step. In the following time instant, if this node still has the largest MSE, there

will be no changes in the combining coefficients for this set of nodes.

To guarantee the stability, the parameter ρ is assumed to be sufficiently small and the

penalty takes effect only on the element in ek for which the magnitude is comparable to

1/ε [3]. Moreover, there is little shrinkage exerted on the element in ek whose |ek[j]| ≪1/ε. The SI–LMS and SI–RLS algorithms perform link selection by the adjustment of the

combining coefficients through (4.27). At this point, it should be emphasized that:

• The process in (4.27) satisfies condition (4.5), as the penalty and reward amounts

are the same for the nodes with maximum and minimum error pattern, respectively.

• When computing (4.27), there are no matrix–vector multiplications. Therefore, no

additional complexity is introduced. As described in (4.23), only the jth element in

ck, ek and Ψk are used for calculation.



ESTIMATION 73

For the neighbor node with the largest MSE value, after the modifications of ek, its ekl(i)

value in ek will be a positive number which will lead to the term ρε sign(ek[j])1+ε|ξmin| in (4.27)

being positive too. This means that the combining coefficient for this node will be shrunk

and the weight for this node to build ωk(i) will be shrunk too. In other words, when a

node encounters high noise or interference levels, the corresponding MSE value might be

large. As a result, we need to reduce the contribution of this group of nodes.

In contrast, for the neighbor node with the smallest MSE, as its ekl(i) value in ek will

be a negative number, the term ρε sign(ek[j])1+ε|ξmin| in (4.27) will be negative too. As a result, the

weight for this node associated with the smallest MSE to build ωk(i) will be increased.

For the remaining neighbor nodes, the entry ekl(i) in ek is zero, which means the term

ρε sign(ek[j])1+ε|ξmin| in (4.27) is zero and there is no change for the weights to build ωk(i). The

main steps for the proposed SI–LMS and SI–RLS algorithms are listed as follows:

Step 1 Each node carries out the adaptation through its local information based on the

LMS or RLS algorithm.

Step 2 Each node calculates the error pattern through (4.19).

Step 3 Each node modifies the error vector ek.

Step 4 Each node combines the information obtained from its selected neighbors through

(4.27).

The SI–LMS and SI–RLS algorithms are detailed in Table 4.3. For the ES–LMS/ES–

RLS and SI–LMS/SI–RLS algorithms, we design different combination steps and employ

the same adaptation procedure, which means the proposed algorithms have the ability to

equip any diffusion–type wireless networks operating with other than the LMS and RLS

algorithms. This includes, for example, the diffusion conjugate gradient strategy [71].

4.4 Analysis of the proposed algorithms

In this section, a statistical analysis of the proposed algorithms is developed, including

a stability analysis and an MSE analysis of the steady–state and tracking performance.



ESTIMATION 74

Table 4.3: The SI-LMS and SI-RLS Algorithms

Initialize: ωk(−1)=0

P (0) = δ−1I, δ = small positive constant



The adaptation step for computing ψk(i)

is exactly the same as the ES-LMS and ES-RLS

for the SI-LMS and SI-RLS algorithms respectively

end


ekl(i) = dk(i)−ψHl (i)xk(i) l ∈ Nk

ck = [ckl] l ∈ Nk

Ψk = [ψl(i)] l ∈ Nk

ek = [ekl(i)] l ∈ Nk

Find the maximum and minimum absolute terms in ek

Modified ek as ek=[0· · ·0,|ekl(i)|︸︷︷︸max

,0· · ·0,−|ekl(i)|︸︷︷︸min

,0· · ·0]

ξmin = min(|ekl(i)|

)ωk(i) =

|Nk|∑j=1

[ck[j]− ρε sign(ek[j])

1+ε|ξmin|

]Ψk[j]

end

end



ESTIMATION 75

In addition, the computational complexity of the proposed algorithms is also detailed.

Before we start the analysis, we make some assumptions that are common in the literature

[9].

Assumption I: The weight-error vector εk(i) and the input signal vector xk(i) are sta-

tistically independent, and the weight–error vector for node k is defined as

εk(i) , ωk(i)− ω0, (4.28)

where ω0 denotes the optimum Wiener solution of the actual parameter vector to be es-

timated, and ωk(i) is the estimate produced by the proposed algorithms at time instant

i.

Assumption II: The input signal vector xl(i) is drawn from a stochastic process, which

is ergodic in the autocorrelation function [9].

Assumption III: The M×1 vector q(i) represents a stationary sequence of independent

zero–mean vectors and positive definite autocorrelation matrixQ = E[q(i)qH(i)], which

is independent of xk(i), nk(i) and εl(i).

Assumption IV (Independence): All regressor input signals xk(i) are spatially and

temporally independent.

4.4.1 Stability Analysis

In general, to ensure that a partially-connected network can converge to the global net-

work performance, information should be propagated across the network [21]. The work

in [64] shows that it is central to the performance that each node should be able to reach

the other nodes through one or multiple hops [21]. In this section, we discuss the stability

analysis of the proposed ES–LMS and SI–LMS algorithms.



ESTIMATION 76

First, we substitute (4.6) into (4.17) and obtain

ωk(i+ 1) =∑

l∈Ωk(i)

ckl(i)ψl(i+ 1)

=∑

l∈Ωk(i)

[ωl(i) + µlxl(i+ 1)e∗l (i+ 1)]ckl(i)

=∑

l∈Ωk(i)

[ω0 + εl(i) + µlxl(i+ 1)e∗l (i+ 1)]ckl(i)

=∑

l∈Ωk(i)

ω0ckl +∑

l∈Ωk(i)

[εl(i) + µlxl(i+ 1)e∗l (i+ 1)]ckl(i)

subject to∑l

ckl(i) = 1

= ω0 +∑

l∈Ωk(i)

[εl(i) + µlxl(i+ 1)e∗l (i+ 1)]ckl(i). (4.29)

Then, we have

εk(i+ 1) = ω0 − ω0 +∑

l∈Ωk(i)

[εl(i) + µlxl(i+ 1)e∗l (i+ 1)]ckl(i)

=∑

l∈Ωk(i)

[εl(i) + µlxl(i+ 1)e∗l (i+ 1)]ckl(i). (4.30)

By employing Assumption IV, we start with (4.30) for the ES–LMS algorithm and define

the global vectors and matrices:

ε(i+ 1) , [ε1(i+ 1), · · · , εN(i+ 1)]T (4.31)

M , diagµ1IM , ..., µNIM (4.32)

D(i+ 1) , diagx1(i+ 1)xH1 (i+ 1), ...,xN(i+ 1)xH

N(i+ 1) (4.33)

and the NM × 1 vector

g(i+ 1) = [x1(i+ 1)n1(i+ 1), · · · ,xN(i+ 1)nN(i+ 1)]T . (4.34)

We also define an N × N matrix C where the combining coefficients ckl correspond

to the l, k entries of the matrix C and the NM × NM matrix CG with a Kronecker

structure:

CG = C ⊗ IM (4.35)

where ⊗ denotes the Kronecker product.



ESTIMATION 77

By inserting el(i+1) = e0−l − εHl (i)xl(i+1) into (4.30), the global version of (4.30)

can then be written as

ε(i+ 1) = CTG

[I −MD(i+ 1)

]ε(i) +CT

GMg(i+ 1), (4.36)

where e0−l is the estimation error produced by the Wiener filter for node l as described by

e0−l = dl(i)− ωH0 xl(i). (4.37)

If we define

D , E[D(i+ 1)]

= diagR1, ...,RN(4.38)

and take the expectation of (4.36), we arrive at

Eε(i+ 1) = CTG

[I −MD

]Eε(i). (4.39)

Before we proceed, let us defineX = I −MD and introduce Lemma 1:

Lemma 1: Let CG and X denote arbitrary NM ×NM matrices, where CG has real,

non-negative entries, with columns adding up to one. Then, the matrix Y = CTGX is

stable for any choice of CG if and only ifX is stable.

Proof : Assume that X is stable, it is true that for every square matrix X and every

α > 0, there exists a submultiplicative matrix norm ||·||τ that satisfies ||X||τ ≤ τ(X)+α,

where the submultiplicative matrix norm ||AB|| ≤ ||A|| · ||B|| holds and τ(X) is the

spectral radius of X [10, 72]. Since X is stable, τ(X) < 1, and we can choose α > 0

such that τ(X) + α = v < 1 and ||X||τ ≤ v < 1. Then we obtain [17]

||Y i||τ = ||(CTGX)i||τ

≤ ||(CTG)

i||τ · ||X i||τ

≤ vi||(CTG)

i||τ .

(4.40)

Since CTG has non–negative entries with columns that add up to one, it is element–wise

bounded by unity. This means its Frobenius norm is bounded as well and by the equiva-

lence of norms, so is any norm, in particular ||(CTG)

i||τ . As a result, we have

limi→∞

||Y i||τ = 0, (4.41)



ESTIMATION 78

where Y i converges to the zero matrix for large i. This establishes the stability of

(CTGX)i.

A square matrixX is stable if it satisfiesX i → 0 as i → ∞. In view of Lemma 1 and

(82), we need the matrix I −MD to be stable. As a result, it requires I − µkRk to be

stable for all k, which holds if the following condition is satisfied:

0 < µk <2

λmax

(Rk

) (4.42)

where λmax

(Rk

)is the largest eigenvalue of the correlation matrix Rk. The difference

between the ES–LMS and SI–LMS algorithms is the strategy to calculate the matrix C.

Lemma 1 indicates that for any choice of C, only X needs to be stable. As a result, SI–

LMS has the same convergence condition as in (4.42). Given the convergence conditions,

the proposed ES–LMS/ES–RLS and SI–LMS/SI–RLS algorithms will adapt according

to the network connectivity by choosing the group of nodes with the best available per-

formance to construct their estimates. Comparing the results in (4.42) with the existing

algorithms, it can be seen that the proposed link selection techniques change the set of

combining coefficients, which are indicated in CG, as the matrix C employs the chosen

set Ωk(i).

4.4.2 MSE Steady–State Analysis

In this part of the analysis, we devise formulas to predict the excess MSE (EMSE) of the

proposed algorithms. The error signal at node k can be expressed as

ek(i) = dk(i)− ωHk (i)xk(i)

= dk(i)− [ω0 − εk(i)]Hxk(i)

= dk(i)− ωH0 xk(i) + ε

Hk (i)xk(i)

= e0−k + εHk (i)xk(i).

(4.43)



ESTIMATION 79

With Assumption I, the MSE expression can be derived as

Jmse−k(i) = E[|ek(i)|2]

= E[(e0−k + ε

Hk (i)xk(i)

)(e∗0 + x

Hk (i)εk(i)

)]= Jmin−k + E[εHk (i)xk(i)x

Hk (i)εk(i)]

= Jmin−k + trE[εk(i)εHk (i)xk(i)xHk (i)]

= Jmin−k + trE[xk(i)xHk (i)]E[εk(i)εHk (i)]

= Jmin−k + trRk(i)Kk(i), (4.44)

where tr(·) denotes the trace of a matrix and Jmin−k is the minimum mean–square error

(MMSE) for node k [9]:

Jmin−k = σ2d,k − pHk (i)R−1

k (i)pk(i), (4.45)

Rk(i) = E[xk(i)xHk (i)] is the correlation matrix of the inputs for node k, pk(i) =

E[xk(i)d∗k(i)] is the cross–correlation vector between the inputs and the measurement

dk(i), and Kk(i) = E[εk(i)εHk (i)] is the weight–error correlation matrix. From [9], the

EMSE is defined as the difference between the mean–square error at time instant i and

the minimum mean–square error. Then, we can write

Jex−k(i) = Jmse−k(i)− Jmin−k

= trRk(i)Kk(i).(4.46)

For the proposed adaptive link selection algorithms, we will derive the EMSE formulas

separately based on (4.46) and we assume that the input signal is modeled as a stationary

process.

4.4.2.1 ES–LMS

To update the estimate ωk(i), we employ

ωk(i+ 1) =∑

l∈Ωk(i)

ckl(i)ψl(i+ 1)

=∑

l∈Ωk(i)

ckl(i)[ωl(i) + µlxl(i+ 1)e∗l (i+ 1)]

=∑

l∈Ωk(i)

ckl(i)[ωl(i) + µlxl(i+ 1)(dl(i+ 1)− xHl (i+ 1)ωl(i))]. (4.47)



ESTIMATION 80

Then, subtracting ω0 from both sides of (4.47), we arrive at

εk(i+ 1) =∑

l∈Ωk(i)

ckl(i)[ωl(i) + µlxl(i+ 1)(dl(i+ 1)− xHl (i+ 1)ωl(i))]−

∑l∈Ωk(i)

ckl(i)ω0

=∑

l∈Ωk(i)

ckl(i)

[εl(i) + µlxl(i+ 1)

(dl(i+ 1)− xH

l (i+ 1)(εl(i) + ω0))]

=∑

l∈Ωk(i)

ckl(i)


(dl(i+ 1)− xH

l (i+ 1)εl(i)− xHl (i+ 1)ω0

)]

=∑

l∈Ωk(i)

ckl(i)

[εl(i)− µlxl(i+ 1)xH

l (i+ 1)εl(i) + µlxl(i+ 1)e∗0−l(i+ 1)

]

=∑

l∈Ωk(i)

ckl(i)

[(I − µlxl(i+ 1)xH

l (i+ 1))εl(i) + µlxl(i+ 1)e∗0−l(i+ 1)

].

(4.48)

Let us introduce the random variables αkl(i):

αkl(i) =

1, if l ∈ Ωk(i)

0, otherwise.(4.49)

At each time instant, each node will generate data associated with network covariance

matrices Ak with size N × N which reflect the network topology, according to the ex-

haustive search strategy. In the network covariance matricesAk, a value equal to 1 means

nodes k and l are linked and a value 0 means nodes k and l are not linked.

For example, suppose a network has 5 nodes. For node 3, there are two neighbor

nodes, namely, nodes 2 and 5. Through Eq. (4.12), the possible configurations of set

Ω3 are 3, 2, 3, 5 and 3, 2, 5. Evaluating all the possible sets for Ω3, the relevant

covariance matricesA3 with size 5× 5 at time instant i are described in Fig. 4.3.

Then, the coefficients αkl are obtained according to the covariance matrices Ak. In

this example, the three sets of αkl are respectively shown in Table 4.4.

The parameters ckl will then be calculated through Eq. (2.16) for different choices of

matrices Ak and coefficients αkl. After αkl and ckl are calculated, the error pattern for

each possible Ωk will be calculated through (4.15) and the set with the smallest error will

be selected according to (4.16).



ESTIMATION 81

1 2 3 4 5

1

2

3

4

5

1 1 100

0

1

1

0

1 2 3 4 5

1

2

3

4

5

1 1 000

0

0

1

0

1 2 3 4 5

1

2

3

4

5

0 1 100

0

1

0

0

(a)3,2 (a)3,5

(c)3,2,5

Figure 4.3: Covariance matricesA3 for different sets of Ω3

With the newly defined αkl, (4.48) can be rewritten as

εk(i+ 1) =∑l∈Nk

αkl(i)ckl(i)


l (i+ 1))εl(i) + µlxl(i+ 1)e∗0−l(i+ 1)

].

(4.50)

Starting from (4.46), we then focus onKk(i+ 1).

Kk(i+ 1) = E[εk(i+ 1)εHk (i+ 1)]. (4.51)

In (4.50), the term αkl(i) is determined through the network topology for each subset,

while the term ckl(i) is calculated through the Metropolis rule. We assume that αkl(i) and

ckl(i) are statistically independent from the other terms in (4.50). Upon convergence, the

parameters αkl(i) and ckl(i) do not vary because at steady state the choice of the subset

Ωk(i) for each node k will be fixed. Then, under these assumptions, substituting (4.50)



ESTIMATION 82

Table 4.4: Coefficients αkl for different sets of Ω3

2, 3

α31 = 0

α32 = 1

α33 = 1

α34 = 0

α35 = 0

3, 5

α31 = 0

α32 = 0

α33 = 1

α34 = 0

α35 = 1

2, 3, 5

α31 = 0

α32 = 1

α33 = 1

α34 = 0

α35 = 1

into (4.51) we arrive at:

Kk(i+ 1) =∑l∈Nk

E[α2kl(i)c

2kl(i)

]((I − µlRl(i+ 1)

)K l(i)×

(I − µlRl(i+ 1)

)+ µ2

l e0−l(i+ 1)e∗0−l(i+ 1)×Rl(i+ 1)

)+

∑l,q∈Nkl =q

E[αkl(i)αkq(i)ckl(i)ckq(i)

]

×((I − µlRl(i+ 1)

)K l,q(i)

(I − µqRl(i+ 1)

)+ µlµqe0−l(i+ 1)e∗0−q(i+ 1)Rl,q(i+ 1)

)+

∑l,q∈Nkl =q


]

×((I − µqRq(i+ 1)

)KH

l,q(i)(I − µlRl(i+ 1)

)+ µlµqe0−q(i+ 1)e∗0−l(i+ 1)RH

l,q(i+ 1)

)(4.52)

whereRl,q(i+1) = E[xl(i+1)xHq (i+1)] andK l,q(i) = E[εl(i)εHq (i)]. To further simpli-

fy the analysis, we assume that the samples of the input signal xk(i) are uncorrelated, i.e.,

Rk = σ2x,kI with σ2

x,k being the variance. Using the diagonal matrices Rk = Λk = σ2x,kI

andRl,q = Λl,q = σx,lσx,qI we can write

Kk(i+ 1) =∑l∈Nk

E[α2kl(i)c

2kl(i)

]((I − µlΛl

)K l(i)

(I − µlΛl

)+ µ2

l e0−l(i+ 1)e∗0−l(i+ 1)Λl

)+

∑l,q∈Nkl =q


]×((I − µlΛl

)K l,q(i)

(I − µqΛq

)

+ µlµqe0−l(i+ 1)e∗0−q(i+ 1)Λl,q

)+

∑l,q∈Nkl =q


]

×((I − µqΛq

)KH

l,q(i)(I − µlΛl

)+ µlµqe0−q(i+ 1)e∗0−l(i+ 1)ΛH

l,q

).

(4.53)



ESTIMATION 83

Due to the structure of the above equations, the approximations and the quantities in-

volved, we can decouple (4.53) into

Knk (i+ 1) =

∑l∈Nk

E[α2kl(i)c

2kl(i)

]((1− µlλ

nl

)Kn

l (i)(1− µlλ

nl

)+ µ2

l e0−l(i+ 1)e∗0−l(i+ 1)λnl

)+

∑l,q∈Nkl =q


]((1− µlλ

nl

)Kn

l,q(i)(1− µqλ

nq

)

+ µlµqe0−l(i+ 1)e∗0−q(i+ 1)λnl,q

)+

∑l,q∈Nkl =q


]

×((

1− µqλnq

)(Kn

l,q(i))H(1− µlλ

nl

)+ µlµqe0−q(i+ 1)e∗0−l(i+ 1)λn

l,q

),

(4.54)

where Knk (i+1) is the nth element of the main diagonal ofKk(i+1). With the assumption

that αkl(i) and ckl(i) are statistically independent from the other terms in (4.50), we can

rewrite (4.54) as

Knk (i+ 1) =

∑l∈Nk

E[α2kl(i)

]E[c2kl(i)

]((1− µlλ

nl

)2Kn

l (i) + µ2l e0−l(i+ 1)e∗0−l(i+ 1)λn

l

)+ 2×

∑l,q∈Nkl =q

E[αkl(i)αkq(i)

]E[ckl(i)ckq(i)

]((1− µlλ

nl

)(1− µqλ

nq

)Kn

l,q(i)

+ µlµqe0−l(i+ 1)e∗0−q(i+ 1)λnl,q

). (4.55)

By taking i → ∞, we can obtain (4.56). We assume that the choice of covariance matrix

Knk (ES-LMS) =

∑l∈Nk

α2klc

2klµ

2lJmin−lλ

nl + 2

∑l,q∈Nkl =q

αklαkqcklckqµlµqe0−le∗0−qλ

nl,q

1−∑l∈Nk

α2klc

2kl(1− µlλn

l )2 − 2

∑l,q∈Nkl =q

αklαkqcklckq(1− µlλnl )(1− µqλn

q ).

(4.56)

Ak for node k is fixed upon the proposed algorithms convergence, as a result, the covari-

ance matrix Ak is deterministic and does not vary. In the above example, we assume the

choice ofA3 is fixed as show in Fig. 4.4.



ESTIMATION 84

1 2 3 4 5

1

2

3

4

5

1 1 100

0

1

1

0

Figure 4.4: Covariance matrixA3 upon convergence

Then the coefficients αkl will also be fixed and given by

α31 = 0

α32 = 1

α33 = 1

α34 = 0

α35 = 1

as well as the parameters ckl that are computed using the Metropolis combining rule. As

a result, the coefficients αkl and the coefficients ckl are deterministic and can be taken out

from the expectation.

The MSE is then given by

Jmse−k = Jmin−k +Mσ2x,k

M∑n=1

Knk (ES-LMS). (4.57)

4.4.2.2 SI–LMS

For the SI–LMS algorithm, we do not need to consider all possible combinations. This

algorithm simply adjusts the combining coefficients for each node with its neighbors in

order to select the neighbor nodes that yield the smallest MSE values. Thus, we redefine

the combining coefficients through (4.27)

ckl−new = ckl − ρεsign(|ekl|)1 + ε|ξmin|

(l ∈ Nk). (4.58)

For each node k, at time instant i, after it receives the estimates from all its neighbors, it

calculates the error pattern ekl(i) for every estimate received through Eq. (4.19) and finds



ESTIMATION 85

the nodes with the largest and smallest errors. An error vector ek is then defined through

(4.22), which contains all error patterns ekl(i) for node k.

Then a procedure which is detailed after Eq. (4.22) is carried out and modifies the

error vector ek. For example, suppose node 5 has three neighbor nodes, which are nodes

3, 6 and 8. The error vector e5 has the form described by e5 = [e53, e55, e56, e58] =

[0.023, 0.052,−0.0004,−0.012]. After the modification, the error vector e5 will be edited

as e5 = [0, 0.052,−0.0004, 0]. The quantity hkl is then defined as

hkl = ρεsign(|ekl|)1 + ε|ξmin|

(l ∈ Nk), (4.59)

and the term ’error pattern’ ekl in (4.59) is from the modified error vector ek.

From [70], we employ the relation E[sign(ekl)] ≈ sign(e0−k). According to Eqs. (4.1)

and (4.37), when the proposed algorithm converges at node k or the time instant i goes

to infinity, we assume that the error e0−k will be equal to the noise variance at node k.

Then, the asymptotic value hkl can be divided into three situations due to the rule of the

SI–LMS algorithm:

hkl =

ρε sign(|e0−k|)

1+ε|e0−k|for the node with the largest MSE

ρε sign(−|e0−k|)1+ε|e0−k|

for the node with the smallest MSE

0 for all the remaining nodes.

(4.60)

Under this situation, after the time instant i goes to infinity, the parameters hkl for each

neighbor node of node k can be obtained through (4.60) and the quantity hkl will be

deterministic and can be taken out from the expectation.

Finally, removing the random variables αkl(i) and inserting (4.58), (4.59) into (4.56),

the asymptotic values Knk for the SI-LMS algorithm are obtained as in (4.61). At this

Knk (SI-LMS) =

∑l∈Nk

(ckl − hkl)2µ2

lJmin−lλnl + 2

∑l,q∈Nkl =q

(ckl − hkl)(ckq − hkq)µlµqe0−le∗0−qλ

nl,q

1−∑

l∈Nk

(ckl − hkl)2(1− µlλnl )

2 − 2∑

l,q∈Nkl =q

(ckl − hkl)(ckq − hkq)(1− µlλnl )(1− µqλn

q ).

(4.61)

point, the theoretical results are deterministic.



ESTIMATION 86

Then, the MSE for SI–LMS algorithm is given by


M∑n=1

Knk (SI-LMS). (4.62)

4.4.2.3 ES–RLS

For the proposed ES–RLS algorithm, we start from (4.10), after inserting (4.10) into

(4.17), we have

ωk(i+ 1) =∑

l∈Ωk(i)

ckl(i)ψl(i+ 1)

=∑

l∈Ωk(i)

ckl(i)[ωl(i) + kl(i+ 1)e∗l (i+ 1)]

=∑

l∈Ωk(i)

ckl(i)[ωl(i) + kl(i+ 1)(dl(i+ 1)− xHl (i+ 1)ωl(i))]. (4.63)

Then, subtracting the ω0 from both sides of (4.47), we arrive at

εk(i+ 1) =∑

l∈Ωk(i)

ckl(i)[ωl(i) + kl(i+ 1)(dl(i+ 1)− xHl (i+ 1)ωl(i))]−

∑l∈Ωk(i)

ckl(i)ω0

=∑

l∈Ωk(i)

ckl(i)

[εl(i) + kl(i+ 1)

(dl(i+ 1)− xH

l (i+ 1)(εl(i) + ω0))]

=∑

l∈Ωk(i)

ckl(i)

[(I − kl(i+ 1)xH

l (i+ 1))εl(i) + kl(i+ 1)e∗0−l(i+ 1)

].

(4.64)

Then, with the random variables αkl(i), (4.64) can be rewritten as


αkl(i)ckl(i)

[(I − kl(i+ 1)xH

l (i+ 1))εl(i) + kl(i+ 1)e∗0−l(i+ 1)

].

(4.65)

Since kl(i+ 1) = Φ−1l (i+ 1)xl(i+ 1) [9], we can modify the (4.65) as


αkl(i)ckl(i)

[(I −Φ−1

l (i+ 1)xl(i+ 1)xHl (i+ 1)

)εl(i)

+Φ−1l (i+ 1)xl(i+ 1)e∗0−l(i+ 1)

]. (4.66)



ESTIMATION 87

At this point, if we compare (4.66) with (4.50), we can find that the difference between

(4.66) and (4.50) is, the Φ−1l (i + 1) in (4.66) has replaced the µl in (4.50). From [9], we

also have

E[Φ−1l (i+ 1)] =

1

i−MR−1

l (i+ 1) for i > M + 1. (4.67)

As a result, we can arrive

Kk(i+ 1) =∑l∈Nk

E[α2kl(i)c

2kl(i)

]((I − Λ−1

l Λl

i−M

)K l(i)

(I − ΛlΛ

−1l

i−M

)(4.68)

+Λ−1

l ΛlΛ−1l

(i−M)2e0−l(i+ 1)e∗0−l(i+ 1)

)+

∑l,q∈Nkl =q


]

×((I − Λ−1

l Λl

i−M

)K l,q(i)

(I −

ΛqΛ−1q

i−M

)+

Λ−1l Λl,qΛ

−1q

(i−M)2e0−l(i+ 1)

× e∗0−q(i+ 1)

)+

∑l,q∈Nkl =q


]

×((I −

ΛqΛ−1q

i−M

)KH

l,q(i)(I − Λ−1

l Λl

i−M

)+

Λ−1q ΛH

l,qΛ−1l

(i−M)2e0−q(i+ 1)e∗0−l(i+ 1)

).

(4.69)

Due to the structure of the above equations, the approximations and the quantities in-

volved, we can decouple (4.69) into

Knk (i+ 1) =

∑l∈Nk

E[α2kl(i)c

2kl(i)

]((1− 1

i−M

)2Kn

l (i) +e0−l(i+ 1)e∗0−l(i+ 1)

λnl (i−M)2

)+

∑l,q∈Nkl =q


]((1− 1

i−M

)2Kn

l,q(i)

+λnl,qe0−l(i+ 1)e∗0−q(i+ 1)

(i−M)2λnl λ

nq

)+

∑l,q∈Nkl =q


]

×((

1− 1

i−M

)2(Kn

l,q(i))H +

λnl,qe0−q(i+ 1)e∗0−l(i+ 1)

(i−M)2λnqλ

nl

)(4.70)

where Knk (i + 1) is the nth elements of the main diagonals of Kk(i + 1). With the

assumption that, upon convergence, αkl and ckl do not vary, because at steady state, the

choice of subset Ωk(i) for each node k will be fixed, we can rewrite (4.70) as (4.71). Then,

the MSE is given by

Jmse−k(i+ 1) = Jmin−k +Mσ2x,k

M∑n=1

Knk (i+ 1)(ES-RLS). (4.72)

On the basis of (4.71), we have that when i tends to infinity, the MSE approaches the

MMSE in theory [9].



ESTIMATION 88

Knk (i+1)(ES-RLS) =

∑l∈Nk

α2klc

2kl

Jmin−l

λnl (i−M)2

+ 2∑

l,q∈Nkl =q

αklαkqcklckqλnl,qe0−le

∗0−q

(i−M)2λnl λ

nq

1−∑l∈Nk

α2klc

2kl

(1− 1

i−M

)2

− 2∑

l,q∈Nkl =q

αklαkqcklckq

(1− 1

i−M

)2 .

(4.71)

4.4.2.4 SI–RLS

For the proposed SI–RLS algorithm, we insert (4.58) into (4.71), remove the random

variables αkl(i), and following the same procedure as for the SI–LMS algorithm, we can

obtain (4.73), where hkl and hkq satisfy the rule in (4.60). Then, the MSE is given by

Knk (i+ 1)(SI-RLS) =

∑l∈Nk

(ckl − hkl)2 Jmin−l

λnl (i−M)2

+ 2∑

l,q∈Nkl =q

(ckl − hkl)(ckq − hkq)λnl,qe0−le

∗0−q

(i−M)2λnl λ

nq

1−∑

l∈Nk

(ckl − hkl)2(1− 1

i−M

)2 − 2∑

l,q∈Nkl =q

(ckl − hkl)(ckq − hkq)(1− 1

i−M

)2 .

(4.73)


M∑n=1

Knk (i+ 1)(SI-RLS). (4.74)

In conclusion, according to (4.61) and (4.73), with the help of modified combining coef-

ficients, for the proposed SI–type algorithms, the neighbor node with lowest MSE con-

tributes the most to the combination, while the neighbor node with the highest MSE con-

tributes the least. Therefore, the proposed SI–type algorithms perform better than the

standard diffusion algorithms with fixed combining coefficients.

4.4.3 Tracking Analysis

In this subsection, we assess the proposed ES–LMS/RLS and SI–LMS/RLS algorithms in

a non–stationary environment, in which the algorithms have to track the minimum point

of the error–performance surface [73, 74]. In the time–varying scenarios of interest, the

optimum estimate is assumed to vary according to the model ω0(i + 1) = βω0(i) +

q(i), where q(i) denotes a random perturbation [10] and β = 1 in order to facilitate



ESTIMATION 89

the analysis. This is typical in the context of tracking analysis of adaptive algorithms

[9, 10, 75, 76].

4.4.3.1 ES–LMS

For the tracking analysis of the ES–LMS algorithm, we employ Assumption III and start

from (4.47). After subtracting the ω0(i+ 1) from both sides of (4.47), we obtain

εk(i+ 1) =∑

l∈Ωk(i)

ckl(i)[ωl(i) + µlxl(i+ 1)(dl(i+ 1)− xHl (i+ 1)ωl(i))]−

∑l∈Ωk(i)

ckl(i)ω0(i+ 1)

=∑

l∈Ωk(i)

ckl(i)[ωl(i) + µlxl(i+ 1)(dl(i+ 1)

− xHl (i+ 1)ωl(i))]−

∑l∈Ωk(i)

ckl(i)

(ω0(i) + q(i)

)

=∑

l∈Ωk(i)

ckl(i)


(dl(i+ 1)− xH

l (i+ 1)(εl(i) + ω0))]

− q(i)

=∑

l∈Ωk(i)

ckl(i)


l (i+ 1))εl(i) + µlxl(i+ 1)e∗0−l(i+ 1)

]− q(i).

(4.75)

Using Assumption III, we can arrive at

Jex−k(i+ 1) = trRk(i+ 1)Kk(i+ 1)+ trRk(i+ 1)Q. (4.76)

The first part on the right side of (4.76), has already been obtained in the MSE steady–state

analysis part in Section IV B. The second part can be decomposed as

trRk(i+ 1)Q = trE[xk(i+ 1)xH

k (i+ 1)]E[q(i)qH(i)

]= Mσ2

x,ktrQ. (4.77)

The MSE is then obtained as


M∑n=1

Knk (ES-LMS) +Mσ2

x,ktrQ. (4.78)



ESTIMATION 90

4.4.3.2 SI–LMS

For the SI–LMS recursions, we follow the same procedure as for the ES-LMS algorithm,

and obtain


M∑n=1

Knk (SI-LMS) +Mσ2

x,ktrQ. (4.79)

4.4.3.3 ES–RLS

For the SI–RLS algorithm, we follow the same procedure as for the ES–LMS algorithm

and arrive at


M∑n=1

Knk (i+ 1)(ES-RLS) +Mσ2

x,ktrQ. (4.80)

4.4.3.4 SI–RLS

We start from (4.74), and after a similar procedure to that of the SI–LMS algorithm, we

have


M∑n=1

Knk (i+ 1)(SI-RLS) +Mσ2

x,ktrQ. (4.81)

In conclusion, for time-varying scenarios there is only one additional term Mσ2x,ktrQ in

the MSE expression for all algorithms, and this additional term has the same value for all

algorithms. As a result, the proposed SI–type algorithms still perform better than the stan-

dard diffusion algorithms with fixed combining coefficients, according to the conclusion

obtained in the previous subsection.


In the analysis of the computational cost of the algorithms studied, we assume complex-

valued data and first analyze the adaptation step. For both ES–LMS/RLS and SI–

LMS/RLS algorithms, the adaptation cost depends on the type of recursions (LMS or



ESTIMATION 91

Table 4.5: Computational complexity for the adaptation step per node per time instant

Adaptation Method Multiplications Additions Divisions

LMS 8M + 2 8M

RLS 4M2 + 16M + 1 4M2 + 12M − 1 1

Table 4.6: Computational complexity for combination step per node per time instant

Algorithms Multiplications Additions Divisions

ES–LMS/RLS M(t+ 1)T !

t!(T − t)!Mt

T !

t!(T − t)!

SI–LMS/RLS (2M + 3)|Nk| (M + 2)|Nk| |Nk|

RLS) that each strategy employs. The details are shown in Table 4.5. For the combination

step, we analyze the computational complexity in Table 4.6. The overall complexity for

each algorithm is summarized in Table 4.7. In the above three tables, T is the total number

of nodes linked to node k including node k itself and t is the number of nodes chosen from

T . M is the length of the unknown vectorω0. The proposed algorithms require extra com-

putations as compared to the existing distributed LMS and RLS algorithms. This extra

cost ranges from a small additional number of operations for the SI-LMS/RLS algorithms

to a more significant extra cost that depends on T for the ES-LMS/RLS algorithms.

Table 4.7: Computational complexity per node per time instant

Algorithm Multiplications Additions Divisions

ES–LMS[(t+ 1)T !

t!(T − t)!+ 8

]M + 2

[T !

(t− 1)!(T − t)!+ 8

]M

ES–RLS 4M2 +

[(t+ 1)T !

t!(T − t)!+ 16

]M + 1 4M2 +

[T !

(t− 1)!(T − t)!+ 12

]M − 1 1

SI–LMS (8 + 2Nk)M + 3|Nk|+ 2 (8 + |Nk|)M + 2|Nk| |Nk|

SI–RLS 4M2 + (16 + 2|Nk|)M + 3|Nk|+ 1 4M2 + (12 + |Nk|)M + 2|Nk| − 1 |Nk|+ 1

4.5 Simulations

In this section, we investigate the performance of the proposed link selection strategies

for distributed estimation in two scenarios: wireless sensor networks and smart grids. In

these applications, we simulate the proposed link selection strategies in both static and



ESTIMATION 92

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

12

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

Figure 4.5: Diffusion wireless sensor networks topology with 20 nodes

time–varying scenarios. We also show the analytical results for the MSE steady–state and

tracking performances that we obtained in Section IV.

4.5.1 Diffusion Wireless Sensor Networks

In this subsection, we compare the proposed ES–LMS/ES–RLS and SI–LMS/SI–RLS

algorithms with the diffusion LMS algorithm [2], the diffusion RLS algorithm [20] and

the single–link strategy [27] in terms of their MSE performance. The network topology is

illustrated in Fig. 4.5 and we employ N = 20 nodes in the simulations. The length of the

unknown parameter vectorω0 is M = 10 and it is generated randomly. The input signal is

generated as xk(i) = [xk(i) xk(i−1) ... xk(i−M+1)] and xk(i) = uk(i)+αkxk(i−1),

where αk is a correlation coefficient and uk(i) is a white noise process with variance

σ2u,k = 1 − |αk|2, to ensure the variance of xk(i) is σ2

x,k = 1. The noise samples are



ESTIMATION 93

modeled as circular Gaussian noise with zero mean and variance σ2n,k ∈ [0.001, 0.01].

The step size for the diffusion LMS ES–LMS and SI–LMS algorithms is µ = 0.2. For the

diffusion RLS algorithm, both ES–RLS and SI–RLS, the forgetting factor λ is set to 0.97

and δ is equal to 0.81. In the static scenario, the sparsity parameters of the SI–LMS/SI–

RLS algorithms are set to ρ = 4 × 10−3 and ε = 10. The Metropolis rule is used to

calculate the combining coefficients ckl. The MSE and MMSE are defined as in (4.3) and

(4.45), respectively. The results are averaged over 100 independent runs.

In Fig. 4.6, we can see that ES–RLS has the best performance for both steady-state

MSE and convergence rate, and obtains a gain of about 8 dB over the standard diffusion

RLS algorithm. SI–RLS is a bit worse than the ES–RLS, but is still significantly better

than the standard diffusion RLS algorithm by about 5 dB. Regarding the complexity and

processing time, SI–RLS is as simple as the standard diffusion RLS algorithm, while ES–

RLS is more complex. The proposed ES–LMS and SI–LMS algorithms are superior to the

standard diffusion LMS algorithm. In the time–varying scenario, the sparsity parameters

of the SI–LMS and SI–RLS algorithms are set to ρ = 6×10−3 and ε = 10. The unknown

parameter vector ω0 varies according to the first–order Markov vector process:

ω0(i+ 1) = βω0(i) + z(i), (4.82)

where z(i) is an independent zero–mean Gaussian vector process with variance σ2z = 0.01

and β = 0.98.

Fig. 4.7 shows that, similarly to the static scenario, ES–RLS has the best performance,

and obtains a 5 dB gain over the standard diffusion RLS algorithm. SI–RLS is slightly

worse than the ES–RLS, but is still better than the standard diffusion RLS algorithm

by about 3 dB. The proposed ES–LMS and SI–LMS algorithms have the same advantage

over the standard diffusion LMS algorithm in the time-varying scenario. Notice that in the

scenario with large |Nk|, the proposed SI-type algorithms still have a better performance

when compared with the standard techniques. To illustrate the link selection for the ES–

type algorithms, we provide Figs. 4.8 and 4.9. From these two figures, we can see that

upon convergence the proposed algorithms converge to a fixed selected set of links Ωk.



ESTIMATION 94

0 50 100 150 200 250 300 350 400 450−30

−25

−20

−15

−10

−5

−0

5

10

Time instant, i

MS

E (

dB)

Diffusion LMS[2]Single−Link Strategy[27]ES−LMSSI−LMSDiffusion RLS[20]ES−RLSSI−RLSMMSE Bound

Figure 4.6: Network MSE curves in a static scenario

0 50 100 150 200 250 300 350 400 450−30

−25

−20

−15

−10

−5

0

5

Time instant, i

MS

E (

dB)

Diffusion LMS[2]Single−Link Strategy[27]ES−LMSSI−LMSDiffusion RLS[20]ES−RLSSI−RLSMMSE Bound

Figure 4.7: Network MSE curves in a time-varying scenario



ESTIMATION 95

0 80 160 240 320 400 480 560 640 720−25

−20

−15

−10

−5

0

5

10

Time, i

MS

E (

dB)

Link selection state for Node 16 with ES−−LMS algorithmstatic scenarios

11,12,14,15,18,20

11,12,14,15,18,20

12,15,17,18

11, 15, 18, 20

Figure 4.8: Set of selected links for node 16 with ES–LMS in a static scenario

0 80 160 240 320 400 480 560 640 720−20

−15

−10

−5

0

5

10

Time, i

MS

E (

dB)

Link selection state for Node 16 with ES−LMS algorithmtime−−varying scenarios

12,14,17

11,12,17,18

11,12,14,17,18,20

11,12,14,17,18,20

Figure 4.9: Link selection state for node 16 with ES–LMS in a time-varying scenario



ESTIMATION 96

4.5.2 MSE Analytical Results

The aim of this section is to validate the analytical results obtained in Section IV. First, we

verify the MSE steady–state performance. Specifically, we compare the analytical results

in (4.57), (4.62), (4.72) and (4.74) to the results obtained by simulations under different

SNR values. The SNR indicates the signal variance to noise variance ratio. We assess

the MSE against the SNR, as show in Figs. 4.10 and 4.11. For ES–RLS and SI–RLS

algorithms, we use (4.72) and (4.74) to compute the MSE after convergence. The results

show that the analytical curves coincide with those obtained by simulations and are within

0.01 dB of each other, which indicates the validity of the analysis. We have assessed the

proposed algorithms with SNR equal to 0dB, 10dB, 20dB and 30dB, respectively, with

20 nodes in the network. For the other parameters, we follow the same definitions used to

obtain the network MSE curves in a static scenario. The details have been shown on the

top of each sub figure in Figs. 4.10 and 4.11. The theoretical curves for ES–LMS/RLS

and SI–LMS/RLS are all close to the simulation curves and remain within 0.01 dB of one

another.

0 10 20 30−30

−25

−20

−15

−10

−5

0

5

SNR (dB)

MS

E (

dB

)

a) ES−LMS, N=20 nodes, µ=0.045

SimulationTheory

0 10 20 30−30

−25

−20

−15

−10

−5

0

5

10

SNR (dB)

MS

E (

dB

)

b) SI−LMS, N=20 nodes, µ=0.045, ρ=4*10e−3, ε=10

SimulationTheory

Figure 4.10: MSE performance against SNR for ES–LMS and SI–LMS



ESTIMATION 97

0 10 20 30−30

−25

−20

−15

−10

−5

0

5

10

SNR (dB)

MS

E (

dB

)

a) ES−RLS, N=20 nodes, λ=0.97, δ=0.811

SimulationTheory

0 10 20 30−30

−25

−20

−15

−10

−5

0

5

SNR (dB)

MS

E (

dB

)

b) SI−RLS, N=20 nodes, λ=0.97, δ=0.811,ρ=4*10e−3, ε=10

SimulationTheory

Figure 4.11: MSE performance against SNR for ES–RLS and SI–RLS

The tracking analysis of the proposed algorithms in a time–varying scenario is dis-

cussed as follows. Here, we verify that the results (4.78), (4.79), (4.80) and (4.81) of the

subsection on the tracking analysis can provide a means of estimating the MSE. We con-

sider the same model as in (4.82), but with β is set to 1. In the next examples, we employ

N = 20 nodes in the network and the same parameters used to obtain the network MSE

curves in a time–varying scenario. A comparison of the curves obtained by simulations

and by the analytical formulas is shown in Figs. 4.12 and 4.13. From these curves, we can

verify that the gap between the simulation and analysis results are within 0.02dB under

different SNR values. The details of the parameters are shown on the top of each sub

figure in Figs. 4.12 and 4.13.

4.5.3 Smart Grids

The proposed algorithms provide a cost–effective tool that could be used for distributed

state estimation in smart grid applications. In order to test the proposed algorithms in a



ESTIMATION 98

0 10 20 30−25

−20

−15

−10

−5

0

5

SNR (dB)

MS

E (

dB

)

a) ES−LMS, N=20 nodes, µ=0.045

SimulationTheory

0 10 20 30−25

−20

−15

−10

−5

0

5

SNR (dB)

MS

E (

dB

)

b) SI−LMS, N=20 nodes, µ=0.045, ρ=6*10e−3, ε=10

SimulationTheory

Figure 4.12: MSE performance against SNR for ES–LMS and SI–LMS in a time-varying scenario

0 10 20 30−30

−25

−20

−15

−10

−5

0

SNR (dB)

MS

E (

dB

)

a) ES−RLS, N=20 nodes, λ=0.97, δ=0.811

SimulationTheory

0 10 20 30−25

−20

−15

−10

−5

0

5

SNR (dB)

MS

E (

dB

)

b) SI−RLS, N=20 nodes, λ=0.97, δ=0.811, ρ=6*10e−3, ε=10

SimulationTheory

Figure 4.13: MSE performance against SNR for ES–RLS and SI–RLS in a time-varying scenario



ESTIMATION 99

possible smart grid scenario, we consider the IEEE 14–bus system [26], where 14 is the

number of substations. At every time instant i, each bus k, k = 1, 2, . . . , 14, takes a scalar

measurement dk(i) according to

dk(i) = Xk

(ω0(i)

)+ nk(i), k = 1, 2, . . . , 14, (4.83)



mean equal to zero and which corresponds to bus k.

Initially, we focus on the linearized DC state estimation problem. The system is built

with 1.0 per unit (p.u) voltage magnitudes at all buses and j1.0 p.u. branch impedance.

Then, the state vector ω0(i) is taken as the voltage phase angle vector ω0 for all buses.

Therefore, the nonlinear measurement model for state estimation (4.83) is approximated

by

dk(i) = xHk (i)ω0 + nk(i), k = 1, 2, . . . , 14, (4.84)

where xk(i) is the measurement Jacobian vector for bus k. Then, the aim of the distribut-

ed estimation algorithm is to compute an estimate of ω0, which can minimize the cost

function given by



Jω(ω) =N∑k=1

E|dk(i)− xHk (i)ω|2. (4.86)

We compare the proposed algorithms with the M–CSE algorithm [4], the single link

strategy [27], the standard diffusion RLS algorithm [20] and the standard diffusion LMS

algorithm [2] in terms of MSE performance. The MSE comparison is used to determine

the accuracy of the algorithms, and compare the rate of convergence. We define the

IEEE–14 bus system as in Fig. 4.14.

All buses are corrupted by additive white Gaussian noise with variance σ2n,k ∈

[0.001, 0.01]. The step size for the standard diffusion LMS [2], the proposed ES–LMS

and SI–LMS algorithms is 0.15. The parameter vector ω0 is set to an all–one vector. For

the diffusion RLS, ES–RLS and SI–RLS algorithms the forgetting factor λ is set to 0.945



ESTIMATION 100

1

5

2

9

14

10

7

4

8

3

12

13

6

11

Figure 4.14: IEEE 14–bus system for simulation

and δ is equal to 0.001. The sparsity parameters of the SI–LMS/RLS algorithms are set to

ρ = 0.07 and ε = 10. The results are averaged over 100 independent runs. We simulate

the proposed algorithms for smart grids under a static scenario.

0 50 100 150 200 250 300 350 400 450−30

−25

−20

−15

−10

−5

0

5

10

Time instant, i

MS

E (

dB)

Diffusion LMS[2]Single−Link Strategy[27]M−CSE[4]ES−LMSSI−LMSDiffusion RLS[20]ES−RLSSI−RLSMMSE Bound

Figure 4.15: MSE performance curves for smart grids.

From Fig. 4.15, it can be seen that ES–RLS has the best performance, and significantly



ESTIMATION 101

outperforms the standard diffusion LMS [2] and the M–CSE [4] algorithms. The ES–

LMS is slightly worse than ES–RLS, which outperforms the remaining techniques. SI–

RLS is worse than ES–LMS but is still better than SI–LMS, while SI–LMS remains better

than the diffusion RLS, LMS, M–CSE algorithms and the single link strategy.

4.6 Summary

In this chapter, we have proposed ES–LMS/RLS and SI–LMS/RLS algorithms for dis-

tributed estimation in applications such as wireless sensor networks and smart grids. We

have compared the proposed algorithms with existing methods. We have also devised an-

alytical expressions to predict their MSE steady–state performance and tracking behavior.

Simulation experiments have been conducted to verify the analytical results and illustrate

that the proposed algorithms significantly outperform the existing strategies, in both static

and time–varying scenarios, in examples of wireless sensor networks and smart grids.


Chapter 5

Distributed Compressed Estimation

Based on Compressive Sensing


5.2 System Model and Problem Statement . . . . . . . . . . . . . . . . . 104

5.3 Proposed Distributed Compressed Estimation Scheme . . . . . . . . 105

5.4 Measurement Matrix Optimization . . . . . . . . . . . . . . . . . . 109

5.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.1 Introduction

Distributed signal processing algorithms are of great importance for statistical inference

in wireless networks and applications such as wireless sensor networks (WSNs) [2,17,46,

71]. Distributed processing techniques deal with the extraction of information from data

collected at nodes that are distributed over a geographic area [2]. In this context, for each

node a set of neighbor nodes collect and process their local information, and transmit their

estimates to a specific node. Then, each specific node combines the collected information

together with its local estimate to generate improved estimates.


102

2015

CHAPTER 5. DISTRIBUTED COMPRESSED ESTIMATION BASED ON COMPRESSIVE

SENSING 103

In many scenarios, the unknown parameter vector to be estimated can be sparse and

contain only a few nonzero coefficients. Many algorithms have been developed in the

literature for sparse signal estimation [3, 25, 43, 77–81]. However, these techniques are

designed to take into account the full dimension of the observed data, which increases

the computational cost, slows down the convergence rate and degrades mean square error

(MSE) performance.

Compressive sensing (CS) [11, 12] has recently received considerable attention and

been successfully applied to diverse fields, e.g., image processing [82], wireless commu-

nications [83] and MIMO radar [84]. The theory of CS states that an S–sparse signal

ω0 of length M can be recovered exactly with high probability from O(S logM) mea-

surements. Mathematically, the vector ω0 with dimension D × 1 that carries sufficient

information about ω0 (D ≪ M ) can be obtained via a linear model [12]

ω0 = Γω0 (5.1)

where Γ ∈ RD×M is the measurement matrix.

The application of CS to WSNs has been recently investigated in [83,85] and [86,87].

A compressive wireless sensing scheme was developed in [83] to save energy and band-

width, where CS is only employed in the transmit layer. In [85], a greedy algorithm called

precognition matching pursuit was developed for CS and used at sensors and the fusion

center to achieve fast reconstruction. However, the sensors are assumed to capture the tar-

get signal perfectly with only measurement noise. The work of [86] introduced a theory

for distributed CS based on jointly sparse signal recovery. However, in [86] CS techniques

are only applied to the transmit layer, whereas distributed CS in the estimation layer has

not been widely investigated. A sparse model that allows the use of CS for the online

recovery of large data sets in WSNs was proposed in [87], but it assumes that the sensor

measurements could be gathered directly, without an estimation procedure. In summary,

prior work has focused on signal reconstruction algorithms in a distributed manner but

has not considered both compressed transmit strategies and estimation techniques.

In this chapter, we focus on the design of an approach that exploits lower dimensions,

reduces the required bandwidth, and improves the convergence rate and the MSE perfor-

mance. Inspired by CS, we introduce a scheme that incorporates compression and decom-

pression modules into the distributed estimation procedure. In the compression module,



SENSING 104

we compress the unknown parameter ω0 into a lower dimension. As a result, the estima-

tion procedure is performed in a compressed dimension. After the estimation procedure is

completed, the decompression module recovers the compressed estimator into its original

dimension using an orthogonal matching pursuit (OMP) algorithm [37, 42, 88]. We also

present a design procedure and develop an algorithm to optimize the measurement matri-

ces, which can further improve the performance of the proposed scheme. Specifically, we

derive an adaptive stochastic gradient recursion to update the measurement matrix. Sim-

ulation results illustrate the performance of the proposed scheme and algorithm against

existing techniques.

5.2 System Model and Problem Statement

A wireless sensor network (WSN) with N nodes, which have limited processing capabil-

ities, is considered with a partially connected topology. A diffusion protocol is employed

although other strategies, such as incremental [1] and consensus [4] could also be used. A

partially connected network means that nodes can exchange information only with their

neighbors as determined by the connectivity topology. In contrast, a fully connected net-

work means that, data broadcast by a node can be captured by all other nodes in the net-

work [19]. At every time instant i, the sensor at each node k takes a scalar measurement

dk(i) according to

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (5.2)

where xk(i) is the M × 1 input signal vector with zero mean and variance σ2x,k, nk(i) is

the noise at each node with zero mean and variance σ2n,k. From (5.2), we can see that

the measurements for all nodes are related to an unknown parameter vector ω0 with size

M × 1 that should be estimated by the network. We assume that ω0 is a sparse vector

with S ≪ M non-zero coefficients. The aim of such a network is to compute an estimate

of ω0 in a distributed fashion, which minimizes the cost function

Jω(ω) =N∑k=1


∣∣2, (5.3)

where E denotes expectation. Distributed estimation of ω0 is appealing because it pro-

vides robustness against noisy measurements and improved performance as reported



SENSING 105

in [1, 2, 4]. To solve this problem, a cost-effective technique is the adapt–then–combine

(ATC) diffusion strategy [2]ψk(i) = ωk(i) + µkxk(i)

[dk(i)− ωH

k (i)xk(i)]∗,


cklψl(i),

(5.4)

where Nk indicates the set of neighbors for node k, ψk(i) is the local estimator of node

k, |Nk| denotes the cardinality of Nk and ckl is the combination coefficient, which is

calculated with respect to the Metropolis ruleckl =

1max(|Nk|,|Nl|)

, if k = l are linked

ckl = 0, for k and l not linked

ckk = 1−∑

l∈Nk/k

ckl, for k = l

(5.5)

and should satisfy ∑l

ckl = 1, l ∈ Nk∀k. (5.6)

Existing distributed sparsity-aware estimation strategies, e.g., [3, 63, 77], are designed

using the full dimension signal space, which reduces the convergence rate and degrades

the MSE performance. In order to improve performance, reduce the required bandwidth

and optimize the distributed processing, we incorporate at each node of the WSN the

proposed distributed compressed estimation scheme based on CS techniques, together

with a measurement matrix optimization algorithm.

5.3 Proposed Distributed Compressed Estimation

Scheme

In this section, we detail the proposed distributed compressed estimation (DCE) scheme

based on CS. The proposed scheme, depicted in Fig. 5.1, employs compression and

decompression modules inspired by CS techniques to perform distributed compressed

estimation. In the proposed scheme, at each node, the sensor first observes the M × 1

vector xk(i), then with the help of the D×M measurement matrix obtains the compressed

version xk(i), and performs the estimation of ω0 in the compressed domain. In other



SENSING 106

Measurement

D ×M

Compressedestimator ωk(i)

input signalω0

Unknown ek(i)

D × 1 − +

xk(i)matrix Γk

parameter ω0

M × 1

nk(i)

dk(i)

D × 1

yk(i)D < M

Compression module

Estimatorreconstruction

strategyMeasurement

matrix Γk

Decompressedestimator ωk(I)

M × 1Decompression module

After the final time instant I

Figure 5.1: Proposed Compressive Sensing Modules

words, the proposed scheme estimates the D × 1 vector ω0 instead of the M × 1 vector

ω0, where D ≪ M and the D–dimensional quantities are designated with an overbar.

At each node, a decompression module employs a D × M measurement matrix Γk and

a reconstruction algorithm to compute an estimate of ω0. One advantage for the DCE

scheme is that fewer parameters need to be transmitted between neighbour nodes.

ψl(i)l∈Nk

ck,l

ωk(i+ 1)

Combination

Combine

Compressive sensing

Module

ωk(i)

xk(i) ψk(i)

Adaptation

Exchange

Node k

Node k

k

ψ1(i)

ψ3(i)

ψ6(i)

ψk(i)Information

Γk

Figure 5.2: Proposed DCE Scheme

We start the description of the proposed DCE scheme with the scalar measurement



SENSING 107

dk(i) given by

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (5.7)

where ω0 = Γkω0 and xk(i) is the D × 1 input signal vector. This operation is depicted

in Fig. 5.1 as the compression module.

Fig. 5.2 illustrates the proposed DCE scheme. The scheme can be divided into three

steps:

1) Adaptation

In the adaptation step, at each time instant i=1,2, . . . , I, each node k=1,2, . . . , N,

generates a local compressed estimator ψk(i) through

ψk(i) = ωk(i) + µk(i)e∗k(i)xk(i), (5.8)

where ek(i) = dk(i)− ωHk (i)xk(i) and µk(i) =

µ0

xHk (i)xk(i)

.

2) Information exchange

Given the network topology structure, only the local compressed estimator ψk(i) will be

transmitted between node k and all its neighbor nodes. The measurement matrix Γk will

be kept locally.

3) Combination

At each time instant i=1,2, . . . , I, the combination step starts after the information

exchange is finished. Each node will combine the local compressed estimators from its

neighbor nodes and itself through


cklψl(i), (5.9)

to compute the updated compressed estimator ωk(i+ 1).

After the final iteration I , each node will employ the OMP reconstruction strategy to

generate the decompressed estimator ωk(I). Other reconstruction algorithms can also be



SENSING 108

used. The decompression module described in Fig. 5.1 illustrates the details. In summary,

during the DCE procedure, only the local compressed estimator ψk(i) will be transmitted

over the network resulting in a reduction of the number of parameters to be transmitted

from M to D. The proposed DCE scheme is given in Table 5.1.

The computational complexity of the proposed DCE scheme is O(NDI + ND3),

where N is the number of nodes in the WSN and I is the number of time instants. The

distributed NLMS algorithm has a complexity O(NMI), while the complexity of the

sparse diffusion NLMS algorithm [63] is O(3NMI). For the distributed compressive

sensing algorithm of [85], the computational complexity is O(NMI + ND3I). In the

proposed DCE scheme, only the local compressed estimator ψk(i) with D parameters will

be transmitted through the network, which means the transmission requirement is greatly

reduced as compared with the standard schemes that transmit ψk(i) with M parameters.

Table 5.1: The Proposed DCE Scheme


For each time instant i=1,2, . . . , I-1


ψk(i) = ωk(i) + µ(i)e∗k(i)xk(i)

where ek(i) = dk(i)− ωHk (i)xk(i),

dk(i) = ωH0 xk(i) + nk(i) = (Γkω0)

Hxk(i) + nk(i)

and Γk is the D ×M random measurement matrix

end



cklψl(i)

end

end

After the final iteration I


ωk(I) = fOMPωk(I)where ωk(I) is the final decompressed estimator.

end



SENSING 109

5.4 Measurement Matrix Optimization

To further improve the performance of the proposed DCE scheme, an optimization al-

gorithm for the design of the measurement matrix Γk(i), which is now time–variant, is

developed here. Unlike prior work [84, 89], this optimization is distributed and adaptive.

Let us consider the cost function

J = E|ek(i)|2

= E|dk(i)− yk(i)|2

= E|dk(i)|2 − Ed∗k(i)yk(i) − Edk(i)y∗k(i)+ E|yk(i)|2,

(5.10)

where yk(i) = ωHk (i)xk(i). To minimize the cost function, we need to compute the

gradient of J with respect to Γ∗k(i) and equate it to a null vector, i.e., ∇JΓ∗

k(i)= 0. As

a result, only the first three terms in (5.10) need to be considered. Taking the first three

terms of (5.10) we arrive at

E|dk(i)|2 − Ed∗k(i)yk(i) − Edk(i)y∗k(i)

= E|ωH0 Γ

Hk (i)xk(i) + nk(i)|2 − E(ωH

0 ΓHk (i)xk(i) + nk(i))

∗yk(i)

− E(ωH0 Γ

Hk (i)xk(i) + nk(i))y

∗k(i)

= E|ωH0 Γ

Hk (i)xk(i)|2+ E(ωH

0 ΓHk (i)xk(i))

∗nk(i)+ E(ωH0 Γ

Hk (i)xk(i))n

∗k(i)

+ E|nk(i)|2 − E(ωH0 Γ

Hk (i)xk(i))

∗yk(i) − En∗k(i)yk(i)

− E(ωH0 Γ

Hk (i)xk(i))y

∗k(i) − Enk(i)y

∗k(i). (5.11)

Because the random variable nk(i) is statistically independent from the other parameters

and has zero mean, (5.11) can be further simplified as

E|dk(i)|2 − Ed∗k(i)yk(i) − Edk(i)y∗k(i)

= E|ωH0 Γ

Hk (i)xk(i)|2+ σ2

n,k − E(ωH0 Γ

Hk (i)xk(i))

∗yk(i) − E(ωH0 Γ

Hk (i)xk(i))y

∗k(i).

(5.12)

Then, we have

∇JΓ∗k(i)

= Rk(i)Γk(i)Rω0 − P k(i), (5.13)

where Rk(i) = Exk(i)xHk (i), Rω0 = Eω0ω

H0 and P k(i) = Ey∗k(i)xk(i)ω

H0 .

Equating (5.13) to a null vector, we obtain

Rk(i)Γk(i)Rω0 − P k(i) = 0, (5.14)



SENSING 110

Γk(i) = R−1k (i)P k(i)R

−1ω0. (5.15)

The expression in (5.15) cannot be solved in closed–form because ω0 is an unknown

parameter. As a result, we employ the previous estimate ωk(i) to replace ω0. However,

ωk(i) and Γk(i) depend on each other, thus, it is necessary to iterate (5.15) with an initial

guess to obtain a solution. In particular, we replace the expected values with instantaneous

values. Starting from (5.13), we use instantaneous estimates to compute

Rk(i) = xk(i)xHk (i), (5.16)

Rω0 = ω0ωH0 (5.17)

and

P k(i) = y∗k(i)xk(i)ωH0 . (5.18)

According to the method of steepest descent [32], the updated parameters of the measure-

ment matrix Γk(i) at time i+ 1 are computed by using the simple recursive relation

Γk(i+ 1) = Γk(i) + η[−∇JΓ∗k(i)

]

= Γk(i) + η[P k(i)− Rk(i)Γk(i)Rω0 ] (5.19)

= Γk(i) + η[y∗k(i)xk(i)ωH0 − xk(i)x

Hk (i)Γk(i)ω0ω

H0 ].

where η is the step size and ω0 is the M × 1 unknown parameter vector that must be

estimated by the network. Then, the parameter vector ωk(i) is used to reconstruct the

estimate of ω0 as follows

ωrek(i) = fOMPωk(i), (5.20)

where the operator fOMP· denotes the OMP reconstruction algorithm described in Chap-

ter 2. Note that other reconstruction algorithms could also be employed. Replacing ω0 by

ωrek(i), we arrive at the expression for updating the measurement matrix described by

Γk(i+ 1) = Γk(i) + η[y∗k(i)xk(i)ω

Hrek

(i)− xk(i)xHk (i)Γk(i)ωrek(i)ω

Hrek

(i)]. (5.21)

The computational complexity of the proposed scheme with measurement matrix opti-

mization is O(NDI + ND3I). The structure of the proposed measurement matrix opti-

mization is described in Fig. 5.3.



SENSING 111

Measurement

D ×M

Compressedestimator ωk(i)

input signalω0

Unknown ek(i)

D × 1

− +

xk(i)matrix Γk

parameter ω0

M × 1

nk(i)

dk(i)

D × 1

yk(i)D < M

Compression module

Estimatorreconstruction

strategyMeasurement

matrix Γk

Decompressedestimator ωk(I)

M × 1

Decompression module

yk(i), ωk(i)

Final

After the final time instant I

Figure 5.3: Proposed Compressive Sensing Module with Measurement Matrix Optimization

5.5 Simulations

We assess the proposed DCE scheme and the measurement matrix optimization algo-

rithm in a WSN application, where a partially connected network with N = 20 nodes is

considered and illustrated in Fig. 5.4. We compare the proposed DCE scheme with un-

compressed schemes, including the distributed NLMS (dNLMS) algorithm (normalized

version of [2]), sparse diffusion NLMS algorithm [63], sparsity-promoting adaptive al-

gorithm [78], and the distributed compressive sensing algorithm [85], in terms of MSE

performance. Note that other metrics such as mean-square deviation (MSD) could be

used but result in the same performance hierarchy between the analyzed algorithms.

The input signal is generated as xk(i) = [xk(i) xk(i− 1) ... xk(i−M + 1)]T and

xk(i) = uk(i) + αkxk(i − 1), where αk is a correlation coefficient and uk(i) is a white

noise process with variance σ2u,k = 1− |αk|2, to ensure the variance of xk(i) is σ2

x,k = 1.

The compressed input signal is obtained by xk(i) = Γkxk(i). The measurement matrix

Γk is an i.i.d. Gaussian random matrix that is kept constant. The noise samples are

modeled as complex Gaussian noise with variance σ2n,k = 0.001. The unknown M × 1



SENSING 112

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

12

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

Figure 5.4: Diffusion wireless sensor network with 20 Nodes

parameter vector ω0 has sparsity S, where M=50, D=10 and S=3. The step size µ0 for

the distributed NLMS, distributed compressive sensing, sparse diffusion LMS and the

proposed DCE algorithms is 0.15. The parameter that controls the shrinkage in [63] is set

to 0.001. For [78], the number of hyperslabs equals 55 and the width of the hyperslabs is

0.01.

Fig. 5.5 illustrates the comparison between the DCE scheme with other existing al-

gorithms, without the measurement matrix optimization. It is clear that, when compared

with the existing algorithms, the DCE scheme has a significantly faster convergence rate

and a better MSE performance. These advantages consist in two features: the compressed

dimension brought by the proposed scheme and CS being implemented in the estimation

layer. As a result, the number of parameters for transmission in the network is significant-

ly reduced.

In the second scenario, we employ the measurement matrix optimization algorithm to

in the DCE scheme. The parameter η for the measurement matrix optimization algorithm



SENSING 113

0 100 200 300 400 500 600 700 800 900 1000−35

−30

−25

−20

−15

−10

−5

0

5

10

15

Time instant, i

MS

E (

dB)

Proposed DCE SchemeDistributed CS [85]Sparse Diffusion NLMS [63]dNLMS [2]Sparsity promotingadaptive algorithm [78]

Figure 5.5: MSE performance against time

0 100 200 300 400 500 600 700 800 900 1000−35

−30

−25

−20

−15

−10

−5

0

5

10

15

Time instant, i

MS

E (

dB)

Proposed DCE SchemeDistributed CS[85]Sparse Diffusion NLMS[63]dNLMS[2]Proposed DCE Scheme withmeasurement matrix optimizationSparsity promotingadaptive algorithm[78]

Figure 5.6: MSE performance against time with measurement matrix optimization



SENSING 114

is set to 0.08 and all other parameters remain the same as in the previous scenario. In Fig.

5.6, we observe that with the help of the measurement matrix optimization algorithm, D-

CE can achieve a faster convergence when compared with DCE without the measurement

matrix optimization.

In the third scenario, we compare the DCE scheme with the distributed NLMS algo-

rithm with different levels of resolution in bits per coefficient, reduced dimension D and

sparsity level S. The x-axis stands for the reduced dimension D and their corresponding

sparsity level S can be found in Fig. 5.7. In Fig. 5.7, it is clear that with the increase of

the sparsity level S the MSE performance degrades. In addition, the MSE performance

will increase when the transmission has more bits per coefficient. For the DCE scheme,

the total number of bits required for transmission is D times the number of bits per coef-

ficient, whereas for the distributed NLMS algorithm it is M times the number of bits per

coefficient. A certain level of redundancy is required between the sparsity level and the

reduced dimension due to the error introduced by the estimation procedure.

4 8 12 16 20−30

−28

−26

−24

−22

−20

−18

−16

−14

−12

−10

Reduced Dimension D

MS

E (

dB)

DCE with 6 bitsDEC with 8 bitsDCE with 10 bitsdNLMS with 10 bitsdNLMS with 8 bitsdNLMS with 6 bits

Sparse Level (S) Reduced Dimention (D)1 42 83 124 165 20

Figure 5.7: MSE performance against reduced dimension D for different levels of resolution in bits per

coefficient



SENSING 115

5.6 Summary

In this chapter, we have proposed a novel DCE scheme and algorithms for sparse signals

and systems based on CS techniques and a measurement matrix optimization. In the

DCE scheme, the estimation procedure is performed in a compressed dimension. The

simulation results for a WSN application show that the DCE scheme outperforms existing

strategies in terms of convergence rate, reduced bandwidth and MSE performance.


Chapter 6

Distributed Reduced–Rank Estimation

Based on Joint Iterative Optimization in

Sensor Networks


6.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.3 Distributed Dimensionality Reduction and Adaptive Processing . . 120

6.4 Proposed Distributed Reduced-Rank Algorithms . . . . . . . . . . . 121

6.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.1 Introduction

Distributed strategies have become fundamental for parameter estimation in wireless net-

works and applications such as sensor networks [1,2] and smart grids [4,28]. Distributed

processing techniques deal with the extraction of information from data collected at nodes

that are distributed over a geographic area [1]. In this context, a specific node or agent in

the network collects data from its neighbors and combines them with its local information


116

2015

CHAPTER 6. DISTRIBUTED REDUCED–RANK ESTIMATION BASED ON JOINT ITERATIVE

OPTIMIZATION IN SENSOR NETWORKS 117

to generate an improved estimate. However, when the unknown parameter vector to be

estimated has a large dimension, the network requires a large communication bandwidth

between neighbor nodes to transmit their local estimate. This problem limits the applica-

tion of existing algorithms in applications with large data sets as the convergence speed

is dependent on the length of the parameter vector [2, 9, 10]. Hence, distributed dimen-

sionality reduction has become an important tool for distributed inference problems with

large data sets.

In order to perform dimensionality reduction, many algorithms have been proposed

in the literature, in the context of distributed quantized Kalman Filtering [90, 91], quan-

tized consensus algorithms [92], distributed principal subspace estimation [93], single bit

strategy [94] and Krylov subspace optimization techniques [95]. However, existing algo-

rithms are either too costly or have unsatisfactory performance when processing a large

number of parameters. As a result, trade-offs between the amount of cooperation, com-

munication and system performance naturally exist. In this context, reduced–rank tech-

niques are powerful tools to perform dimensionality reduction, which have been applied

to DS–CDMA system [96–98], multi–input–multi–output (MIMO) equalization applica-

tion [99], spread–spectrum systems [100], space–time interference suppression [13] and

beamforming [101, 102]. However, limited research has been carried out on distributed

reduced-rank estimation. Related approaches to reduced–rank techniques include com-

pressive sensing-based strategies [84], which exploit sparsity to reduce the number of

parameters for estimation, and attribute-distributed learning [103], which employs agents

and a fusion center to meet the communication constraints.

In this chapter, we propose a scheme for distributed signal processing along with dis-

tributed reduced–rank algorithms for parameter estimation. In particular, the proposed

algorithms are based on an alternating optimization strategy [96, 99] and are called dis-

tributed reduced–rank joint iterative optimization normalized least mean squares (DRJIO–

NLMS) algorithm and distributed reduced–rank joint iterative optimization recursive least

square (DRJIO–RLS) algorithm. In contrast to prior work on reduced–rank techniques

and distributed methods, this is to the best of our knowledge the first time this alternating

optimization strategy is done in a distributed way. The proposed reduced–rank strategies

are distributed and perform dimensionality reduction without costly decompositions at

each agent. The proposed DRJIO–NLMS and DRJIO–RLS algorithms are flexible with




regards to the amount of information that is exchanged, have low cost and high perfor-

mance. The DRJIO–NLMS and DRJIO–RLS algorithms can also outperform competing

techniques. Analysis of complexity and convergence are also provided in this chapter,

together with applications to parameter estimation in wireless sensor networks and smart

grids.

6.2 System Model

A distributed network with N nodes, which have limited processing capabilities, is consid-

ered with a partially connected topology. A diffusion protocol is employed although other

strategies, such as incremental [1] and consensus-based [4] could also be used. A partially

connected network means that nodes can exchange information only with their neighbors

determined by the connectivity topology. In contrast, a fully connected network means

that, data broadcast by a node can be captured by all other nodes in the network [19]. At

every time instant i, each node k takes a scalar measurement dk(i) according to

dk(i) = ωH0 xk(i) + nk(i), i = 1, 2, . . . , I, (6.1)

where xk(i) is the M × 1 input signal vector with zero mean and variance σ2x,k, nk(i) is

the noise sample at each node with zero mean and variance σ2n,k. Through (6.1), we can

see that the measurements for all nodes are related to an unknown parameter vector ω0

with size M × 1, that would be estimated by the network. Fig. 6.1 shows an example

for a diffusion–type network with 20 nodes. The aim of such a network is to compute an

estimate of ω0 in a distributed fashion, which can minimize the global cost function

Jω(ω) =N∑k=1


∣∣2, (6.2)

where E denotes the expectation operator. To solve this problem, one possible technique

is the adapt–then–combine (ATC) diffusion strategy [2]ψk(i) = ωk(i− 1) + µkxk(i)

[dk(i)− ωH

k (i− 1)xk(i)]∗,

ωk(i) =∑l∈Nk

cklψl(i),

(6.3)




where Nk indicates the set of neighbors for node k, |Nk| denotes the cardinality of Nk

and ckl is the combination coefficient, which is calculated under the Metropolis ruleckl =

1max(|Nk|,|Nl|)

, if k = l are linked

ckl = 0, for k and l not linked

ckk = 1−∑

l∈Nk/k

ckl, for k = l

(6.4)

and should satisfy ∑l

ckl = 1, l ∈ Nk∀k. (6.5)

With this strategy, when the dimension of the unknown parameter vector ω0 is large, it

could lead to a high communication overhead between each neighbor node and the con-

vergence speed is reduced. In order to solve this problem and optimize the distributed

processing, we incorporate at the kth node of the network distributed reduced–rank s-

trategies based on alternating optimization techniques.

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

12

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

18

19

20

Figure 6.1: Network topology with 20 nodes




6.3 Distributed Dimensionality Reduction and Adaptive

Processing

The proposed distributed dimensionality reduction scheme, depicted in Fig.6.2, employs

a transformation matrix SDk(i) to process the input signal xk(i) with dimensions M × 1

and projects it onto a lower D×1 dimensional subspace xk(i), where D ≪ M . Following

this procedure, a reduced–rank estimator ωk(i) is computed, and the ωk(i) is transmitted

by each node. In particular, the transformation matrix SDk(i) and reduced–rank estimator

ωk(i) will be jointly optimized in the proposed scheme according to the minimum mean–

squared error (MMSE) criterion.

Dimensionality reductionmatrix SDk

(i)Reduced-Rankestimator ωk(i)

dk(i)xk(i) xk(i)

Design Algorithm

ek(i)M × 1

D × 1

− +

Figure 6.2: Proposed dimensionality reduction scheme at each node or agent.

Specifically, we start the description of the method with an M × D matrix SDk(i),

which carries out a dimensionality reduction on the input signal as given by

xk(i) = SHDk

(i)xk(i), (6.6)

where, in what follows, all D–dimensional quantities are denoted with a ’bar’. The design

of SDk(i) and ωk(i) corresponds to the optimization problem given by

SoptDk

, ωoptk

= minSDk

(i),ωk(i)E[|dk(i)− ωk

H(i)SHDk

(i)xk(i)|2] (6.7)

where ωk(i) is the reduced–rank estimator. By fixing SDk(i) and minimizing (6.7) with

respect to ωk(i), we have

ωk(i) = R−1k (i)pk(i), (6.8)




where Rk(i) = E[SHDk

(i)xk(i)xHk (i)SDk

(i)] = E[xk(i)xHk (i)] and pk(i) =

E[d∗k(i)SHDk

(i)xk(i)] = E[d∗k(i)xk(i)]. We then fix ωk(i) and minimize (6.7) with re-

spect to SDk(i), and arrive at the following expression:

SDk(i) = R−1

k (i)PDk(i)R

−1ωk(i), (6.9)

where Rk(i) = E[xk(i)xHk (i)], PDk

(i) = E[d∗k(i)xk(i)ωHk (i)] and Rωk

(i) =

E[ωk(i)ωHk (i)]. The associated reduced–rank MSE is obtained by substituting the ex-

pressions obtained in (6.8) and (6.9) into the cost function and is described as [96]

MSE = σ2dk

− pHk (i)R−1k (i)pk(i) (6.10)

where σ2dk

= E[|dk(i)|2]. Because there is no closed-form expression for SDk(i) and

ωk(i) as they depend on each, we need a strategy to compute the parameters. The pro-

posed strategy is based on an alternating optimization of SDk(i) and ωk(i). In the next

section, we develop a distributed reduced-rank algorithm to compute the parameters of

interest.

6.4 Proposed Distributed Reduced-Rank Algorithms

In this section, we present the two proposed distributed reduced–rank algorithms for dis-

tributed estimation, namely DRJIO–NLMS and DRJIO–RLS. Unlike prior work [93–95],

the proposed algorithms do NOT require

• an M × M auto–correlation matrix of the input signal and an M × 1 cross–

correlation vector between the input signal and the desired signal used to build

the Krylov subspace [95]

• Additional cost to perform eigen–decompositions [93]

• Extra adaptive processing at the local node [94]

• Costly convex optimization at the local node, which introduces extra complexity

[95].

The DRJIO–NLMS and DRJIO–RLS algorithms are flexible, have low cost and very fast

convergence speed.




6.4.1 Proposed DRJIO–NLMS algorithm

For the DRJIO–NLMS algorithm, the parameters in (6.7) are optimized by an alternating

procedure that adjusts one of the parameters while keeping the other parameter fixed.

Therefore, it solves the following optimization problem in an alternating fashion:[Sopt

Dk, ωopt

k

]= minSDk

(i),ωHk (i)

||ωk(i)− ωk(i− 1)||2 + ||SDk(i)− SDk

(i− 1)||2

subject to ωHk (i)S

HDk

(i)xk(i) = dk(i).

(6.11)

Using the method of Lagrange multipliers and considering SDk, ωk jointly, we arrive

at the following Lagrangian:

Lk = ||ωk(i)− ωk(i− 1)||2 + ||SDk(i)− SDk

(i− 1)||2

+R[λ∗1(dk(i)− ωH

k (i)SHDk

(i− 1)xk(i))]

+R[λ∗2(dk(i)− ωH

k (i− 1)SHDk

(i)xk(i))],

(6.12)

where , λ1, λ2 are scalar Lagrange multipliers, || · || denotes the Frobenius norm, and the

operator R[·] retains the real part of the argument. By computing the gradient terms of

(6.12) with respect to ωk(i), SDk(i), λ1 and λ2, respectively, we obtain

∇ωk(i)L = 2(ωk(i)− ωk(i− 1)

)+ SH

Dk(i− 1)xk(i)λ1 (6.13)

∇SDk(i)L = 2

(SDk

(i)− SDk(i− 1)

)+ xk(i)ωk(i− 1)λ2 (6.14)

∇λ1L = dk(i)− ωHk (i)S

HDk

(i− 1)xk(i) (6.15)

∇λ2L = dk(i)− ωHk (i− 1)SH

Dk(i)xk(i). (6.16)

By setting (6.13)–(6.16) to zero and solving the remaining equations, we obtain the re-

cursions of the proposed DRJIO–NLMS algorithm described by

ωk(i) = ωk(i− 1) + µ(i)e∗k(i)xk(i) (6.17)

SDk(i) = SDk

(i− 1) + η(i)e∗k(i)xk(i)ωHk (i− 1) (6.18)

where ek(i) = dk(i) − ωHk (i − 1)SH

Dk(i − 1)xk(i), µ(i) = µ0

xHk (i)xk(i)

and η(i) =

η0ωH

k (i−1)ωk(i−1)xHk (i)xk(i)

are the time–varying step sizes, while µ0 and η0 are the conver-

gence factors. The normalization makes it easier the setting of the convergence factors,

improves the convergence speed and facilitates the comparison with other distributed




LMS–type algorithms. The recursions are computed in an alternating way with one it-

eration per time instant at each node.

The proposed DRJIO–NLMS algorithm includes two steps, namely, adaptation step

and combination step, which are performed using an alternating procedure which is de-

tailed next.

• Adaptation step

For the adaptation step, at each time instant i=1,2, . . . , I, each node k=1,2, . . . , N, starts

from generating a local reduced–rank estimator through

ψk(i) = ωk(i− 1) + µ(i)e∗k(i)xk(i), (6.19)

where ek(i) = dk(i) − ωHk (i − 1)SH

Dk(i)xk(i). This local reduced–rank estimator ψk(i)

will be transmitted to all its neighbor nodes under the network topology structure.

Then, each node k=1,2, . . . , N, will update its dimensionality reduction matrix accord-

ing to

SDk(i) = SDk

(i− 1) + η(i)e∗k(i)xk(i)ωk(i− 1), (6.20)

and keep it locally.

• Combination step

At each time instant i=1,2, . . . , I, the combination step starts after the adaptation step

finishes. Each node will combine the local reduced–rank estimators from its neighbor

nodes and itself through

ωk(i) =∑l∈Nk

cklψl(i), (6.21)

to compute the reduced–rank estimator ωk(i).

After the final iteration I , each node will generate the full–rank estimator ωk(I) from

ωk(I) = SDk(I)ωk(I). (6.22)




In conclusion, during the distributed processing steps, only the local reduced–rank es-

timator ψk(i) will be transmitted through the network. The proposed DRJIO–NLMS

algorithm is detailed in Table 6.1.

Table 6.1: The DRJIO–NLMS Algorithm




ψk(i) = ωk(i− 1) + µ(i)e∗k(i)xk(i)

where ek(i) = dk(i)− ωHk (i− 1)SH

Dk(i)xk(i)

% ψk(i) is the local reduced–rank estimator and will be

% sent to all neighbor nodes of node k under the network

% topology structure.

SDk(i) = SDk

(i− 1) + η(i)e∗k(i)xk(i)ωk(i− 1)

% The dimensionality reduction matrix SDk(i)

% will be updated and kept locally.

end


ωk(i) =∑l∈Nk

cklψl(i)

% The reduced–rank estimator ωk(i)

% will be updated and kept locally.

end

end

After the final iteration I


ωk(I) = SDk(I)ωk(I)

where ωk(I) is the final full–rank estimator.

end

6.4.2 Proposed DRJIO–RLS algorithm

In this subsection, we present a recursive approach for the dimensionality reduction dis-

tributed estimation. Specifically, we develop the DRJIO–RLS algorithm for computing




SDk(i) and ωk(i). In order to derive the proposed algorithm, we first define

P k(i) , R−1k (i) (6.23)

PDk(i) , λPDk

(i− 1) + d∗k(i)xk(i)ωHk (i) (6.24)

Qωk(i) , R−1

ωk(i− 1) (6.25)

and rewrite the expression in (6.9) as follows

SDk(i) = R−1

k (i)PDk(i)R

−1ωk(i− 1)

= P k(i)PDk(i)Qωk

(i)

= λP k(i)PDk(i− 1)Qωk

(i) + d∗k(i)P k(i)xk(i)ωHk (i)Qωk

(i)

= SDk(i− 1) + kk(i)

[d∗k(i)t

Hk (i)− xH

k (i)SDk(i− 1)

],

(6.26)

where the D × 1 vector tk(i) = Qωk(i)ωk(i) and the M × 1 Kalman gain vector is

kk(i) =λ−1P k(i− 1)xk(i)

1 + λ−1xHk (i)P k(i− 1)xk(i)

. (6.27)

In addition, the update for the M ×M matrix P k(i) employs the matrix inversion lemma

[9] as follows:

P k(i) = λ−1P k(i− 1)− λ−1kk(i)xHk (i)P k(i− 1) (6.28)

and the D × 1 vector tk(i) is updated as

tk(i) =λ−1Qωk

(i− 1)ωk(i− 1)

1 + λ−1ωHk (i− 1)Qωk

(i− 1)ωk(i− 1). (6.29)

The matrix inversion lemma [9] is then used to update the D × D matrix Qωk(i) as

described by

Qωk(i) = λ−1Qωk

(i− 1)− λ−1tk(i)ωHk (i− 1)Qωk

(i− 1). (6.30)

Equations (6.23)–(6.30) constitute the key part steps of the proposed DRJIO-RLS algo-

rithm for computing SDk(i).

To derive the expression for updating ωk(i), the following associated quantities are

defined

Φk(i) , R−1k (i) (6.31)

pk(i) = λpk(i− 1) + d∗k(i)xk(i). (6.32)




Then, equation (6.8) will be rewritten as

ωk(i) = R−1k (i)pk(i)

= Φk(i)pk(i)

= λΦk(i)pk(i− 1) + d∗k(i)Φk(i)xk(i)

= ωk(i− 1) + kk(i)

[d∗k(i)− xH

k (i)ωk(i− 1)

],

(6.33)

where the D × 1 Kalman gain vector is given by

kk(i) =λ−1Φk(i− 1)xk(i)

1 + λ−1xHk (i)Φk(i− 1)xk(i)

. (6.34)

and the update for the matrix inverse Φk(i) employs the matrix inversion lemma [9]

Φk(i) = λ−1Φk(i− 1)− λ−1kk(i)xHk (i)Φk(i− 1). (6.35)

Equations (6.31)–(6.35) constitute the key part of the proposed DRJIO-RLS algorithm for

computing ωk(i). The proposed DRJIO–RLS algorithm is detailed in Table 6.2.

6.4.3 Computational Complexity Analysis

Here, we evaluate the computational complexity of the proposed DRJIO–NLMS and

DRJIO–RLS algorithms. The computational complexity of the proposed DRJIO–NLMS

algorithm is O(DM), while the proposed DRJIO–RLS algorithm has a complexity

O(M2 + D2). The distributed NLMS algorithm [2] has a complexity O(M), while the

complexity of the distributed RLS algorithm [20] is O(M2). For the Krylov Subspace

NLMS [95] the complexity reaches O(DM2), while for the distributed principal subspace

estimation algorithms [93], the complexity is O(M3). Thus, the proposed DRJIO–NLMS

algorithm has a much lower computational complexity, and because D ≪ M , it is as

simple as the distributed NLMS algorithm. An additional and very important aspect of

distributed reduced–rank algorithms is that the dimensionality reduction results in a de-

crease in the number of transmitted parameters from M to D which corresponds to a less

stringent bandwidth requirement. The details of computational complexity of the the pro-

posed and the existing algorithms, are shown in Table 6.3, where M is the length of the

unknown parameter that needs to be estimated, D is the reduced dimension and |Nk| is

the cardinality of Nk.




Table 6.2: The DRJIO-RLS Algorithm


P k(0) = δ−1IM×M ,Qωk(0) = δ−1ID×D,

Φk(0) = δ−1ID×D and δ = small positive constant



kk(i) =λ−1P k(i−1)xk(i)

1+λ−1xHk (i)P k(i−1)xk(i)

tk(i) =λ−1Qωk

(i−1)ωk(i−1)

1+λ−1ωHk (i−1)Qωk

(i−1)ωk(i−1)

SDk(i) = SDk

(i− 1) + kk(i)

[d∗k(i)t

Hk (i)− xH

k (i)SDk(i− 1)

]P k(i) = λ−1P k(i− 1)− λ−1kk(i)x

Hk (i)P k(i− 1)

Qωk(i) = λ−1Qωk

(i− 1)− λ−1tk(i)ωHk (i− 1)Qωk

(i− 1)

kk(i) =λ−1Φk(i−1)xk(i)

1+λ−1xHk (i)Φk(i−1)xk(i)

ψk(i) = ωk(i− 1) + kk(i)

[d∗k(i)− xH

k (i)ωk(i− 1)

]Φk(i) = λ−1Φk(i− 1)− λ−1kk(i)x

Hk (i)Φk(i− 1)

end


ωk(i) =∑l∈Nk

cklψl(i)

end

end


ωk(I) = SDk(I)ωk(I)

where ωk(I) is the final full–rank estimator.

end

To further illustrate the computational complexity for different algorithms, we present

the main trends in terms of the number of multiplications for the proposed and existing

algorithms in Fig. 6.3. For the parameters, we take node 14 as an example and the D = 5,

|Nk| = 5.




20 30 40 50 60 70 80 90 10010

2

103

104

105

106

M

Num

ber

of M

ultip

licat

ions

DRJIO−NLMSDRJIO−RLSDistributed NLMS[2]Distributed RLS[20]Krylov Subspace NLMS[95]Distributed PrincipalSubspace Estimation[93]

Figure 6.3: Complexity in terms of multiplications

Table 6.3: Computational Complexity of Different Algorithms

Algorithm Multiplications Additions

DRJIO–NLMS 2(D + 1)M + (3 + |Nk|)D + 5 (2D + 1)M + (2 + |Nk|)D − 2

DRJIO–RLS 2M2 + (3 + 2D)M + 4D2 2M2 + 2DM + 4D2

+(9 + |Nk|)D +(2 + |Nk|)D

Distributed NLMS [2] (4 + |Nk|)M + 1 (5 + |Nk|)M − 1

Distributed RLS [20] 4M2 + (12 + |Nk|)M − 1 4M2 + (16 + |Nk|)M + 1

Krylov Subspace 6DM2 + 4M + (5 + |Nk|)D 6DM2 + 2M + (2 + |Nk|)DNLMS [95]

Distributed principal M3 + 2(D + 2)M M3 + (D + 1)M

subspace estimation [93] +(3 + |Nk|)D + 4 +(2 + |Nk|)D − 1




6.4.4 Sufficient Conditions for Convergence

To start the analysis, we assume that the compression and decompression with SDk(i)

do not introduce loss of information or performance. Then, to develop the analysis and

proofs, we need to define a metric space and the Hausdorff distance that will extensively

be used. A metric space is an ordered pair (M, r), where M is a nonempty set, and r is

a metric on M, i.e., a function r : M× M → R such that, for any x, y, z, and M, the

following conditions hold.

1) r(x, y) ≥ 0.

2) r(x, y) = 0 if x = y.

3) r(x, y) = r(y, x).

4) r(x, y) ≤ r(x, y) + r(y, z(triangle inequality)).

The Hausdorff distance measures how far two subsets of a metric space are from each

other and is defined by

rH(X, Y ) = max

supx∈X

infy∈Y

r(x, y), supy∈Y

infx∈X

r(x, y)

. (6.36)

The design of SDk(i) and ωk(i) corresponds to the optimization problem given by

Sopt

Dk, ωopt

k

= minSDk

(i),ωk(i)

N∑k=1

E[|dk(i)− ωkH(i)SH

Dk(i)xk(i)|2] (6.37)

where ωk(i) is the reduced–rank estimator. By fixing SDk(i) and minimizing (6.37) with

respect to ωk(i), we have

ωk(i) = R−1k (i)pk(i), (6.38)

where Rk(i) = E[SHDk

(i)xk(i)xHk (i)SDk

(i)] = E[xk(i)xHk (i)] and pk(i) =

E[d∗k(i)SHDk

(i)xk(i)] = E[d∗k(i)xk(i)]. We then fix ωk(i) and minimize (6.37) with re-

spect to SDk(i), and arrive at the following expression

SDk(i) = R−1

k (i)PDk(i)R

−1ωk(i), (6.39)




where Rk(i) = E[xk(i)xHk (i)], PDk

(i) = E[d∗k(i)xk(i)ωHk (i)] and Rωk

(i) =

E[ωk(i)ωHk (i)].

The equation with the associated MSE for node k is obtained by substituting the ex-

pressions in (6.38) and (6.39) into the cost function (6.37) and is given by

MSEk = σ2dk

− pHk (i)R−1k (i)pk(i) (6.40)

where σ2dk

= E[|dk(i)|2]. The proposed DRJIO–NLMS and DRJIO–RLS algorithms can

be stated as an alternating minimization strategy performed in a distributed fasion based

on the MSE and expressed as

SDk(i) ∈ arg min

SoptDk

∈SDk(i)MSEk

(Sopt

Dk, ωk(i)

)(6.41)

and

ωk(i) ∈ arg minωopt

k ∈ωk(i)MSEk

(SDk

(i), ωoptk

)(6.42)

where SoptDk

and ωoptk correspond to the optimal values of SDk

(i) and ωk(i), respective-

ly, and the sequences of compact sets SDk(i)i≥0 and ωk(i)i≥0 converge to the sets

SDk,optand ωk,opt, respectively.

The sets SDk,optand ωk,opt are not directly given, but we observe the sequence of

compact sets SDk(i)i≥0 and ωk(i)i≥0. The goal of the proposed algorithms is to find

a sequence of SDk(i) and ωk(i) in a distributed way such that

limi→∞

MSEk

(SDk

(i), ωk(i)

)= MSEk

(Sopt

Dk, ωopt

k

). (6.43)

To present a set of sufficient conditions under which the proposed algorithms converge,

we employ the so–called three– and four–point properties [104, 105]. Let us assume that

there is a function f : M×M → R such that the following conditions are satisfied.

1) Three–point property (SoptDk

, SDk, ωopt

k ).

For all i ≥ 1, SoptDk

∈ SDk(i), ωopt

k ∈ ωk(i) and SDk∈

argminωoptk ∈ωk(i)

MSEk

(Sopt

Dk, ωopt

k

), we have

f

(Sopt

Dk, SDk

)+MSEk

(SDk

, ωoptk

)≤ MSEk

(Sopt

Dk, ωopt

k

). (6.44)




2) Four–point property (SoptDk

, ωoptk , SDk

, ˜ωk).

For all i ≥ 1, SoptDk

, SDk∈ SDk

(i), ωoptk ∈ ωk(i) and ˜ωk ∈

argminωoptk ∈ωk(i)

MSEk

(SDk

, ωoptk

), we have

MSEk

(Sopt

Dk, ˜ωk

)≤ MSEk

(Sopt

Dk, ωopt

k

)+ f

(Sopt

Dk, SDk

). (6.45)

Theorem: Let SDk(i)i≥0, ωk(i)i≥0, Sopt

Dk, ωopt

k be compact subsets of the compact

metric space (M, r) such that

SDk(i)

rh→ SoptDk

ωk(i)rh→ ωopt

k (6.46)

and let MSEk : M×M → R be a continuous function.

Now, let conditions 1) and 2) hold. Then, for the proposed algorithms, we have

limi→∞

MSEk

(SDk

(i), ωk(i)

)= MSEk

(Sopt

Dk, ωopt

k

). (6.47)

A general proof of this theorem is detailed in [104, 105].

6.4.5 Convergence to the Optimal Reduced-Rank Estimator

In this section, we show that the proposed reduced–rank algorithm globally and exponen-

tially converges to the optimal reduced-rank estimator [106, 107]. To proceed with our

proof, let us rewrite the expressions in (6.38) and (6.39) for time instant 0 as follows:

Rk(0)SDk(0)Rωk

(0) = PDk(0) = pk(0)ω

Hk (0) (6.48)

and

Rk(0)ωk(1) = SHDk

(0)Rk(0)SDk(0)ωk(1) = pk(0). (6.49)

Using (6.48), we can obtain the following relation:

Rωk(0) =

(SH

Dk(0)R2

k(0)SDk(0)

)−1

SDk(0)Rk(0)pk(0)ω

Hk (0). (6.50)

Substituting the aforementioned result for Rωk(0) into the expression in (6.48), we get a

recursive expression for SDk(0) as

SDk(0) = R−1

k (0)pk(0)ωHk (0)

(SH

Dk(0)Rk(0)pk(0)ω

Hk (0)

)−1(SH

Dk(0)R2

k(0)SDk(0)

)−1

.

(6.51)




Using (6.48), we can express ωk(1) as

ωk(1) =

(SH

Dk(0)Rk(0)SDk

(0)

)−1

SHDk

(0)pk(0). (6.52)

For the proposed DRJIO–NLMS and DRJIO–RLS, the relation is

ωk(1) = SDk(1)

∑l∈Nk

cklωl(1). (6.53)

As a result, we obtain

ωk(1) = R−1k (1)pk(1)ω

Hk (1)

(SH

Dk(1)Rk(1)pk(1)ω

Hk (1)

)−1(SH

Dk(1)R2

k(1)SDk(1)

)−1

×∑l∈Nk

ckl

(SH

Dl(0)Rl(0)SDl

(0)

)−1

SHDl(0)pl(0). (6.54)

More generally, we can express the proposed algorithms by the following recursion:

ωk(i) = SDk(i)

∑l∈Nk

cklωl(i)

= R−1k (i)pk(i)ω

Hk (i)

(SH

Dk(i)Rk(i)pk(i)ω

Hk (i)

)−1(SH

Dk(i)R2

k(i)SDk(i)

)−1

×∑l∈Nk

ckl

(SH

Dl(i− 1)Rl(i− 1)SDl

(i− 1)

)−1

SHDl(i− 1)pl(i− 1). (6.55)

At this point, we assume that when the algorithms convergence, the SDk(i) for each

node k are the same. Because the optimal reduced–rank filter can be described by the

EVD of R−1/2k (i)pk(i) [20], [21], where R−1/2

k (i) is the square root of the matrix Rk(i),

and pk(i) is the crosscorrelation vector, we have

R−1/2k (i)pk(i) = ΦkΛkΦ

Hk pk(i), (6.56)

where Λk is a M ×M diagonal matrix with the eigenvalues of Rk, and Φk is a M ×M

unitary matrix with the eigenvectors of Rk. Let us assume that there exists some ωk(0)

such that the randomly selected SDk(0) can be written as [21]

SDk(0) = R

−1/2k (i)Φkωk(0), (6.57)

Using (6.56) and (6.57) in (6.55) together with the assumption and manipulating the al-

gebraic expressions, we can express (6.55) in a more compact way that is suitable for

analysis, as given by

ωk(i) =∑l∈Nk

cklΛ2lωl(i− 1)

(ωH

l (i− 1)Λ2lωl(i− 1)

)−1ωH

l (i− 1)ωl(i− 1). (6.58)




The aforementioned expression can be decomposed as follows:

ωk(i) =∑l∈Nk

cklQl(i)Ql(i− 1) . . .Ql(1)ωl(0), (6.59)

where

Ql(i) = Λ2il ωl(0)

(ωH

l (0)Λ4i−2l ωl(0)

)−1ωH

l (0)Λ2i−2l . (6.60)

At this point, we need to establish that the norm of SDk(i), for all i, is both lower and up-

per bounded, i.e., 0 <∥ SDk(i) ∥< ∞, for all i, and that ωk(i) = SDk

(i)∑

l∈Nkcklωl(i)

exponentially approaches ωk,opt(i) as i increases. Due to the linear mapping, the bound-

edness of SDk(i) is equivalent to the boundedness of ωk(i). Therefore, we have, upon

convergence, ωHk (i)ωk(i− 1) = ωH

k (i− 1)ωk(i− 1).

Because ∥ ωHk (i)ωk(i − 1) ∥≤∥ ωH

k (i − 1) ∥∥ ωk(i) ∥ and ∥ ωHk (i − 1)ωk(i −

1) ∥=∥ ωk(i − 1) ∥2, the relation ωHk (i)ωk(i − 1) = ωH

k (i − 1)ωk(i − 1) implies that

∥ ωk(i) ∥≥∥ ωk(i− 1) ∥, and hence

∥ ωk(∞) ∥≥∥ ωk(i) ∥≥∥ ωk(0) ∥ . (6.61)

To show that the upper bound ∥ ωk(∞) ∥ is finite, let us express the M × M matrix

Qk(i) as a function of the M × 1 vector ωl(i) =

ωl,1(i)

ωl,2(i)

and the M × M matrix

Λ =

Λl,1

Λl,2

. Substituting the previous expressions of ωl(i) and Λkl into Ql(i)

as given in (6.60), we obtain

Ql(i) =

Λ2il,1ωl,1(0)

Λ2il,2ωl,2(0)

(ωH

l,1(0)Λ4i−2l,1 ωl,1(0)+ω

Hl,2(0)Λ

4i−2l,2 ωl,2(0)

)−1

ωHl,1(0)Λ

2i−2l,1

ωHl,2(0)Λ

2i−2l,2

(6.62)

Using the matrix identity (A+B)−1 = A−1 −A−1B(I +A−1B)−1A−1 in the decom-

posedQl(i) in (6.62) and making i large, we get

Ql(i) = diag(1 . . . 1︸︷︷︸D

0 . . . 0︸︷︷︸M−D

) +Ol

(ϵl(i)

), (6.63)

where ϵl(i) = (λr+1/λr)2i, in which λr+1 and λr are the (r + 1)th and the rth largest

singular values ofR−1/2l (i)pl(i), respectively, and O(·) denotes the order of the argument.

Based on (6.63), it follows that, for some positive constant g, we have ∥ ωl(i) ∥≤ 1 +




gϵl(i). Based on (6.59), we obtain

∥ ωk(∞) ∥ ≤∑l∈Nk

ckl(∥ Ql(∞) ∥ . . . ∥ Ql(2) ∥∥ Ql(1) ∥∥ Ql(0) ∥)

≤∑l∈Nk

ckl

(∥ ωl(0) ∥

∞∏i=0

(1 + gϵl(i)

))

=∑l∈Nk

ckl

(∥ ωl(0) ∥ exp

( ∞∑i=1

log(1 + gϵl(i)

)))

≤∑l∈Nk

ckl

(∥ ωl(0) ∥ exp

( ∞∑i=1

gϵl(i)

))=∥

∑l∈Nk

ckl

(∥ ωl(0) ∥ exp

(g

1− (λr+1/λr)2

)). (6.64)

With the previous development, the norm of ωk(i) is proven to be both lower and upper

bounded. Once this case has been established, the expression in (6.55) converges for a

sufficiently large i to the reduced-rank Wiener filter. This condition is verified by equating

the terms of (6.58), which yields

ωk(i) =∑l∈Nk

ckl

(R

−1/2l (i)Φl,1Λl,1Φ

Hl,1pl(i) +Ol

(ϵl(i)

))(6.65)

where Φl,1 is a M × D matrix with the D largest eigenvectors of Rl(i), and Λl,1 is a

D ×D matrix with the largest eigenvalues ofRl(i).

6.5 Simulation results

In this section, we investigate the performance of the proposed DRJIO–NLMS algorithm

and DRJIO–RLS algorithm for distributed estimation in two scenarios: wireless sensor

networks and smart grids.

6.5.1 Wireless Sensor Networks

In this subsection, we compare our proposed DRJIO–NLMS algorithm and DRJIO–RLS

algorithm with the distributed NLMS algorithm (normalized version of [2]), distributed




0 100 200 300 400 500 600 700 800 900−30

−25

−20

−15

−10

−5

0

5

10

Time instant, i

MS

E(d

B)

DRJIO−RLS(D=5)Distributed PrincipalSubspace Estimation(D=5)[93]Krylov Subspace NLMS(D=5)[95]Distributed RLS(M=20)[20]Distributed NLMS(M=20)[2]DRJIO−NLMS(D=5)

Figure 6.4: Full–rank system with M=20

RLS algorithm [20], Krylov subspace NLMS [95] and distributed principal subspace es-

timation [93], based on their MSE performance. With the network topology structure

outlined in Fig. 6.1 with N = 20 nodes, we consider numerical simulations under three

scenarios

• Full–rank system with M=20

• Sparse system with M=20 (D valid coefficients and M −D zeros coefficients)

• Full–rank system with M=60

The input signal is generated as xk(i) = [xk(i) xk(i − 1) ... xk(i − M + 1)] and

xk(i) = uk(i) + αkxk(i − 1), where αk is a correlation coefficient and uk(i) is a white

noise process with variance σ2u,k = 1− |αk|2, to ensure the variance of xk(i) is σ2

x,k = 1.

The noise samples are modeled as complex Gaussian noise with variance of σ2n,k = 0.001.

We assume that the network has perfect transmission between linked nodes.




0 100 200 300 400 500 600 700 800 900−30

−25

−20

−15

−10

−5

0

5

Time instant, i

MS

E(d

B)


Figure 6.5: Sparse system with M=20

The step size µ0 for the distributed NLMS algorithm, Krylov subspace NLMS, dis-

tributed principal subspace estimation and DRJIO–NLMS is 0.15 and the η0 is set to 0.5.

For the distributed RLS algorithm and DRJIO–RLS algorithm, the forgetting factor λ is

equal to 0.99 and δ is set to 0.11. In Fig. 6.4, we compare the proposed DRJIO–NLMS,

DRJIO–RLS with the existing strategies using the full–rank system with M=20 and D=5.

The dimensionality reduction matrix SDk(0) is initialized as [ID 0D,M−D]

T . We observe

that the proposed DRJIO–RLS algorithm has the best performance when compared with

other algorithms, while the proposed DRJIO–NLMS algorithm also has a better perfor-

mance, which is very close to the distributed RLS algorithm. However, its complexity is

an order of magnitude lower than the distributed RLS algorithm and DRJIO–RLS algo-

rithm.

In a sparse system with M=20 scenario, the proposed DRJIO–RLS and DRJIO–NLMS

algorithms still have excellent performances as shown in Fig. 6.5. Specifically, the pro-

posed DRJIO–NLMS algorithm performs very close to the distributed RLS algorithm

and outperforms the other analyzed algorithms. When the full–rank system M increas-

es to 60, Fig. 6.6 illustrates that, the proposed DRJIO–RLS algorithm still has the best




0 100 200 300 400 500 600 700 800 900−30

−25

−20

−15

−10

−5

0

5

10

15

Time instant, i

MS

E(d

B)


Figure 6.6: Full–rank system with M=60

performance, while DRJIO–NLMS algorithm also shows a fast convergence rate, which

is comparable to the distributed RLS algorithm. For the distributed NLMS algorithm,

Krylov subspace NLMS and distributed principal subspace estimation, their convergence

speed is much lower.

At last, we compare the performance between the proposed DRJIO–NLMS and DCE

scheme in Chapter 5, under different sparsity level scenarios. The step size for both

algorithms are set to 0.3 and the η0 for DRJIO–NLMS is set to 0.5. The length of the

unknown parameter ω0 is 20 and D = 10. For the first scenario, the number of non–zero

coefficients in the unknown parameter is 3 and for the second scenario, the number of

non–zero coefficients is set to 10. The comparison results are shown in Fig. 6.7 and 6.8.

It is clear that in a very sparse system, the proposed DCE scheme performs better than

the DRJIO–NLMS algorithm. With the decrease of system sparse level, the proposed

DRJIO–NLMS algorithm performs better than the DCE scheme.




0 50 100 150 200 250 300 350 400−30

−25

−20

−15

−10

−5

0

5

10

Time instant, i

MS

E(d

B)

Sparsity Level S=3

DRJIO−NLMSDCE Scheme

Figure 6.7: DRJIO–NLMS vs DCE scheme with sparsity level S=3

0 50 100 150 200 250 300 350 400−30

−25

−20

−15

−10

−5

0

5

10

Time instant, i

MS

E(d

B)

Sparsity Level S=10

DRJIO−NLMSDCE Scheme

Figure 6.8: DRJIO–NLMS vs DCE scheme with sparsity level S=10




6.5.2 Smart Grids

In order to test the proposed algorithms in a possible smart grid scenario, we consider

the Hierarchical IEEE 14–bus system which has been proposed in [108], where 14 is the

number of substations. At every time instant i, each bus k, k = 1, 2, . . . , 14, takes a scalar

measurement dk(i) according to

dk(i) = Xk

(ω0(i)

)+ nk(i), k = 1, 2, . . . , 14, (6.66)



mean equal to zero and which corresponds to bus k.

We focus on the linearized DC state estimation problem. We assume that each bus

connects and measures three users’ state, as a result, for the IEEE–14 bus system, there

will be 42 users in the system. The system is built with 1.0 per unit (p.u) voltage magni-

tudes at all users and j1.0 p.u. branch impedance. Then, the state vector ω0(i) is taken as

the voltage phase angle vector ω0 for all users. Initially, each bus only knows the voltage

phase angle of the three users connected to it. With the help of distributed estimation

algorithms, each bus will estimate the state of the voltage phase angles for all users in

the system. Therefore, the nonlinear measurement model for state estimation (6.66) is

approximated by

dk(i) = xHk (i)ω0 + nk(i), k = 1, 2, . . . , 14, (6.67)

where xk(i) is the measurement Jacobian vector for bus k. Then, the aim of the distribut-

ed estimation algorithm is to compute an estimate of ω0, which can minimize the cost

function given by


and the global network cost function is described by

Jω(ω) =N∑k=1

E|dk(i)− xHk (i)ω|2. (6.69)

We compare the proposed algorithms with the M–CSE algorithm [4], the distributed RLS

algorithm [20], the distributed NLMS algorithm (normalized version of [2]) and distribut-

ed principal subspace estimation [93] in terms of MSE performance. The MSE compari-




1

5

2

9

14

10

7

4

8

3

12

13

6

11

Figure 6.9: Hierarchical IEEE 14–bus system

son is used to determine the accuracy of the algorithms and the rate of convergence. We

define the Hierarchical IEEE–14 bus system as in Fig. 6.9.

All buses are corrupted by additive white Gaussian noise with variance σ2n,k = 0.001.

The step size for the distributed NLMS [2], the proposed DRJIO–NLMS algorithms is

0.15 and the η0 is set to 0.5. The parameter vector ω0 is set to an all–one vector with size

42 × 1. For the distributed RLS, DRJIO–RLS algorithms the forgetting factor λ is set to

0.99 and δ is equal to 0.11. The reduced dimension D is set to 10 for both DRJIO–RLS

and DRJIO–NLMS algorithm. The results are averaged over 100 independent runs. We

simulate the proposed algorithms for smart grids under a static scenario.

From Fig. 6.10, it can be seen that the proposed DRJIO–RLS algorithm has the best

performance, and significantly outperforms the distributed NLMS [2] and the M–CSE [4]

algorithms. The DRJIO–NLMS is slightly worse than distributed RLS algorithm [20], but

better than the distributed NLMS and M–CSE algorithms.




0 100 200 300 400 500 600 700 800 900 −30

−25

−20

−15

−10

−5

0

5

10

Time instant, i

MS

E(d

B)

DRJIO−RLS(M=10)M−CSE[4]Distributed RLS[20]Distributed NLMS[2]DRJIO−NLMS(M=10)Distributed PrincipalSubspace Estimation(D=10)[93]

Figure 6.10: MSE performance for smart grids

6.6 Summary

In this chapter, we have proposed a novel distributed reduced–rank scheme along with

efficient algorithms for distributed estimation in wireless sensor networks and smart grids.

Simulation results have shown that the proposed DRJIO–RLS has the best performance,

while DRJIO–NLMS algorithm has a better performance and lower cost than existing

algorithms in all the three scenarios considered. We have also compared the proposed

algorithms with the DCE scheme, which was presented in chapter 5, for systems with

different levels of sparsity. Furthermore, the proposed scheme requires the transmission

of only D parameters instead of M , resulting in higher bandwidth efficiency than standard

schemes.


Chapter 7

Conclusions and Future Work

Contents7.1 Summary of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.1 Summary of the Work

In this thesis, a number of innovative distributed cooperative strategies for dealing with

exchange of information, node failures and compression of data have been considered

for wireless sensor networks, spectrum estimation and smart grids, which require low

complexity and high performance algorithms. The proposed distributed algorithms have

been studied in statistical inference problems and found to be highly effective and to

outperform previously reported algorithms in the literature in several applications..

In Chapter 3, distributed adaptive algorithms based on the conjugate CG method for

distributed networks have been presented. Both incremental and diffusion adaptive so-

lutions are considered. In particular, the IDCCG/IDMCG and DDCCG/DDMCG algo-

rithms have been proposed for applications to parameter and spectrum estimation. The

distributed CCG and MCG algorithms provide an improved performance in terms of MSE

as compared with LMS–based algorithms and a performance that is close to RLS algo-

rithms. The design of preconditioners for CG algorithms, which have the ability to im-


142

2015

CHAPTER 7. CONCLUSIONS AND FUTURE WORK 143

prove the performance of the proposed CG algorithms is also presented in this chapter.

The resulting algorithms are distributed, cooperative and able to respond in real time to

changes in the environment. Simulation results have proved the advantages of the pro-

posed IDCCG/IDMCG and DDCCG/DDMCG algorithms in different applications. The

distributed CG algorithms outperform LMS algorithms and have a close performance to

RLS algorithms without the numerical problems of the latter. Designers could implement

the distributed CG algorithms in hardware and deploy sensor networks based on these

tools.

In Chapter 4, adaptive link selection algorithms for distributed estimation and their

application to wireless sensor networks and smart grids have been investigated. Specifi-

cally, based on the LMS/RLS strategies, exhaustive search-based LMS/RLS link selection

algorithms and sparsity-inspired LMS/RLS link selection algorithms that can exploit the

topology of networks with poor-quality links have been considered. The proposed link se-

lection algorithms were then analyzed in terms of their stability, steady-state and tracking

performance, and computational complexity. We have compared the proposed algorithms

with existing methods. In comparison with existing centralized or distributed estimation

strategies, more accurate estimates and faster convergence speed can be obtained for the

proposed algorithms and the network is equipped with the ability of link selection that

can circumvent link failures and improve the estimation performance. We have also de-

vised analytical expressions to predict their MSE steady–state performance and tracking

behavior. Simulation experiments have been conducted to verify the analytical results and

illustrate that the proposed algorithms significantly outperform the existing strategies, in

both static and time–varying scenarios, in examples of wireless sensor networks and s-

mart grids. Furthermore, the proposed link selection algorithms could be used in any

distributed scenario where the topology is not optimized in order to further improve the

performance.

In Chapter 5, a novel distributed compressed estimation scheme, namely DCE scheme,

has been introduced for sparse signals and systems based on compressive sensing tech-

niques. The proposed scheme consists of compression and decompression modules in-

spired by compressive sensing to perform distributed compressed estimation. In the DCE

scheme, the estimation procedure has been performed in a compressed dimension. A

design procedure has also been presented and an algorithm developed to optimize mea-



surement matrices, which can further improve the performance of the proposed distributed

compressed estimation scheme. The simulation results for a WSN application have show

that the DCE scheme outperforms existing strategies in terms of convergence rate, re-

duced bandwidth and MSE performance. Designers could implement the DCE scheme in

hardware for scenarios with sparse signals and deploy sensor networks for detection and

estimation based on this scheme.

In Chapter 6, the challenge of estimating large dimension unknown parameter vectors

in wireless sensor networks and smart grids that requires large communication bandwidth

is addressed. A novel distributed reduced-rank scheme and adaptive algorithms have

been proposed for distributed estimation in wireless sensor networks and smart grids. In

particular, we have developed a novel dimensionality reduction scheme and adaptive al-

gorithms for performing distributed dimensionality reduction and computing low–rank

approximations of unknown parameter vectors. The proposed scheme is based on a trans-

formation that performs dimensionality reduction at each agent of the network followed

by a reduced–dimension parameter vector. The DRJIO–NLMS and DRJIO–RLS have

been developed to achieve significantly reduced communication overhead and improved

performance. Simulation results illustrate the advantages of the proposed strategy in terms

of convergence rate and MSE performance. In addition, the proposed DRJIO–NLMS and

DRJIO–RLS algorithms could be used in any distributed scenario for signals that exhibit

some level of sparsity or redundancy in order to further improve the performance.

7.2 Future Work

Many of the proposed schemes and algorithms detailed in this thesis have potential to be

applied to scenarios, systems and techniques outside the scope of this thesis, and there

is further work and analysis that could be considered to extend the work that has been

covered.

The proposed distributed CG based algorithms in chapter 3 can be extended to a

sparsity–aware version and a cooperation with CETUC, PUC–Rio will be carried out

on the topic of ”Sparsity–Aware Distributed Conjugate Gradient Algorithm for Spectrum



Sensing”. This could be particularly useful in spectrum sensing problems where the spec-

trum content can be modelled as a sparse parameter vector and exploited via sparsity–

aware distributed CG algorithms.

New distributed network cooperative protocols can also be investigated and compared

with current protocols proposed in the thesis. Muti–task distributed processing, where

the task is to estimate multiple parameter vectors, can also be considered to deal with

parameter estimation in big data problems and heterogeneous networks.

In chapter 4, the proposed link selection strategies softly change the network topology

in wireless sensor networks and smart grids by only employing a selected subset of links,

which correspond to the effective topology used for a desired task. Further research on

physical dynamic topology adaptation strategies can be carried out as future work.

The algorithms and systems described in this thesis have not considered the channel

state between linked nodes. In order to broaden the application of the algorithms in more

realistic scenarios, the flat fading or fast fading channel design may be considered in the

future.


Bibliography

[1] C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies over distributed

networks,” IEEE Trans. Signal Process., vol. 48, no. 8, pp. 223–229, Aug 2007.

[2] ——, “Diffusion least–mean squares over adaptive networks: Formulation and per-

formance analysis,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3122–3136,

July 2008.

[3] Y. Chen, Y. Gu, and A. O. Hero, “Sparse LMS for system identification,” in

Proc. IEEE International Conference on Acoustics, Speech and Signal Processing,

Taipei, Taiwan, April 2009, pp. 3125–3128.

[4] L. Xie, D. H. Choi, S. Kar, and H. V. Poor, “Fully distributed state estimation for

wide-area monitoring systems,” IEEE Trans. Smart Grid, vol. 3, no. 3, pp. 1154–

1169, September 2012.

[5] A. H. Sayed and C. G. Lopes, “Adaptive processing over distributed networks,”

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E90-A, no. 8, pp.

1504–1510, August 2007.

[6] O. Axelsson, Iterative Solution Methods. New York, NY, USA: Cambridge Uni-

versity Press, 1995.

[7] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore, MD,

USA: Johns Hopkins University Press, 1996.

[8] P. S. Chang and J. A. N. Willson, “Analysis of conjugate gradient algorithms for

adaptive filtering,” IEEE Trans. Signal Process., vol. 48, no. 2, pp. 409–418, Febru-

ary 2000.


146

2015

BIBLIOGRAPHY 147

[9] S. Haykin, Adaptive Filter Theory, 4th ed. Upper Saddle River, NJ, USA: Prentice

Hall, 2002.

[10] A. H. Sayed, Fundamentals of Adaptive Filtering. Hoboken, NJ, USA: John

Wiley&Sons, 2003.

[11] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theor., vol. 52, no. 4, pp.

1289–1306, April 2006.

[12] E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE

Signal. Proc. Mag., vol. 25, no. 2, pp. 21–30, March 2008.

[13] R. C. de Lamare and R. Sampaio-Neto, “Adaptive reduced-rank processing based

on joint and iterative interpolation, decimation and filtering,” IEEE Trans. Signal

Process., vol. 57, no. 7, pp. 2503–2514, July 2009.

[14] A. H. Sayed, “Adaptation, learning, and optimization over networks,” Foundations

and Trends in Machine Learning., vol. 1, no. 4-5, pp. 311–801, July 2014.

[15] ——, “Diffusion adaptation over networks,” E-Reference Signal Processing, R.

Chellapa and S. Theodoridis, editors, Elsevier, 2013.

[16] L. Li and J. A. Chambers, “Distributed adaptive estimation based on the APA al-

gorithm over diffusion networks with changing topology,” in Proc. IEEE/SP 15th

Workshop on Statistical Signal Processing, Cardiff, Wales, September 2009, pp.

757–760.

[17] F. S. Cattivelli and A. H. Sayed, “Diffusion LMS strategies for distributed estima-

tion,” IEEE Trans. Signal Process., vol. 58, p. 10351048, March 2010.

[18] G. Mateos, I. D. Schizas, and G. B. Giannakis, “Distributed Recursive Least-

Squares for Consensus-Based In-Network Adaptive Estimation,” IEEE Trans. Sig-

nal Process., vol. 57, no. 11, pp. 4583–4588, November 2009.

[19] A. Bertrand and M. Moonen, “Distributed adaptive node–specific signal estimation

in fully connected sensor networks–part II: Simultaneous and asynchronous node

updating,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5292–5306, 2010.


BIBLIOGRAPHY 148

[20] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursive least-squares

for distributed estimation over adaptive networks,” IEEE Trans. Signal Process.,

vol. 56, no. 5, pp. 1865–1877, May 2008.

[21] F. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed kalman filter-

ing and smoothing,” IEEE Trans. Autom. Control, vol. 55, no. 9, pp. 2069–2084,

September 2010.

[22] J. A. Bazerque and G. B. Giannakis, “Distributed spectrum sensing for cognitive

radio networks by exploiting sparsity,” IEEE Trans. Signal Process., vol. 58, no. 3,

pp. 1847–1862, March 2010.

[23] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis

pursuit,” SIAM JOURNAL ON SCIENTIFIC COMPUTING, vol. 20, pp. 33–61,

1998.

[24] Y. V. Zakharov, T. C. Tozer, and J. F. Adlard, “Polynomial spline–approximation of

clarke’s model,” IEEE Trans. Signal Process., vol. 52, no. 5, pp. 1198–1208, May

2004.

[25] P. D. Lorenzo, S. Barbarossa, and A. H. Sayed, “Distributed spectrum estimation

for small cell networks based on sparse diffusion adaptation,” IEEE Signal Pro-

cessing Letters, vol. 20, no. 12, pp. 1261–1265, December 2013.

[26] A. Bose, “Smart transmission grid applications and their supporting infrastructure,”

IEEE Trans. Smart Grid, vol. 1, no. 1, pp. 11–19, June 2010.

[27] X. Zhao and A. H. Sayed, “Single-link diffusion strategies over adaptive networks,”

in Proc. IEEE International Conference on Acoustics, Speech and Signal Process-

ing, Kyoto, Japan, March 2012, pp. 3749–3752.

[28] S. Xu, R. C. de Lamare, and H. V. Poor, “Dynamic topology adaptation for dis-

tributed estimation in smart grids,” in Proc. IEEE 5th International Workshop on

Computational Advances in Multi-Sensor Adaptive Processing, December 2013,

pp. 420–423.

[29] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Syst. Control

Lett., vol. 53, no. 1, pp. 65–78, September 2004.


BIBLIOGRAPHY 149

[30] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents

with switching topology and time-delays,” IEEE Trans. Autom. Control., vol. 49,

pp. 1520–1533, September 2004.

[31] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile au-

tonomous agents using nearest neighbor rules,” IEEE Trans. Autom. Control,

vol. 48, no. 6, pp. 988–1001, June 2003.

[32] S. Haykin, Adaptive Filter Theory, 3rd ed. Upper Saddle River, NJ, USA: Prentice

Hall, 1996.

[33] R. G. Baraniuk, “Compressive sensing,” IEEE Signal Process. Mag., vol. 24, no. 4,

pp. 118–121, July 2007.

[34] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal

reconstruction from highly incomplete frequency information,” IEEE Trans. Inf.

Theor., vol. 52, no. 2, pp. 489–509, February 2006.

[35] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf.

Theor., vol. 51, no. 12, pp. 4203–4215, December 2005.

[36] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,”

IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, December 1993.

[37] J. A. Tropp and A. Gilbert, “Signal recovery from random measurements via or-

thogonal matching pursuit,” IEEE Trans. Inform. Theory, vol. 53, no. 12, pp. 4655–

4666, December 2007.

[38] D. L. Donoho, Y. Tsaig, I. Drori, and J. L. Starck, “Sparse solution of underde-

termined systems of linear equations by stagewise orthogonal matching pursuit,”

IEEE Trans. Inf. Theor., vol. 58, no. 2, pp. 1094–1121, February 2012.

[39] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery from incomplete

and inaccurate samples,” Appl. Comp. Harmonic Anal., vol. 26, no. 3, p. 301321,

May 2009.

[40] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sens-

ing,” Appl. Comp. Harmonic Anal., vol. 27, no. 3, pp. 265–274, November 2009.


BIBLIOGRAPHY 150

[41] R. G. Baraniuk, M. Davenport, R. DeVore, and M. B. Wakin, “A simple proof of the

restricted isometry principle for random matrices (aka the johnson-lindenstrauss

lemma meets compressed sensing),” Constructive Approximation, vol. 28, no. 3,

pp. 253–263, Springer New York, 2007.

[42] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit:

recursive function approximation with applications to wavelet decomposition,” in

Proc. Conference Record of The Twenty-Seventh Asilomar Conference on Signals,

Systems and Computers, Pacic Grove, CA, November 1993, pp. 40–44 vol.1.

[43] P. D. Lorenzo and A. H. Sayed, “Sparse distributed learning based on diffusion

adaptation,” IEEE Trans. Signal Process., vol. 61, no. 6, pp. 1419–1433, March

2013.

[44] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the

Royal Statistical Society, Series B, vol. 58, pp. 267–288, 1994.

[45] E. J. Candes, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted ℓ1 mini-

mization,” J. Fourier Anal. Appl., vol. 14, pp. 877–905, 2007.

[46] S. Xu, R. C. de Lamare, and H. V. Poor, “Adaptive link selection strategies for

distributed estimation in diffusion wireless networks,” in Proc. IEEE International

Conference on Acoustics, Speech, and Signal Processing, May 2013, pp. 5402–

5405.

[47] A. H. Sayed and C. G. Lopes, “Distributed recursive least-squares strategies over

adaptive networks,” in Fortieth Asilomar Conference on Signals, Systems and Com-

puters, October 2006, pp. 233–237.

[48] S. Xu, R. C. de Lamare, and H. V. Poor, “Distributed compressed estimation based

on compressive sensing,” IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1311–

1315, September 2015.

[49] M. S. E. Abadi and Z. Saffari, “Distributed estimation over an adaptive diffusion

network based on the family of affine projection algorithms,” in Sixth International

Symposium on Telecommunications, November 2012, pp. 607–611.


BIBLIOGRAPHY 151

[50] L. Wang and R. C. de Lamare, “Constrained adaptive filtering algorithms based

on conjugate gradient techniques for beamforming,” IET Signal Processing, vol. 4,

no. 6, pp. 686–697, December 2010.

[51] I. D. Schizas, G. Mateos, and G. B. Giannakis, “Distributed LMS for consensus–

based in-network adaptive processing,” IEEE Trans. Signal Process., vol. 57, no. 6,

pp. 2365–2382, June 2009.

[52] P. S. Chang and J. A. N. Willson, “Adaptive filtering using modified conjugate gra-

dient,” in Proceedings., Proceedings of the 38th Midwest Symposium on Circuits

and Systems, vol. 1, August 1995, pp. 243–246.

[53] R. Fletcher, Practical Methods of Optimization, 2nd ed. Chichester, U.K.: Wiley,

1987.

[54] D. F. Shanno, “Conjugate gradient methods with inexact searches,” Mathematics

of Operations Research, vol. 3, no. 3, pp. 244–256, 1978.

[55] X. Zhao and A. H. Sayed, “Performance limits for distributed estimation over lms

adaptive networks,” IEEE Trans. Signal Process., vol. 60, no. 10, pp. 5107–5124,

October 2012.

[56] M. Benzi, “Preconditioning techniques for large linear systems: A survey,” Journal

of Computational Physics, vol. 182, no. 2, pp. 418–477, 2002.

[57] S. C. Eisenstat, “Efficient implementation of a class of preconditioned conjugate

gradient methods,” SIAM Journal on Scientific and Statistical Computing, vol. 2,

no. 1, pp. 1–4, 1981.

[58] A. V. Knyazev, “Toward the optimal preconditioned eigensolver: Locally opti-

mal block preconditioned conjugate gradient method,” SIAM Journal on Scientific

Computing, vol. 23, no. 2, pp. 517–541, 2001.

[59] O. Axelsson and G. Lindskog, “On the rate of convergence of the preconditioned

conjugate gradient method,” Numerische Mathematik, vol. 48, no. 5, pp. 499–523,

1986.

[60] D. Bertsekas, “A new class of incremental gradient methods for least squares prob-

lems,” SIAM J. Optim, vol. 7, no. 4, p. 913926, 1997.


BIBLIOGRAPHY 152

[61] A. Nedic and D. Bertsekas, “Incremental subgradient methods for nondifferen-

tiable optimization,” SIAM J. Optim, vol. 12, no. 1, p. 109138, 2001.

[62] M. G. Rabbat and R. D. Nowak, “Quantized incremental algorithms for distributed

optimization,” IEEE J. Sel. Areas Commun, vol. 23, no. 4, p. 798808, April 2005.

[63] P. D. Lorenzo, S. Barbarossa, and A. H. Sayed, “Sparse diffusion LMS for dis-

tributed adaptive estimation,” in Proc. IEEE International Conference on Acous-

tics, Speech, and Signal Processing, Kyoto, Japan, March 2012, pp. 3281–3284.

[64] C. G. Lopes and A. H. Sayed, “Diffusion adaptive networks with changing topolo-

gies,” in Proc. IEEE International Conference on Acoustics, Speech and Signal

Processing, Las Vegas, 2008, pp. 3285–3288.

[65] B. Fadlallah and J. Principe, “Diffusion least-mean squares over adaptive networks

with dynamic topologies,” in Proc. IEEE International Joint Conference on Neural

Networks, Dallas, TX, USA 2013.

[66] T. Wimalajeewa and S. K. Jayaweera, “Distributed node selection for sequential

estimation over noisy communication channels,” IEEE Trans. Wirel. Commun.,

vol. 9, no. 7, pp. 2290–2301, July 2010.

[67] R. C. de Lamare and P. S. R. Diniz, “Set-membership adaptive algorithms based

on time-varying error bounds for CDMA interference suppression,” IEEE Trans.

Vehi. Techn., vol. 58, no. 2, pp. 644–654, February 2009.

[68] L. Guo and Y. F. Huang, “Frequency-domain set-membership filtering and its ap-

plications,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1326–1338, April 2007.

[69] R. Meng, R. C. de Lamare, and V. H. Nascimento, “Sparsity-aware affine projection

adaptive algorithms for system identification,” in Proc. Sensor Signal Processing

for Defence Conference, London, UK, 2011.

[70] Y. Chen, Y. Gu, and A. Hero, “Regularized least-mean-square algorithms,” Techni-

cal Report for AFOSR, December 2010.

[71] S. Xu and R. C. de Lamare, “Distributed conjugate gradient strategies for distribut-

ed estimation over sensor networks,” in Proc. Sensor Signal Processing for Defence

2012, London, UK, 2012.


BIBLIOGRAPHY 153

[72] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation. Englewood Cliffs,

NJ, USA: Prentice-Hall, 2000.

[73] R. C. de Lamare and P. S. R. Diniz, “Blind adaptive interference suppression based

on set-membership constrained constant-modulus algorithms with dynamic bound-

s,” IEEE Trans. Signal Process., vol. 61, no. 5, pp. 1288–1301, March 2013.

[74] Y. Cai and R. C. de Lamare, “Low–complexity variable step-size mechanism for

code-constrained constant modulus stochastic gradient algorithms applied to cdma

interference suppression,” IEEE Trans. Signal Process., vol. 57, no. 1, pp. 313–323,

January 2009.

[75] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ,

USA: Prentice-Hall, 1985.

[76] E. Eweda, “Comparison of RLS, LMS, and sign algorithms for tracking random-

ly time-varying channels,” IEEE Trans. Signal Process., vol. 42, p. 29372944,

November 1994.

[77] D. Angelosante, J. A. Bazerque, and G. B. Giannakis, “Online adaptive estimation

of sparse signals: Where rls meets the ℓ1–norm,” IEEE Trans. Signal Process.,

vol. 58, no. 7, pp. 3436–3447, July 2010.

[78] S. Chouvardas, K. Slavakis, Y. Kopsinis, and S. Theodoridis, “A sparsity promoting

adaptive algorithm for distributed learning,” IEEE Transactions on Signal Process-

ing, vol. 60, no. 10, pp. 5412–5425, October 2012.

[79] G. Mateos, J. A. Bazerque, and G. B. Giannakis, “Distributed sparse linear regres-

sion,” IEEE Transactions on Signal Processing, vol. 58, no. 10, pp. 5262–5276,

October 2010.

[80] Z. Liu, Y. Liu, and C. Li, “Distributed sparse recursive least-squares over network-

s,” IEEE Transactions on Signal Processing, vol. 62, no. 6, pp. 1386–1395, March

2014.

[81] M. O. Sayin and S. S. Kozat, “Compressive diffusion strategies over distributed

networks for reduced communication load,” IEEE Transactions on Signal Process-

ing, vol. 62, no. 20, pp. 5308–5323, October 2014.


BIBLIOGRAPHY 154

[82] J. Romberg, “Imaging via compressive sampling,” IEEE Signal. Proc. Mag.,

vol. 25, no. 2, pp. 14–20, March 2008.

[83] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressive wireless sensing,”

in Proc. The Fifth International Conference on Information Processing in Sensor

Networks, Nashville, TN, April 2006, pp. 134–142.

[84] Y. Yao, A. P. Petropulu, and H. V. Poor, “MIMO radar using compressive sam-

pling,” IEEE J. Sel. Top. Sign. Proces., vol. 4, no. 1, pp. 146–163, February 2010.

[85] C. Wei, M. Rodrigues, and I. Wassell, “Distributed compressive sensing recon-

struction via common support discovery,” in IEEE International Conference on

Communications, Kyoto, Japan, June 2011.

[86] D. Baron, M. F. Duarte, M. B. Wakin, S. Sarvotham, and R. G. Baraniuk, “Dis-

tributed compressive sensing,” in ECE Department Tech. Report TREE-0612, Rice

University, Houston, TX, November 2006.

[87] G. Quer, R. Masiero, G. Pillonetto, M. Rossi, and M. Zorzi, “Sensing, compression,

and recovery for WSNs: Sparse signal modeling and monitoring framework,” IEEE

Trans. Wireless Commun., vol. 11, no. 10, pp. 3447–3461, October 2012.

[88] G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approximations,” Con-

structive Approximation, vol. 13, no. 1, pp. 57–98, 1997.

[89] Y. Yao, A. Petropulu, and H. V. Poor, “Measurement matrix design for compressive

sensing–based MIMO radar,” IEEE Trans. Signal Process., vol. 59, no. 11, pp.

5338–5352, November 2011.

[90] E. Msechu, S. Roumeliotis, A. Ribeiro, and G. Giannakis, “Decentralized quan-

tized kalman filtering with scalable communication cost,” IEEE Trans. Signal Pro-

cess., vol. 56, no. 8, pp. 3727–3741, October 2008.

[91] J. J. Xiao, A. Ribeiro, Z. Q. Luo, and G. Giannakis, “Decentralized quantized

kalman filtering with scalable communication cost,” IEEE Signal Process. Mag.,

vol. 23, no. 4, pp. 27–41, July 2006.

[92] S. Pereira and A. Pages-Zamora, “Distributed consensus in wireless sensor net-

works with quantized information exchange,” in Proc. IEEE 9th Workshop on Sig-

nal Processing Advances in Wireless Communications, July 2008, pp. 241–245.


BIBLIOGRAPHY 155

[93] L. Li, A. Scaglione, and J. H. Manton, “Distributed principal subspace estimation

in wireless sensor networks,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 4, pp.

725–738, 2011.

[94] M. O. Sayin and S. Kozat, “Single bit and reduced dimension diffusion strategies

over distributed networks,” IEEE Signal Process. Lett., vol. 20, no. 10, pp. 976–

979, 2013.

[95] S. Chouvardas, K. Slavakis, and S. Theodoridis, “Trading off complexity with com-

munication costs in distributed adaptive learning via krylov subspaces for dimen-

sionality reduction,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 2, pp. 257–273,

2013.

[96] R. C. de Lamare and R. Sampaio-Neto, “Reduced–rank adaptive filtering based on

joint iterative optimization of adaptive filters,” IEEE Signal Process. Lett., vol. 14,

no. 12, pp. 980–983, 2007.

[97] Y. Cai and R. C. de Lamare, “Adaptive linear minimum BER reduced–rank inter-

ference suppression algorithms based on joint and iterative optimization of filters,”

IEEE Commun. Lett., vol. 17, no. 4, pp. 633–636, April 2013.

[98] P. Clarke and R. C. de Lamare, “Low–complexity reduced-rank linear interference

suppression based on set-membership joint iterative optimization for ds–cdma sys-

tems,” IEEE Trans. Veh. Technol., vol. 60, no. 9, pp. 4324–4337, November 2011.

[99] R. C. de Lamare and R. Sampaio-Neto, “Adaptive reduced-rank equalization al-

gorithms based on alternating optimization design techniques for mimo systems,”

IEEE Trans. Vehi. Techn., vol. 60, no. 6, pp. 2482–2494, 2011.

[100] ——, “Reduced-rank space–time adaptive interference suppression with joint it-

erative least squares algorithms for spread-spectrum systems,” IEEE Trans. Vehi.

Techn., vol. 59, no. 3, pp. 1217–1228, 2010.

[101] S. Nuan, W. U. Alokozai, R. C. de Lamare, and M. Haardt, “Adaptive widely lin-

ear reduced–rank beamforming based on joint iterative optimization,” IEEE Signal

Process. Lett., vol. 21, no. 3, pp. 265–269, March 2014.


BIBLIOGRAPHY 156

[102] L. Wang and R. C. de Lamare, “Low–complexity constrained adaptive reduced-

rank beamforming algorithms,” IEEE Trans. Aerosp. Electron. Syst., vol. 49, no. 4,

pp. 2114–2128, October 2013.

[103] H. Zheng, S. R. Kulkarni, and H. V. Poor, “Attribute-distributed learning: Models,

limits, and algorithms,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 386–398,

January 2011.

[104] I. Csiszr and G. Tusndy, “Information geometry and alternating minimization pro-

cedures,” Statist. Decis.Supplement Issue., no. 1, pp. 205–237, 1984.

[105] U. Niesen, D. Shah, and G. W. Wornell, “Adaptive alternating minimization algo-

rithms,” IEEE Trans. Inf. Theory., vol. 55, no. 3, pp. 1423–1429, March 2009.

[106] L. L. Scharf, “The svd and reduced-rank signal processing,” Signal Process.,

vol. 25, no. 2, pp. 113–133, November 1991.

[107] Y. Hua and M. Nikpour, “Computing the reduced-rank wiener filter by iqmd,” IEEE

Signal Process. Lett., vol. 6, no. 9, pp. 240–242, September 1999.

[108] H. Ma, Y. Yang, Y. Chen, and K. J. R. Liu, “Distributed state estimation in smart

grid with communication constraints,” in Asia–Pacific Signal Information Process-

ing Association Annual Summit and Conference, December 2012, pp. 1–4.


distributed signal processing algorithms for wireless networksdelamare.cetuc.puc-rio.br/songcen xu...

Documents