channel equalization using machine learning for underwater...

Master of Science Thesis in Electrical EngineeringDepartment of Electrical Engineering, Linköping University, 2020

Channel Equalization usingMachine Learning forUnderwater AcousticCommunications

Martin Allander

Master of Science Thesis in Electrical Engineering

Channel Equalization using Machine Learning for Underwater AcousticCommunications

Martin Allander

LiTH-ISY-EX--20/5301--SE

Supervisor: Dr. Özlem Tugfe Demirisy, Linköping University

Systems Engineer Oskar AxelssonSaab Dynamics

Examiner: Associate Professor Emil Björnsonisy, Linköping University

Division of Communication SystemsDepartment of Electrical Engineering

Linköping UniversitySE-581 83 Linköping, Sweden

Copyright © 2020 Martin Allander

Sammanfattning

Trådlös akustisk undervattenskommunikation är ett fält i utveckling med ett fler-tal applikationer. Den akustiska undervattenskanalen är väldigt speciell och be-teendet beror mycket på miljön kommunikationen sker i. Jämfört med trådlösradiokommunikation är den använda bandbredden mycket mindre och Doppler-effekten är mycket mer påtaglig, på grund av ljudets långsammare utbrednings-hastighet. Litteratur publicerade de senaste åren framhäver att maskininlärnings-assisterad kanalestimering och kanalutjämning jämfört med traditionella signal-behandlingsmetoder. Maskininlärning kan vara fördelaktigt att använda då detkan vara svårt att designea algoritmer för undervattenskommunikation, då gene-rella kanalmodeller har visat sig vara svårt att hitta. Denna studie syftar till attutforska ifall maskininlärnings-assisterad kanalestimering och kanalutjämningkan erbjuda ökad prestanda jämfört med tradiotionella metoder. I studien stude-ras övervakad maskininlärning med ett ”deep neural network” och ett ”recurrentneural network”, för att se om neuronnäten kan öka prestanda i termer av an-talet bitfel. En kanalsimulator med miljöspecifik indata används för att studeraett antal olika scenarion. Resultatet av simuleringarna syftar till att identifieraintressanta miljöer att testa neuronnäten i. Resultaten i studien pekar på att imycket tidsvarierande kanaler kan maskininlärning sänka bitfelsfrekvensen, omnätverk tränas med förhandsinformation om kanalen. Att utnyttja maskininlär-ning utan föregående information om kanalen resulterade i ingen förbättring avprestandan.

iii

Abstract

Wireless underwater acoustic (uwa) communications is a developing field withvarious applications. The underwater acoustic communication channel is veryspecial and its behavior is environment-dependent. The uwa channel is charac-terized by low available bandwidth, and severe motion-introduced Doppler effectcompared to wireless radio communication. Recent literature suggests that ma-chine learning (ml)-based channel estimation and equalization offer benefits overtraditional techniques (a decision feedback equalizer), in uwa communications.ml can be advantageous due to the difficultly in designing algorithms for uwacommunication, as finding general channel models have proven to be difficult.This study aims to explore if ml-based channel estimation and equalization as apart of a sophisticated physical layer structure can offer improved performance.In the study, supervised ml using a deep neural network and a recurrent neu-ral network will be utilized to improve the bit error rate. A channel simulatorwith environment-specific input is used to study a wide range of channels. Thesimulations are utilized to study in which environments ml should be tested. Itis shown that in highly time-varying channels, ml outperforms traditional tech-niques if trained with prior information of the channel. However, utilizing mlwithout prior information of the channel yielded no improvement of the perfor-mance.

v

Acknowledgments

A big thank you to Oskar Axelsson for hosting me at Saab Dynamics and givingme the freedom to outline this thesis after my interests and all the support andfeedback. At Saab Dynamics, I would also like to thank Per Abramhamsson andSimon Keisala, Per for helping me understand all the complications in modelingsonar propagation, and Simon for all the help with tweaking the neural networks.

Of course, a final thank you to Emil Björnson and Özlem Tugfe Demir. I ap-preciate all the feedback on the report and support.

Linköping, June 2020Martin Allander

vii

Contents

List of Figures xii

List of Tables xiv

Notation xv

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Theoretical Background 72.1 Underwater Channel Characteristics . . . . . . . . . . . . . . . . . 7

2.1.1 Attenuation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.3 Multipath Propagation . . . . . . . . . . . . . . . . . . . . . 92.1.4 Doppler Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.5 Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Channel Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Baseline Physical Layer . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 152.4.1 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . 162.4.2 The Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.3 The Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.5 Designs Considerations . . . . . . . . . . . . . . . . . . . . . 172.4.6 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . 192.4.7 Long Short-Term Memory Architecture . . . . . . . . . . . . 19

2.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

ix

x Contents

2.5.1 Machine Learning in Wireless Radio Communication . . . 202.5.2 Machine Learning in Underwater Acoustic Communication 212.5.3 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Method 233.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Baseline Receiver . . . . . . . . . . . . . . . . . . . . . . . . 243.1.2 Machine Learning Receiver . . . . . . . . . . . . . . . . . . 243.1.3 Choice of Parameters . . . . . . . . . . . . . . . . . . . . . . 243.1.4 Bit Error Rate Definition . . . . . . . . . . . . . . . . . . . . 25

3.2 Software Simulation Environment . . . . . . . . . . . . . . . . . . . 253.2.1 Machine Learning Software . . . . . . . . . . . . . . . . . . 253.2.2 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Channel Simulation Configuration . . . . . . . . . . . . . . . . . . 263.3.1 Bathymetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.2 Sound Speed Profiles . . . . . . . . . . . . . . . . . . . . . . 273.3.3 Bottom Sediment Types . . . . . . . . . . . . . . . . . . . . 273.3.4 Channel Geometry and General Parameters . . . . . . . . . 283.3.5 Channel Variations . . . . . . . . . . . . . . . . . . . . . . . 283.3.6 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Channel Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.1 Time-Variant Filter . . . . . . . . . . . . . . . . . . . . . . . 313.4.2 Simulation Loop . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Artificial Neural Network Structure . . . . . . . . . . . . . . . . . . 323.5.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . 333.5.2 Long Short-Term Memory Network . . . . . . . . . . . . . . 34

3.6 Artificial Neural Network Experiments . . . . . . . . . . . . . . . . 343.6.1 Training Data Generation . . . . . . . . . . . . . . . . . . . 35

3.7 Miscellaneous Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Results 374.1 Artificial Neural Network Experiments . . . . . . . . . . . . . . . . 37

4.1.1 High Bit Error Rate Channels . . . . . . . . . . . . . . . . . 384.1.2 Low Bit Error Rate Channels . . . . . . . . . . . . . . . . . . 39

4.2 Deployment Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 404.2.1 Low to Moderate Time-Variance . . . . . . . . . . . . . . . . 404.2.2 High Time-Variance . . . . . . . . . . . . . . . . . . . . . . . 42

4.3 Miscellaneous Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Discussion 455.1 The Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1.1 Artificial Neural Network Structure . . . . . . . . . . . . . 455.1.2 Deployment Strategies . . . . . . . . . . . . . . . . . . . . . 465.1.3 Miscellaneous Studies . . . . . . . . . . . . . . . . . . . . . 47

5.2 Error Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2.1 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 47

Contents xi

5.2.2 Machine Learning Software . . . . . . . . . . . . . . . . . . 485.3 Relation to Other Work . . . . . . . . . . . . . . . . . . . . . . . . . 485.4 Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.5 The Thesis in a Larger Perspective . . . . . . . . . . . . . . . . . . . 49

6 Conclusions 516.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.3 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A Bathymetry Profiles 57A.1 Shallow Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57A.2 Deep Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

B Sound Speed Profiles 63B.1 Shallow Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63B.2 Deep Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

C Channel Simulations 73

D Time-varying Channels 79

Bibliography 83

List of Figures

1.1 Picture of an underwater sensor node from Saab Dynamics. . . . . 2

2.1 Transmission loss as a function of frequency, at 1km range andwith k = 1.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 An illustration of the frss modulation format, compared to tradi-tional dsss, from [32]. . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Illustration of an lstm block and its connections, from [6]. . . . . 20

4.1 Performance comparison of the lstm, dnn and dfe-pll in highber channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 Performance comparison of the lstm, dnn and dfe-pll in lowber channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 ber as a function of snr, using three different equalizers. Shallowprofile, clay bottom ssp 2019-03-16. . . . . . . . . . . . . . . . . . 41



4.6 ber as a function of snr, using two different equalizers. Shallowprofile, clay bottom ssp 2019-01-07. . . . . . . . . . . . . . . . . . 43

4.7 ber as a function of snr, for online and offline-trained lstm incgn and wgn. Shallow obstacle profile, clay bottom ssp 2019-03-16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A.1 Illustration of the shallow flat bottom profile. . . . . . . . . . . . . 58A.2 Illustration of the shallow slope bottom profile. . . . . . . . . . . . 59A.3 Illustration of the shallow obstacle bottom profile. . . . . . . . . . 60A.4 Illustration of the deep flat bottom profile. . . . . . . . . . . . . . . 61A.5 Illustration of the deep slope bottom profile. . . . . . . . . . . . . . 61A.6 Illustration of the deep obstacle bottom profile. . . . . . . . . . . . 62

B.1 Sound speed as a function of depth, data from 2019-03-16 04:45REF M1V1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


xii

LIST OF FIGURES xiii



B.5 Sound speed as a function of depth, data from 2019-01-07 13:50Släggö. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68





C.1 Channel simulation, shallow scenario with sandy bottom type. . . 74C.2 Channel simulation, shallow scenario with clay bottom type. . . . 75C.3 Channel simulation, deep scenario with sandy bottom type. . . . . 76C.4 Channel simulation, deep scenario with clay bottom type. . . . . . 77

D.1 Channel impulse response for deep scenario, slope profile, sandbottom ssp 2019-03-05. . . . . . . . . . . . . . . . . . . . . . . . . . 79

D.2 Channel impulse response for deep scenario, slope profile, sandbottom ssp 2019-05-06. . . . . . . . . . . . . . . . . . . . . . . . . . 80

D.3 Channel impulse response for deep scenario, obstacle profile, claybottom ssp 2019-01-07. . . . . . . . . . . . . . . . . . . . . . . . . . 80

D.4 Channel impulse response for shallow scenario, obstacle profile,clay bottom ssp 2019-03-16. . . . . . . . . . . . . . . . . . . . . . . 81



List of Tables

1.1 Comparison of fundamental physical properties for radio commu-nication and uwa communication. . . . . . . . . . . . . . . . . . . 2

3.1 Configurable equalizer parameters. . . . . . . . . . . . . . . . . . . 253.2 Bottom sediment properties for two different kinds of bottom. . . 273.3 Channel geometry and general channel/simulation properties. . . 283.4 Configurable small-scale settings. . . . . . . . . . . . . . . . . . . . 293.5 Configurable large-scale (L-S) settings. . . . . . . . . . . . . . . . . 293.6 Configurable Doppler effect parameters. . . . . . . . . . . . . . . . 303.7 dnn layer structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.8 lstm layer structure. . . . . . . . . . . . . . . . . . . . . . . . . . . 343.9 Training options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xiv

Notation

Abbreviations

Abbreviations Meaning

adam Adaptive moment estimationann Artificial neural networkauv Autonomous underwater vehicleber Bit error ratecgn Colored Gaussian noisedfe Decision feedback equalizerdfe-pll Decision feedback equalizer with a phase-locked loopdnn Deep neural networkdsss Direct sequence spread spectrumfrss Frequency repetition spread spectrumfsk Frequency-shift keyingGPU Graphical processing unitisi Intersymbol interferencelstm Long short-term memorymcss Multi-carrier spread spectrumml Machine learningofdm Orthogonal frequency division multiplexingpll Phase-locked looppsd Power spectral densityqpsk Quadrature phase-shift keyingrelu Rectified linear unitrls Recursive least squaresrnn Recurrent neural networksiso Single input single outputsnr Signal-to-noise ratiossp Sound speed profileuwa Underwater acousticwgn White Gaussian noisewssus Wide-sense stationary uncorrelated scattering

xv

1Introduction

1.1 Motivation

Wireless terrestrial communications is a game-changing technology and is a defin-ing technology of the 20th and 21st century. Huge efforts have been made by theindustry and the research community to improve and optimize all aspects of mod-ern wireless communications. Modern wireless communications are based onelectromagnetic waves, which travel at the speed of light. Underwater commu-nication is not an equally explored territory as its terrestrial radio counterpartsince the applications are not necessarily as broad and mainstream. However,applications do exist, both civilian and military. Examples of civil applicationsare oceanographic studies, in terms of undersea exploration, environmental mon-itoring, and disaster prevention. Examples of military applications are communi-cations between submarines, surveillance systems, and mine reconnaissance. Inother words, underwater networks have interesting and varying applications. Apicture of an underwater sensor node designed by Saab Dynamics is shown inFigure 1.1.

Electromagnetic and optical waves are not feasible in underwater communica-tions as they are quickly absorbed by the water. So to communicate under waterat a longer range than a few meters, acoustic waves are utilized, although acous-tic waves come with some undesired properties. Table 1.1 presents some met-rics, comparing an example of the underwater acoustic (uwa) channel with anexample of the common terrestrial radio channel, to give the reader some under-standing of the implications of using acoustic waves compared to electromagneticwaves. Data for the radio channel is from the long term evolution standard, morecommonly known as 4G [1]. As can be seen in the table, the utilizable bandwidthand propagation speed in uwa communications are in vastly different scales com-

1

2 1 Introduction

Figure 1.1: Picture of an underwater sensor node from Saab Dynamics.

pared to radio communications.

Table 1.1: Comparison of fundamental physical properties for radio com-munication and uwa communication.

Property Radio channel uwa channel

Wave propagation speed 3 · 108 m/s 1500 m/sBandwidth 20 MHz 5 - 10 kHz

Center frequency 700 MHz - 2.7 GHz 5 - 30 kHz

Communication networks are often modeled by dividing the stack into differentlayers, an example is the OSI-model [2], each layer has different tasks and the low-est level is the physical layer. The physical layer has the task of transmitting theraw bit-stream over the physical channel [2]. The purpose of the transmitter isto map binary data onto a waveform which can be transmitted over the medium.The medium can, for example, be a radio-link or an acoustic link. The physicalmedium disturbs the transmitted waveform by introducing noise and attenuatingthe signal, etc. The receiver has the task of picking up the disturbed waveformand decode the information correctly. The physical layer transmitter and receivershould, therefore, be designed according to the communication channel it is op-erating in.

1.1 Motivation 3

The nodes in an underwater network can be static or moving. A static node canbe a sensor gathering data, while a moving node can be an autonomous underwa-ter vehicle (auv). Underwater networks are often built ad-hoc, as the applicationand the environment between nodes vary a lot. For example, a node that collectsdata from the ocean needs to establish a communication link between a node onthe surface. The distance between the nodes can be several kilometers, whichposes a significant challenge, due to the severe signal attenuation at long dis-tances. Another very different scenario would be two auvs wanting to establisha communication link in the Baltic Sea (around 80 meters deep). Here difficultiesdo not arise due to the vast distance, but instead, echoes caused by reflections ofaudio on the surrounding surfaces. The environment in the underwater channelbetween the nodes is also subject to constant change due to factors such as ma-rine wildlife, ocean vessels, and tides.

With the very different propagation environments described above, designing aphysical layer that can handle very different and difficult conditions is a chal-lenge. There are in literature a lot of suggestions for transmitter/receiver struc-tures for uwa communications with a lot of variations. In modern wireless ra-dio communication standards such as 4G and 5G, orthogonal frequency divisionmultiplexing (ofdm) is a commonly used method. In ofdm the available fre-quency band is divided into many sub-bands, each behaving similarly to an addi-tive white Gaussian noise (wgn) channel. ofdm reduces modem complexity andenables high data-rates. ofdm is applicable in uwa communication and can bebeneficial. However, ofdm does not always provide the optimal solution in uwacommunication, shown in [31]. The preferred physical layer design depends on ifthe application is long/short-range, high/low signal-to-noise ratio (snr), shallowor deep. Therefore more complicated physical layer designs for uwa exist andcan provide benefits over ofdm, models where the specifics of the uwa channelare taken into consideration. Developing physical layers for uwa communica-tion is non-trivial and with the different possible environments, uniformly goodperformance is not easily guaranteed. It is all worsened by the fact that testing isexpensive and time-consuming.

uwa has some leading characteristics, explained in Section 2.1, but modelingthe uwa channel is non-trivial, which is detailed in Section 2.2. Finding generalmodels has proven to be difficult, which leads to a model deficit. Physical layerdesigns are therefore often sub-optimal and unable to perform well in all con-ditions. In this case, machine learning (ml) is a good candidate to combat themodel deficit. Recent literature highlights ml as an alternative or complementapproach to classical signal processing, to improve and generalize performancefor physical layer algorithms. ml has also gained popularity in other fields suchas speech recognition, image processing, etc. It is therefore of great interest tostudy ifml algorithms can be utilized to improve the performance of a communi-cation system when the underwater channel introduces difficulties. The desirableproperty of an ml-based system is that it can be generalized, performing well inmultiple circumstances, if trained correctly.

4 1 Introduction

1.2 Purpose

Researchers in cooperation with Saab Dynamics have suggested a physical layer(transmitter and receiver) protocol based on the frequency repetition spread spec-trum (frss). The protocol is motivated for reliability and performance for lowsnr, where it outperforms ofdm [31]. The existing frss transmitter and receiverwill be used as a baseline for the project. The purpose is then to investigate ifthe channel estimation and equalization utilized by the frss based on a decisionfeedback equalizer with a phase-locked loop (dfe-pll) can be improved by uti-lizing ml. Channel estimation and equalization are considered among the mostdifficult tasks in uwa communication, due to the sparse time-varying multipathpropagation. The purpose of the study is to explore whether a receiver basedon ml methods like deep neural networks (dnns) or recurrent neural networks(rnns) can offer improved performance. The performance will be studied interms of coded bit error rate (ber) as a function of the snr when the baselinedfe-pll will be compared to the developed ml-based receiver.

1.3 Problem Formulation

The thesis aims to study

1. in which environments dnn or rnn-based channel estimation and channelequalization can improve performance compared to dfe-pll;

2. why it can offer improved performance and how much the performance canbe improved;

3. and possibilities to develop a solution where the rnn or dnn has no priorknowledge of the deployment scenario, either an online-training or a data-driven approach.

Various channel simulations will be performed to identify environments wherethe performance can be improved, which will be a sizeable part of the work.

1.4 Limitations

To limit the scope of the thesis, several limitations are set throughout the thesis.Here some of the initial assumptions are described, but more limitations are setthroughout the thesis as theory and models are introduced.

The distance between transmitter and receiver is assumed to be 1000 meters. Itwas also decided that the underwater environments should resemble the condi-tions in Swedish waters. Swedish waters, mainly the Baltic Sea and the waters inSkagerrak and Kattegatt are shallow. So the intention is to study shallow-watercommunications. Two depths, a shallow case (18 meters) and a deep case (72meters), are studied to represent different environments. Note that a depth of 72

1.5 Background 5

meters is still considered in the general field as shallow-water communications.It is assumed that the communication nodes utilize hydrophones in a single inputsingle output (siso) setup i.e., transmitter and receiver only use one hydrophone.The beam-pattern is assumed to be omnidirectional. The hardware used at SaabDynamics does not provide a truly omnidirectional pattern, but this assumptionreduces the complexity and the assumption is common in the literature. Trans-mitter and receiver locations are assumed to be static, only drifting slightly in theenvironment. Further limitations introduced in the thesis are intended to repli-cate a realistic simulation.

The aspect of computational complexity and hardware limitations of uwa com-munication modems will be disregarded.

1.5 Background

This thesis is written in corporation with Saab Dynamics in Linköping. SaabDynamics is a subsidiary of Saab AB. Saab AB provides high technological solu-tions within the military defense, civil defense, and aerospace. Saab Dynamics, inturn, supplies a wide range of products, such as torpedoes, missiles, ground com-bat equipment, and various naval solutions. The naval solutions include auvs,remotely-operated vehicles, underwater networks, and torpedoes (a mix of civiland military products). For example, remotely-operated vehicles can perform re-pairing missions on underwater pipelines or oceanographic studies.

Thus, underwater wireless communications is an area of great interest for SaabDynamics as it can allow for these products to become wireless and autonomous.Saab Dynamics works with a lot of partners to promote and participate in uwawireless communications research. As an active participant in the research com-munity and involved in product development, Saab Dynamics has an interest inuwa channel modeling and signal processing.

2Theoretical Background

This chapter describes the fundamental theory important to the study. First, anintroduction to the underwater channel is given in Section 2.1 and the channelmodels are discussed in Section 2.2. Based on theoretical knowledge about uwacommunications, the baseline frss physical layer is described in Section 2.3. Asection is devoted to describing the basics of artificial neural networks (anns)and the types dnns and rnns, which are studied in this thesis. The chapter isended by studying related work in communications.

2.1 Underwater Channel Characteristics

Due to the usage of acoustic signals, the signals transmitted in uwa are inher-ently wideband [25], i.e., the bandwidth of the signal B is large relative to thecarrier frequency fc. As highlighted in Table 1.1, the carrier frequency is substan-tially lower compared to electromagnetic waves, making the available bandwidthlower. The low bandwidth utilized also means that the supported data-rates arequite low, as the capacity of the channel increases with the available bandwidth[29]. The intuition behind the result is that more available bandwidth means thatmore information can be loaded into one transmission. In radio communications,assumptions are often based on the fact that fc B, this is not viable in the uwachannel [25]. To get a further understanding of the difficulties of the acousticchannel, some crucial aspects will be discussed below in separate sections.

2.1.1 Attenuation

The amount of power captured at the receiver is a determining factor if we canextract any data from the signal, or if we just receive noise. Thus, understandingthe attenuation of a signal through a communication medium is crucial in all

7

8 2 Theoretical Background

kinds of communication systems. In the uwa channel, the signal attenuationis frequency-dependent. The attenuation, i.e., path-loss A(l, f ) can be describedaccording to [25] as:

A(l, f ) = (l/ lr )kα(f )l−lr , (2.1)

where l is the propagation distance compared to a reference distance lr , f is thefrequency of the signal, and α(f ) is the absorption coefficient, which increaseswith increasing frequency. The exponent k, i.e., the path loss exponent, modelsthe spreading factor in the water, which is a factor between 1 and 2. α(f ) impliesthat low-frequency components of the signal transmitted through the water willbe received with higher power compared to the high-frequency components. Thefrequency-dependent absorption coefficient can be described by Thorp’s empiri-cal formula [16] in dB/km:

10 logα(f ) = 0.11f 2

1 + f 2 + 44f 2

4100 + f 2 + 2.75 · 10−4f 2 + 0.003, (2.2)

where f is in kHz. The frequency-dependent transmission is visualized in Figure2.1 for a typical bandwidth 5 − 10 kHz.

Figure 2.1: Transmission loss as a function of frequency, at 1km range andwith k = 1.7.

The choice of the spreading factor is determined by the physical properties of thechannel. A spreading factor of k = 2 corresponds to spherical spreading, wherethe transmission loss increases with the square of the range [30, p. 101] and isspread over the surface of a sphere. The choice of k ≈ 2 is comparable to deepocean communications when the sound waves propagate through the ocean with

2.1 Underwater Channel Characteristics 9

few obstacles. A spreading factor of k = 1 corresponds to cylindrical spread-ing [30, p. 102]. Cylindrical spreading occurs when the sound does not propa-gate freely horizontally or vertically, for example, when the vertical propagationis limited by the seafloor and surface. Cylindrical spreading can occur both atmoderate and long ranges [30, p. 102], where the sound is trapped between theseafloor and surface.

The reality in most scenarios is somewhere in between cylindrical and sphericalspreading. With our interest in shallow-water communication, a choice of k = 2is not realistic as the sound does not propagate freely in the medium. A choiceof k = 1 is not realistic as the trapped sound at a depth of 18 or 72 meters stillsuffers from attenuation when reflected on the surfaces, and the bending due tosound speed variations yields an inhomogeneous propagation.

2.1.2 Noise

All communication channels are subject to disturbing noise. A common assump-tion is to study the performance of a communication system in the presence ofambient wgn, where the white color describes the power spectral density (psd)of the noise being constant in the frequency range of interest and Gaussian de-scribes the probability density function of the noise. In the uwa channel, theambient noise may be modeled as Gaussian [25], although no specific motivationis provided. Some of the most important articles cited in this thesis assume thenoise is Gaussian, but without further references or motivation [23, 26, 32]. Thenoise psd in uwa communication is in fact colored (frequency-dependent), simi-lar to the path-loss [30, p. 206]. Both [25] and [30, p. 210] suggest that the noisepsd decreases at approximately 18 dB/decade with increasing frequency.

In the water, there exists a lot of interference, such as ocean wildlife and shipnoise. These noise sources can differ a lot depending on the environment, for ex-ample in a harbor or the middle of the sea. Attempts have been made to modelspecific interference sources, such as shipping lanes [3] and shrimp clicking [5].Due to the unpredictability of some of these interference sources, they can poten-tially disrupt even the most optimal receivers, designed under assumptions ofcolored or white Gaussian noise.

2.1.3 Multipath Propagation

The environment between the receiver and the transmitter is not obstacle-free,therefore, the acoustic signal is reflected while propagating in the environment.Thus the receiver can pick up multiple delayed instances (from multiple paths) ofthe originally transmitted signal, as reflective components are received. Multipledelayed instances of signals at the receiver cause intersymbol interference (isi).isi is an issue that must be dealt with by the receiver, and if not, can disrupt anyattempts at communication. Multipath propagation is commonly modeled as atapped-line impulse response [24], where the tap gains are modeled by stochastic


processes. In standard terrestrial wireless communications, multipath can bein large quantities. In the uwa channel, the number of propagation paths isnot necessarily as many. The issue is rather that the isi can last several symbolintervals [26], due to the speed of sound, as sound travels at a much slower speedin water compared to electromagnetic waves.

Time-Varying Multipath Propagation

It is important to notice that the multipath propagation properties shift slowlyover time. The time-varying multipath channel is described by a time-variant fil-ter h(τ ; t), as described in [2, p. 132]. The variable τ corresponds to the delays inthe impulse response, i.e., filter taps. The variable t describes how h(τ ; t) varieswith time. When the impulse response varies faster with respect to τ in compar-ison to the variable t, the filter can be considered a sequence of time-invariantfilters [2, p. 132]. The output of the filter is given by the convolution:

y(t) =

∞∫−∞

h(τ ; t)x(t − τ)dτ, (2.3)

where x(t) is the input signal to the system.

Environmental Variations

The behavior of multipath propagation is determined by the appearance of thephysical channel [24]. In shallow waters, the channel impulse response is de-termined by reflections on the surface and bottom, as well as other objects andthe direct path [24]. The appearance of the bottom is called bathymetry andcan be compared to topography on land. Bathymetry between two communica-tion nodes depends on where the nodes are deployed. Hence, there is no singlebathymetry valid for all communications. The importance of having an under-standing of the bathymetry for communications will be highlighted in Section3.3.1. The surface properties between the communication nodes are not staticeither. Due to tides, waves, and other phenomena the surface is subject to vari-ations over time compared to the bathymetry which can be considered ratherstatic once known. Attempts at modeling the behavior of the surface will berather limited in this thesis, but the impact of surface variations should be men-tioned. Entire studies have been devoted to modeling the impact of waves oncommunications and concluded that they are significant [15]. Even details suchas air bubbles due to crashing waves also affect communications.

Sound Speed Variations

The speed of sound in water varies with depth [30, p. 111], due to varying lev-els of temperature, pressure, and salinity, etc. This is important to consider asthe sound waves do not propagate homogeneously in the water due to this trait.Sound waves are bent, which causes further unpredictabilities. The sound speed

2.2 Channel Models 11

profile (ssp) can vary largely with the seasons. In the spring, cold water fromrivers cools down the surface water, while the deeper ocean water keeps a stabletemperature, causing a "knee" in the sound speed profile, Figure B.5 illustratesthis well. Similar behaviors can be observed in the summer when water close tothe surface is heated up. This affects the geometry of the multipath propagation[24].

2.1.4 Doppler Effect

The Doppler effect is the frequency shift in the observed signal due to the move-ment of the transmitter or receiver relative to the path traveled by the signal.The Doppler effect is present in all non-stationary communication channels, butas in all previous sections, it is more severe in the underwater case. The mag-nitude of the Doppler effect is proportional to a = v/c, where c is the propaga-tion speed of the signal and v is the velocity of the moving transmitter/receiver.To give some insight, c ≈ 1500 m/s for underwater sound, while radio wavestravel at the speed of light, c ≈ 3 · 108 m/s. Hence, for a particular object ve-locity v, the Doppler effect would be higher by a factor of 200000 for the uwachannel. As [25] points out, there are few comparable scenarios in radio com-munications, only for low-orbit satellite communications similar Doppler effectsare introduced. The Doppler effect is introduced when we consider a movingtransmitter or receiver such as an auv, but even without intentional motion, un-derwater nodes are subject to drifting with waves, tides, and currents [25]. So theDoppler effect can not be disregarded even for a non-moving setup.

2.1.5 Scattering

The sea contains inhomogeneities which intercept and reradiate portions of theacoustic signal [30, p. 237]. The reradiation is called scattering. The sum ofthe total scattering is called reverberation, and [30, p. 237] names three typesof reverberation: sea-surface reverberation, bottom reverberation, and volumereverberation. The first two are self-explanatory and volume reverberation occursdue to marine wildlife, other objects, and the inhomogeneous structure of seaitself.

2.2 Channel Models

Modeling of the uwa channel is a major obstacle to achieving reliable communi-cations in the uwa channel [12]. Good channel models that can be implementedin software are essential to simulate physical layers, as sea tests are expensiveand troubleshooting is more difficult. To create a channel model that takes intoconsideration all the aspects mentioned in Section 2.1 is non-trivial. In wirelesscommunications, a common approach to deal with multipath propagation andfading is to create stochastic channel models. A common assumption is the wide-sense stationary uncorrelated scattering (wssus) channel [2]. Due to its analyti-cal tractability, a similar model would be desired in uwa communications, but


studies conclude that wssus assumptions are violated by non-stationary behav-ior in the uwa channel [33]. Also, as summarized in [18], the signal envelope hasbeen reported to follow Rayleigh, Rice, and Lognormal distributions in differentstudies. According to another article [24], claims of shallow-water medium-rangechannels following a Rayleigh fading model have been challenged. Models suchas Rayleigh fading might be feasible but are constantly challenged and the uwachannel behavior is difficult to generalize. It is concluded that the contradictingresults show that a realistic uwa channel simulator is required [18]. A promisingapproach is ray theory, both [24] and [12] highlight the usage of ray theory tomodel multipath propagation in the uwa channel.

2.3 Baseline Physical Layer

The dfe-pll has been considered a suitable and popular receiver structure foruwa communications [12]. The structure was initially suggested in a series oftwo articles [23] and [26]. The design is motivated by the unique characteristicsof the uwa channel, namely large Doppler-fluctuations and long time-varyingmultipath. The structure has been combined with the multi-carrier spread spec-trum (mcss) modulation technique in [34]. The authors suggest a physical layerstructure based on mcss, where the receiver utilizes the dfe-pll [34]. mcss waslater renamed to frss and was motivated specifically for outperforming the othercandidates, namely, direct sequence spread spectrum (dsss) and frequency-shiftkeying (fsk) in low snr scenarios [32]. It did not give the highest data-rates athigh snrs compared to dsss and fsk, but the low snr performance is very desir-able in a tough underwater channel. The frss transmitter and receiver are vitalcomponents in this thesis.

2.3.1 Transmitter

The effective data-rate (bit/s) and stability of the reception are determined by theused spreading factor. The spreading factor is determined by a rate-parameter R,as it determines the effective data-rate (bit/s). A total of six available configura-tions of R are suggested in [32], R ∈ 1, 2, 3, 4, 5, 6, but only four are available inthe available implementation. The spreading factor is of length K = 2R − 1. Achoice of R = 1 yields a single sub-band, while R = 4 yields fifteen sub-bands.A choice of higher R yields a more stable transmission, but the increased spread-ing factor reduces the effective data-rate (bit/s). Therefore, the choice should bemade based on the transmission conditions.

When R is chosen, the data symbols are mapped onto the K = 2R − 1 sub-bands.Initial training symbols are prepended and information bits are continuouslymultiplexed with periodic training symbols. The waveform is prepended with apreamble, see [32], which is utilized for detection, synchronization, and Dopplerestimation. Figure 2.2 from [32] illustrates the frssmodulation format for R = 2compared to dsss.

2.3 Baseline Physical Layer 13

Figure 2.2: An illustration of the frss modulation format, compared to tra-ditional dsss, from [32].

Preamble

All rates use an m-sequence preamble as described in [32]. The m-sequence ismapped onto a raised cosine carrier function. The raised cosine has a roll-offfactor of β = 1/3. The preamble is then moved to the passband with carrierfrequency fc.

Channel Coding

Before the bits are channel coded they are scrambled. After scrambling, the bitsare channel coded using a 1/2 rate convolutional encoder [32]. The bits are theninterleaved, i.e., rearranged to protect the data from burst errors.

Modulation and Training Symbols

The symbol sequence created from the process of channel coding is modulatedusing Gray-coded quadrature phase-shift keying (qpsk). The training symbolsmentioned in Section 2.3.1 are a set of known qpsk symbols, inserting one train-ing symbol for every two data symbols. A sequence of random qpsk symbolsis prepended to the signal, to enable initial equalizer training. The resulting se-quence which is ready to be modulated onto the sub-bands is denoted z(n).

FRSS generation

The sequence z(n) is mapped onto the K = 2R − 1 sub-bands utilizing a raisedcosine pulse with roll-off factor of β = 1/3. After modulation onto the sub-band


waveforms, the signal is concatenated with the preamble with a short pause inbetween of duration ∆t. The final output from the frss transmitter is a time-continuous signal x(t).

2.3.2 Receiver

The receiver can be divided into several parts as well. The processes are in or-der acquisition, equalization, adaptive filtering, phase tracking, log-likelihoodcomputation, and soft Viterbi decoding. The input to the frss receiver is a time-continuous signal u(t) which has passed through the uwa channel.

Acquisition and Pre-processing

The received signal u(t) is demodulated to the baseband from the carrier fre-quency fc. The baseband signal is then passed to a Doppler filter bank, i.e., thesignal is correlated with a bank of Doppler-shifted replicas of the preamble. Abrick wall filter is applied to remove out-of-band noise [34]. A filtered signal yi(t)per sub-band i is obtained from the pre-processing.

Equalization

The purpose of the equalizer is to eliminate the isi introduced by the channel.The input to the equalizer is a time segment of length Teq of the pre-processedsignal. Teq should be larger than the channel delay spread, Teq determines theparameter L. The equalizer is trained using the training sequences described inSection 2.3.1. Training is performed by feeding the samples to the filter and up-dating the filter coefficients by investigating the error between the known train-ing symbols and the filter output. The output of the equalizer is denoted as z(n).

The equalization is performed by a fractionally spaced dfe-pll as described in

[23] and [26]. A symbol estimate is yielded by taking the input signal y(m)k,n and

performing multiplication with the filter coefficients c(m)k,n , see equation (9) in [34];

z(m)(n) =K∑k=1

(y(m)k,n )T c(m)

k,n±1, (2.4)

where y(m)k,n is given by equation (10) in [34];

y(m)k,n =

yk(nT − (L − 1)bT /(2a))yk(nT − (L − 3)bT /(2a))

...yk(nT + (L − 3)bT /(2a))yk(nT + (L − 1)bT /(2a))

exp

[−iθ(m)

k (n ± 1)]. (2.5)

2.4 Artificial Neural Networks 15

The signal yk(t) is down-sampled to a samples per symbol. The dfe has a spac-

ing of bTeq/a, a and b are design parameters [34], T is the symbol period. θ(m)k

represents the approximated phase shift of the carrier frequency for sub-band k.

Adaptive filtering and phase tracking

The parameters utilized in the equalization process are constantly tracked. The

known training symbols are used to update c(m)k,n using an adaptive filter, in [34]

a recursive least squares (rls) filter is suggested. For more details on adaptive

filtering and the rls filter see [8]. The phase shift θ(m)k is approximated by a pll

by utilizing the known training symbols, for details see [34].

Log-likelihood Ratio

The symbol estimates from the equalization z(n) are then utilized to calculate thelog-likelihood ratio. The periodic training symbols are utilized to calculate a log-likelihood ratio [34], which is utilized to compensate for the bias of the equalizer.The output from the Log-likelihood Ratio is fed to the Viterbi decoder.

Viterbi Decoding

The scaled symbols are deinterleaved using the known interleaving sequence atthe transmitter. The symbols are then put through a soft-decision Viterbi decoder[32]. The final step is to unscramble the symbols, which yields the desired bitsequence.

2.4 Artificial Neural Networks

In this section, the basics of anns are described, as dnns and rnns are specifictypes of anns that have been utilized in this thesis.

anns is categorized as an ml algorithm. anns have attracted great attentionin recent years and often ml is considered synonymous with anns, even thoughthere are other kinds of algorithms in ml. anns have become very popular dueto their ability to solve very complex and non-linear problems and have provento be applicable in a wide array of subjects. Applications include computer vi-sion, signal processing, medical diagnosis, etc. Fundamental research and the-ory in the area were made in the 1970s, but it has gained popularity in the past10-20 years due to the increased accessibility of computing power that enablesthe training of the parameters in larger anns. In this section, the fundamentaltheory will be presented to introduce the reader into the basics in anns. This sec-tion only studies supervised learning with anns, where the network is trainedby feeding the network input/output combinations of the correct behavior. Thegoal with the training is to learn a function that takes previously unseen data andperforms the correct function mapping, i.e. generalization. The basics of annsin this section are based on [19].


2.4.1 Input and Output

Before explaining how anns work, the input and output have to be explained. Asmentioned before, anns have been applied to a variety of problems and the inputand output of an ann vary depending on the application. The input data to anann is often referred to as features, where selected features of the data are used.If the user wants to identify cats in an image, the feature can be 256 × 256 pixelvalues and the desired output is 0 or 1, 0 if there is no cat, 1 if there is a cat. In asignal processing example, the feature can be an entire received signal in the time-domain, its Fourier transform, or other properties of the signal. Feature selectionis of out-most importance, and selecting the correct feature and feature size iscrucial. It is desirable to pre-process information, to reduce the input dimensionand aid the neural network in learning already known properties. However, itmight come at a cost of lost information which can degrade the performance. Theimportant part is that there is a good data set with correctly annotated input andoutput pairs xn and yn, n ∈ 1, ..., N respectively. The purpose of the network isthen to build an approximation of the function that gives the desired function fso that yn = f (xn) for all n.

2.4.2 The Neuron

The neuron is the fundamental computational block of an ann. The neuronhas a set of input features xn, and each feature xn is multiplied with a weightwn, n ∈ 1, ..., N . The multiplied weights are summed up according to:

z =N∑n=1

wnxn + b (2.6)

where b corresponds to an introduced bias weight. The purpose of the bias weightis to be able to represent the output of the neuron in a possibly wider rangecompared to the input domain. The output of the neuron y is finally given byy = g(z), where g(z) corresponds to the activation function. The purpose of theactivation is to perform a non-linear mapping of the weighted summation to theoutput so that it can be decided if the neuron was activated or not. Activationfunctions are non-linear functions (linear functions may only be utilized in theoutput layer), as it enables anns to capture non-linear behaviors, a necessityto solve complex problems. Some examples of popular activation functions aregiven in Section 2.4.5.

2.4.3 The Network

The artificial neural network is built of multiple layers of neurons. In the simplestcase, the input features (input layer) is connected to a set of neurons, which thenproduce the output (output layer). To solve complex problems layers of neuronsare added in between the input and output layers, which are called hidden layers.

When performing classification tasks the Softmax layer is commonly used in the


output layer in multilabel classification tasks. So each class is given a probabilitybetween zero and one, based on which a classification decision is made.

2.4.4 Training

Once the network architecture is built the final step is training the network. Thetraining of an ann is the process of updating the weights (including bias weights)w in all the neurons in the network. First, we have to design a problem which canbe optimized. Therefore, a loss function is introduced, in this case the meansquare error loss function is selected, but note that there are other variants avail-able. The mean square error loss function is given as;

ε(w) =∑m

∑k

(yk,m − pk,m(w))2, (2.7)

where k corresponds to all training samples and m to all output nodes. Note thatthe training sample yk,m is known and the network predicts the output pk,m(w) .It is desired to minimize ε(w), this is done by gradient descent:

wt+1i,j = wti,j − η

∂ε(w)∂wi,j

, (2.8)

the parameter η is called the learning rate and the derivative dε(w)/dwij is foundusing partial derivatives and the chain rule, using that activation functions shouldbe differentiable or piecewise differentiable. The learning rate determines howfast the weights in the network are updated. A high learning rate leads to fastertraining, and possibly over-fitting. A low learning-rate leads to slower training,and possibly non-converging results.

The process of feeding training samples through the network is called forwardpropagation. The process of updating the weights based on the loss function iscalled backward propagation. This can be done for one training sample at a timeor utilizing multiples samples, which is called batch learning. In batch-learning,training samples are put into batches, and then forward and propagation is per-formed. The number of training samples in one batch is called batch size ormini-batch size, as common sizes are relatively small numbers as 32 or 16.

2.4.5 Designs Considerations

Some important design considerations will be discussed in this section, to givesome insight into the design of an ann.

Activation Function

There exist many possible activation functions, here three classical activationfunctions are presented. The first one is the sigmoid function:

σ (x) =1

1 + e−x. (2.9)


The sigmoid function has a range between 0 and 1 and is easy to apply. However,it suffers from some undesirable properties:

• Vanishing gradient problem, i.e., the partial derivates become very smallwhen x is too small or too large, which leads to slow updates for early layersin the network.

• The range of the derivative of the sigmoid function is very narrow, whichleads to indistinct gradient values.

• It is not zero-centered. Hence, negative-valued outputs cannot be repre-sented by a sigmoid function.

The hyperbolic tangent function:

f (x) = tanh(x) (2.10)

is a popular choice. It is zero-centered and the derivative is not as narrow as thesigmoid function. It does, however, suffer from the vanishing gradient problem.

The rectified linear unit (relu):

f (x) = max(0, x) (2.11)

is a function that does not suffer from the vanishing gradient problem and it isnot zero-centered. It is worth noting that Equation (2.11) is not suitable for theoutput-layer and should only be used in the hidden layers if the range of output isnot restricted to 0-1 interval. A variant of the relu is the leaky relu, it behavessimilarly to the relu, with the exception that it allows small negative values tobe let through. It can be described as:

f (x) =

x x > 00.01x otherwise.

(2.12)

Number of Layers

The number of layers and neurons is a design consideration. The number of lay-ers can be increased to improve the ability to capture complexities in data, i.e., amore shallow network might be well suited for simpler tasks and vice versa. Thenumber of neurons in each layer can also be increased to capture more complexstructures, but should also be of suitable dimensions compared to the dimensionsof the input.

There are some drawbacks to increasing the number of layers. One danger isthat the network is more prone to over-fitting on the training data. Increasingthe number of layers increases training and run times. When working with alarger amount of layers it is also important to make sure that earlier layers in thenetwork are updated properly, to avoid the vanishing gradient problem. A dnnis an ann where there is more than one hidden layer.


Optimization method

Equation (2.8) presents the simple gradient descent method. There are a lot ofvariants of the basic gradient descent available. Different methods can reducetraining time or modify the learning rate while training. The learning rate shouldalso be selected carefully. An example of a popular optimization method is adap-tive moment estimation (adam), which updates the learning rate while training[13].

Loss Function

The mentioned mean squared error function (2.7) is a popular choice of the lossfunction, however, there are other options available. The choice of loss functiondepends on how the problem is designed.

Data Set

The performance of anns is only as good as the provided data set. If the dataset is not general enough the ann might only become applicable to specific situ-ations. The training data has to be selected such that it allows to be generalizedto the possible various data that will appear when using the ann. Any network,regardless of how many neurons it has or how deep it is, is only as good as thegiven training data.

2.4.6 Recurrent Neural Networks

rnns is a class of anns, where the connections between neurons form a directedgraph. Compared to feedforward anns (information only moves in one direction,from the input layer to output layer), rnns can use their internal state/memoryto aid in the solving of tasks [10]. Essentially rnns can utilize information ofprevious input compared to the basic feedforward network, which lacks thisproperty. To learn time-varying underwater channels, this property could be ofuse. There are several rnn architectures. The only rnn structure provided byMATLABs Deep Learning Toolbox (version 2019b) is the long short-term memory(lstm) structure.

2.4.7 Long Short-Term Memory Architecture

A lstm layer is built from a set of recurrently connected blocks, called memoryblocks. Each block contains a set of cells, where each cell is connected to threemultiplicative gates, the input gate, the output gate, and the forget gate. Thegates regulate the flow of information into the cell by being open or closed, thuscontrolling its behavior [6]. An illustration of an lstm is presented in Figure 2.3,from [6]. The gates open or close based on the signals they receive, so informa-tion is let through or blocked based on the signal strength and importance. Thesignals that lead into the gates are weighed as shown in Figure 2.3. This meansthat the network learns how to value the signal strength, i.e., learning how to


Figure 2.3: Illustration of an lstm block and its connections, from [6].

handle the data. The input gate thus controls when data is allowed to enter thecell, the output gate controls when data is allowed to leave the cell, and the forgetgate controls when data is allowed to be deleted. The concept of gates, allows theneural network to capture remote dependencies, long-term memory as the nameimplies.

Bidirectional LSTMs

A bidirectional lstm runs the input in both directions, yielding backward andforward information of the sequence. This allows the network to combine infor-mation from the past and the future. This property has proven to be useful as itlearns the context of the provided information better compared to a basic lstm.

2.5 Previous Work

Before explaining the outlined method in this thesis, recent work and progresswithin the field will be explained to give some insight into the new approach thisthesis takes.

2.5.1 Machine Learning in Wireless Radio Communication

Terrestrial radio wireless communications have a plethora of literature on utiliz-ing ml for performing channel estimation and equalization. One can learn a lotfrom this field, but the assumptions are not coherent with the assumptions for the

2.5 Previous Work 21

uwa channel. In an article, a dnn is trained online and offline, exploiting knowl-edge from training symbols to learn a time-varying channel [17]. An interestingapproach to this problem is to develop a network that can learn time-varyingproperties. Recent literature in radio communications highlights the fact thatrnns which can learn a temporal dynamic behavior compared to dnns, can besuitable for channel estimation/equalization. rnns are exploited in [4] and [7] toperform online pilot-assisted channel estimation/equalization. rnns utilize theprevious outputs of the network to create an internal state, which it can use toprocess new inputs. The number of works on lstms in communications is ratherlimited, they have however been proven successful in speech recognition [14].

2.5.2 Machine Learning in Underwater Acoustic Communication

Recent literature on uwa communication suggests that different parts of the re-ceiver chain or the entire chain can be replaced by dnns with promising results.In a study, a version of the dfe-pll receiver is replaced by a dnn with improvedsystem performance, namely lower bers compared to the normal receiver at thesame snr [37]. The authors considered a single sub-band system, while in thisthesis a multi-sub-band system is adopted. A related study proposes a systemwhere the entire ofdm receiver chain is replaced by a dnn [36], implying thepossibility that dnns are promising to improve ber in multi-sub-band systems.However ofdm is very different compared to frss in a lot of aspects, most im-portantly the isi is removed by the cyclic prefix in ofdm, which simplifies therest of the reception. Another work replaces the channel estimation in an ofdmsystem with a dnn [11]. In [36] and [11] the authors observed lower bers at allsnrs compared to traditional receiver structures which the authors suggested asbenchmarks.

One common factor in [11, 36, 37] is that the dnns only consist of three to fourhidden layers and have a common topology. The topology follows the patternthat the number of neurons in the hidden layer is half of the previous layer. Ifthe input layer is of size 1024 as in [36], the first hidden layer should be of size512, then 256, and so on. Building a network from a similar structure should bea good start.

2.5.3 Key Takeaways

The strength of rnns are highlighted for wireless radio communication [4, 7],but similar studies in uwa communication were not found. The studies relatedto uwa communication highlights the usage of dnns [11, 36, 37]. In both areas,training symbol-aided online-training has been proven effective for rnns anddnns. Training symbol-aided training is very interesting as frss training sym-bols could be utilized in such an approach.

This study will take a new approach by comparing the two different structuresin uwa communications and a rather unique multi-sub-band system. The cited


literature highlights the possibilities of this approach.

3Method

In this chapter, the method to answer the problems posed in Section 1.3 is out-lined. First, the theory from previous chapters is used to describe the systemmodel. Then the software used to simulate the system is described in Section3.2. Details regarding the intricate channel simulation are described in Section3.3, outlining the possible channels. Section 3.4 outlines how the different con-figurations are simulated. The content in the sections until Section 3.5 serves thepurpose to identify interesting deployment scenarios for the anns, i.e., answer-ing in which environments performance can be improved. The final questions ofhow much performance can be increased and deployment strategies (generaliz-able performance) are answered in the final sections. Section 3.5 proposes annstructures based on the related work and Section 3.6 describes how anns are de-ployed to answer the outlined problem formulation. Finally, some miscellaneousstudies are described.

3.1 System Model

The intention with this section is to describe how the ml receiver and baselinereceiver were implemented on a schematic level. Both systems utilized the samefunctions and shared most properties, besides the equalization.

Assuming that the frss transmitter described in Section 2.3.1 yields a time-continuossignal x(t), x(t) was then sent through the underwater multipath channel accord-ing to equation (2.3). The received signal at the hydrophone is y(t) where n(t) isadditive noise:

u(t) = y(t) + n(t). (3.1)

The noise n(t) can either be colored or white, see Section 3.3.6. What is importantis that the received signal is inevitably embedded in noise. The received signal

23

24 3 Method

u(t) was fed to the different receivers.

3.1.1 Baseline Receiver

The baseline receiver utilizes all the steps outlined in Section 2.3.2. In the end, asequence of estimated information bits is generated. By comparing the estimatedbit sequence and the original bit sequence, the ber could be calculated.

3.1.2 Machine Learning Receiver

For performance of the ann receiver to be comparable to the baseline receiver,only the equalization process described in Section 2.3.2 is replaced. First, thefrss receiver performs the described pre-processing mentioned in Section 2.3.2.The next step was to perform the equalization, which was performed by an ann.Similar to equation (2.5) the signals were stacked, but the phase offset was disre-garded in this implementation, yielding the following expression:

yk,n =

yk(nT − (L − 1)bT /(2a))yk(nT − (L − 3)bT /(2a))

...yk(nT + (L − 3)bT /(2a))yk(nT + (L − 1)bT /(2a))

. (3.2)

yk,n for each symbol n was considered as the input to the ann. The choice of aand b are design parameters which had to be considered, see Section 3.1.3, theyessentially determined how many samples the signal was represented by, i.e., thelength of the vector yk,n. Dimensions for the selection of a and b are presented inSection 3.1.3, which yielded a vector of length 90. The input was then to be fedto the ann which yielded a symbol estimate z(n) for each n, the estimate is fed tothe log-likelihood ratio and Viterbi decoding as described in Section 2.3.2. In theend, a sequence of estimated information bits was generated. By comparing theestimated bit sequence and the original bit sequence, the ber could be calculated.

3.1.3 Choice of Parameters

Each rate had a different amount of sub-bands, hence the signal dimensions foreach rate vary. Thus in this thesis an ann is configured to one rate configuration.Out of the four available rates, the rate R = 2 which utilizes three sub-bands wasstudied in this thesis. The choice of R = 2 was motivated by the fact that fewersub-bands reduces the input dimension complexity to the ann. R = 1 was notconsidered as part of what makes the thesis unique compared to [37] is to studyequalization in a multi-sub-band configuration.

The choice of equalizer parameters a, b and Teq (Teq determines the size of L)for the ann are presented in Table 3.1.

3.2 Software Simulation Environment 25

Table 3.1: Configurable equalizer parameters.

Property baseline ann

a undisclosed 4b undisclosed 1Teq undisclosed 24 ms

The choices of a, b, and Teq utilized by the baseline frss receiver are undisclosedbut are comparable to the ones chosen for the ann. Teq was chosen by studyingthe simulated channel impulse responses, which had a maximum delay spreadin the range of 20 ms, see Appendix D. Thus 24 ms was chosen to have somemargins. A spacing of b = 1 was chosen as a high signal fidelity was consideredbeneficial, i.e., the spacing between samples was minimal. The parameter a waschosen to be comparable to the baseline equalizer.

3.1.4 Bit Error Rate Definition

The ber refers to the channel coded ber, the number of bit errors in a receivedpacket. As mentioned, by comparing the estimated bit sequence and the originalbit sequence, the ber could be calculated. The ber curve as a function of snris the performance metric utilized in this report. At least 10 errors per point onthe curve is utilized to estimate the ber. In some channels, and in high snr, thiscould require an extensive number of simulations to estimate.

The ber can be also be affected by the reception, namely false alarms or unde-tected packets. The receiver has a probability of giving a false alarm, i.e., a detec-tion occurs in the absence of a signal. There is also a probability that a receivedpacket is not detected. False alarms and undetected packet affects the modem per-formance, and can thus be represented in the ber curve, but in this study perfectreception is assumed, i.e., false alarms and undetected packets are disregarded.Any false alarms are disregarded in simulations, as false alarms occur with anincreased probability as the number of simulations is increased, which can skewber calculations when a higher number of simulations are needed to estimate theber. For the same reasoning, undetected packets are also disregarded.

3.2 Software Simulation Environment

The frss transmitter and receiver described in Section 2.3 came with a MATLABreference implementation. In this section, the software and setup used to simu-late the underwater channel and the ann framework will be described.

3.2.1 Machine Learning Software

MATLAB has a Deep Learning Toolbox [28], which was utilized to implementthe dnn and rnn. The Deep Learning Toolbox provides tools to build anns

26 3 Method

and contains a lot of advanced features and provided the most necessary toolsto design and build anns within this thesis. The toolbox also allows the user tospeed up training with a graphical processing unit (GPU), which was available inthe setup. An advantage of utilizing the toolbox was that the thesis could focusmore on designing network structures and testing different models, rather thanspending time on implementing basic functions. An alternative would have beento use popular toolboxes, PyTorch or TensorFlow which are available for Python,but the frss code was given in MATLAB, so MATLAB’s toolbox was chosen.

3.2.2 Channel Model

As simple channel models based on assumptions regarding the uwa channelwere proven to be non-realistic, as mentioned in Section 2.2, one must resortto more complex channel models. The aspects mentioned in Section 2.1 neededto be taken into consideration one by one. A common choice for modeling multi-path propagation and attenuation in underwater conditions is Bellhop [20]. Bell-hop is a model based on ray tracing and can utilize information about ssps andbathymetry, along with other environmental properties, to generate a channel im-pulse response. Bellhop is relatively complicated as it requires a lot of input, butit has become popular as it is considered somewhat realistic. There are plenti-ful of models [21, 35, 38], which utilize Bellhop as a baseline for generating thechannel impulse response. [35] proposes a channel model and study the impactof noise and the Doppler effect to generate a complete channel model. Anotherstudy [21] proposes a statistical model based on large-scale and small-scale ef-fects of the movements and environmental conditions. Doppler effects are alsostudied in detail. A third article [38] utilizes Bellhop to simulate the physicallayer in a NET-layer simulator.

Out of the possible channel models, [21] was chosen to be used in this thesis.All the important aspects mentioned in Section 2.1 are modeled in [21] whichmakes the model very complete, and it is referenced in recent uwa literature[36, 37]. An additional benefit is also that the model had an openly availableimplementation in MATLAB [22]. The model described in [21] contains a lot ofhyperparameters, and the next section is devoted to motivating the input to thesimulation.

3.3 Channel Simulation Configuration

The channel simulator, based on [21], yielded an estimated time-variant channelimpulse response h(τ ; t) based on the configured parameters. The channel simu-lator took the environmental parameters and gave them to Bellhop. The Bellhopsimulation was deterministic, a specific configuration always yielded the samesignal arrivals. Based on the signal arrivals, the time-variant channel impulseresponse was generated from the small-scale variations (Table 3.4) and Dopplervariations (Table 3.6).

3.3 Channel Simulation Configuration 27

In this section, the configuration utilized to simulate h(τ ; t) will be described.Bathymetry, ssps and bottom sediment described in Section 3.3.1 to Section 3.3.3were input to the Bellhop simulator while the general parameters and channelvariations were input to the statistical model from [21].

3.3.1 Bathymetry

As mentioned in Section 1.4, two depths were considered. For each depth, threegeneral bathymetry profiles were considered, yielding six possible bathymetries.The purpose was to capture general scenarios. The three general scenarios se-lected were a flat ocean bottom, a slope, and an obstacle. Motivation and plots ofthe bathymetries can be found in Appendix A.

3.3.2 Sound Speed Profiles

Nine different ssps were utilized in simulations, four for the shallow channel, andfive for the deeper channel. The ssps were chosen to correspond to the varyingseasons. The ssps were taken from the Swedish Meteorological and HydrologicalInstitute’s weather buoys in Skagerrak and the Baltic Sea [27]. Data from the buoylocated at Släggö was chosen due to its coast proximity in which the effects offreshwater rivers can be noticed. Data from the buoy REF M1V1 located betweenÖland and Småland was chosen for the shallow scenario, as it was one of thefew buoys with sound speed data in shallower waters. For each buoy, a set of theavailable dates with data were selected, the chosen dates are found in Appendix B.Data points were written down on paper and then manually added into MATLAB,creating an approximation of the plots provided by Swedish Meteorological andHydrological Institute. ssps were picked at specific times, attempting to capturethe varying seasons and their effects on the sea. Plots of the ssps and descriptionof the behaviors are found in Appendix B.

3.3.3 Bottom Sediment Types

The Bellhop simulator allowed the user to specify properties of the sea bottomsediment. It was noted that altering this parameter yielded significant changes inthe simulation (thus, two setups were considered). The properties of the bottomsediment determined how much reflection and reverberation the bottom gener-ates. Two kinds of bottom sediment, with very different properties were consid-ered. The sediment types and properties are presented below in Table 3.2, thedata was obtained from seafloor measurements [9].

Table 3.2: Bottom sediment properties for two different kinds of bottom.

Sediment type Sediment sound speed Wet bulk density

Silty sand 1657.49 m/s 1.91 g/cm3

Clayey silt 1465.52 m/s 1.63 g/cm3

28 3 Method

The intention was to study how important this property is to the reception sincethe receiver and transmitter described in Appendix A are very closely located tothe bottom. The center frequency (carrier frequency) and bandwidth are undis-closed, as they were specific to the given simulation code which can be confiden-tial information.

3.3.4 Channel Geometry and General Parameters

The properties of the basic channel geometry are listed below in Table 3.3.

Table 3.3: Channel geometry and general channel/simulation properties.

Property Value Unit

Surface height 18 or 72 mTX height 2 meters from bottom mRX height 2 meters from bottom m

Channel distance 1000 mSpreading factor 1.7 -Center frequency undisclosed Hz

Bandwidth undisclosed HzFrequency resolution 20 Hz

Time resolution dt 120 msSmall-scale variation duration T 20 s

Most parameters were based on the limitations set in the introduction. Thespreading factor was set to k = 1.7 by default. It was also verified with sonarexperts at Saab Dynamics who claimed that the choice of k = 1.7 was suitablefor shallow water communications. The time resolution corresponds to the reso-lution of time-variance t, and the frequency resolution can be interpreted as theresolution of the delays τ . The simulator returned τ as a frequency vector, butcould be converted simply by dividing the returned frequency vector with thebandwidth.

3.3.5 Channel Variations

Small-scale fading describe movements/displacements within in the range of afew wavelengths. Small-scale variations were described by [21] as the variationsof the surface and bottom, and the statistical properties of the scattering. The con-figurable small-scale fading settings are described in Table 3.4. In [21], each pathestimated by the Bellhop simulator was assumed to undergo scattering. Eachpath would be split into several micro paths. The statistical properties of themicro path and their delays were described by the psd, mean of the intra-pathamplitudes, and the variance. The values were the same as the default from [22],except for the variance of bottom variations which was set to zero as the bottomwas considered rather static.

3.3 Channel Simulation Configuration 29

Table 3.4: Configurable small-scale settings.

Property Value Unit

Variance of surface variations 1.125 m2

Variance of bottom variations 0 m2

3-dB width of psd 0.5 msNumber of intra-paths 20 -

Mean of intra-path amplitudes 0.025 -Variance of intra-path 10−6 -

All large-scale variations in the model [21] were assumed to be zero. The reason-ing for this was twofold;

• Reducing the number of parameters simplifies the analysis of the results.

• In reality, the change of height and distance is negligible over time.

The configurable large-scale settings are listed below in Table 3.5.

Table 3.5: Configurable large-scale (L-S) settings.

Property Value Unit

Range of surface height variation 0 mRange of TX height variation 0 mRange of RX height variation 0 m

Range of channel distance variation 0 mStandard deviation of L-S variations of surface height 0 m

Standard deviation of L-S variations of TX height 0 mStandard deviation of L-S variations of RX height 0 m

Standard deviation of L-S variations of channel distance 0 m

Doppler Parameters

It was assumed that the transmitter and receiver were in a constant small drifthorizontally along the surface bottom, all other movements were assumed to bezero. Surface variations were the default from [22]. The configurable Dopplereffect paramters are described in Table 3.6.

30 3 Method

Table 3.6: Configurable Doppler effect parameters.

Property Value Unit

TX drifting speed [-0.005 0.005] m/sTX drifting angle 0 radRX drifting speed [-0.005 0.005] m/sRX drifting angle 0 rad

TX vehicular speed 0 m/sTX vehicular angle 0 radRX vehicular speed 0 m/sRX vehicular angle 0 rad

Surface variation amplitude 0.05 -Surface variation frequency 0.01 -

3.3.6 Noise

Two kinds of additive noise were considered, complex white Gaussian noise (wgn)and complex colored Gaussian noise (cgn). For channel simulations and train-ing anns, wgn was utilized. cgn was studied in the final step, to identify howit affected the studied anns. The additive noise in Section 3.1 was described byits snr.

A red-colored noise psd was be used to describe the colored additive noise. Asthe noise described in Section 2.1.2 was described to decay at 18 dB per decade.Using red noise which decays at 20 dB per decade, could be considered a rea-sonable assumption. The advantage of using red noise was that there existedadditive red noise implementations for MATLAB, which reduced uncertainty asimplementing a custom noise profile takes time and increased the risk of unnec-essary errors. Thus, further on cgn will refer to additive red noise, as it wasconsidered an approximation of the ocean ambient noise psd.

The frss receiver utilized a brick-wall filter to remove out of band noise, hencecgn andwgn energy in the frequency band of interest had to be of similar quan-tities to be comparable. To ensure that the snr of the cgn and wgn was com-parable, the noise profiles were bandpass-filtered in the bandwidth utilized fortransmission. A Butterworth filter of order twenty was used, yielding roughlythe same energies with a maximum difference of one to two percent.

3.4 Channel Simulations

To identify how the different channels behaved, several simulations were con-ducted. With the two different depth scenarios, bathymetries, and sound speedprofiles (ssps), a total number of 27 possible combinations exised. Consideringtwo different bottom types, a total of 54 possible combinations existed. The chan-nel simulation was crucial to identify the environments in which anns were to

3.4 Channel Simulations 31

be deployed.

3.4.1 Time-Variant Filter

The time-variant filter described by (2.3) had to be implemented by consideringthe channel impulse response as a sequence of time-invariant filters. The chan-nel simulator yields an approximated discrete-time version of h(τ, t), which isdenoted by h[i, j]. The indexes i and j are discrete samples, i corresponds to dis-crete instances of time delays 0, dτ, ..., Td and j corresponds to discrete instancesof time 0, dt, ..., T . The time resolution dt and T are directly available from Table3.3, while dτ and Td depend on the selected value of frequency resolution fromthe same table. With the available frss signal x(t), the implemented filter is de-scribed below in Algorithm 1. Before the algorithm is run, h[i, j] is up-sampled tothe same sampling frequency as x(t). The sampling frequency fs is undisclosed,as it belonged to the given simulation code. The notation ∗ denotes the convolu-tional operator.

Algorithm 1 Time-variant filter algorithm.

procedure Timevariantfilter(h, x, t, τ) . Performs time-variant filtering, his a function of t and τ

for i = 1 to length(t) doyp[:, i] = x ∗ h[:, i]

end forj = 1for i = 1 to length(yp[:, 1]) do

if (i/fs) > (dt · j) thenj = j + 1

end ify[i] = yp[i, j]

end forend procedure

3.4.2 Simulation Loop

To simulate the different configurations a simulation script was setup. The in-tention was to study the channel impact on the ber. The algorithm takes thenumber of available ssps and bathymetries as S and B, x corresponds to a set offrss signals, n is wgn and fs the sampling frequency. The channel type, labeledtype corresponds to the deep or shallow simulation configuration.

32 3 Method

Algorithm 2 Simulation loop

procedure Simulation loop(S, B, x, n, type) .for b = 1 to B do

for s = 1 to S do[h, t, τ] = createEnvironment(b, s, type)y = timeVaryingFilter(h, x, t, τ)ber = frssReceiver(y + n)

end forend for

end procedure

Algorithm 2 was executed for all types of setups to study the channel impact.The simulation loop was run in a range of snrs multiple times to ensure thatstatistical anomalies in the channel simulator are mitigated.

3.5 Artificial Neural Network Structure

As the literature had highlighted dnns and rnns to be viable solutions, one vari-ation of each network would be considered. The ann layers are designed ac-cording to MATLAB’s syntax. Both anns wre designed as classifiers. The annstask was to take the input described in the system model in Section 3.1.2, withthe selected parameters from Section 3.1.3, and classify which symbol each time-segment corresponds to. As three different sub-bands were utilized, the inputfrom each sub-band had to be treated as separate input streams. anns do nothandle complex numbers, so the real and imaginary samples in each sub-bandhad to be separated yielding a total of six input streams. The choices of a, band Teq from Table 3.1 yield a total of 90 symbols per sub-band. Thus, the fea-ture dimension was x ∈ [6, 90]. The output dimension was a class from the setq = 1, 2, 3, 4, corresponding to the four possible qpsk symbols and the classifica-tion was based on a Softmax layer. However, in the implementation, the probabil-ity of each qpsk symbol from the Softmax layer was used to make a soft decision,instead of the hard decision classification. This proved to be beneficial over us-ing hard decisions from the classification layer. Probably because the uncertaintyin which the soft decision gives information about the uncertainty of the symboldecision which could be utilized by the Viterbi decoding in the receiver.

Offline-training

The adam optimizer was utilized. The initial learning rate was set to the default0.001. In offline-training over-fitting was combated by using early stopping. Arandom 10% of the data set was extracted and used as a validation set. The train-ing was then stopped if the accuracy of the validation set does not increase after apredetermined number passes (validation patience). The validation patience wasset to six. Training was performed in mini-batches of size 651. The specific sizewas related to the number of symbols in one frss packet. The standard mean

3.5 Artificial Neural Network Structure 33

square error loss function was used, as the correct equalizer symbol output wasknown.

Online-training

In online-training only the training symbols which were used to train the dfe-pll were used. Training was performed once for each received frss waveform,utilizing all the known training symbols. The adam optimizer was utilized onceagain with the default learning rate of 0.001 with mini-batches of size 31, onceagain related to the number of symbols in one frss packet. Training was alwaysperformed in fifteen epochs without any early stopping or method to preventover-fitting. The intention was rather to over-fit on the particular set of trainingsymbols. Training was thus performed for each received frss signal. The stan-dard mean square error loss function was used, as the correct equalizer symboloutput was known.

3.5.1 Deep Neural Network

In Section 2.5.2, a common structure for dnns in uwa communication was high-lighted. The topology followed a constant halving of the number of neurons. Thetopology suggested in Table 3.7 almost follows the concept of halving the numberof neurons in each layer. The intention was to utilize a similar topology and thenetwork design is, something which can be studied more in detail. The choiceof the activation function was decided by simply trying out different layers, theleaky relu provided the best training results.

Table 3.7: dnn layer structure.

Layer Dimension

Image input layer [90 1 6]Fully connected layer 220

Leaky relu layer -Fully connected layer 100



Softmax layer -Classification layer -

MATLAB’s Deep Learning toolbox supported two kinds of input layers, imageinput layers, and sequence input layers. Even though the input is not an image,its dimensions were adjusted in MATLAB to adhere to the syntax.

34 3 Method

3.5.2 Long Short-Term Memory Network

In MATLAB’s 2019b version the only available rnn structure was the lstm struc-ture. It was therefore chosen as the recurrent layer. The bidirectional lstm struc-ture was chosen, as early tests show that the bidirectional structure yielded muchhigher accuracy on the same data set.

Table 3.8: lstm layer structure.

Layer Dimension

Sequence input layer 6Bi lstm layer 100

Fully connected layer 4Softmax layer -

Classification layer -

3.6 Artificial Neural Network Experiments

This section will outline the experiments which were performed to answer theproblem formulation, see Section 1.3. The simulations outlined in Section 3.4.2gave a clear understanding of which channels were suitable and unsuitable tothe baseline receiver. This information was utilized to build a test strategy forthe anns. Both the lstm and dnn network were to be tested in:

• Channels with low ber,

• Channels with high ber.

Based on the network structures and the result, the second problem from Sec-tion 1.3 would be answered, regarding how much and why performance improve-ments could be offered. To simplify the analysis, one network structure were tobe decided upon, based on the performance in the high and low ber channels.If the performance was even, parameters such as training time and the numberof weights could be considered to make a decision. The selected network struc-ture was then to be studied further to answer the third problem from Section 1.3by investigating deployment strategies, i.e., how one would desire to deploy thenetwork. The aspects which would be considered were:

• Study online learning (by using training symbols),

• Train with multiple channels (data-driven approach).

Most studies mentioned in Section 2.5.2 trained anns in one specific channel orin restricted scenarios. It was therefore interesting to study if performance couldbe generalized.

3.7 Miscellaneous Studies 35

3.6.1 Training Data Generation

Training data was used for offline-training and generated similarly as the channelsimulations, i.e., by sending sets of frss signals over the simulated channel, ac-cording to Algorithm 2. However, instead of saving the ber, the known equalizersymbols were stored, with the corresponding received baseband signal, whichhad undergone pre-processing. This data was generated for a specific channel ina selected range of snrs and stored as .mat files. Besides the mentioned proper-ties such as depth, bottom type, and noise psd, the parameters presented in Table3.9 could be selected when configuring the training data.

Table 3.9: Training options.

Property Option

snr in dB Single number or vectorMultiple simulations True or false

As mentioned, the Bellhop simulation was deterministic, thus a specific config-uration always yielded the same impulse response. The time-variant impulseresponse was generated due to the simulated small-scale variations and Doppler-effects. As the small-scale variations and Doppler-effects were input to a statis-tical model with randomized output, each simulation yielded a unique outcome.So when generating the training data it was possible to use single or multiple sim-ulations, i.e., one or multiple unique outcomes from the channel model. It waspossible to set one or multiple simulation outcomes, as presented in Table 3.9. Itis an important option to understand, because if the option was set to True, theresult was a much more diverse data set. All training data was generated in theheuristically selected snr range of −5dB to 5dB.

3.7 Miscellaneous Studies

As mentioned in Section 1.3, a study of the impact of noise color was be made.Offline and online-training (trained inwgn) were tested in cgn to study how theassumptions of the noise color affected performance.

4Results

The results from the ann experiments are presented in this chapter. The annswere tested in different channels based on their performance in simulations inSection 4.1. Results from attempting to generalize performance are presented inSection 4.2. The mentioned simulations give an understanding of what situationsthe dfe-pll equalization performs well for. However, since these simulations arenot results that answer the problems stated in Section 1.3, results that correspondto those simulations can be found in Appendix C. Plots of the channel impulseresponses for the channels which were utilized are found in Appendix D. Theplotted snr ranges vary because the ber is very small for some channels. Inthose channels estimating the true ber requires too many simulations.

4.1 Artificial Neural Network Experiments

The anns were deployed in both low ber and high ber channels. The low berchannel is characterized by its performance being similar to the additive noisechannel, and that it drops below a 10−1 ber in the studied snr range. An exam-ple of the low ber channel is the deep obstacle profile with ssp from 2019-01-07with a sandy bottom, see Figure C.3. Most studied channels behave as a low berchannel. A high ber channel is a channel where the ber is larger than 10−1 inthe considered snr range. A notable example is the channel with a deep slopeprofile with ssp from 2019-03-05 with a sandy bottom, see Figure C.3.

The legends to the Figures are explained below.

• ’eq’ - corresponds to the baseline dfe-pll equalizer

• ’lstmClassSoft’ - corresponds to the offline-trained lstm equalizer

37

38 4 Results

• ’dnnClassSoft’ - corresponds to the offline-trained dnn equalizer

• ’lstmClassOnline’ - corresponds to the online-trained lstm equalizer

Note that the ber sometimes seems to decrease at low snr, such as in Figure 4.1and Figure 4.6, where the ber is close to 0.5 at −5 dB, were all receivers approachsimilar ber. The combination of a highly time-varying channel and a high noiselevel is a common factor. Why this happens requires further investigation. Anber of 0.5 corresponds to random guessing, i.e., the network outputs randomguesses.

4.1.1 High Bit Error Rate Channels

The most interesting case for anns is the deployment in tougher channel con-ditions. The deep slope profile with ssp from 2019-03-05 and a sandy bottomwas chosen. A plot of a simulated channel impulse response is found in FigureD.2. With multiple simulation instances (see Table 3.9), both the dnn and lstmcould not converge, reaching an accuracy below 50%. By using a single simu-lation instance, the training of the networks could converge. After the training,the trained networks were used to perform equalization via a transmission in thespecified channel using the same small scale simulation, and the resulting berwas calculated. The result is presented in Figure 4.1.

-5 -4 -3 -2 -1 0 1 2 3 4 5

SNR [dB]

10-1

100

BE

R

eq

lstmClassSoft

dnnClassSoft

Figure 4.1: Performance comparison of the lstm, dnn and dfe-pll in highber channel.

4.1 Artificial Neural Network Experiments 39

By training the lstm and dnn network on the slope profile, sandy bottom withssp from 2019-03-05 the lstm could outperform the baseline receiver in this spe-cific channel, but the dnn does not seem to perform the task satisfactory, despitebeing trained to 80% accuracy.

4.1.2 Low Bit Error Rate Channels

In low ber channels, it was quickly noticed that both the dnn and lstm could betrained with relatively high accuracy. Convergence could be obtained with mul-tiple simulation instances but varied a lot among low ber channels. An exampleof a low ber channel is the obstacle profile with the ssp from 2019-01-07, in thedeep scenario with a sandy bottom type. A plot of the simulated channel canbe found in Figure D.3. The dnn and lstm were trained in a single simulationinstance of the channel and the networks were then utilized for equalization, theresults are shown below in Figure 4.2.

-5 -4.5 -4 -3.5 -3 -2.5 -2

SNR [dB]

10-5

10-4

10-3

10-2

10-1

100

BE

R

eq

lstmClassSoft

dnnClassSoft

Figure 4.2: Performance comparison of the lstm, dnn and dfe-pll in lowber channel.

Even in this channel, the lstm network proved to perform better with a lowerber, it can be seen that the lstm bermoves down faster than the dnn as the berfor −2 dB is 0.1. Based on low ber and high ber channel simulation results, itwas concluded that the lstm network offered the best performance compared tothe dnn. Therefore it was decided that the lstm network should be the ann to

40 4 Results

be studied more closely.

4.2 Deployment Strategies

To study if low ber can be obtained with limited or no previous knowledge,online-training, and a data-driven approach was studied. It was quickly noticedthat training in multiple channels was unsuitable with the suggested experimentsetup, and each added channel came with a loss of accuracy. Thus, the focus inthis section is on online-training. First, a set of comparable low or moderatelytime-varying channels was selected to study how well online-training performs.Lastly, a highly time-varying channel was selected for the same purpose. In bothscenarios an offline-trained network and the baseline dfe-pll were used as abenchmark to study the online-training performance.

4.2.1 Low to Moderate Time-Variance

An interesting set of comparable channels was selected. Namely the shallow ob-stacle profile, with clay bottom and with three different ssps (2019-03-16, 2019-06-15, 2019-08-21), all considered low ber channels, with the ssp from 2019-08-21 having a somewhat higher ber in Figure C.2. The purpose of studying thesethree particular channels was related to the behavior of the channel impulse re-sponse which had three very distinct behaviors.

• 2019-03-16, two receptions, one slow time-varying and one fast time-varying.

• 2019-06-15, one reception, slow time-varying.

• 2019-08-21, one reception, fast time-varying.

For plots of the channel impulse responses, see Appendix D. It was quickly no-ticed that the performance of theml algorithms was related to how fast the time-variations occurred. The 2019-08-21 sspwas supposed to represent a moderatelytime-varying channel, 2019-06-15 a slow time-varying channel and 2019-03-16something in between. Offline-training was performed with data generated withmultiple simulation instances.

The offline-training generated the following classification accuracy:

• 75.36% for 2019-03-16,

• 86.83% for 2019-06-15,

• 53.53% for 2019-08-21.

The trained networks were compared with online based training and the baselinedfe-pll. The results are outlined in the images below.

4.2 Deployment Strategies 41

-5 -4.5 -4 -3.5 -3 -2.5 -2

SNR [dB]

10-4

10-3

10-2

10-1

100

BE

R

eq

lstmClassSoft

lstmClassOnline

Figure 4.3: ber as a function of snr, using three different equalizers. Shal-low profile, clay bottom ssp 2019-03-16.

-5 -4.5 -4 -3.5 -3 -2.5 -2

SNR [dB]

10-5

10-4

10-3

10-2

10-1

100

BE

R

eq

lstmClassSoft

lstmClassOnline


42 4 Results

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

SNR [dB]

10-1

100

BE

R

eq

lstmClassSoft

lstmClassOnline


In all three scenarios online learning provided better ber performance comparedto offline-training for all studied snrs. It performs well, but not better than thebaseline receiver. It is also noted that in the more time-varying channels theoffline-trained network could not perform well at all with a ber close to 0.5.

4.2.2 High Time-Variance

To study a highly time-varying channel with online-training, the deep slope pro-file with sand bottom and with the 2019-01-07 ssps was chosen. The channelwas considered as a high ber channel, see Figure D.1. It behaves similarly to thepreviously studied high ber channel, see Figure D.2.

4.3 Miscellaneous Studies 43

-5 -4 -3 -2 -1 0 1 2 3 4 5

SNR [dB]

10-1

100

BE

R

eq

lstmClassOnline

Figure 4.6: ber as a function of snr, using two different equalizers. Shallowprofile, clay bottom ssp 2019-01-07.

It is noticed that in this highly time-varying channels the online-training plateausunsatisfactory at a constant 0.5 ber, regardless of the snr. As mentioned thiscorresponds to random guessing and that the algorithm is not working at all.

4.3 Miscellaneous Studies

Both offline and online trained lstms were compared in the presence of wgnand cgn. The results are presented in the figure below, were the offline-trainednetwork and the online-trained network are compared. It can be noted that theoffline-trained network (which was trained in wgn) performed worse in cgncompared to wgn. The assumption of cgn seems to affect performance posi-tively, in terms of lower ber for the online-trained lstm.

44 4 Results

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

SNR [dB]

0.15

0.2

0.25

0.3

0.35

0.4

0.45B

ER

lstmClassSoft WGN

lstmClassOnline WGN

lstmClassSoft CGN

lstmClassOnline CGN

Figure 4.7: ber as a function of snr, for online and offline-trained lstm incgn andwgn. Shallow obstacle profile, clay bottom ssp 2019-03-16.

5Discussion

This chapter discusses the results and method used in the study. First, the re-sults are discussed in relation to the selected method, and error sources. Designchoices are also discussed. The results are also discussed in relation to the resultsof related work. The chapter ends with a discussion on the utilized referencesand ethical implications of the work.

5.1 The Results

The presented results highlight the weaknesses and strengths of ann equaliza-tion.

5.1.1 Artificial Neural Network Structure

It was found that the lstm network was more suitable to perform the desiredtask compared to the dnn in most aspects. The dnn did, however, require lesstime per training epoch and less run time, which is advantageous. The lstm net-work was chosen as the go-to structure, as the performance was the main interestin this study. It was noted that in channels with moderate or high time-variance,multiple simulation instances could not be learned using this particular experi-mental setup. This implies that the studied anns does not learn to perform thegeneral task of equalization, in fact, they only learn to decode the informationfrom the wave in a specific channel. It is difficult to have an understanding ofwhat is going on "under the hood" of the anns, and understand why the lstmperforms better than the dnn.

In low ber channels, it was noticed that neither of the considered structurescould outperform the existing receiver. This phenomenon was noticed often be-

45

46 5 Discussion

cause trained networks in simpler environments where ber is mainly determinedat a specific snr by the noise level, rather than the difficulties introduced by thetime-varying multipath. The baseline dfe-pll in the frss is mathematically mo-tivated for low snr with Gaussian statistics, so the anns inability to performbetter could be considered understandable.

5.1.2 Deployment Strategies

Two approaches could be utilized to deploy an ann were tested, either an online-training training-symbol aided approach or a purely data-driven approach.

Data-driven Approach

As mentioned previously, training on multiple simulation instances of high berchannels resulted in no convergence. By training in multiple low ber channels,convergence proved to be possible, but for each added channel there was an ac-curacy loss. The overall results hint that added channels and especially addingcomplex channels reduced accuracy. Based on the theory presented in Section 2.4,some possibilities for the non-converging results are possible. Either the amountof data is not large enough or the network structure is not sufficient to solve theproblem. The convergence problem can probably be motivated by both causes,but some aspects should be considered. First, increasing the size of data sets in-creases training time and since the data is generated in software, the generationof large data sets becomes a bottleneck. It was noticed that for high ber channelsthere was a large amount of data required to enable convergence. Sometimes, itresulted in GPU memory overflows and long load times. For a data-driven ap-proach to be possible, a lot of data needs to be generated and processed, whichwould be very time-consuming.

However, the main issue, which was quickly identified was that the ann struc-ture was insufficient for a data-driven approach. The ann structure could beattempted to be made deeper and more complex, however, this makes the onlinetraining approaches more time-consuming. There is also limited time to performall this testing, so the structures needed to be decided upon to limit the scopeof the project. Ways forward regarding network structure will be mentioned inSection 6.2. There is simply a limited time to try out multiple structures in anorganized way, due to all the long training times. A data-driven approach for thesuggested network structures was not considered viable.

Online-training

The concept of utilizing the known training-symbols to train the ann was con-sidered desirable and was brought up in a lot of recent work. It generally per-forms better than the offline trained networks at low snr. The most interestingresult is that it performs a lot better than the offline trained network in Figure4.5. The results imply that for more time-varying channels the online-trainingcan perform better than offline-training, see Appendix D, which visualizes the

5.2 Error Sources 47

difference in time-variance. However, the online-training never beats the base-line performance. In the toughest channels, the online-training could not seemto perform well at all, see Figure 4.6. Here only the very specifically trainednetwork could converge and outperform the baseline. Online-training seems tobehave very similarly to the dfe-pll, but never better and it does not seem togive the benefits highlighted in [37], although the authors used an dnn insteadof an lstm.

The results imply that online-training can be useful to track moderately time-varying channels, but for the highly time-varying channels online-training is un-able to track the changes. Possible reasons for these behaviors can be that thechannel behavior changes faster than the spacing of training symbols, shown inFigure 2.2. As shown in Figure 2.2, a larger portion of the training symbols areplaced in earlier time-slots and then sparsely injected, or that the injection oftraining symbols is spaced in a sub-optimal way is not suited for this kind ofonline-training. The hyperparameters such as learning rate, mini-batch size, andthe number of epochs can, of course, be studied more for better performance.

5.1.3 Miscellaneous Studies

It was noticed that cgn yielded the same ber for all snrs for the online-trainedlstm. However, when using a network trained offline in wgn, the performanceis worsened when tested in cgn. If the network was to be trained in cgn, maybeperformance in cgn would be identical. It highlights that, whenever trying todeploy a pre-trained network, training in a realistic noise profile affects perfor-mance. It can also be concluded from the online-trained network that the conven-tional wgn assumption is valid and behaves similarly compared to cgn, whichis supposed to be more realistic.

5.2 Error Sources

In this section, the impact of method and assumptions will be discussed in thecontext of how they affect the confidence in the results. It is also interesting todiscuss how close the results are to reality.

5.2.1 Channel Model

A large part of the work is based on the choice of the channel model and the cor-responding parameters and the designed time-varying filter. The model is a time-discrete approximation of the time-continuous real world. This approximationcan become interesting in the highly time-varying channels (high ber channels),where values between samples can differ greatly. These simulations, which aresome of the most interesting in this study, might therefore be more unrealistic.Increasing the frequency resolution and time resolution in Table 3.3 could yielda more fine simulation (closer to reality), at the cost of simulation time.

48 5 Discussion

Parameter Selection

Most selected parameters in the model were either set to zero or motivated by theknowledge of underwater conditions with help from Saab Dynamics employees.The parameters related to scattering, found in Table 3.4 were model specific andthe simulation defaults were used. The articles utilizing the model, such as [36,37] did not specify if these parameters were altered or not. It is difficult to discussif the introduced scattering was too large or small. When building the channelsimulations, it was noticed that increasing the 3 dB width of the psd resultedin increased time-variance in the channel and higher ber. So the parametersrelated to scattering might have been unrealistic and considering their impact onthe results, an unrealistic choice might yield deceiving results.

Time-variant Filter

The time-variant filter described in Algorithm 1 was developed for the study. Itwas difficult to find existing implementations or information on how to imple-ment time-variant filtering. The developed algorithm, which is a discrete ap-proximation, is prone to errors. If time was available this algorithm could beimproved or replaced, to obtain more realistic results, since the discrete approxi-mation probably yields some uneven boundaries (between samples) and odd be-haviors.

5.2.2 Machine Learning Software

The MATLAB Deep Learning toolbox allowed for simple implementations ofanns and it abstracted some of the processes compared to other popular frame-works. The image input layer in the dnn performed some kind of normalizationof the data before training, how this affects the network when it was deployedis unknown. It would be preferable to utilize the same kind of input layer, how-ever, utilizing a classification output layer and the sequence input layer was notpossible with the dnn structure, due to some issues with the toolbox.

5.3 Relation to Other Work

The approach chosen in [37] was considered very promising as the problem for-mulation was very similar to that of this article. In this study these tests weresomewhat reproducible, i.e., for some time-varying channels, online-training waspossible. It was not beneficial, while in [37] it was concluded that online-trainingwas beneficial. These results were not reproducible for high ber channels (seeFigure D.1 and Figure D.2), where only the offline-training could outperform thedfe-pll. A pure online-training approach for high ber channels was not possi-ble, as training did not converge. Looking deeper into [37], one can notice thatthe channel behaves more similar to Figures D.4, D.5 and D.6.

Most articles utilized quite simple receiver and transmitter structures, while this

5.4 Sources 49

study evaluates a more intricate receiver and transmitter structure with a lot ofsteps. For example, related work often takes a look at uncoded ber, which is notthe case in this work. A lot of these published articles highlight the advantageof ml by outperforming these simple transmitter and receiver structures, whatcan be noticed from the results is that the dfe-pll performs better than the dnnand lstm in most scenarios. Some articles make their arguments based on perfor-mance in one or a selected few channels [10, 11, 36]. So performance in generalconditions or for several channels is not necessarily proven. In this study, someselect channels were identified, where the lstm network could solve the desiredproblem better, but most simulations showed that the baseline dfe-pll providedsatisfactory performance. So comparing the results from this thesis can be diffi-cult, due to these two factors:

• The different levels of complexity in the baseline physical layer,

• The studied simulated channels.

The results in this thesis imply that the baseline receiver performs better in mostscenarios compared to most other studies which highlight the gains of usinganns.

5.4 Sources

This thesis contains plentiful of references, and most references were researcharticles. Most articles were obtained from the IEEE Xplore database. A lot of theolder articles referenced in this work were also referenced in some of the morecontemporary work. An explanation could be that a handful of articles were writ-ten by the same authors. A possible explanation could be that the research fieldis quite small, where fundamental research is driven by a handful of individuals.A lot of the more recent work cited in the thesis also cite on articles cited in thiswork. Which legitimizes the choice of articles related to uwa communication. Ar-ticles which did not relate to uwa communication were articles related to radiocommunication and ml. The articles could be considered trustworthy and werewritten in well established research fields.

Other references cited in this thesis were some classical textbooks, they wereused mostly to motivate fundamental theory of communications and underwa-ter sound. A few websites were used as references. The websites were eitherfrom well established institutions or open source code or software libraries. Thewebsites related to source code and software libraries, were often referenced inrelated work.

5.5 The Thesis in a Larger Perspective

The thesis attempts to improve equalization performance in time-varying uwacommunication using ml. The research is on a fundamental level, thus a dis-cussion of the implications should be made based on possible applications. The

50 5 Discussion

applications of uwa communication are many, they can be civilian or military.

The thesis is written in cooperation with Saab Dynamics, who can benefit fromthe knowledge in this thesis. An introduction to Saab Dynamics is presented earlyon in the thesis, in Section 1.5. An example of a product which could utilize wire-less uwa communication is a torpedo. What appear to be harmless algorithmspresented in the thesis could be used in warfare. Technology and warfare is a typ-ical ethical dilemma which engineers have to consider. What one thinks aboutwarfare, weapon manufacturing, and related issues is up to each individual. Con-sidering the torpedo example, one can discuss if improved wireless connectionbetween the torpedo increases or decreases its potential harm. Maybe collateraldamage can be decreased with improved wireless connection?

This discussion can be expanded upon and from the perspective of Saab Dynam-ics. However, considering that the research results in this article are available tothe public, the presented results can be utilized by any stakeholder and not SaabDynamics exclusively. Many applications described in Section 1.5 are relativelyharmless and can be utilized for the benefit of civilian society. For example under-water wireless technology can enable underwater vessels to operate in hazardousenvironments, which would otherwise require a diver.

6Conclusions

This chapter concludes the study. First, the results are tied together with theproblem formulation in Section 1.3. Finally, suggestions for how the work can becarried forward are discussed.

6.1 Evaluation

Each of the problems highlighted in Section 1.3 is answered in this section.

Study: in which environments dnn or rnn based channel estimation andchannel equalization can improve performance compared to dfe-pll.

It was shown that an rnn (lstmmore specifically) based channel estimation andequalization can improve performance in highly time-varying channels. So rnnproved to be an interesting solution in conditions which are very tough for thedfe-pll. In simpler less time-varying conditions, neither rnns nor dnns offeredimproved performance, which represented a majority of the studied scenarios.The dnn did not offer improved performance in any studied conditions.

Study: why it can offer improved performance and how much can the perfor-mance be improved.

The performance improvement was as mentioned in highly time-varying chan-nels, where there were only highly time-varying receptions. The very specifi-cally trained lstm could thus offer improved performance compared to the base-line. Probably the only received components were scattered reflections. A berof 0.2 could be gained at high snrs, for low snrs the performance was identical.It could probably be improved as the specifically trained network could learn

51

52 6 Conclusions

the specific fast fluctuating channel better than the dfe-pll. The failed online-training experiments highlight that the injected training symbols might be toofew when the channel behaves in this way. So the improved performance is prob-ably due to the network that has learned a specific channel very well and thelstms property of learning complex time-patterns is useful in this scenario.

Study: possibilities to enable general performance, online-training or a data-driven approach.

It was shown that online-training was feasible in channels were the existingdfe-pll performed well, however, highly time-varying channels could not uti-lize pure online-training. A data-driven approach was quickly considered non-viable for the suggested network structures, deeper structures might leverage adata-driven approach better.

The results are summed up below.

• In very tough conditions, ml can improve performance, by training in thespecific conditions.

• In simple conditions, where performance is noise limited, ml could notimprove performance.

• Online-training proved to be a viable deployment strategy, however, in themost difficult channels it seems unable to solve the problem. And in situa-tions where it could operate, the baseline dfe-pll outperforms the lstm.

6.2 Future Work

There are a lot of possible ways to move forward with work directly related to thestudy.

Baseline Physical Layer

A large part of this work was based on the frss physical layer. The frss physicallayer is sophisticated and built for reliable uwa communication. Future workcould try to deviate from the frss physical layer and try to use more standardmodels. This might yield more similar results or at least comparable to relatedresearch, which would be very interesting.

Artificial Neural Networks

Related to the anns there are a lot of hyper-parameters that can be studied. Whathappens if the number of neurons in the layers is increased or decreased? Canthe training options be optimized further? Maybe some of the results presentedcould be improved if these aspects are studied more closely. Another interest-ing angle would be to try more sophisticated ann structures. Combinations of

6.3 Final Words 53

recurrent structures, feedforward networks, or convolutional anns would be in-teresting to try out. This article has not looked into any pre-processing of thedata, could a principal component analysis be performed to reduce dimensions?Can the data fed to the anns be pre-processed in another way to aid the anntraining? With enough time and craftmanship, ann results could be improved.

A combination of offline and online-training was highlighted in Section 2.5, ini-tial studies of this approach were done, those early results highlighted some gains,but this kind of strategy requires previous knowledge of the deployment scenario.As an untrained network is preferable to deploy in online-training rather than anetwork that has trained on an irrelevant scenario.

Sea trials

To verify how close the channel simulations were to reality, underwater testsshould be carried out. A replica of the underwater scenario can be built to com-pare how the received signals look like and how this affects the ber. There wereno underwater test runs available for this study, so that kind of verification wasnot possible. It is of course always desirable to test performance in true envi-ronments, it seems very common that articles in uwa communication performan actual sea trial to verify their algorithms. At least based on the articles refer-enced in this project, this might be due to wireless radio communications havingmore established simulation models, whereas uwa communications do not havethe same established standard models, highlighted in Section 2.2. Why an actualunderwater deployment would be preferable to verify performance, consideringthat the simulation yields discrete-time approximations, which is an obvious ap-proximation of the time-continuous reality. This would identify how solid thechannel model is. It would also be interesting to try out the anns in an underwa-ter test and see how the results hold up to the simulations.

6.3 Final Words

This study highlights how anns can solve complex equalization issues and insome environments outperform the dfe-pll structure. However, the main re-sults obtained from this study are that the baseline physical layer provides stableperformance in most environments and the gains from using anns are limited ornone. ann performance can be generalized, but performance is somewhat unre-liable.

The results suggest that the researchers who deploy ml should consider morecomplex physical layer structures as benchmarks. Maybe mathematically derivedmethods, such as the dfe-pll structure and simple techniques such as channelcoding are more reliable and simpler than anns. Factors such as computationalcomplexity and power consumption when running online-training have not evenbeen mentioned. Factors that can not be disregarded when deploying any com-munication network, especially underwater where available computing power

54 6 Conclusions

can be very limited. anns have become very popular in recent years. This study,however, hints that existing and proven methods have the upper hand on annsfor channel equalization in uwa communications.

Appendix

ABathymetry Profiles

A shallow and a deeper scenario are simulated. This is important as both con-ditions are present in Swedish waters and it also gives some insight into whichscenario offers more difficulties.

A.1 Shallow Scenario

The shallow scenario considers a maximum depth of 18 meters and a channeldistance of 1 km.

Flat Bottom

Communication in the presence of a flat bottom is a somewhat common scenario.In this setup the receiver and transmitter are in line of sight. We can expectreflections from the surface and bottom. Figure A.1 illustrates this scenario.

57

58 A Bathymetry Profiles

Bathymetry profile, flat

0 100 200 300 400 500 600 700 800 900 1000

Distance (m)

-30

-25

-20

-15

-10

-5

0

5

10

Depth

(m

)Sea

Tx

Rx

Figure A.1: Illustration of the shallow flat bottom profile.

Slope

Communication in the presence of slope is common in coastal scenarios wherethe depth increases with the distance. In this scenario transmitter and receiverare not in line of sight. Hence, the communication relies on reflections in themedium for the acoustic waves to be picked up at the receiver. Figure A.2 illus-trates the scenario.

A.1 Shallow Scenario 59

Bathymetry profile, slope

0 100 200 300 400 500 600 700 800 900 1000

Distance (m)

-30

-25

-20

-15

-10

-5

0

5

10

Depth

(m

)

Sea

Tx

Rx

Figure A.2: Illustration of the shallow slope bottom profile.

Obstacle

Communication in the presence of an obstacle is a general common scenario. Inthis scenario transmitter and receiver are not in line of sight. Hence, the commu-nication relies on reflections in the medium for the acoustic waves to be pickedup at the receiver. Figure A.2 illustrates the scenario. The obstacle presented inFigure A.3 is not necessarily realistic.


Bathymetry profile, obstacle

0 100 200 300 400 500 600 700 800 900 1000

Distance (m)

-30

-25

-20

-15

-10

-5

0

5

10

Depth

(m

)Sea

Tx

Rx

Figure A.3: Illustration of the shallow obstacle bottom profile.

A.2 Deep Scenario

The deep scenario considers a maximum depth of 72 meters and a channel dis-tance of 1 km. The same bathymetry profiles as in the shallow scenario are con-sidered. Figure A.4, A.5 and A.6 illustrate the deep bathymetry profiles. Thereasoning behind the choice of profiles is the same as for the shallow scenario.

A.2 Deep Scenario 61

Bathymetry profile, flat

0 100 200 300 400 500 600 700 800 900 1000

Distance (m)

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

Depth

(m

)

Sea

Tx

Rx

Figure A.4: Illustration of the deep flat bottom profile.

Bathymetry profile, slope

0 100 200 300 400 500 600 700 800 900 1000

Distance (m)

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

Depth

(m

)

Sea

Tx

Rx

Figure A.5: Illustration of the deep slope bottom profile.


Bathymetry profile, obstacle

0 100 200 300 400 500 600 700 800 900 1000

Distance (m)

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

Depth

(m

)Sea

Tx

Rx

Figure A.6: Illustration of the deep obstacle bottom profile.

BSound Speed Profiles

All the utilized sound speed profiles are presented and discussed in this chapter.Due to having two different scenarios, namely deep and shallow, two differentsets of ssps are created. Note that the sound speed is plotted on the x-axis con-trary to the mathematical intuition (as sound speed is a function of depth). Thisis, however, the standard approach in the field.

As mentioned in Section 3.3.2, the figures are manually created replicates of theSwedish Meteorological and Hydrological Institute’s plots. So the number of datapoints in each ssp can vary, since the data points, which were used by the SwedishMeteorological and Hydrological Institute were not available.

B.1 Shallow Scenario

The first profile from 2019-03-16, see Figure B.1, is used to capture the behaviorof early spring/late winter. The sound speed is almost constant in the medium,probably due to a homogeneous temperature in the shallow water.

63

64 B Sound Speed Profiles

1400 1405 1410 1415 1420 1425 1430 1435 1440 1445 1450

Sound speed (m/s)

0

2

4

6

8

10

12

14

16

18

Depth

(m

)

2019-03-16 04:45 REF M1V1

Figure B.1: Sound speed as a function of depth, data from 2019-03-16 04:45REF M1V1.

The second profile from 2019-06-05, see Figure B.2, is used to capture the be-havior of early summer. The sound speed varies a lot and decreases for largerdepths. This is probably due to high surface temperatures during the early sum-mer (with a lot of sunlight), when the water below has not been heated up to thesame temperature.

B.1 Shallow Scenario 65

1450 1455 1460 1465 1470 1475 1480

Sound speed (m/s)

0

2

4

6

8

10

12

14

16

18

Depth

(m

)

2019-06-15 14:40 REF M1V1


The third profile from 2019-08-21, see Figure B.3, is used to capture the behav-ior of late summer. The sound speed only changes in the bottom levels of thewater. In late summer the water has been heated up throughout a longer periodresulting in a more even temperature distribution and only the waters below tenmeters are cooler, yielding the following profile.


1440 1450 1460 1470 1480 1490 1500

Sound speed (m/s)

0

2

4

6

8

10

12

14

16

18

Depth

(m

)

2019-08-21 21:12 REF M1V1


The final profile from 2019-09-21, see Figure B.4, behaves similarly to the profilefrom late August, however, the max sound speed has decreased and the curvebehaves a bit differently, as water at lower depths is cooled down.

B.2 Deep Scenario 67

1450 1455 1460 1465 1470 1475 1480

Sound speed (m/s)

0

2

4

6

8

10

12

14

16

18

Depth

(m

)

2019-09-14 05:18 REF M1V1


No ssps from mid-winter or autumn was chosen, as the winter profiles behavedsimilarly to the 2019-03-16 profile with constant sound speed.

B.2 Deep Scenario

The first profile from 2019-01-07, see Figure B.5, is used to capture the behaviorof the sound speed during hard winter.


1450 1455 1460 1465 1470 1475 1480 1485 1490 1495 1500

Sound speed (m/s)

0

10

20

30

40

50

60

70

Depth

(m

)

2019-01-07 13:50 Släggö

Figure B.5: Sound speed as a function of depth, data from 2019-01-07 13:50Släggö.

The second profile from March, see Figure B.6, is used to capture the behaviorof early spring. In early spring a lot of fresh-water from the rivers flow into theocean, due to the melting of snow. The fresh-water has low salinity compared toocean-water and low temperature. This results in a varying sound speed profile.


1440 1445 1450 1455 1460 1465 1470 1475 1480

Sound speed (m/s)

0

10

20

30

40

50

60

70

Depth

(m

)

2019-03-05 07:23 Släggö


The third profile from early May, see Figure B.7, in the early morning showsquite a flat sound speed profile. Deep sea temperatures are now more in linewith surface temperatures compared to March.


1472 1473 1474 1475 1476 1477 1478

Sound speed (m/s)

0

10

20

30

40

50

60

70

Depth

(m

)

2019-05-06 07:27 Släggö


In July one can assume that the surface temperature has risen a lot due to expo-sure to the sun which yields a new sound speed profile, see Figure B.8, which isopposite to the behavior in March.


1450 1460 1470 1480 1490 1500 1510 1520

Sound speed (m/s)

0

10

20

30

40

50

60

70

Depth

(m

)

2019-07-02 08:24 Släggö


In November surface temperatures are cooler compared to the bottom tempera-ture. On the other hand, the temperatures in deeper water are still warm afterthe summer temperatures, which yields a unique sound speed curve, see FigureB.9.


1480 1482 1484 1486 1488 1490 1492 1494 1496 1498 1500

Sound speed (m/s)

0

10

20

30

40

50

60

70

Depth

(m

)

2019-11-04 08:45 Släggö


CChannel Simulations

The simulated channels are presented in this section. The simulations are per-formed according to Algorithm 2, where the baseline receiver is utilized. Thelegend presents three kinds of bathymetries associated with flat, slope, or obsta-cle, explained in Chapter A. The dates in the legend are associated with an ssp,specified in Chapter B.

The first simulated channel was the shallow scenario with a sandy bottom type,presented in Figure C.1. The flat profile with ssp 2019-08-21 is associated withber being equal to 1, this is due to the receiver not picking up any informationupon arrival. The obstacle with ssp 2019-09-14 yielded constant false alarms inall receptions, so the ber was set to 1.

73

74 C Channel Simulations

Figure C.1: Channel simulation, shallow scenario with sandy bottom type.

The second simulated channel was the shallow scenario with a clay bottom type,presented in Figure C.2. Once again flat profile with ssp 2019-08-21 has a ber of1, this is due to the receiver not picking up any meaningful signal, so the ber is 1.The obstacle with ssp 2019-09-14 yielded constant false alarms in all receptions,so the ber was set to 1.

75

Figure C.2: Channel simulation, shallow scenario with clay bottom type.

The third simulated channel was the deep scenario with a sandy bottom type,presented in Figure C.3. The obstacle with ssp 2019-07-02 yielded constant falsealarms in all receptions, so the ber was set to 1.

76 C Channel Simulations

Figure C.3: Channel simulation, deep scenario with sandy bottom type.

The fourth simulated channel was the deep scenario with a clay bottom type,presented in Figure C.4. The slope profile with ssp 2019-03-05 does not pick upany signal. The obstacle with ssp 2019-07-02 yielded constant false alarms in allreceptions, so the ber was set to 1.

77

Figure C.4: Channel simulation, deep scenario with clay bottom type.

The channels which exhibited the constant false alarms were all from the obstacleprofile with the fourth ssp. The hahavior was very odd, and since it had nothingto do with the dfe-pll these results were disregarded.

DTime-varying Channels

The channel impulse responses of the channels used in the results section arepresented in this chapter. The intention is to give the reader an understanding ofhow the channels behave and why some are more or less difficult for the equal-izers. Figures D.1 to D.6 plot how the channel impulse varies over time. It rep-resents the simulated h[i, j], which is utilized in Section 3.4.1. The values on thex-axis correspond to to time delays 0, dτ, ...Td and the y-axis to 0, dt, ...T .

Figure D.1: Channel impulse response for deep scenario, slope profile, sandbottom ssp 2019-03-05.

79

80 D Time-varying Channels

Figure D.2: Channel impulse response for deep scenario, slope profile, sandbottom ssp 2019-05-06.

Figure D.3: Channel impulse response for deep scenario, obstacle profile,clay bottom ssp 2019-01-07.

81

Figure D.4: Channel impulse response for shallow scenario, obstacle profile,clay bottom ssp 2019-03-16.


82 D Time-varying Channels


Bibliography

[1] 3GPP. LTE. URL https://www.3gpp.org/technologies/keywords-acronyms/98-lte.

[2] Lars Ahlin, Jens Zander, and Ben Slimane. Principles of Wireless Commu-nications. Studentlitteratur, 1:2 edition, 2006. ISBN 978-91-44-03080-7.

[3] Lazar Atanackovic, Ruoyu Zhang, Lutz Lampe, and Roee Diamant. Statisti-cal Shipping Noise Characterization and Mitigation for Underwater Acous-tic Communications. OCEANS 2019 - Marseille, pages 1–7, 2019. doi:10.1109/oceanse.2019.8867520.

[4] Qinbo Bai, Jintao Wang, Yue Zhang, and Jian Song. Deep Learning basedChannel Estimation Algorithm over Time Selective Fading Channels. IEEETransactions on Cognitive Communications and Networking, pages 1–1,2019. doi: 10.1109/tccn.2019.2943455.

[5] Mandar A. Chitre, John R. Potter, and Sim Heng Ong. Optimal and near-optimal signal detection in snapping shrimp dominated ambient noise.IEEE Journal of Oceanic Engineering, 31(2):497–503, 2006. ISSN 03649059.doi: 10.1109/JOE.2006.875272.

[6] Klaus Greff, Rupesh K. Srivastava, Jan Koutnik, Bas R. Steunebrink, andJurgen Schmidhuber. LSTM: A Search Space Odyssey. IEEE Transactionson Neural Networks and Learning Systems, 28(10):2222–2232, 2017. ISSN21622388. doi: 10.1109/TNNLS.2016.2582924.

[7] Jiaqi Gu, Chuanqiang Shan, Xiaohui Chen, Huarui Yin, and WeidongWang. A Novel Pilot-Aided Channel Estimation Scheme Based on RNNfor FDD-LTE systems. 2018 10th International Conference on WirelessCommunications and Signal Processing, WCSP 2018, pages 1–5, 2018. doi:10.1109/WCSP.2018.8555634.

[8] Fredrik Gustafsson, Lennart Ljung, and Mille Millnert. Signal Processing.Studentlitteratur, 1:3 edition, 2010. ISBN 978-91-44-05835.

83

https://www.3gpp.org/technologies/keywords-acronyms/98-lte

https://www.3gpp.org/technologies/keywords-acronyms/98-lte

84 Bibliography

[9] Zhengyu Hou, Zhong Chen, Jingqiang Wang, Xufeng Zheng, Wen Yan,Yuhang Tian, and Yun Luo. Acoustic impedance properties of seafloor sed-iments off the coast of Southeastern Hainan, South China Sea. Journalof Asian Earth Sciences, 154(December 2017):1–7, 2018. ISSN 13679120.doi: 10.1016/j.jseaes.2017.12.003. URL https://doi.org/10.1016/j.jseaes.2017.12.003.

[10] Yang Hu, Ling Zhao, and Yue Hu. Joint Channel Equalization and Decod-ing with One Recurrent Neural Network. pages 1–4, 2020. doi: 10.1109/bmsb47279.2019.8971938.

[11] Rongkun Jiang, Xuetian Wang, Shan Cao, Jiafei Zhao, and Xiaoran Li. DeepNeural Networks for Channel Estimation in Underwater Acoustic OFDMSystems. IEEE Access, 7:23579–23594, 2019. ISSN 21693536. doi: 10.1109/ACCESS.2019.2899990.

[12] Daniel B. Kilfoyle and Arthur B. Baggeroer. State of the art in underwateracoustic telemetry. IEEE Journal of Oceanic Engineering, 25(1):4–27, 2000.ISSN 03649059. doi: 10.1109/48.820733.

[13] Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochasticoptimization. 3rd International Conference on Learning Representations,ICLR 2015 - Conference Track Proceedings, pages 1–15, 2015.

[14] Xiangang Li and Xihong Wu. Long short-term memory based convolutionalrecurrent neural networks for large vocabulary speech recognition. Proceed-ings of the Annual Conference of the International Speech CommunicationAssociation, INTERSPEECH, 2015-Janua:3219–3223, 2015. ISSN 19909772.

[15] Shengxing Liu and Chienchung Shen. Impact of sea waves on performanceof shallow water acoustic communications. 2016 IEEE/OES China OceanAcoustics Symposium, COA 2016, pages 1–5, 2016. doi: 10.1109/COA.2016.7535761.

[16] Daniel E. Lucani, Milica Stojanovic, and Muriel Médard. On the relation-ship between transmission power and capacity of an underwater acousticcommunication channel. OCEANS’08 MTS/IEEE Kobe-Techno-Ocean’08- Voyage toward the Future, OTO’08, 2(2):3–8, 2008. doi: 10.1109/OCEANSKOBE.2008.4531073.

[17] Xiaoli Ma, Hao Ye, and Ye Li. Learning Assisted Estimation for Time-Varying Channels. Proceedings of the International Symposium on Wire-less Communication Systems, 2018-Augus, 2018. ISSN 21540225. doi:10.1109/ISWCS.2018.8491068.

[18] Meisam Naderi, Do Viet Ha, Van Duc Nguyen, and Matthias Patzold. Mod-elling the Doppler power spectrum of non-stationary underwater acousticchannels based on Doppler measurements. OCEANS 2017 - Aberdeen, 2017-Octob:1–6, 2017. doi: 10.1109/OCEANSE.2017.8084993.

https://doi.org/10.1016/j.jseaes.2017.12.003

https://doi.org/10.1016/j.jseaes.2017.12.003

Bibliography 85

[19] Michael A. Nielsen. Neural Networks and Deep Learning. DeterminationPress, 2015.

[20] Michael B Porter. The BELLHOP manual and user’s guide. HLS Research, ,2010, pages 1–57, 2011. URL http://oalib.hlsresearch.com/Rays/HLS-2010-1.pdf.

[21] Parastoo Qarabaqi. Statistical modeling of a shallow water acous-tic communication channel. Proc. Underwater Acoustic, 38(4):701–717, 2009. URL http://promitheas.iacm.forth.gr/uam2009/lectures/pdf/33-9.pdf.

[22] Parastoo Qarabaqi and Milica Stojanovic. Acoustic Channel Simula-tor. pages 2–3, 2014. URL http://oalib.hlsresearch.com/Rays/acoustic_channel_simulator_code/.

[23] M. Stojanovic, J. Catipovic, and J. G. Proakis. Adaptive multichannel com-bining and equalization for underwater acoustic communications. Journal- Acoustical Society of America, 94(3):1609–1620, 1993. ISSN 0001-4966.doi: 10.1121/1.408135.

[24] Milica Stojanovic. Recent advances in high-speed underwater acoustic com-munications. IEEE Journal of Oceanic Engineering, 21(2):125–136, 1996.ISSN 03649059. doi: 10.1109/48.486787.

[25] Milica Stojanovic. Underwater Acoustic Communications: Design Consider-ations on the Physical Layer. 1(2), 2008.

[26] Milica Stojanovic, John G. Proakis, and Josko A. Catipovic. Phase-CoherentDigital Communications for Underwater Acoustic Channels. IEEE Jour-nal of Oceanic Engineering, 19(1):100–111, 1994. ISSN 15581691. doi:10.1109/48.289455.

[27] Swedish Meteorological and Hydrological Institute. Havsprofilerfysik. URL https://www.smhi.se/vadret/hav-och-kust/havsobservationer/havsprofil_ctd.htm.

[28] Inc. The MathWorks. Deep Learning Toolbox. URL https://se.mathworks.com/products/deep-learning.html.

[29] David Tse and Pramod Viswanath. Fundamentals of Wireless Communica-tion. Cambridge University Press, 2005. URL https://web.stanford.edu/~dntse/wireless_book.html.

[30] Robert J. Ulrick. Principles of underwater sound. Peninsula Publishing, 3rdedtiti edition, 2013.

[31] Paul Van Walree, Erland Sangfelt, and Geert Leus. Multicarrier spreadspectrum for covert acoustic communications. Oceans 2008, 2008. doi:10.1109/OCEANS.2008.5151841.

http://oalib.hlsresearch.com/Rays/HLS-2010-1.pdf

http://oalib.hlsresearch.com/Rays/HLS-2010-1.pdf

http://promitheas.iacm.forth.gr/uam2009/lectures/pdf/33-9.pdf

http://promitheas.iacm.forth.gr/uam2009/lectures/pdf/33-9.pdf

http://oalib.hlsresearch.com/Rays/acoustic_channel_simulator_code/

http://oalib.hlsresearch.com/Rays/acoustic_channel_simulator_code/

https://www.smhi.se/vadret/hav-och-kust/havsobservationer/havsprofil_ctd.htm

https://www.smhi.se/vadret/hav-och-kust/havsobservationer/havsprofil_ctd.htm

https://se.mathworks.com/products/deep-learning.html

https://se.mathworks.com/products/deep-learning.html

https://web.stanford.edu/~dntse/wireless_book.html

https://web.stanford.edu/~dntse/wireless_book.html

86 Bibliography

[32] Paul Van Walree, Helge Buen, and Roald Otnes. A performance comparisonbetween DSSS, M-FSK, and frequency-division multiplexing in underwa-ter acoustic channels. 2014 Underwater Communications and Networking,UComms 2014, (3191):1–5, 2014. doi: 10.1109/UComms.2014.7017133.

[33] Paul A. Van Walree, Trond Jenserud, and Morten Smedsrud. A discrete-timechannel simulator driven by measured scattering functions. IEEE Journal onSelected Areas in Communications, 26(9):1628–1637, 2008. ISSN 07338716.doi: 10.1109/JSAC.2008.081203.

[34] Paul A Van Walree, Geert Leus, and Senior Member. Robust UnderwaterTelemetry With Adaptive Turbo Multiband Equalization. IEEE Journal ofOceanic Engineering, Vol. 34, No. 4, October 2009, 34(4):645–655, 2009.

[35] Lars Michael Wolff, Erik Szczepanski, and Sabah Badri-Hoeher. Acousticunderwater channel and network simulator. Program Book - OCEANS 2012MTS/IEEE Yeosu: The Living Ocean and Coast - Diversity of Resources andSustainable Activities, pages 1–6, 2012. doi: 10.1109/OCEANS-Yeosu.2012.6263608.

[36] Jing Zhang, Yu Cao, Guangyao Han, and Xiaomei Fu. Deep neural network-based underwater OFDM receiver. IET Communications, 13(13):1998–2002,2019. ISSN 17518628. doi: 10.1049/iet-com.2019.0243.

[37] Youwen Zhang, Junxuan Li, Yuriy V. Zakharov, Jianghui Li, Yingsong Li,Chuan Lin, and Xiang Li. Deep Learning Based Single Carrier Communica-tions over Time-Varying Underwater Acoustic Channel. IEEE Access, 7(Dl):38420–38430, 2019. ISSN 21693536. doi: 10.1109/ACCESS.2019.2906424.

[38] Ruiqin Zhao, Miao Li, and Weigang Bai. Underwater acoustic networks en-vironment simulation with combination of BELLHOP and OPNET modeler.In OCEANS 2017 - Aberdeen, volume 2017-Octob, pages 1–4. Institute ofElectrical and Electronics Engineers Inc., oct 2017. ISBN 9781509052783.doi: 10.1109/OCEANSE.2017.8085016.

channel equalization using machine learning for underwater...

Documents