speech watermarking for air traffic control · digital watermarking was identiﬁed as possible...

The information contained in this document is the property of the EUROCONTROL Agency and no part should be reproduced in any form without the Agency’s permission.

The views expressed herein do not necessarily reflect the official views or policy of the Agency.

EUROPEAN ORGANISATION FOR THE SAFETY OF AIR NAVIGATION

EUROCONTROL

EUROCONTROL EXPERIMENTAL CENTRE

SPEECH WATERMARKING FOR AIR TRAFFIC CONTROL

EEC Note No. 05/05

Project INO-2-AT-AIT1

Issued: February 2005

REPORT DOCUMENTATION PAGE

Reference: EEC Note No. 05/05

Security Classification: Unclassified

Originator: EEC – INO (Innovative Research)

Originator (Corporate Author) Name/Location: Graz University of Technology Signal Processing and Speech Communication Laboratory Inffeldgasse 12 A-8010 Graz Telephone : +43 (0)316 / 873-7441

Sponsor: EUROCONTROL Experimental Centre

Sponsor (Contract Authority) Name/Location: EUROCONTROL Experimental Centre Centre de Bois des Bordes B.P.15 F – 91222 Brétigny-sur-Orge CEDEX FRANCE Telephone: +33 (0)1 69 88 75 00 Internet: www.eurocontrol.int

TITLE:

SPEECH WATERMARKING FOR AIR TRAFFIC CONTROL

Authors Martin Hagmüller (TU Graz)

Gernot Kubin (TU Graz)

Date 02/2005

Pages xii + 51

Figures 37

Tables 3

Annexes 2

References 30

Project

INO-2-AT-AIT1 Task No. Sponsor

Period 2003

Distribution Statement: (a) Controlled by: Head of INO (b) Special Limitations: None

Descriptors (keywords): Voice Communication, VHF, UHF, HF, Air Ground Communication, Aircraft Identification, Call Sign Confusion, Safety, Security, Digital Signature, Watermark Embedding, Analogue Voice, Virtual PTT Switch, Fake Communication, Attacker, Speech Watermarking, Spread-Spectrum, Perceptual Hiding, Data Embedding, Linear Prediction, Error Control Coding, Air Traffic Control

Abstract:

The Air Traffic Control Voice Communication, it is desirable to transmit additional aircraft identification data over the analogue VHF voice channel. That means the additional data has to be embedded into the speech signal. Such watermarking systems are used e.g. for CD-audio copyright protection, where copyright data is embedded into the music without any audible distortion. Current approaches such as spread spectrum watermarking, quantization index modulation, and echo hiding are presented. A system for speech watermarking in an air traffic control environment is developed. The system uses spread spectrum technology with linear prediction for spectral shaping of the watermark to achieve perceptual hiding. Error Control Coding is done using the BCH-code, which can both detect and correct errors. Results show that, for 12 bits/s, the watermark can be transmitted at a level which is hardly audible with very low error rate (≤ 0.1%). For the transmission of 24 and 36 bits/s the watermark level has to be increased to an audible but not annoying level so as to stay in a low error rate region.

Speech Watermarking for Air Traffic Control

Preface

Since its beginnings air traffic control relies on the voice communication between theaircraft pilots and the air traffic control operators. The avionic radio is the main toolof the controller for giving flight instructions and clearances to the pilot. Only the radarsystem and the flight plan data give the controller further information about the currentposition of the aircraft and its intended route and destination.

Therefore it is crucial for a secure and safe operation to have a reliable and failsafe radiocommunication network to guarantee the possibility of communication at any given time.From a technical point of view, high effort is put into the systems in order to providepermanent availability through robust and redundant design.

Once this ”technical” link between ground and aircraft is established (which we assumefurther-on), the verbal communication between pilot and controller can start.

In order to avoid misunderstandings and to guarantee a common terminology, the twoparties use a restricted, very simple, common language. Although most of the wordsare borrowed from English language, the terms and structure of this language are clearlydefined in the corresponding ICAO standards. Every voice communication on the aero-nautical channel is supposed to take place according to this ICAO terminology.

In the classical ATC environment, several person are listening to and talking via thesame radio channel. This is usually called a ”party-line”, used by the air traffic controllerand the aircrafts in the corresponding flight sector.

In order to establish a meaningful communication in this environment, it has to beclear who is talking (the addresser, sender, originator) and to whom the current word isaddressed to (the addressee, recipient, acceptor). For the ATC air-ground communication,certain rules are in place to make this identification clear. In the standard situations, theair traffic controller is the only person on the channel that does not identify himself asaddresser on the beginning of the message. But he starts the message with the call-signof the aircraft, the addressee of the message.

In case not otherwise explicitly specified, every voice message of the aircraft pilot isinherently addressed to the air traffic controller. Therefore the identification of the ad-dressee (the controller in this case) is omitted. The pilot starts the message with hiscall-sign to identify himself to the controller as addresser of the message .

The correct identification of addresser and addressee is crucial for a safe communication.If the identification is not understood by the addressee, the entire message is declared voidand has to be repeated.

More importance in terms of safety has the case of a wrongly understood identification.By that, a controller would assign an information given by the pilot to the wrong aircraft.Therefore any future action does not base on correct assumptions anymore. This highly

v


compromises safety. This ”mis-identification” is most likely to happen, when aircrafts withsimilar call-sign are present at the same channel, and either one addresser or addresseemistakes the two call-signs. This potential risk is usually referred to as call-sign confusion.

In order to address this safety-critical problem, an idea was born to develop up a system,which automatically identifies and displays the addresser, the originator of the message.

The advantages would be two-fold. In terms of safety the controller would get aconfirmation for the call-sign of the sender (the aircraft), even in cases when the call-signwas said incompletely or not understandable. In terms of security, the aircraft pilot couldreassure that a given instruction was indeed issued by the authorised air traffic controllerand not by any spurious third party.

Different already existing applications gave inspiration for the system. The most obviousis the Radio Data System (RDS) for FM broadcast radio in the consumer market, whichdisplays the name of the radio station once the user has selected the channel.

Although serving the same purpose, the technology is not directly transferable to theavionic air-ground communication, as several restrictions have to be considered.

The identification system should work without human interaction. For a rapid deploy-ment within a considerable amount of time, major modification to the aircraft equipment(as additional transceivers, etc) should be avoided. Taking into account the current lack ofbandwidth and frequencies for avionic radio, additional bandwidth-requiring digital trans-mission channels are not feasible in the near future. Additionally, as the implementationof new technologies in the avionic world usually takes place within a long transition period,backward-compatibility and undisturbed co-existence with existing radio communicationsystems has to be granted.

Digital watermarking was identified as possible solution which fulfils the intended pur-pose and respects the above restrictions. The technique embeds the addresser’s identi-fication as a small digital tag into the voice signal. It is therefore inherently transmittedwith the voice signal and can be read by the addressee.

The study presented in this note evaluates the feasibility of the watermarking technologyfor the above automatic addresser identification in VHF air/ground voice communications.On behalf of Eurocontrol, the study was carried out by the Signal Processing and SpeechCommunication Laboratory of Graz University of Technology (Austria). It provides prom-ising results and is the basis for further research and development along this direction.

Horst Hering (Eurocontrol Experimental Centre)Konrad Hofbauer (Graz University of Technology)February 2005

vi


Contents

1. Introduction 11.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Outline of the Desired System . . . . . . . . . . . . . . . . . . . . . . . 1

2. State-of-the-Art 32.1. Security Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2. Spread Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3. Quantization Index Modulation (QIM) . . . . . . . . . . . . . . . . . . 62.4. Echo Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5. Phase Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.6. Frequency Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.7. Speech Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.8. Perfomance Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . 10

3. System Description 113.1. Chosen Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2. Error Control Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3. Data Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4. Data Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.1. Energy Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4.2. Spectral Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5. AM-Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.6. Whitening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.7. Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.8. Data De-Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.9. Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.10. Overview of the System . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4. Performance Analysis 234.1. Channel data rate = 80bits/sec . . . . . . . . . . . . . . . . . . . . . . 24

4.1.1. Payload data word = 12 bits . . . . . . . . . . . . . . . . . . . 244.1.2. Payload data word = 24 bits . . . . . . . . . . . . . . . . . . . 284.1.3. Payload data word = 36 bits . . . . . . . . . . . . . . . . . . . 31

4.2. Channel data rate = 100 bits/sec . . . . . . . . . . . . . . . . . . . . . 344.2.1. Payload data word = 12 bits . . . . . . . . . . . . . . . . . . . 344.2.2. Payload data word = 24 bits & 36 bits . . . . . . . . . . . . . . 37

vii


5. Conclusion 39

A. Error Control Coding 41

B. Levinson-Durbin Algorithm 45

Bibliography 49

viii


List of Figures

1.1. Simplified block diagram of speech watermarking for air traffic control. 1

2.1. Spread spectrum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2. Spread spectrum: bit structure. . . . . . . . . . . . . . . . . . . . . . . 52.3. Spread spectrum system for bandpass channel. . . . . . . . . . . . . . 62.4. Example of QIM embedding. . . . . . . . . . . . . . . . . . . . . . . . 72.5. Dither modulation, an implementation of QIM. . . . . . . . . . . . . . 72.6. Echo embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.7. Frequency Masking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1. Schematic diagram of data embedding using spread spectrum. . . . . . 113.2. Schematic diagram of watermark extraction. . . . . . . . . . . . . . . . 113.3. BER over different low-pass cut-off frequencies. . . . . . . . . . . . . . 133.4. Bandwidth expansion of the LP-filter. . . . . . . . . . . . . . . . . . . 173.5. Bandwidth expansion by moving the poles of the LP-filter toward the

center of the unit circle. . . . . . . . . . . . . . . . . . . . . . . . . . . 183.6. Group delay of a the LP-filter of a speech segment, with and without

bandwidth expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.7. Channel transfer function. . . . . . . . . . . . . . . . . . . . . . . . . . 203.8. Bit structure of the watermark. . . . . . . . . . . . . . . . . . . . . . . 213.9. Output of matched filter. . . . . . . . . . . . . . . . . . . . . . . . . . 213.10. Schematic diagram of data embedding using spread spectrum. . . . . . 223.11. Schematic diagram of reconstruction of data from speech and spread

spectrum signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1. BER before error correction. . . . . . . . . . . . . . . . . . . . . . . . . 244.2. BER after error correction, 12 bit message. . . . . . . . . . . . . . . . 254.3. MER over all received blocks, 12 bit message. . . . . . . . . . . . . . . 264.4. MER of the confident results and the percentage of occurrence, 12 bit

message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.5. BER after error correction, 24 bit message. . . . . . . . . . . . . . . . 284.6. MER over all received blocks, 24 bit message. . . . . . . . . . . . . . . 294.7. MER of the confident results and the percentage of occurrence, 24 bit

message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.8. BER after error correction, 36 bit message. . . . . . . . . . . . . . . . 314.9. MER over all received blocks, 36 bit message. . . . . . . . . . . . . . . 32

ix


4.10. MER of the confident results and the percentage of occurrence, 36 bitmessage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.11. BER before error correction. . . . . . . . . . . . . . . . . . . . . . . . . 344.12. BER after error correction. . . . . . . . . . . . . . . . . . . . . . . . . 354.13. MER over all received block, 12 bit message. . . . . . . . . . . . . . . 354.14. MER of the confident results and the percentage of occurrence, 12 bit

message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.15. MER of the confident results and the percentage of occurrence, 24 bit

message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.16. MER of the confident results and the percentage of occurrence , 36 bit

message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

A.1. Linear block code structure. . . . . . . . . . . . . . . . . . . . . . . . . 42

B.1. Block diagram of a lattice filter. . . . . . . . . . . . . . . . . . . . . . . 47

x


List of Tables

2.1. Differences between CD-audio and speech watermarking. . . . . . . . . 92.2. Comparison of different published watermarking schemes. . . . . . . . 10

3.1. BCH code for 12 / 24 / 36 bits data words. . . . . . . . . . . . . . . . 12

xi


Page intentionally left blank.

xii


Chapter 1.

Introduction

1.1. Motivation

Since voice communication between pilot and air traffic controller is done over ananalog VHF channel, which is often subject to distortion, ambiguities of the spokenmessage can arise. One crucial information is the correct identification of the com-munication partner. Incorrect identification can lead to wrong orders, which canhave fatal consequences. Therefore, the reliable transmission of identification data ofthe airplane can improve air traffic security. Speech watermarking is a possibility touse the existing voice communication channel to transmit this additional informationwhile keeping the interference with the speech signal at a very low level. A verysimplified block diagram of the system is presented in fig. 1.1

watermarkingspeech

data bitsdecoder

data bits

speech

Figure 1.1.: Simplified block diagram of speech watermarking for air traffic control.

1.2. Outline of the Desired System

For air traffic control communication between pilot and controller, the additional datashould be embedded into the voice frequency band. The data should not disturb thecommunication and still be reliably transmitted.

Data Rate: The minimal data rate is 12 bits in less than 1 second. Optimum is 36bits in less than 1 second.

1


Reliability: Possibly wrong data-blocks should be detected and discarded with anerror rate � 10−4

Audibility: The data stream should not disturb the voice communication.

Allowed Noise Level: The minimal allowed SNR for the voice channel is 14dB inthe USA. For Europe no standardization exists. So the noise created by thewatermark signal should be 17dB below the speech signal, to be on the saveside.

Delay: The delay should be kept < 20ms.

2


Chapter 2.

Review of the State-of-the-Art ofWatermarking

For over a decade, starting with image processing, some effort has been put intothe embedding of additional data into a host signal mainly for copyright protectionpurposes. At the beginning, watermarking techniques were primarily developed fordigital images and video, interest in audio started slightly later [4]. In the last yearsseveral techniques for audio watermarking have been developed or adopted from imageprocessing approaches. This section gives an overview of some current techniques,which are of interest for speech watermarking.

• Direct Sequence Spread Spectrum [21]

• Echo Hiding [13]

• Phase Embedding [4]

• Quantization Index Modulation [5, 6]

• Frequency Masking [28]

The influence of the watermark on the host signal can be classified into differentcategories. Depending on the application, a different level of audibility is allowed.

No perceivable difference. This would be required for high quality CD–audio. Noperceivable degradation can be permitted in this case. Even in an AB-comparisonthe listener should not be able to distinguish the original from the watermarkedsignal.

Perceivable, but no noticeable difference. For some applications it is sufficient, thatthe listener does not notice that a modification of the original signal has beenmade. Though he could hear the distortion when it is brought to his attention.

Noticeable difference, but no degradation. The lowest requirement is that the wa-termark signal can be clearly noticed, but the host signal should not be degradedso that e.g. speech intelligibility does not suffer from the modification.

3


Depending on the availability of the watermark at the receiver side the recognitiontask has a different degree of difficulty.

Original signal is available: For some applications, the original signal is available atthe decoder. This simplifies the detection of a watermark. However in mostcases the original signal is not available for comparison.

Detection of a known watermark: Sometimes the decoder only has to check the ex-istence of a known watermark. In this case the decision is a yes/no answer.

Recognition of an arbitrary watermark: In this case the watermark can be any dataword of a given dimension. Not only the existence of the watermark, but alsothe watermark content has to be retrieved.

2.1. Security Aspects

Since copyright protection is the main application for watermarking, security is anissue of high priority. The removal or modification of the watermark has to be pro-hibited or only made possible at the cost of severe degradation of the host signal. Atthis stage of the project, such security aspects are of reduced priority, but they canbe incorporated in the future.

2.2. Direct Sequence–Spread Spectrum Watermarking(DS-SS)

Initially, spread-spectrum technology was developed for military communication sys-tems to achieve highly jamming resistant communication systems. The technology iscalled spread spectrum because the transmission bandwidth employed is much greaterthan the minimum bandwidth required to transmit the information (fig. 2.1, [26]).Spreading is accomplished by means of a spreading signal, which is independent fromthe data. Despreading at the receiver is done by the correlation of the received signalwith a synchronized replica of the spreading signal used to spread the information.Due to the jamming resistance of spread spectrum, this technology is a popular ap-proach for watermarking.

Principle of Spread Spectrum Modulation.

In figure 2.3 a block diagram of spread spectrum modulation is presented. UsuallyBinary Phase Shift Keying is used for bandpass modulation. In case of basebandmodulation this step is not necessary.

wPSK [n] = d[n] cos(w0n), (2.1)

where d[n] is a binary data sequence, which stays constant over a symbol interval.

4


f

f

a) Original Signal

b) Spreaded Signal

Figure 2.1.: Spread spectrum.

symbol interval

chip interval

d1 d2 d2 d5d4

cn

Figure 2.2.: Spread spectrum: bit structure.

wPSK [n] is then spreaded with a pseudo-random spreading sequence c[n] ∈ {±1}.Each element of ci is usually called a chip (see fig. 2.3a).

w[n] = wPSK [n]c[n] = d[n]c[n] cos[w0n]. (2.2)

The watermarked signal y is created by

y[n] = x[n] + λw[n], (2.3)

where λ is the amplitude of the watermark and x[n] is the host signal.On the decoder side the incoming signal r[n] is then correlated with the spreading

sequence to despread the signal (fig. 2.3 b). The spreading sequence is required to beperfectly synchronized.

r[n] = d[n− Td]c[n− Td] cos[w0(n− Td) + φ]c[n− Td]. (2.4)

Synchronization is achieved, if Td = Td. Since c[n]c[n] = 1 the original watermarksignal is reconstructed (exept for a random phase φ and a delay Td).

5


BPSK dataModulator

wPSK [n] = d[n] cos(ω0n)

c[n]

w[n] = d[n]c[n] cos(ω0n)

BPSK datademodulator

c[n − Td]

r[n] = Ad[n − Td]c[n − Td] cos[ω0[n − Td] + Φ]

d[n]

A′d[n − Td]

a) Spread Spectrum Modulator

b) Spread Spectrum Demodulator

cos(ω0n)

Figure 2.3.: Spread spectrum system for bandpass channel: a) modulator. b) demodu-lator.

By the despreading operation the speech signal is spread over the whole frequencyband, thus reducing the energy density of the host signal, which is the basis for thejamming resistance of the spread spectrum techniques.

BPSK demodulation returns the reconstructed input data sequence.

Watermarking Application

For watermarking the spread spectrum approach has some interesting properties. Thehost signal is treated as interference, which is added on the channel. The pseudo-random-noise sequence is perceived as white noise which, to a certain extent, is presentin every speech signal. Therfore, the perceptual influence is much less than e.g. for aharmonic modulation distortion. Futhermore, the jamming resistance is of importancesuch that the watermark can have negative SNR (where the data signal would be thewatermark and the noise would be the host signal).

2.3. Quantization Index Modulation (QIM)

Chen and Wornell [5, 6] have proposed another watermarking method based on LowBit Modulation (LBM). For LBM the least significant bit of each sample in a signal isreplaced with the embedded information [4]. Ideally the embedding capacity equals

6


the sampling frequency, so for a sampled sequence with fs = 8kHz, 8kb/s can theor-etically be embedded in the signal. An obvious, major disadvantage of this methodis its poor robustness to channel noise.

For QIM the embedding is done by using different quantizers for corresponding bits(Fig. 2.4).

s(.,1) s(.,2) s(.,1) s(.,1)s(.,2) s(.,2)

dmin

Figure 2.4.: Example of QIM embedding. o ... used quantizer for the ”0” bit, x ... usesquantizer for the ”1” bit. dmin measures the robustness to perturbationsand the distortion.

In contrast to spread spectrum watermarking, where the the host signal is an in-terference to the embedded signal, the host signal is no interference for QIM. In casethe channel distortion stays below the distance between the quantizers dmin, QIM isa optimal approach for watermarking. Figure 2.5 shows an implementation of QIM[6], which uses a dither signal to switch between the quantizer levels.

bi

x[n]q(·)

selectdither

d[n; bi]

s[n]

Figure 2.5.: Dither modulation, an implementation of QIM.

2.4. Echo Embedding

Echo embedding encodes the watermark data in an echo of the signal where differentdelay-times correspond to different bits [13]. At the receiver, the corresponding delay-time can be detected using autocorrelation. Echo embedding has the advantage thatecho is a very common phenomenon in voice communication, since the acoustic roomimpulse response (reverberation) very often has an audible impact on the perceivedvoice signal.

7


(A) "ONE" KERNEL (B) "ZERO" KERNEL

δ1 δ0t t

Figure 2.6.: Echo embedding. Delay time corresponds to embedded bits.

2.5. Phase Modulation

To a certain extent, modifications of the phase of a signal cannot be perceived by thehuman auditory system. Therefore it is obvious to try to embed information into thephase of a signal. This can be done by modifying the phase of the discrete Fouriertransform (DFT) of segments of a signal [4] or alternatively using all-pass filters [30] tomodify the phase without changing the amplitude of the host signal in the frequencydomain. According to [12], phase coding compared to spread spectrum watermarkingperforms not very well on an AWGN channel with low SNR.

2.6. Frequency Masking

Frequency Masking is a psycho-acoustical phenomenon, which is widely used for audiocompression [22]. Subjective tests showed that a loud frequency component masksneighboring components below a certain masking threshold (fig. 2.7). In audio codingthis information is used for signal compression, since signal components, which cannotbe heard, need not be transmitted. For watermarking this property can be used toadd additional signals which should not be perceived.

2.7. Speech Watermarking

Watermarking for speech signals is different than the usual audio watermarking dueto the much narrower signal bandwidth. Compared to the 44.1kHz sampling rate forCD-audio, telephony speech is usually sampled at 8kHz. Therefore, less informationcan be embedded in the signal. For perceputal hiding usually the masking levelshave to be calculated. The common algorithms used are optimized for CD-audiobandwidth and are computationally very expensive. Therefore Cheng et al. [7] useLinear Prediction Coefficients (LPC) for the calculation of the masking levels (seesection 3.4.2).

8


f/Hz

A

Masking threshold

Figure 2.7.: Frequency Masking. Strong frequency components (red) mask weakersignal components (blue) below a masking theshold.

Another difference is the expected channel noise. For CD-audio the channel noiseis usually rather low. The audio signal looses its commercial value if the channelnoise goes beyond a certain threshold. On the other hand the distortion allowed fora watermark signal is very low as well. Both artist and consumer would not trade ina reduced recording quality for better copyright protection. Speech on the other sideis very often transmitted over noisy channels, in particular true for air traffic controlvoice communication. On the one hand, the channel noise is a disadvantage, on theother hand this allows much more power for the watermark signal, since the channelnoise will cover it anyway. The listener expects a certain amount of noise in the signal.A summary of the differences can be seen in table 2.1.

CD-Audio SpeechWatermarking Watermarking

channel noise very low can be highbandwidth wideband (20kHz) narrowband (< 4kHz)allowed not perceivable lowdistortion

Table 2.1.: Differences between CD-audio and speech watermarking.

Due to those differences it is hard to compare results from CD-audio watermarkingwith speech watermarking. However, the next section will be dedicated to comparingsome of the results of published audio watermark algorithms.

9


2.8. Performance Comparisons of Different Algorithms

It is difficult to compare the different algorithms since no general evaluation criterionexists. Gordy and Bruton [12] tried a comparison of five different algorithms (echocoding, phase coding, direct sequence - spread spectrum (DS-SS), frequency hoping -spread spectrum (FH-SS) and frequency masking) with respect to bit error rate, signalto watermark rate, perceptual quality, compuational complexity and robustness tosignal processing. The frequency masking algorithm was more robust than the otheralgorithms at the cost of high computational complexity, since for every frame theperceptual masking analysis has to be performed. DS-SS was computationally veryeffective, though it is not very robust to signal processing attacks.

A rather difficult endeavor is a dircet comparison of performance results. In table2.2 different published results of audio watermarking algorithms are compared. Sinceusually only very little information about test parameters is given, those results areonly of limited use and should be treated with care.

Message Rate Error Rate Bandwidth Method - Application Reference[bits/s] [%] [kHz]

47.2 5.9 20 Spread spectrum - speech [21]150 70 6 Phase modulation - speech [11]28 1.7 20 SS derivation - music [27]4 ? 20 DS-SS [4]1 10−4 20 [19]

800 26 4 DS-SS [7]

Table 2.2.: Comparison of different published watermarking schemes.

Comparison of QIM and DS-SS.

Perez-Gonzalez et al. [23] compared spread spectrum and quantization index modula-tion and tried to do objective performance analysis. One result was that, at a certainnoise level, the QIM suffers of a rapid decrease of performance, where for the spreadspectrum approach the error probability increases as a rather smooth function of thenoise level (”graceful degradation”).

10


Chapter 3.

System Description

3.1. Chosen Approach

After study of the literature, an approach has been chosen which combines the spreadspectrum approach with a simplified frequency masking. Spread spectrum was chosen,since it is well studied and seems rather robust against channel noise. The simplifiedblock diagram for the encoder can be seen in figure 3.1. The system can be split intothree main parts. First the error control coding, which performs channel coding toget increased reliability of the results. Next is the spreading of the watermark signal,over the available frequency band. Finally the watermark is embedded into the speechsignal using perceptual considerations.

Error Coding

WatermarkGeneration

WatermarkEmbedding

Data

SpeechWatermarked

Speech

Figure 3.1.: Schematic diagram of data embedding using spread spectrum.

At the decoder side the watermark data has to be extracted from the speech signal(fig. 3.2). First a whitening filter is employed to undo the frequency masking spectralshaping. The the signal has to be synchronized to perform the watermark extraction.The error correction algorithm uses the added redundancy to correct errors made onthe channel.

WhiteningFilter Synchronization

WatermarkedSpeech

Speech

WatermarkExtraction

DataErrorCorrection

Figure 3.2.: Schematic diagram of watermark extraction.

11


3.2. Error Control Coding

For error control coding a BCH-Code was used. BCH-Codes are cyclic linear blockcodes and allow a large selection of block length, code rates, alphabet sizes and errorcorrection capability (For more information on error control coding see appendix Aand [26]).

The BCH decoder provides information about the number of bit errors in a receivedblock. If this number is within the correction capabilities of the code, if will becorrected. If the detected errors are more than the correction capability of the codeallows code only offers the information about the number of errors, without a reliableerror correction. In this case the received data cannot be trusted and has to bediscarded. In table 3.1 for three different datawords the number of codewords, parity-bits and the correction capability for the BCH-code is presented. Note that theBCH-code does not allow 12 bit data words, so it has to be split into two 6 bit datawords.

data-bits code-bits parity-bits correctionn k (n-k) capability t

12 bits 62 bits 50 bits 14 bits= 2*6 bits = 2*31 bits = 2*25 bits = 2*7 bits

24 bits 63 bits 39 bits 7 bits36 bits 63 bits 27 bits 5 bits

Table 3.1.: BCH code for 12 / 24 / 36 bits data words.

3.3. Data Spreading

The chosen approach uses spread spectrum technology to spread the data over theavailable bandwidth. The signal is spread by modulation with a white pseudo-noisebinary sequence. The length of the sequence equals the number of samples used forone data word. In case of 8 kHz sampling rate and a symbol rate of 100 bit/s and atotal data word size of 80 bits (including 62 (63) code bits plus extra synchronizationbits) this means a spreading sequence length of 6400 samples. In [17] it is claimedthat the spectrum of the watermark signal should be close to the host signal spectrumto achieve robustness for synchronization. To match the channel bandwidth, the PN-sequence is low-pass filtered with an FIR minimum phase filter and then re-quantizedto receive a binary sequence.

Filtering of binary sequences

When filtering binary sequences with FIR filters, an interesting effect has been ob-served. When using an even filter order, after requantizing the filtered amplitueds to

12


{−1, 1}, the sequence was white again. Only when using an odd filter order the signalhad still low-pass characteristic after quantizing.

y[n] = sign(x[n] ∗ h[n]), (3.1)

where x[n] ∈ {−1, 1}. However, if we use continuous normal distributed randomnumber sequences as input x[n], the result does not dependent on whether the filterorder is even or odd. The filter characteristic remains similar, though the stop-bandis not attenuated as much as in the continuous result.

Two possibilities were tested. First a simple FIR low-pass filter was used to shapethe watermark. This improved the BER significantly. Various other approaches, suchas a filter that shapes the watermark like an average speech signal were not thatsuccessful. To get the lowest BER a series of simulations were run for different low-pass cut-off frequencies. The channel was modeled with a passband from 0.3-3.1 kHzand and stopband from 1.89 - 2.19 kHz. The channel BER was measured and can beseen in figure 3.3.

3 3.2 3.4 3.6 3.8 47

8

9

10

11

12

13

fc [kHz]

BE

R [%

]

Figure 3.3.: BER over different low-pass cut-off frequencies ( fc = 4kHz means nofilter).

The cut-off frequency of the low-pass was set to 3.4kHz, which achieved the lowestbit error rates, compared with BERs of unfiltered spreading sequences and differentfilters.

13


3.4. Data Embedding

The spread spectrum watermark signal should next be embedded into the speechsignal. Simply adding the watermark signal would result in a high interference fromthe host signal depending on the widely changing speech power (equ. 2.3). Thisdistortion can be better controlled by using masking techniques both in terms oftemporal energy and spectral shape. The goal of the data embedding is to put thewatermark in the speech signal with maximum energy but the least possible perceptualdistortion. Many watermarking algorithms use psychoacoustic knowledge about thehuman auditory system to minimize the distortion caused by the watermark [10, 8,14, 16].

3.4.1. Energy Shaping

The host interference can be compensated by modulating the energy of the insertedwatermark, in [19] this is called improved spread spectrum. In [7] the energy ofthe watermark is modulated in dependence of the energy of the host signal. Thewatermark amplitude is not a constant anymore, but changes depending on the speechenergy.

y[n] = x[n] + µ(x, λ)w[n] (3.2)

Since, in speech communication over an AM-Channel, the users expect a certainamount of noise anyway, a minimum level of watermark energy is always maintainedat the usual noise level. If speech is present in the signal the watermark gain is thenadjusted to the actual signal level. This prevents too low watermark levels in case ofsilence, and improves the performance considerably. When the pilot presses the ’pushto talk’ button without speaking, the message can still be transmitted with very lowerror probability, since the watermark energy is high compared to the backgroundsignal energy. This leads to

y[n] = x[n] + max(µ(x, λ), µmin)w[n], (3.3)

where µmin is the minimal maintained watermark level.

3.4.2. Spectral Shaping

In addition the spectrum of the watermark can be spectrally shaped to be similarto the speech signal. This decreases the perceptual distortion of the watermarkedsignal. This is usually done by calculating frequency masking thresholds for thesignal. However, this is computationally very expensive and, therefore, not usablein real-time with limited computer power. Additionally, those models are developedfor compression of wideband audio, which is quite different to a narrow-band speechchannel. In wide-band audio the frequency spectrum is very often sparsely populated,whereas the speech spectrum is densely filled.

14


An alternative is an effective production model for speech, linear prediction analysis.It is a well studied method to model the characteristics of human speech production.It was also used in [7] to achieve a simplified frequency masking.

Linear Prediction

Linear Prediction is widely used to estimate the spectrum of speech signals. It is anauto regressive (AR) model and thus has only significant poles and all zeros at theorigin of the z-plane. The AR model is mainly used, because the parameters an canreadily be determined by solving a linear set of equations [18]. The method is calledlinear prediction because the model assumes that future samples can be predicted bya linear combination of past samples:

x[n] =p∑

k=1

akx[n− k] + e[n], (3.4)

where e[n] is the innovation input to the filter.For analysis purposes the input is usually not available, so the equation reduces to

x[n] =p∑

k=1

akx[n− k], (3.5)

where x[n] is the predicted sample. Consequently, the prediction error is

e[n] = x[n]− x[n] = x[n]−p∑

k=1

akx[n− k] (3.6)

In the frequency domain, the transfer function corresponding to (equ. 3.4) is:

HAR(ejω) =1

1−∑M

k=1 ake−jω=

1A(ejω)

(3.7)

A(ejω) = 1−M∑

k=1

ake−jω (3.8)

Because a speech signal is only a quasi-stationary signal, (i.e. stationary overintervals of about 10 − 20ms) the LP-Analysis is done for overlapping frames of thesignal. The signal is windowed using a hamming window:

xm[n] = x[m + n]w[n],

where

w[n] ={

(1− 0.54)− 0.46 cos(2πn/N) 0 ≤ n ≤ N0 otherwise

(3.9)

15


and the index m denotes the number of the frame.The prediction coefficients are determined by minimizing the mean squared error

between the predicted and the current samples for a frame of the speech signal.

Em =∑

n

e2m[n] =

∑n

(xm[n]− x[n])2 =∑

n

(xm[n]−

p∑k=1

akx[n− k]

)2

(3.10)

The error Em is minimized by setting

∂Em

∂ai= 0, 1 ≤ i ≤ p (3.11)

This givesp∑

k=1

ak

∑n

xm[n− k]xm[n− i] = −∑

n

xm[n]xm[n− k] (3.12)

This gives a set of p equations, known as the Yule-Walker equations, which can besolved for ak [15]:

r(0) r(1) . . . r(p− 1)r(1) r(0) . . . r(p− 2)

......

. . ....

r(p− 1) r(p− 2) . . . r(0)

a1

a2...

ap

=

r(1)r(2)

...r(p)

(3.13)

In compact form:

Ra = r (3.14)

where the autocorrelation sequence r(k) is defined as:

r(k) =p−1+k∑n=0

xm[n]xm[n + k] (3.15)

The solution for ak (assuming an invertible correlation matrix R) can be writen as:

a = R−1r (3.16)

To avoid the direct matrix inversion, the Levinson-Durbin algorithm is a very ef-ficient method to calculate the LP coefficients. For a detailed description of thealgorithm refer to appendix B.

16


Bandwidth expansion

The Linear Prediction coefficients of the speech signal are calculated framewise. Thespectral peaks of the vocal tract (formants) can be quite narrow. The perceptuallyimportant part for the intelligibility of the speech utterance is in the energy richregions of the spectral peaks. Distortions in this regions would deteriorate the speechquality more than distortions in low energy regions. Due to frequency masking thebandwidth of the watermark signal can be broadened [29, p. 283] (see fig. 3.4), sothat some of the energy of the watermark signal is transferred from the formants tothe regions in between.

This is achieved by moving the poles of the all-pole filter toward the center of theunit circle (see fig. 3.5).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−15

−10

−5

0

5

10

15

Normalized Frequency (×π rad/sample)

Mag

nitu

de (

dB)

Figure 3.4.: Bandwidth expansion of the LP-filter (— original transfer function - - -transfer function after bandwidth expansion).

A parameter γ adjusts the LP-Coefficents.

a′k = akγk 0 ≤ k ≤ order

where γ may be chosen between 0 and 1 to ensure stability. Our watermarkingalgorithm uses γ = 0.9 for a slight broadening of the spectral peaks.

After the modification of the coefficients, the watermark signal is all-pole filtered(IIR) with the coefficients on a frame-by-frame basis.

Consideration of group delay

After spectral shaping of the watermark the minimum watermark signal and the spec-trally shaped signal are added together, so concerning the power spectral density ofthe watermark signal a minimum level is maintained. The LPC filtering introducessome groupdelay, which is not linear and of course the unmodified minimum water-mark signal has no delay. So the adding of the two signals has to be done carefully.

17


−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

8

Real Part

Imag

inar

y P

art

8

Figure 3.5.: Bandwidth expansion by moving the poles of the LP-filter toward thecenter of the unit circle.

18


Simulations showed that because of the bandwidth expansion the group delay wasreduced to around one sample (fig. 3.6). Other approaches such as trying to approx-imate the group delay of the LPC signal with an allpass filter were not sucessful, sinceno fast methods exists to design a simple approximation filter for a given group delay.Consequently the minimum watermark signal is added with a delay of one sample.

y[n] = x[n] + (µ(x, λ)w[n]) ∗ h[n, x] + µminw[n− 1], (3.17)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4

−3

−2

−1

0

1

2

3

4

5

6

Normalized Frequency (×π rad/sample)

Gro

up d

elay

(sa

mpl

es)

no modificationbandwidth expansion

Figure 3.6.: Group delay of a the LP-filter of a speech segment, with and withoutbandwidth expansion.

3.5. AM-Channel

The AM-Channel was simulated as an additive white gaussian noise channel (AWGN)with slow amplitude fading, which is a very simple model. The bandwidth was limitedto 0.3 - 3.1 kHz and a stopband between 1.89 and 2.19 kHz (see fig. 3.7).

More complex models exist [9], but were not available for this research. Imple-mentation of such a channel model including a realistic selection for its numerousparameters would have been far beyond the scope of this study.

19


0 500 1000 1500 2000 2500 3000 3500 4000−1500

−1000

−500

0

Frequency (Hz)

Pha

se (

degr

ees)

0 500 1000 1500 2000 2500 3000 3500 4000−60

−40

−20

0

Frequency (Hz)

Mag

nitu

de (

dB)

Figure 3.7.: Channel transfer function. top: magnitude. bottom: phase.

3.6. Whitening

At the receiver the incoming signal is first of all whitened to undo the spectral shapingprocess of the transmitter side. Analysis of the signal is again done using linearprediction analysis (see 3.4.2). This time the inverse of the LP-filter is used to calulatethe error signal, therefore it is an FIR - filter.

y[n] = x[n]−p∑

i=1

aix[n− i] (3.18)

The bandwidth expansion is of course performed for the whitening filter, too. Thistime it is actually the filter zeros that are moved toward the center of the unit circle.

3.7. Synchronization

At the receiver side, a major issue is the synchronization of the watermarking sig-nal. Since it is not known when the watermark signal starts it is crucial that thedecoder locks to the received watermark signal. If this cannot be accomplished thewhole block cannot be decoded. The subject of synchronization is well studied in thecommunications engineering literature. On the transmitter side a sequence, which isknow to the receiver, is added to the payload data (fig. 3.8).

Synchroniztion can then be implemented with a matched filter. Its impulse responseis the reverse of the spreaded synchronization sequence (i.e. the first 1152 sample ofone data block, in case of a 100 bits/s channel rate). The output of the filter is a

20


1 sec = 8000 samples = 100 data bits

18 synchonization bits data bits = 2 * 6 payload bits + 2* 25 parity bits = 62 bits

80 data bits = 6400 samples = 0.8 sec

spreading sequence = 6400 samples

1152 samples 5248 samples

1 data bit

80 spreading samples

Figure 3.8.: Bit structure of the watermark.

maximum, when the synchronization sequence was fully fed into the filter. In figure3.9 the output of the matched filter for about 20 data blocks is drawn.

2 4 6 8 10 12 14 16

x 104

0

0.5

1

1.5

2

x 105 Output of Matched filter

time/samples

Figure 3.9.: Output of matched filter.

3.8. Data De-Spreading

When the exact location of the watermark signal is know it can be de-spreaded bymultiplying the signal with the sequence used for spreading at the transmitter. Thisworks only when the watermark signal and the spreading sequence are exactly syn-chronized, multiplication with only one chip difference does not despread the signal.

3.9. Decoding

Decoding is done using a simple integrator. Since the decoder knows the positionsand the length of the data bits, integrating over the period of one data bit gives theresult, when quantized to {1,−1} .

y[i] = sign

(bit−length∑

n=0

x[n]i

)(3.19)

21


where i is the current bit interval. At this stage the downsampling from the signalrate to the binary symbol rate is performed as well.

3.10. Overview of the System

Finally two block diagrams are given. One for the transmitter (fig. 3.10), the otherfor the receiver side (fig. 3.11).

Channel coding

(BCH-Code)

Time variantfilter (IIR)

LPC-Analysis

syncsequence

MUX

8

data @ fd

speech @ fs

Gain fordesired

SNR

+

+

fsfd

Inter-polate

Gain

spreadingsequence

Lowpass(0 - 3600Hz)

watermarkedsignal

signum

Delay

Bandwidthexpansion

8

Figure 3.10.: Schematic diagram of data embedding using spread spectrum (fd ...datarate, fs ... sampling rate).

Channel decoding

(BCH-Code)

Time variantwhitening

filter

LPC-Analysisspreadingsequence

syncsequencespreaded

Matchedfilter

8

receivedsignal

speech

Threshold

data

confidenceIntegrator fsfd

Bandpass(100-3600Hz)

Figure 3.11.: Schematic diagram of reconstruction of data from speech and spreadspectrum signal (fd ...data rate, fs ... sampling rate).

22


Chapter 4.

Performance Analysis

To evaluate the performance of the proposed watermarking algorithm a series of simu-lations were carried out. For the characterization of the algorithm different error rateresults, characterizing different levels of system performance, and perceptual qualitywere evaluated:

Channel BER: The watermark channel BER, before doing Error Correction

Message BER: The message BER, after doing Error Correction, only concerning thepayload data.

Message Error Rate (MER): The error rate concerning whole message blocks (i.e.12, 24, 36 bits).

MER of confident results: MER taking only confident results (This is where theBCH-decoder assumes that the number of bit errors is within the correctioncapability of the code).

Occurence of confident results: The percentage of occurrence of confident results isalso evaluated. This is important, since even error-free confident results, whichonly occur for 10% of the transmitted data-blocks are not usable in practice.

Perceptual quality: This was not evaluated systematically, but informally only. Aselection of watermarked signals with different watermark power levels is avail-able. Extensive Mean Opinion Score (MOS) tests would have been necessary toclassify the perceptual results, but those test are beyond the scope of this studyand should be carried out with the intended user group.

The following parameters, which are of interest for an implemented system were var-ied:

Data rate: The watermark channel data rate (80, 100, 160 bits/sec).

Message size: The aircraft identification tag or payload size (12, 24, 36 bits).

Watermark floor: The effective minimum watermark energy in the signal (-16 to -32dB).

23


Signal to Watermark Ratio (SWR): The ratio by which the watermark is attenu-ated relative to the speech signal (12dB to 28dB).

4.1. Channel data rate = 80bits/sec

The first test series was carried out with a watermark channel-rate of 80 bits/sec,which means that a complete data-block is transmitted in one second. The watermarkfloor and the SWR were varied. The simulations were carried out for 12, 24 and 36bits message size. A data block was 80 bits including 18 synchronization bits and 50/ 38 /26 parity bits for 12 / 24 /36 bits. Those simulations were carried out using aspeech signal size of 1000 sec. So 1000 messages were transmitted. The channel wassimulated with additive white gaussian noise with an SNR of 25dB.

12 14 16 18 20 22 24 26 280

1

2

3

4

5

6

7

8

9

10

SWR [dB]

brut

to −

BE

R [%

]

Watermark floor −22dB −24dB −26dB −28dB −30dB−32dB

Figure 4.1.: BER before error correction.

The channel BER is clearly dependent on both the watermark floor and the SWR.The error rate was under 10 % (see fig. 4.1).

4.1.1. Payload data word = 12 bits

For a message size of 12 bits, the message BER could be reduced to clearly smaller than1% for almost all chosen SWR and watermark floor values (see fig. 4.2). Translatingthe BER to an MER, the errors rose (see fig. 4.3), so a confidence information isvery important. The BCH-code provides this confidence information. Using only theconfident results, the transmission is error free for all watermark levels and SWRs,

24


with the exception of the watermark floors of -30 and -32dB at an SWR of 24dB (seefig. 4.4).

12 14 16 18 20 22 24 26 280

0.5

1

1.5

2

2.5

SWR [dB]

netto

− B

ER

[%]

k=12 bits


Figure 4.2.: BER after error correction, 12 bit message.

25


12 14 16 18 20 22 24 26 280

1

2

3

4

5

6

7

8

9

10

SWR [dB]

erro

r of

rec

eive

d bl

ocks

(M

ER

) [%

]

k=12 bits

Watermark floor

−22dB −24dB −26dB −28dB −30dB−32dB

Figure 4.3.: MER over all received blocks, 12 bit message.

26


12 14 16 18 20 22 24 26 280

0.05

0.1

0.15

0.2

SWR [dB]

ME

R [%

]

k=12 bits − Confident results

Watermark floor

−22dB −24dB −26dB −28dB −30dB−32dB

12 14 16 18 20 22 24 26 2840

50

60

70

80

90

100

SWR [dB]

Occ

urre

nce

[%]



Figure 4.4.: MER of the confident results and the percentage of occurrence, 12 bitmessage.

27



For 24 bits the error rate after BCH-error correction was roughly twice the 12 bit errorrate (see fig. 4.5). The confident results still show almost no errors, though at slightlyreduced occurrence probability. That means more message blocks are supressed dueto probably false decoded messages (see fig. 4.4).

12 14 16 18 20 22 24 26 280

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

SWR [dB]

netto

− B

ER

[%]

k=24 bits

Watermark floor −22dB −24dB −26dB −28dB −30dB −32dB


28


12 14 16 18 20 22 24 26 280

1

2

3

4

5

6

7

8

9

10

SWR [dB]

erro

r of

rec

eive

d bl

ocks

(M

ER

) [%

]

k=24 bits



29


12 14 16 18 20 22 24 26 280

0.1

0.2

0.3

0.4

0.5

SWR [dB]

ME

R [%

]


Watermark floor

−22dB −24dB −26dB −28dB −30dB −32dB

12 14 16 18 20 22 24 26 280

20

40

60

80

100

SWR [dB]

Occ

urre

nce

[%]




30



For a message size of 36 bits, the error rates rise considerably. The confident results arestill at a low error level, but the occurrence probability is now down below 80%. Thatmeans less than 4 out of 5 received messages will be usable, which is not desirable.

12 14 16 18 20 22 24 26 280

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

SWR [dB]

netto

− B

ER

[%]

k=36 bits



31


12 14 16 18 20 22 24 26 280

1

2

3

4

5

6

7

8

9

10

SWR [dB]

erro

r of

rec

eive

d bl

ocks

(M

ER

) [%

]

k=36 bits



32


12 14 16 18 20 22 24 26 280

1

2

3

4

SWR [dB]

ME

R [%

]


Watermark floor

−22dB −24dB −26dB −28dB −30dB −32dB

12 14 16 18 20 22 24 26 280

20

40

60

80

100

SWR [dB]

Occ

urre

nce

[%]



33


4.2. Channel data rate = 100 bits/sec

Further tests were carried out with a watermark channel-rate of 100 bits/sec. Thewatermark floor and the SWR were varied. The simulations were carried out for amessage size of 12, 24 and 36 bits. A data block was 80 bits including 18 synchroniz-ation bits and 50 / 38 /26 parity bits for 12 / 24 /36 bits. The entire data block istransmitted in 0.8 sec. Those simulations were carried out using a speech signal sizeof 1000 sec. Thus 1250 messages were transmitted. The channel was simulated withadditive white gaussian noise with an SNR of 20dB.

12 14 16 18 20 22 24 26 280

2

4

6

8

10

12

14

16

18

20

SWR [dB]

brut

to −

BE

R [%

]

Watermark floor

−16dB −18dB−20dB −22dB −24dB −26dB −28dB −30dB

Figure 4.11.: BER before error correction.

The channel BER again is clearly dependend on both the watermark floor and theSWR. For most watermark levels, the error rate was under 10 % (see fig. 4.11).


For a message size of 12 bits, the netto BER could be reduced to clearly less than1% for almost all chosen SWR and watermark floor values (see fig. 4.12). Whenevaluating the MER the errors rose (see fig. 4.13). Using only the confident results,the transmission is almost error-free for all watermark levels, but the watermark floorsof -30 and -32dB (see fig. 4.14). This means for confident results the error probabilityis ≤ 1 · 10−4. The occurrence probability of confident results is greater than 80% formost watermark levels.

34


12 14 16 18 20 22 24 26 280

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

SWR [dB]

netto

− B

ER

[%]

k=12 bits

Watermark floor


Figure 4.12.: BER after error correction.

12 14 16 18 20 22 24 26 280

1

2

3

4

5

6

7

8

9

10

SWR [dB]

erro

r of

rec

eive

d bl

ocks

(M

ER

) [%

]

k=12 bits

Watermark floor


Figure 4.13.: MER over all received block, 12 bit message.

35


12 14 16 18 20 22 24 26 280

0.1

0.2

0.3

0.4

0.5

SWR [dB]

ME

R [%

]



12 14 16 18 20 22 24 26 280

20

40

60

80

100

SWR [dB]

Occ

urre

nce

[%]


Watermark floor −16dB −18dB−20dB −22dB −24dB −26dB −28dB −30dB


36


4.2.2. Payload data word = 24 bits & 36 bits

For longer data words the occurrence of confident results decreases as the watermarkfloor decreases (fig. 4.15, 4.16). In this case reliable data trasmission can be achievedat the cost of higher distortion of the signal. For example the 36 bit transmission at awatermark floor of −20dB only yields confident results for 70−80% of the transmitteddata blocks.

12 14 16 18 20 22 24 260

0.1

0.2

0.3

0.4

0.5

SWR [dB]

ME

R [%

]


−16dB −18dB −20dB −22dB −24dB −26dB

12 14 16 18 20 22 24 2620

40

60

80

100

SWR [dB]

Occ

urre

nce

[%]


37


12 14 16 18 20 22 24 260

0.5

1

1.5

2

2.5

SWR [dB]

ME

R [%

]

k=36 bits − Confident resultsWatermark floor

−16dB −18dB −20dB −22dB −24dB −26dB

12 14 16 18 20 22 24 260

20

40

60

80

100

SWR [dB]

Occ

urre

nce

[%]

Figure 4.16.: MER of the confident results and the percentage of occurrence , 36 bitmessage.

38


Chapter 5.

Conclusion

An overview over current methods for audio watermarking has been given. Water-marking is a very young research topic, but has been very active in the past few years.This resulted in a lot of publications, and some special journal issues and LectureNote in Computer Science issues have been published which give a good overview ofthe field [1, 2, 3, 25, 20, 24].

Based on the study of literature and considerations of the application for the airtraffic control voice communication channel a specific approach has been developed.It combines a spread spectrum method and frequency masking using linear predictionspeech analysis to perceptually shape the watermark signal. To reduce the messageerror probability , error control coding (BCH-Code) was included in the system.

Extensive performance tests showed that the task is feasible within the given spe-cification for the system. For the transmission of 12 bits in less than 1 seconds errorrates were very low (≤ 0.1%) at a distortion level which is lower than the channelnoise. Therefore, the desired aircraft identification tag could be implemented as asimple add-on to the existing voice communication system without interfering withspeech transmission quality.

Outlook

With some more research effort higher data rates with lower error probability andlower perceptual distortion are clearly within reach, given the rapid developement inthe field.

Additional research effort can be put into further improvements of the watermarkingalgorithm.

Careful channel equalization at the receiver will also improve the data rates, sinceinter-symbol interference can be reduced. This was not done in this research, sinceno dynamic aeronautic channel model was available.

Since usual watermarking applications cover a lot of security aspects (copyrightenforcement), there is also a lot of potential of making data hiding algorithms resistantto misuse by a third party.

39



40


A. Error Control Coding

On a binary symmetric channel, where the alphabet set consists of binary elements (0and 1) the conditional probabilities P (x|y) to receive a symbol y given that symbolx was transmitted are symmetric:

P (0|1) = P (1|0) = p

andP (1|1) = P (0|0) = 1− p,

where p is the bit error rate.Since zero bit error rates are not achievable, one wants to improve the performance

of the transmission [26]. Error control coding is done to decrease the error-rate atthe cost of bandwidth, that is by adding redundancy to the data (parity bits). Thesimplest form of error control coding can only give information whether a sequencewas correctly received. In case of an error the receiver can either skip the result orrequest the information a second time.

For more sophisticated error codes the parity bits are designed in a way that allowsboth error detection and error correction. Of course not all errors can be corrected,above a certain error rate the code can only detect the error and is not able to correctit any more. This property is used as a way of classifying error control codes accordingto their error correction capabilities. Error control codes can be characterized by itsnumber of code-bits (n), data-bits (k) and parity-bits (n-k).

Linear Block Codes.

Linear block codes are an important class of binary error codes. Having k-bits mes-sages they form 2k distinct message sequences, called k-tuples, and n-bits messagesform 2n distinct sequences, called n-tuples. The encoding procedure maps the 2k

message k-tuples uniquely into a new set of 2k code-word n-tuples, and since it is alinear block code the mapping is linear. Usually the mapping is achieved by using alook-up table. The vector space Vn is spanned by the 2n n-tuples. If, and only if theset of 2k n-tuples is a linear subspace Vk of the vector space Vn of all n-tuples, thecode is called a linear block code.

Two main requirements for designing an efficient code are:

1. One goal is, of course, efficiency. The vector space Vn should be packed with asmany message-words as possible. That is, redundancy should be kept as smallas possible.

41


n n n n n n

n

n n n

n n

n n n

n

n

nn

n

nn

nn

n n

k

k k

k

Figure A.1.: Linear block code structure. The 2n n-tuples represent the entire vectorspace Vn. The 2k n-tuples represent the subspace of codewords.

2. The second goal is, that the message words should be as far apart from each otheras possible. If the code-word vectors are slightly corrupted during transmission,they can still be decoded correctly.

Since the before mentioned table lookup is not practicable, for large codeword sizes,it is of interest to reduce the complexity of generating the required code-words. TheVk is a k-dimensional subspace of the n-dimensional vector space Vn. Therefore itis possible to find a set of n-tuples, that can generate all the 2k codewords of thesubspace. This set is called the basis of the subspace. Any basis set of k linearlyindependet n-tuples V1, V2, . . . , Vk can be used to generate the required linearblock code vectors, since each code vector is a linear combination of V1, V2, . . . , Vk

. So each set of 2k codewords {U} can be described by

U = m1V1 + m2V2 + . . . + mkVk

where mi are the message digits and i = 1, . . . k.A (k × n) generator matrix is generally defined as

G =

V1

V2...

Vk

(A.1)

If we letm = m1,m2, . . . ,mk

the generation of the codeword can be written in matrix notation as:

U = mG

42


For systematic linear block codes, the codeword is made up of the message-bits andparity bits.

U = p1, p2, . . . , pn−k,m1,m2, . . . ,mk

The generator matrix then has the form

G =[

P Ik

](A.2)

where P is the (k×(n−k)) parity array portion and Ik is a k×k identity matrix. Nowa parity check matrix H can be defined, which will enable us to decode the receivedvector. The matrix H is orthogonal to the generator matrix, so that GHT = 0, wherethe matrix H is

H =[

In−k PT]

(A.3)

A vector U was generated by the matrix G , if and only if, when UHT = 0. Nowlet r = r1, r2, . . . , rn, be a received vector, resulting from the transmission of U andsome error e = e1, e2, . . . , en, then

r = U + e

Now we can calculate the syndrome of r, which is defined as,

S = rHT (A.4)= UHT + eHT (A.5)

Since UHT = 0,S = eHT (A.6)

Since, every correctable error pattern yields the same syndrome, such error patterncan be corrected. The combination of every possible codeword with every possiblecorrectable error pattern is called the standard array (e1 means no error).

U1 U2 . . . Ui . . . U2k

e2 U2 + e2 . . . Ui + e2 . . . U2k + e2...

......

...ei U2 + ei . . . Ui + ei . . . U2k + ei...

......

...e2n−k U2 + e2n−k . . . Ui + e2n−k . . . U2k + e2n−k

(A.7)

The procedure for error correction then is:

• Calculate the syndrome of r (S = rHT ).

• Locate the error pattern ej , whose syndrome equals rHT .

• The corrected received codeword can be calculated by U = r + ej

43


Cyclic Codes.

Cyclic codes are a subclass of linear block codes, where a circular shift of any validcode word results in another valid codeword. They are implemented using feedbackshift registers and syndrome calculation is done with similar feedback shift registers.The components of a codeword U are the coefficients of a polynomial U(X):

U(X) = u0 + u1X + u2X2 + . . . + un−1X

n−1 (A.8)

The cyclic property of the codeword is shown in the following. If U(X) is an (n− 1)-degree codeword polynomial, then

U(i) = XiU(X)modulo(Xn + 1) (A.9)

The code word is calculated using a generator matrix g(X) of the form

g(X) = g0 + g1X + g2X2 + . . . + gpX

p (A.10)

and the message polynomial is of the form

m(X) = m0 + m1X + m2X2 + . . . + mkX

k (A.11)

where k = n− p.The codeword polynomial is then

U(X) = m(X)g(X) (A.12)

At the receiver end of the channel the corrupted version of the codeword U(X)would then be

r(X) = U(X) + e(X) (A.13)

The syndrome is the remainder of the division of r(X) by g(X) so that

r(X) = q(X)g(X) + S(X) (A.14)

Since the syndrom depends on the error vector. A table lookup can be used todetermine and correct the errors.

BCH-Code

The BCH Code is a cyclic code and allows multiple error correction (The Reed-Solomon code is a generalization of the BCH-code and allows non binary alphabets).They allow a large selection of block length, code rates, alphabet sizes and error cor-rection capability ([26]). In case the number of bit error is beyond the error correctioncapability the code can still do error detection.

44


B. Levinson-Durbin Algorithm

The Levinson-Durbin algorithm is a recursive method, which avoids the matrix inver-sion of the correlation matrix. It calculates the solution of a(p) of a set of equations oforder p, based of the known solution a(p−1) of a set of equations Ra = −r of order p-1.Starting with the trivial solution for p = 0 it is possible to find the desired predictorof order n with low computational cost [29].

We start looking at the transition of p − 1 = 2 to p = 3. For simplification theprediction coefficients are substituted by

α(p)i

.= −api ; i ∈ {1, 2, . . . p}

Using equation (3.13) we can say that r1

r2

r3

+

r0 r1 r2

r1 r0 r1

r2 r1 r0

α31

α32

α33

=

000

. (B.1)

or r1 r0 r1 r2

r2 r1 r0 r1

r3 r2 r1 r0

1α3

1

α32

α33

=

000

. (B.2)

The short-time energy of the prediction error is

E{d2(k)} = r0 +3∑

i=1

α(3)i ri := E(3).

We can extend (B.2) byr0 r1 r2 r3

r1 r0 r1 r2

r2 r1 r0 r1

r3 r2 r1 r0

1α

(3)1

α(3)2

α(3)3

=

E(3)

000

. (B.3)

Due to the symmetry of the correlation matrix we can write: r0 r1 r2

r1 r0 r1

r2 r1 r0

1

α(2)1

α(2)2

.= R(3)α(2) =

E(2)

00

.= e(2) (B.4)

45


and r0 r1 r2

r1 r0 r1

r2 r1 r0

α

(2)2

α(2)1

1

.= R(3)α(2) =

00

E(2)

.= e(2), (B.5)

where the tilde , denotes the vectial mirrored version of the vectors α(2) and e(2).Now we want the solution for p = 3, which has the structure of equation (B.3):

α(3) .=

1

α(3)1

α(3)2

α(3)3

=

1

α(2)1

α(2)2

0

+ k3

0

α(2)2

α(2)1

1

(B.6)

The constant k3 can be found by extension of (B.4)

R(4)α(3) = e(3).

and (B.6)

r0 r1 r2 r3

r1 r0 r1 r2

r2 r1 r0 r1

r3 r2 r1 r0

·

1α

(2)1

α(2)2

0

+ k3

0

α(3)2

α(3)1

1

=

E(2)

00q

+ k3

q00

E(2)

!=

E(3)

000

(B.7)

with q = r3 + r2α(2)1 + r1α

(2)2 . To determine k3 and E(3) one can find from (B.7) the

conditionsE(2) + k3q = E(3)

q + k3E(2) = 0.

Which yields k3 and E(3)

k3 = − q

E(2)(B.8)

E(3) = E(2)(1− k23). (B.9)

With k3, α(3) can be determined. The next step is the solution for p = 4. Theparameters kp are also called reflection coefficients. They are used as coefficients forlattice filters and are robust against quantization and smoothing.

Now we summarize the algorithm. The recursion starts at p = 0, i.e., no prediction,and finds the solution for p = n in n recursions.

46


+

+ +

+

z−1 z−1 +

+

z−1

b [n]

−k

k

−k

k

e [n]

b [n] b [n]

k

−k

p−1

1

1

0p

p

b [n]p−1

e [n]

p

p

p−1

p−1

1

1

e[n]=e [n] x[n]

k =−10

Figure B.1.: Block diagram of a lattice filter. e[n] is the excitation signal, x[n] theoutput signal

1. Calculation of n + 1 values ri of the short-time autocorrelation

2. p = 0 (no prediction)

d(k) = x(k)E(0) = r0

α(0)0

.= 1

3. for p ≥ 1 calculation of

q =p−1∑i=0

α(p−1)i rp−1 (B.10)

kp = − q

E(p−1)(B.11)

α(p−1)p = 1 (B.12)

α(p)i = α

(p−1)i + kpα

(p−1)p−i ∀0 ≤ i ≤ p (B.13)

E(p) = E(p−1)(1− k2p) (B.14)

p = p + 1 (B.15)

4. Repeat step 3, while p ≤ n

5. ai = −α(n)i 1 ≤ i ≤ n

Since the algorithm provides either prediction coefficients ai or reflection coefficientskp, the predictor can be implemented using the direct form or the lattice structure(Fig. B.1).

47



48


Bibliography

[1] Special Section: Information Theoretic Aspects of Digital Watermarking,volume 81. June 2001.

[2] Special issue on Signal Processing for Data Hiding in Digital Media and secureContent Delivery, volume 51. IEEE, April 2003.

[3] D. Aucsmith, editor. Information Hiding: Second International Workshop, IH’98,Portland, Oregon, USA, April 1998. Proceedings. Springer-Verlag Berlin Heidel-berg, 1999.

[4] Walter Bender, Daniel Gruhl, Norishige Morimoto, and Anthony Lu. Techniquesfor data hiding. I.B.M. Systems Journal, 35(3 & 4):313–336, 1996.

[5] B. Chen and G.W. Wornell. Quantization index modulation: a class of prov-ably good methods for digital watermarking and information embedding. IEEETransactions on Information Theory, 47(4):1423–1443, May 2001.

[6] Brian Chen and Gregory W. Wornell. Quantization index modulation methodsfor digital watermarking and information embedding of multimedia. The Journalof VLSI Signal Processing, 27(1-2):7–33, February 2001.

[7] Qiang Cheng and Jeffrey Sorensen. Spread spectrum signaling for speech water-marking. In Proceedings of IEEE International Conference on Acoustics, Speech,and Signal Processing, volume 3, pages 1337 – 1340, Salt Lake City, UT, USA,May 2001.

[8] N. Cvejic, A. Keskinarkaus, and T. Seppanen. Audio watermarking using m-sequences and temporal masking. In IEEE Workshop on the Applications ofSignal Processing to Audio and Acoustics, pages 227–230, New Platz, NY, USA,October 2001.

[9] Stefan Derhaschnig. Echtzeitsimulation eines VHF-Flugfunkekanals auf einemdigitalen Signalprozessor TMS320C67, 2001.

[10] Ricardo A. Garcia. Digital watermarking of audio signals using a psychoacousticauditory model and spread spectrum theory. In Preprints of AES 107th Conven-tion, New York, US, September 1999.

[11] K.G. Gopalan, D.S. Benincasa, and S.J. Wenndt. Data embedding in audiosignals. IEEE Proceedings of Aerospace Conference, 6:2713–2720, March 2001.

49


[12] J.D. Gordy and L.T. Bruton. Performance evaluation of digital audio watermark-ing algorithms. In Proceedings of the 43rd IEEE Midwest Symposium on Circuitsand Systems, volume 1, pages 456–459, Lansing, MI , USA, August 2000.

[13] Daniel Gruhl, Walter Bender, and Anthony Lu. Echo hiding. In R.J. Ander-son, editor, 1. International Workshop on Information Hiding, Lecture Notes inComputer Science, volume 1174, pages 295–315, Cambridge, England, May 1996.Springer, Berlin.

[14] Petar Horvatic, Jian Zhao, and Niels J. Thorwirth. Robust audio watermarking:based on secure spread spectrum and auditory perception model, pages 181–190.Kluwer Academic Publishers, Norwell, MA, USA, Beijing, China, August 2000.

[15] Steven M. Kay. Modern Spectral Estimation-Theory and Application. PTR Pren-tice Hall, Englewood Cliffs, NJ, 1988.

[16] D. Kirovski and H. Malvar. Spread-spectrum audio watermarking: requirements,applications, and limitations. In Proc. IEEE Fourth Workshop on MultimediaSignal Processing, pages 219–224, Cannes , France, October 2001.

[17] M. Litao Gang Akansu, A.N. Ramkumar. Security and synchronization in water-mark sequence. In Proceedings of IEEE International Conference on Acoustics,Speech, and Signal Processing, volume 4, pages 3736–3739, Orlando, Florida, May2002.

[18] John Makhoul. Linear prediction: A tutorial review. Proceedings of the IEEE,63:561–580, April 1975.

[19] H.S. Malvar and D.A.F. Florencio. Improved spread spectrum: a new modulationtechnique for robust watermarking. IEEE Trans. Signal Processing, 51(4):898–905, April 2003.

[20] I.S. Moskowitz, editor. Information Hiding: 4th International Workshop, IHW2001, Pittsburgh, PA, USA, April 25-27, 2001. Proceedings. Springer-VerlagBerlin Heidelberg, 2001.

[21] Chr. Neubauer, J. Herre, and K. Brandenburg. Continuous Steganographic DataTransmission Using Uncompressed Audio, volume 1525 of Lecture Notes in Com-puter Science, pages 208–217. Springer; Berlin, 1998.

[22] Ted Painter and Andreas Spanias. Perceptual coding of digital audio. Proceedingsof the IEEE, 88(4):451–515, April 2000.

[23] F. Perez-Gonzalez, F. Balado, and J.R. Hernandez Martin. Performance analysisof existing and new methods for data hiding with known-host information inadditive channels. IEEE Trans. Signal Processing, 51(4):960–980, April 2003.

50


[24] F.A.P. Petitcolas, editor. Information Hiding: 5th International Workshop,IH 2002 Noordwijkerhout, The Netherlands, October 7-9, 2002. Revised Papers.Springer-Verlag Berlin Heidelberg, 2003.

[25] A Pfitzmann, editor. Information Hiding: Third International Workshop, IH’99,Dresden, Germany, September 29 - October 1, 1999 Proceedings. Springer-VerlagBerlin Heidelberg, 2000.

[26] Bernhard Sklar. Digital Communications. Prentice Hall PTR, Upper SaddleRiver, New Jersey, 2001.

[27] Mitchell D. Swanson, Bin Zhu, and Ahmed H. Tewfik. Current state of theart, challenges and future directions for audio watermarking. In IEEE Interna-tional Conference on Multimedia Computing and Systems, volume 1, pages 19–24,Florence, Italy, July 1999.

[28] Mitchell D. Swanson, Bin Zhu, Ahmed H. Tewfik, and Laurence Boney. Robustaudio watermarking using perceptual masking. Signal Processing, 66:337–355,1998.

[29] Peter Vary, Ulrich Heute, and Wolfgang Hess. Digitale Sprachsiganlverarbeitung.B.G. Teubner Stuttgart, 1998.

[30] Y. Yardimci, A. E. Cetin, and R. Ansari. Data hiding in speech using phasecoding. In Proceedings of Eurospeech Conference, Rhodes, Greece, September1997.

51

speech watermarking for air traffic control · digital watermarking was identiﬁed as possible...

Documents