psychoacoustic approaches to audio steganography report

ECES 434REPORT

Psychoacoustic Approaches to Audio Steganography

Cody A. RayDrexel University

Fall 2009

IntroductionSteganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity. The word steganography is of Greek origin and means "concealed writing". Apart from the obvious applica-tions of transporting hidden information between enti-ties, the methods of steganography are also used within copyright protection, the detection of content manipu-lation, fingerprinting, and watermarking.

The objective of this project was to explore meth-ods of audio steganography with emphasis on psy-choacoustic approaches. Specifically, the project has the requirement of hiding a text-based message in-side an audio signal with minimal or no distortion of the signal as perceived by the human ear. In all ap-proaches, we assume that the length of the message to be hidden is much smaller than the number of samples in the original sound track. We did not con-sider the resilience of the embedded message to attacks or otherwise “friendly” transformations of the host signal.

ApproachesWe will compare and contrast three different ap-proaches to audio steganography.

1. The first and simplest method we implemented is known as the Least-Significant Bit (LSB) method. In the LSB method, the least signifi-cant bit of each sampling point of the original signal is substituted with a binary message.

2. The second method we demonstrated was a amplitude modulation (AM) algorithm for the time-domain. We slice the time signal into “blocks” and scale each block according to bits of the message.

3. The last method we explored was use of the MPEG Model 1 Layer 1 psychoacoustic model to calculate the unnecessary bits using the signal-to-mask ratios (SMR). Then we replace the unnecessary bits with those of the message.

Least-Significant Bit MethodThe method of least-significant bit (LSB) coding is the simplest technique for embedding information in a digital audio file. The least-significant bit of each sample in the signal is substituted with a bit from the secret message. One bit is embedded per each sample; thus, the LSB method allows for en-coding a large amount of data.

To recover the message hidden inside an LSB en-coded audio track, the receiver needs to know the sequence of indices corresponding to each embed-ded sample. There are a number of methods used to choose the subset of samples in which to embed bits of the message; however, whatever the method, the receiver must also know the algorithm used for se-lecting the samples. One trivial method starts at a constant distance from the beginning of the audio track and perform LSB coding until the message has been completely embedded within the signal, with-out changing any of the remaining samples. How-

ECES 434 Report 1

ever, this approach creates an easy-to-detect statisti-cal anomaly as the probabilities are non-uniform across the sample set.

One way to avoid this issue is by padding the mes-sage with random bits in order to make the message length the same as the number of samples. However, we’re now embedding far more information than required to convey the given message. By modifying more of the file than necessary, we’re increasing the amount of noise in the signal, which in turn in-creases the probability of detection of the hidden message.

A more sophisticated approach involves the use of a random number generator to spread the secret mes-sage out over the audio track in a random manner. One popular approach uses a shared secret as a seed for the random number generator, allowing the sender and receiver to independently construct the same pseudorandom sequence of sample indices. One drawback is the necessity to avoid collisions created by using the same sample index twice; a bookkeeping system can be used to track previous indices. Alternatively, a pseudorandom permutation of the entire set can be constructed through the use of a secure hash function.

All of the above variants do not require the original audio track to recover the message.

Since we did not consider resilience to attacks in this study, we implemented the trivial method out-lined above. As a matter of practical concern, we also prefixed the message with an identifier string to mark the file as containing a secret message, and included the size of the secret message to guide the receiver as to where to stop decoding the signal.

Time-domain Amplitude Modulation MethodTime domain amplitude modulation (TDAM) capi-talizes on the difficulty of differentiating between subtle changes in perception of loudness. The signal is sliced in the time domain, and the message is en-coded as a scale factor applied to each time slice. One bit is encoded per block, where the block size

is the ratio of the lengths of the samples to the mes-sage. Correspondingly, a smaller message can be en-coded in this technique.

To recover the message hidden inside an TDAM encoded audio track, the receiver needs access to the original audio file, and must know the scale fac-tors used in coding the message. Extraction is done by scaling the original file by the lowest scale factor, and comparing whether each frame of the “dirty” signal is greater than the scaled original.

Many of the issues addressed in the previous section on LSB coding apply to TDAM as well. These issues will not be covered again here. Note, however, that this method doesn’t require any additional data to be embedded, and the signal is modified uniformly.

Amplitude Modulation via Psychoacoustic ModelsThe most sophisticated approach is amplitude modulation in the frequency domain based upon MPEG Model 1 Layer 1 psychoacoustic model. The basic algorithm is as follows:

1. Calculate the power spectrum.

2. Identify the tonal and non-tonal components.

3. Decimate the maskers to eliminate all irrelevant maskers.

4. Compute the individual masking thresholds.

5. Compute the global masking threshold.

6. Determine the minimum masking threshold in each subband.

7. Shape the power of the message below the mask-ing threshold.

The psychoacoustic model shows components in the signal that do not affect perception. The mask-ing threshold defines the frequency response of the loudness threshold minimum filter, which is used to shape the message. The filtered message is scaled to shift the message noise and added to the delayed original signal in order to produce the “dirty” track.

ECES 434 Report 2

Results

ECES 434 Report 3

LSB Coding for Mono Wav

LSB Coding for Stereo Wav

ECES 434 Report 4

Time Domain Amplitude Modulation for Mono Wav

Time Domain Amplitude Modulation for Stereo Wav

Discussion and ConclusionWe tested on both mono and stereo channel wave files. These are depicted in the results section above. It should be noted that the magnitude access for the mono channel is always twice as large as that of the stereo files due to an artifact from the preprocessing state, where we converted the original stereo wav file to a mono wav file. Also, note that our time-domain amplitude modulation approach currently outputs a mono channel WAVE format file regard-less of the number of channels available in the input.

In LSB coding, when modifying the least significant bit in the first coding system, the “bin” into which the quantized signal falls is being directly modified. Since we’re only modifying the quantization level by one, at worst, we’re only modifying the time-domain signal by a small value that’s dependent on the num-ber of bits used for quantization. Effectively, we’re increasing the noise due to quantization and hiding the message in this noise. This induces a small pen-alty due to being an audibly perceptual modification.

In TDAM coding, we’re decreasing the amplitude of the time-domain signal by 1-2%. This will primarily affect the perceived loudness of the sound. Addi-tionally, this coding system could slightly affect per-ception of pitch, due to intensity-dependent factors related to the perception of pitch. However, because the scale is small, this system produces a “dirty” audio signal that yields a negligible difference from the magnitude of the original signal.

Unfortunately, time did not allow for the comple-tion of the MPEG-based steganography system im-plementation prior to final reporting. However, the hypothesis that each approach is successively better than the previous was true, which indicates that when completed this technique will be superior to the others.

Bibliography

Arnold, Michael. “Audio Watermarking.” Published: November 1, 2001. Access: December 2, 2009. http://www.ddj.com/security/184404839

Cvejic, Nedeljko. “Algorithms for Audio Watermark-ing and Steganography.” University of Oulu. 2004.

Garcia, R.A. “Digital Watermarking of Audio Signals using Psychoacoustic Auditory Model and Spread Spectrum Theory.” Preprints-Audio Engineering Society. Citeseer. 1999.

Petitcolas, Fabien. “MPEG for MATLAB.” Pub-lished: August 11, 2003. Access: December 2, 2009. http://www.petitcolas.net/fabien/software/mpeg/

Welsh, Eric. Chen, Alex. Shehad, Nader. Virani, Aamir. “W.A.V.S Compression.” http://is.rice.edu/~welsh/elec431/

Wikipedia. “Steganography.” Access: November 17, 2009. http://en.wikipedia.org/wiki/Steganography

Wilson, Scott. “Microsoft WAVE soundfile format.” Published: January 20, 2033. Access: November 19, 2009. ccrma.stanford.edu/courses/422/projects/WaveFormat/

ECES 434 Report 5

http://www.ddj.com/security/184404839

http://www.ddj.com/security/184404839

http://www.petitcolas.net/fabien/software/mpeg/

http://www.petitcolas.net/fabien/software/mpeg/

http://is.rice.edu/~welsh/elec431/

http://is.rice.edu/~welsh/elec431/

http://en.wikipedia.org/wiki/Steganography

http://en.wikipedia.org/wiki/Steganography

https://ccrma.stanford.edu/courses/422/projects/WaveFormat/




psychoacoustic approaches to audio steganography report

Technology