elementare akustik nach (nicht mehr im netz)

59
Elementare Akustik nach http://www.ling.mq.edu.au/units/sph301/main/schedule.htm l (nicht mehr im Netz)

Upload: posy-morrison

Post on 30-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elementare Akustik nach  (nicht mehr im Netz)

Elementare Akustiknach

http://www.ling.mq.edu.au/units/sph301/main/schedule.html

(nicht mehr im Netz)

Page 2: Elementare Akustik nach  (nicht mehr im Netz)

What is Sound?

Sound is a wave-like distortion of a physical medium.

There are two classes of wave that can distort a physical medium:transverse waves longitudinal waves.

In transverse waves, the movement of the elements of the medium move orthogonally (at 90°) to the direction of movement of the wave.

A typical example of a transverse wave is a wave pattern on the surface of a body of water.

In such a wave the molecules of water move up and down whilst the wave front moves along the surface of the water.

Page 3: Elementare Akustik nach  (nicht mehr im Netz)

An example of a transverse wave: a wave induced in a piece of string.

Page 4: Elementare Akustik nach  (nicht mehr im Netz)

In longitudinal waves the elements of the medium move back and forth in line with the direction of propagation of the wave fronts.

In a spring a hand can induce a longitudinal wave by periodically moving back and forth in line with the direction of the spring.

This causes the regions of high and low spring compression to move along the spring.

This movement propagates through the spring producing a series of wavefronts which move towards the fixed wall with a velocity v.

Individual parts of the spring only move backwards and forwards short distances in the direction of wave propagation.

This causes the coils to periodically come closer to and further from adjacent coils than would be the case for the spring at rest.

A longitudinal wave is a compression wave in which particles move back and forth in the direction of wavefront movement.

Longitudinal waves

Page 5: Elementare Akustik nach  (nicht mehr im Netz)

An example of a longitudinal wave: a wave induced in a spring.

Page 6: Elementare Akustik nach  (nicht mehr im Netz)

Sound is a longitudinal compression wave.

Sound is a longitudinal compression wave which distorts a medium by creating moving fronts of high and low particle compression.

Sound can occur in any medium (solid, liquid and gas). Sound cannot occur in a vacuum as there is no medium to compress.

Individual particles only move short distances backward and forward in the direction of wave propagation whilst the compression wave front can move considerable distances.

Sound in air consists of consecutive regions of higher and lower air pressure relative to ambient air pressure (typically 1 atmosphere at sea level). These fluctuations in air pressure are extremely small relative to ambient air pressure.

Page 7: Elementare Akustik nach  (nicht mehr im Netz)

Acoustic Units of Measurement

The wavelength (λ) of a wave is the distance between successive wave fronts (ie. peak-to-peak distance). Wavelength is measured in metres (m).

The frequency (f) of a wave is the number of times per second that a complete wave cycle passes an observer. Frequency is measured in Hertz (Hz) {or /second (s-1) in basic units}.

The period (T) of a wave is the time it takes for one wave cycle to pass an observer. The period is measured in seconds (s) or milliseconds (ms).

The speed or velocity of sound (c) is the number of metres that a wave front can travel in a second. The speed of sound is measured in metres/second (m.s-1)

Page 8: Elementare Akustik nach  (nicht mehr im Netz)

Sound "Amplitude"

The human ear and the microphone (the main artificial transducer of sound) both measure the tiny changes in pressure that result from the passage of a longitudinal wave through a medium.

The average air pressure at sea level is approximately equivalent to the pressure exerted by a column of mercury 76 cm high (in a barometer) at 0°C under standard gravity. This is equivalent to 1 atm.

1 atm ≈ 1.013 x 105 Pa

The sound pressure that is only just perceivable (ie. the threshold of hearing for a 1000 Hz tone) is taken to be

2 x 10-5 Pa (ie. 20 µPa)

The threshold of pain (ie. the maximum sound pressure that can be perceived without pain) is about 100 Pa or about 1/1000 atm, which is 5,000,000 times the threshold sound pressure.

Page 9: Elementare Akustik nach  (nicht mehr im Netz)

The intensity of a sound, with a sound pressure level of 20 µPa, is very close to 10-12 Watts.m-2.

The sensitivity of the ear to changes in intensity is not related linearly to either intensity or pressure.

The ear's sensitivity to sound intensity or sound pressure is approximately logarithmic and measured in deciBels (dB):

dB = 10 x log10 (I1/I2)

The acoustic intensity, or average rate at which work is being transferred through a unit area (on the surface of the spherical wave front radiating out from the source in all directions) diminishes with distance in accordance with the inverse square law:

where: I ≈ the intensity of a sound  r ≈ the distance from the source of the sound

Page 10: Elementare Akustik nach  (nicht mehr im Netz)
Page 11: Elementare Akustik nach  (nicht mehr im Netz)

A two dimensional simulation of the inverse square law.

Page 12: Elementare Akustik nach  (nicht mehr im Netz)

Simple Harmonic Motion

A single cycle of a sine wave can be depicted as if it were a point on a circle moving anti-clockwise (they are mathematically equivalent).

At its starting point (when the sine wave is moving up from the baseline the point is at zero degrees (or zero radians: the 3 o'clock position on the circle).

At the top of the sine wave's first peak it is equivalent to being at the 90° (or π/2 radians: 12 o'clock) position in the circle.

When the sine wave reaches the baseline on its way down it is equivalent to the 180° (π radians: 9 o'clock) position.

When the sine wave reaches the bottom of the first dip it is at 270° (3 π /2 radians: 6 o'clock).

When it completes its first cycle it is back at the starting point 360°≈  0° (2π ≈ 0 radians).

Page 13: Elementare Akustik nach  (nicht mehr im Netz)

Simple Harmonic Motion

Page 14: Elementare Akustik nach  (nicht mehr im Netz)

Continuous Waveforms and Damping

A sine wave is a waveform generated by a system that is characterised by simple harmonic motion.

An ideal sine wave which exhibits simple harmonic motion looses no energy (or has its energy replenished from outside the system).

A sound wave exhibiting these characteristics would be a pure tone.

A continuous waveform - a pure tone

Page 15: Elementare Akustik nach  (nicht mehr im Netz)

The loss of energy in an oscillating system is known as damping.

A damped waveform is non-continuous.

A non-continuous or damped waveform.

Damping is a characteristic of systems that produce sounds with very complex spectral patterns.

Damping

Page 16: Elementare Akustik nach  (nicht mehr im Netz)

Adding together two pure tones of 100 Hz and 500 Hz (and of different amplitudes).

Waveforms and Phase

Page 17: Elementare Akustik nach  (nicht mehr im Netz)

The vast majority of natural sounds are not pure tones but are complex sounds that can be thought of as the combination of two or more pure tones.

In the bottom part of the diagram we can see the two pure tones as dashed lines. A simple addition of the dashed lines results in the unbroken line.

The unbroken line clearly has a more complex pattern than either of the two pure tones.

The diagram shows the effect of adding two pure tones, one of 100 Hz and the other of 500 Hz. The 500 Hz tone has half the sound pressure level of the 100 Hz tone.

Page 18: Elementare Akustik nach  (nicht mehr im Netz)

The complex pattern repeats with the same period as the 100 Hz tone.

100 Hz is the highest common integer factor of the frequencies of the two tones.

The period (and therefore the frequency) of a complexwave is always equal to the period (or frequency) of the highest common factor of the sine waves being added to it.

The repetition frequency of the complex pattern can be called its fundamental frequency (F0 or F0).

Page 19: Elementare Akustik nach  (nicht mehr im Netz)

Adding together three pure tones of 100 Hz, 200 Hz and 300 Hz.

Page 20: Elementare Akustik nach  (nicht mehr im Netz)

The three pure tones at 100, 200 and 300 Hz are of different amplitudes. They all start from the 0º position.

The highest common factor of 100, 200 and 300 is 100 and so the resultant complex wave has a fundamental frequency of 100 Hz.

In sounds with a continuous musical tone the human ear is insensitive to phase differences. What the ear picks up is the frequency and amplitude characteristics.

In tones with non-zero phase relationships the difference in phase results in a totally different complex wave shape.

Page 21: Elementare Akustik nach  (nicht mehr im Netz)

Speech Waveforms

There are two types of speech sound source:

periodic vibration of the vocal folds resulting in voiced speech

aperiodic sound produced by turbulence at some constriction in the vocal tract resulting in voiceless speech

These two sound sources are modified by the frequency-selective (filtering) effects of different vocal tract shapes to produce the various sounds of speech.

The voiced source can be filtered ("modulated") by the position of the tongue, lips and velum

Page 22: Elementare Akustik nach  (nicht mehr im Netz)

Close-up (40 ms) views of the waveforms of one voiceless fricative (/h/) and 3 vowel tokens

Page 23: Elementare Akustik nach  (nicht mehr im Netz)

The sound /h/ is aperiodic.

The three vowel sounds are periodic. Their patterns are repeated at regular intervals.

The period of these patterns is about 10 ms (1/100 secs) and so their frequency is about 100 Hz.

Each repetition or period of these patterns corresponds to one glottal cycle, or one cycle of vocal fold opening and closing in the larynx.

An F0 (fundamental frequency) of 100 Hz is a normal value for an adult male voice.

The more familiar term pitch refers to the way we perceive F0. A voice with a high-sounding pitch has a high F0.

Page 24: Elementare Akustik nach  (nicht mehr im Netz)

Close-up (40 ms) views of the waveforms of four voiced consonants

Page 25: Elementare Akustik nach  (nicht mehr im Netz)

Close-ups of the fricative /z/ illustrating varying degrees of source mixing

Page 26: Elementare Akustik nach  (nicht mehr im Netz)

Three long vowels in an /h_d/ context

Page 27: Elementare Akustik nach  (nicht mehr im Netz)

Identification of Speech Waveforms

Phones / phonems, e.g. vowels contrast in an identical environment.

The differences between the waveforms are mainly due to the differences between the waveforms of the vowels.

Waveforms can tell you that you are looking at a vowel, but they can't reliably identify the vowel.

The intensity of the vowels rises rapidly at the start, reaches a peak by about 14 of the way through the vowel and then ⁄gradually drops.

Page 28: Elementare Akustik nach  (nicht mehr im Netz)

Three English voiceless oral stops in CV context

Page 29: Elementare Akustik nach  (nicht mehr im Netz)

All three stops commence with a burst.

The burst occurs when a build up of air pressure is suddenly released.

The bursts are very short (about 1-5 ms) and are followed by about 100 ms of aspiration (or fricative-like voiceless sound).

Page 30: Elementare Akustik nach  (nicht mehr im Netz)

Waveforms of two of the English voiceless fricatives in CV (consonant-vocal) context

Page 31: Elementare Akustik nach  (nicht mehr im Netz)

Voiceless fricatives are aperiodic, which means that they don't consist of periodically repeating patterns as occurs in voiced sounds.

The fricative aspiration in these two examples is very long, 250 to 300 ms, compared to the aspiration of the voiceless stops.

Page 32: Elementare Akustik nach  (nicht mehr im Netz)

Analog and Digital Sound

Sound has properties (the dimensions of frequency, intensity, time and phase) that exist in the real world as infinite continua of infinitesimal changes.

"Representations" of sound are the result of transformations of sound into other analog or digital forms.

Sound can be recreated from these representatations with the appropriate technology.

Until the invention of the digital computer all representations of speech sounds were analog signals.

Page 33: Elementare Akustik nach  (nicht mehr im Netz)

Transduction

Transduction is the conversion of a signal from one analog form into another.

Sound is transduced into an electrical signal by a microphone.

In this electrical signal, continuously changing voltage is the analog representation of continuously changing sound pressure level.

This electrical signal is transduced back into sound via a loud speaker.

A device that transforms a signal from one form into another is called a transducer.

Microphones and audio speakers are transducers.

The ear is also a transducer that converts sound into neural signals.

Page 34: Elementare Akustik nach  (nicht mehr im Netz)

Digitisation: Sampling and Quantisation

Windowing

Acoustic analyses attempt to extract the sine waves that add up to produce the variations evident in the waveform.

To select a series of speech samples for spectral analysis we need to "window" the original waveform.

The simplest window is a "rectangular window".

A rectangular window has a starting point "t1" and an end point "t2" with all values between t1 and t2 multiplied by one and all values before t1 or after t2 multiplied by zero.

A rectangular window has a complex spectrum of its own which contaminates the spectrum of speech.

Page 35: Elementare Akustik nach  (nicht mehr im Netz)

Rectangular filter

Filter / Fenster

Page 36: Elementare Akustik nach  (nicht mehr im Netz)

A Hanning window is a member of a family of windows known as raised cosine windows.

An Hanning function is frequently used to reduce aliasing (static distortion resulting from a low sampling rate).

This class of windows has no significant effect on the shape of the spectrum of the resulting windowed speech.

These windows are often used during the frequency analysis of speech sounds.

Hanning window

Page 37: Elementare Akustik nach  (nicht mehr im Netz)

Hanning filter

Filter / Fenster

Page 38: Elementare Akustik nach  (nicht mehr im Netz)

Digitisation

The basic digitisation hardware is an analog-to-digital converter.

It takes snapshots of an input analog signal at regular intervals outputting a number which is closest to the magnitude of the snapshot measurement.

Taking a series of snapshots of a signal can only capture an approximation of the original.

The sampling frequency or sampling rate is a measure of the number of snapshots taken from the signal each second.

The absolute minimum number of samples per cycle needed to properly reproduce a sinusoid is two - one at the peak, one at the trough.

The sampling frequency should be at least twice the frequency of the sinusoid being digitised: the Nyquist Frequency.

In studying speech recorded in quiet conditions we often use a sampling frequency of 20000Hz which gives information up to 10000Hz.

Page 39: Elementare Akustik nach  (nicht mehr im Netz)

Untersuchung von Tonsequenzen: Samples

Page 40: Elementare Akustik nach  (nicht mehr im Netz)

A two-dimensional spectrum is effectively a snapshot of the spectrum of a sound at one point in time.

This "point" in time is always a window of some length.

Spectra

Most often the amplitude axis will be in deciBels (dB).

The frequency axis is usually in Hertz (Hz) or kiloHertz (kHz).

Page 41: Elementare Akustik nach  (nicht mehr im Netz)

Line Spectra

A line spectrum is a spectral representation that displays the frequencies and relative intensities of the component sine waves.

Each sine wave is displayed as a single vertical line placed at the appropriate frequency on the x-axis.

The height of the line represents the amplitude of the component sine wave.

The amplitude is usually displayed as a relative sound pressure level (ie. in Pascals) or as a deciBel value.

Page 42: Elementare Akustik nach  (nicht mehr im Netz)

Fourier Transforms

Fourier Transforms remain the primary method for carrying out frequency analyses of sounds and other phenomena.

In digital signal processing the Fourier transform is almost always performed using an algorithm called the Fast Fourier Transform or FFT.

The Fourier transform transforms a time domain signal into a frequency domain representation of that signal.

This means that it generates a description of the distribution of the energy in the signal as a function of frequency.

This is normally displayed as a plot of frequency (x-axis) against amplitude (y-axis) called a spectrum.

Page 43: Elementare Akustik nach  (nicht mehr im Netz)

Fast Fourier Transform (FFT) of the vowel in the word "heard"

Page 44: Elementare Akustik nach  (nicht mehr im Netz)

Linear Prediction Coefficient (LPC) analysis

A point of specific interest are the major spectral peaks (formants) which correspond to the resonant frequencies of the vocal tract.

Linear Prediction Coefficient (LPC) analysis attempts to predict the major spectral peaks (formants) seen in the Fourier transform.

The resulting LPC spectrum is a smoothed spectrum with the peaks representing the formants (resulting from the vocal tract resonances) of the spectrum of a vowel or vowel-like consonant.

Page 45: Elementare Akustik nach  (nicht mehr im Netz)

An LPC analysis of the vowel of „heard“

Page 46: Elementare Akustik nach  (nicht mehr im Netz)

Combined FFT and LPC analysis of the vowel in „heard“

Page 47: Elementare Akustik nach  (nicht mehr im Netz)

Spectrograms: Time, Frequency and Intensity

A spectrograph is a machine or a computer algorithm that performs a series of spectral analyses at different times and then displays them using a three dimensional display of time, frequency and amplitude.

In most cases time is displayed on the X-axis, frequency is displayed on the Y-axis and amplitude is displayed as variations on greyscale darkness or of colour.

The speech spectrograph consists of a series of band pass (BP) filters.

A band pass filter permits frequency components between two cut-off frequencies to pass unattenuated and attenuates frequency components below the lower (HP) cut-off frequency and above the higher (LP) cut-off frequency.

Page 48: Elementare Akustik nach  (nicht mehr im Netz)

Broad band spectrogram of the word "heard" spoken by an adult male speaker of Australian English

Page 49: Elementare Akustik nach  (nicht mehr im Netz)

Narrow band spectrogram of the word "heard"

Page 50: Elementare Akustik nach  (nicht mehr im Netz)

Speech Production: Source-Filter Theory

The source-filter theory describes speech production as a two stage process involving the generation of a sound source, which is then shaped or filtered by the resonant properties of the vocal tract.

Page 51: Elementare Akustik nach  (nicht mehr im Netz)

Sound sources can be either periodic or aperiodic.

Glottal sound sources can be periodic (voiced), aperiodic (whisper and /h/) or mixed (eg. breathy voice).

Supra-glottal sound sources that are used contrastively in speech are almost always aperiodic (ie. random noise)

Most of the filtering of a source spectrum is carried out by that part of the vocal tract anterior to the sound source.

In the case of a glottal source, the filter is the entire supra-glottal vocal tract.

A voiced glottal source has its own spectrum which includes spectral fine structure (harmonics and some noise) and a characteristic spectral slope.

In voiced speech the fundamental frequency (perceived as vocal pitch) is a characteristic of the glottal source acoustics whilst features such as vowel formants are characteristics of the vocal tract filter (resonances).

Sound sources

Page 52: Elementare Akustik nach  (nicht mehr im Netz)

Resonance

All physical objects resonate.

Some have simple, uniform resonance patterns and some have complex resonance patterns.

Some resonators are highly damped and some are weakly damped.

Some resonators may generate sound by exciting adjacent air particles in the surrounding medium.

For example, a guitar string vibrates upon being plucked.

The guitar string collides with the surrounding air and generates longitudinal pressure waves (sound) in that medium.

Some resonators (eg. the supra-glottal vocal tract) may act upon sound waves generated elsewhere (eg. at the glottis) and selectively permit some frequencies (the resonant frequencies) to pass unattenuated whilst causing other frequencies to be attenuated (reduced in intensity) to some extent.

Page 53: Elementare Akustik nach  (nicht mehr im Netz)

Reflexion einer Welle und Resonanz

http://id.mind.net/~zona/mstm/physics/waves/waves.html

Auf zur Demo von mind.net:

Unsere Themen: InterferenceWave ReflectionStanding Waves

Page 54: Elementare Akustik nach  (nicht mehr im Netz)

Standing waves and resonance

When a wave front is reflected it must reflect with inversion so that the resultant wave interference pattern always maintains zero displacement at each barrier.

In all cases where the end of the resonating body is free to move wave reflection occurs without inversion.

Resonant frequencies have wavelengths that all result in standing waves with nodes at the fixed ends.

For a string fixed at both ends, the resonance frequencies are all multiples of the first resonance frequency.

Page 55: Elementare Akustik nach  (nicht mehr im Netz)

Four wavelengths that would result in nodes at the two fixed ends. In descending order, the wave's wavelength is 2L, L, 2L/3, L/2, where L is the length of the string.

Nodes at two fixed ends

node

antinode

Page 56: Elementare Akustik nach  (nicht mehr im Netz)

Resonanz in einem Rohr, das an einer Seite offen ist

Standing wave patterns for the first four resonances in a tube open at one end and closed at the other.

Page 57: Elementare Akustik nach  (nicht mehr im Netz)

The vocal tract during the production of vowels and vowel-like consonants can be described as a tube open at one end, the mouth, and closed at the other, the glottis.

Resonance in a tube of uniform cross-sectional area is a physical characteristic of that tube.

It is dependent upon the length of that tube and the open or closed state of the two ends.

What actually vibrates, however, is the medium contained in that tube.

When we produce vowel sounds the resonances of the vocal tract selectively enhance sound vibrations close to the resonance frequencies and selectively attenuate sound vibrations remote from the resonance frequencies.

This results in peaks in the acoustic spectrum of the resulting speech sound. These acoustic spectral peaks are called formants, particularly when they occur in vowels and vowel-like consonants.

Resonanz im Vokaltrakt: Formanten von Sonoranten (Vokalen und stimmhaften Konsonanten)

Page 58: Elementare Akustik nach  (nicht mehr im Netz)

Praat

Weiter geht es praktisch-experimentell mit Praat.

Sehen Sie sich zuerst einmal bei Praat um:

http://www.germanistik.unibe.ch/siebenhaar/SiebenhaarFolder/subfolder/PraatEinfuehrung/PraatManual/PraatManual_home.html

http://www.fon.hum.uva.nl/praat/

Page 59: Elementare Akustik nach  (nicht mehr im Netz)