introduction to signal processing
Post on 18-Nov-2014
318 Views
Preview:
TRANSCRIPT
Introduction to Signal Introduction to Signal Processing and Some Processing and Some
applications in audio analysisapplications in audio analysis
Md. Khademul Islam Molla
JSPS Research FellowHirose-Minematsu Laboratory
Email: molla@gavo.t.u-tokyo.ac.jp
Outlines of the presentationOutlines of the presentationBasics of discrete time signalsBasics of discrete time signalsFrequency domain signal analysisFrequency domain signal analysisBasic TransformationsBasic TransformationsFourier Transform (FT), short-time FT (STFT)Fourier Transform (FT), short-time FT (STFT)Wavelet Transform (WT) Wavelet Transform (WT) Empirical mode decomposition Empirical mode decomposition (EMD), and (EMD), and
Hilbert spectrum (HS)Hilbert spectrum (HS)Remarkable comparisons among FT, WT, HSRemarkable comparisons among FT, WT, HSSome applications in audio processingSome applications in audio processingSome open problems to work withSome open problems to work with
Discrete time signalDiscrete time signal
It is not possible to process It is not possible to process continuous signalscontinuous signalsWe need to make it discrete time We need to make it discrete time signal with suitable sampling signal with suitable sampling frequency and quantization frequency and quantization
The sampling theory The sampling theory FFss22ffcc
where, where, ffcc expected signal frequency, expected signal frequency, FFss required sampling frequency required sampling frequency Quantization is required in samplingQuantization is required in sampling
Discrete time signalDiscrete time signal
Signal samplingSignal sampling Signal quantizationSignal quantization
Discrete time signalDiscrete time signal
Effects of under samplingEffects of under sampling
Discrete time signalDiscrete time signal
Effects of required sampling frequencyEffects of required sampling frequency
Discrete time signalDiscrete time signal
Telephone speech is usually sampled at 8 kHz to capture up to 4 kHz data 16 kHz is generally regarded as sufficient for speech recognition and synthesis The audio standard is a sample rate of 44.1 kHz (CD) or 48 kHz (Digital Audio Tape) to represent frequencies up to 20 kHz
-5
-3
-1
1
3
5
-10 -5 0 5 10
-5
-4
-3
-2
-1
0
1
2
3
4
5
-10 -5 0 5 10
Discrete time signalDiscrete time signal
Amplitude
Phase
Frequency
f(x) = 5 cos (x)
f(x) = 5 cos (x + 3.14)
f(x) = 5 cos (3 x + 3.14)
-5
-3
-1
1
3
5
-10 -5 0 5 10
Time-domain signalsTime-domain signals
The Independent Variable is TimeThe Dependent Variable is the AmplitudeMost of the Information is Hidden in the Frequency Content
0 0.5 1-1
-0.5
0
0.5
1
0 0.5 1-1
-0.5
0
0.5
1
0 0.5 1-1
-0.5
0
0.5
1
0 0.5 1-4
-2
0
2
4
10 Hz2 Hz
20 Hz2 Hz +
10 Hz +20Hz
TimeTime
Time Time
Ma
gn
itu
de
Ma
gn
itu
de
Ma
gn
itu
de
Ma
gn
itu
de
SignalSignal TransformationTransformation
WhyTo obtain a further information from the signal that
is not readily available in the raw signal.
Raw SignalNormally the time-domain signal
Processed SignalA signal that has been "transformed" by any of the
available mathematical transformations
Fourier TransformationThe most popular transformation
between time and frequency domains
Frequency domain analysisFrequency domain analysis
Why Frequency Information is Needed
Be able to see any information that is not obvious in time-domain
Types of Frequency TransformationFourier Transform, Hilbert Transform,
Short-time Fourier Transform,the Radon Transform, the Wavelet Transform …
Frequency Frequency domain domain analysisanalysis
time, t frequency, fF
s(t)s(t) S(f) = S(f) = FF[s(t)][s(t)]
analysianalysiss
synthesissynthesis
s(t), S(f) : s(t), S(f) : Transform PairTransform Pair
General Transform General Transform as problem-solving as problem-solving
tooltool
•Powerful & complementary to time domain analysisPowerful & complementary to time domain analysis methodsmethods•Frequency domain representation shows the signal Frequency domain representation shows the signal energy energy and phase with respect to frequencyand phase with respect to frequency•Fast and efficient way to view signal’s informationFast and efficient way to view signal’s information
Basic block diagram of signal transformationBasic block diagram of signal transformation
Frequency Frequency domain domain analysisanalysis
Complex numbers4.2 + 3.7i9.4447 – 6.7i-5.2 (-5.2 + 0i)
General FormZ = a + biRe(Z) = aIm(Z) = b
AmplitudeA = | Z | = √(a2 + b2)
Phase = Z = tan-1(b/a)
Frequency Frequency domain domain analysisanalysis
Polar CoordinateZ = a + bi
AmplitudeA = √(a2 + b2)
Phase = tan-1(b/a)
a
b
A
Frequency Frequency domain domain analysisanalysis
Frequency SpectrumBe basically the frequency components (spectral
components) of that signalShow what frequencies exists in the signal
Fourier Transform (FT) One way to find the frequency contentTells how much of each frequency exists in a
signal
Spectrum of Spectrum of speech speech signalsignal
Fourier TransformFourier Transform•Fourier transform decomposes a function into a Fourier transform decomposes a function into a spectrum of its spectrum of its frequency componentsfrequency components, ,
•TThe inverse transform synthesizes a function from its he inverse transform synthesizes a function from its spectrum of frequency components spectrum of frequency components
•Discrete Fourier transform pair is defined as:Discrete Fourier transform pair is defined as:
Where Where XXkk represents the frequency component represents the frequency component
Where Where xxnn represents nth sample in time domain represents nth sample in time domain
FourierFourier Transform Transform
-5
-4
-3
-2
-1
0
1
2
3
4
5
0 200 400 600 800 1000 1200 1400
-5
-4
-3
-2
-1
0
1
2
3
4
5
0 200 400 600 800 1000 1200 1400
5 10 15(Hz)
5 10 15(Hz)
Amplitude OnlyAmplitude Only
Fourier Fourier Trans. of Trans. of 1D1D signal signal
-5
-4
-3
-2
-1
0
1
2
3
4
5
0 200 400 600 800 1000 1200 1400 5 10 15
(Hz)
Fourier Fourier Spectrum of 1D Spectrum of 1D
FFourier Transformourier Transform
Fourier analysis uses Sinusoids as the basis function in decompositionFourier transforms give the
frequency information, smearing timeSamples of a function give the
temporal information, smearing frequency
7
1ksin(kt)kb-(t)7sw
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
5
1ksin(kt)kb-(t)5sw
3
1ksin(kt)kb-(t)3sw
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
1
1ksin(kt)kb-(t)1sw
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
9
1ksin(kt)kb-(t)9sw
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10t
sq
ua
re s
ign
al,
sw
(t)
11
1ksin(kt)kb-(t)11sw
FS synthesisFS synthesisSquare wave Square wave reconstruction from reconstruction from spectral termsspectral terms
Convergence may be slow (~1/k) - ideally need infinite terms.Convergence may be slow (~1/k) - ideally need infinite terms.PracticallyPractically, series truncated when remainder below computer tolerance, series truncated when remainder below computer tolerance
(( errorerror). ). BUTBUT … Gibbs’ Phenomenon. … Gibbs’ Phenomenon.
Stationarity of the signalStationarity of the signal
Stationary SignalSignals with frequency content
unchanged over the entire timeAll frequency components exist at all
times
Non-stationary SignalFrequency changes in timeOne example: the “Chirp Signal”
Stationarity of the signalStationarity of the signal
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
0 5 10 15 20 250
100
200
300
400
500
600
Time
Ma
gn
itu
de
Ma
gn
itu
de
Frequency (Hz)
2 Hz + 10 Hz + 20Hz
Stationary
0 0.5 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 250
50
100
150
200
250
Time
Ma
gn
itu
de
Ma
gn
itu
de
Frequency (Hz)
Non-Stationary
0.0-0.4: 2 Hz + 0.4-0.7: 10 Hz + 0.7-1.0: 20Hz
Occur at all times
Do not appear at all times
Chirp signalChirp signal
Same in Frequency Domain
0 0.5 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 250
50
100
150
Time
Ma
gn
itu
de
Ma
gn
itu
de
Frequency (Hz)0 0.5 1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 250
50
100
150
Time
Ma
gn
itu
de
Ma
gn
itu
de
Frequency (Hz)
Different in Time DomainFrequency: 2 Hz to 20 Hz Frequency: 20 Hz to 2 Hz
At what time the frequency components occur? FT can not tell!At what time the frequency components occur? FT can not tell!
Limitations of Fourier TransformLimitations of Fourier Transform
FT Only Gives what Frequency Components Exist in the SignalThe Time and Frequency Information can not be seen at the same timeTime-frequency representation of the signal is needed
Most of Signals are Non-stationary
ONE SOLUTION: SHORT-TIME FOURIER TRANSFORM (STFT)
Short Time Fourier TransformShort Time Fourier Transform
Dennis Gabor (1946) used STFTTo analyze only a small section of the signal at a
time -- a technique called Windowing the Signal.
The segment of signal is assumed stationary A 3D transform
dtetttxft ftj
t
2*X ,STFT
function window the:t
A function of time and frequency
Short time Fourier TransformShort time Fourier Transform
FT
FT
Speech Speech signal and its signal and its STFTSTFT
Via Narrow Window
Via Wide Window
DDrawbacks of rawbacks of STFTSTFTUnchanged WindowDilemma of Resolution
Narrow window -> poor frequency resolution Wide window -> poor time resolution
Heisenberg Uncertainty PrincipleCannot know what frequency exists at what time
intervals
Wavelet Transform
To overcome some limitations of To overcome some limitations of Fourier transformFourier transform
SS
A1 D
1
A2 D2
A3 D3
Discrete Wavelet Discrete Wavelet decompositiondecomposition
Wavelet OverviewWavelet Overview
WaveletA small wave
Wavelet TransformsProvide a way for analyzing waveforms, bounded in
both frequency and durationAllow signals to be stored more efficiently than by
Fourier transformBe able to better approximate real-world signalsWell-suited for approximating data with sharp
discontinuities
“The Forest & the Trees”Notice gross features with a large "window“Notice small features with a small "window”
Wavelet TransformAn alternative approach to the short time Fourier
transform to overcome the resolution problem Similar to STFT: signal is multiplied with a function
Multi-resolution Analysis Analyze the signal at different frequencies with
different resolutionsGood time resolution and poor frequency resolution
at high frequenciesGood frequency resolution and poor time resolution
at low frequenciesMore suitable for short duration of higher frequency;
and longer duration of lower frequency components
MMulti-resolution analysisulti-resolution analysis
Advantages of WT over STFTAdvantages of WT over STFT
Width of the Window is Changed as the Transform is Computed for Every Spectral ComponentsAltered Resolutions are Placed
Principles of WTPrinciples of WT
Split Up the Signal into a Bunch of SignalsRepresenting the Same Signal, but all Corresponding to Different Frequency BandsOnly Providing What Frequency Bands Exists at What Time Intervals
Wavelet Small waveMeans the window function is of finite length
Mother WaveletA prototype for generating the other window functionsAll the used windows are its dilated or compressed and
shifted versions
Principles of WTPrinciples of WT
dts
ttx
sss xx
*1
, ,CWT
TranslationTranslation
(The location of (The location of the window)the window)
Scale
Mother Wavelet
Principles of WTPrinciples of WT
Wavelet Basis Functions:
21
1
241-
0
2
20
21
1- :devivativeDOG
1!2!2
DOG :order Paul
:)frequency(Morlet
edd
mm
immi
m
ee
m
mm
mmm
j
Derivative Of a GaussianM=2 is the Marr or Mexican hat wavelet
Time domain Frequency
domain
Wavelet basesWavelet bases
Scale of waveletScale of wavelet
ScaleS>1: dilate the signalS<1: compress the signal
Low Frequency -> High Scale -> Non-detailed Global View of Signal -> Span Entire SignalHigh Frequency -> Low Scale -> Detailed View Last in Short TimeOnly Limited Interval of Scales is Necessary
Computation of WTComputation of WT
Step 1: The wavelet is placed at the beginning of the signal, and set s=1 (the most compressed wavelet);Step 2: The wavelet function at scale “1” is multiplied by the signal, and integrated over all times; then multiplied by ;Step 3: Shift the wavelet to t= , and get the transform value at t= and s=1;Step 4: Repeat the procedure until the wavelet reaches the end of the signal;Step 5: Scale s is increased by a sufficiently small value, the above procedure is repeated for all s;Step 6: Each computation for a given s fills the single row of the time-scale plane;Step 7: CWT is obtained if all s are calculated.
dts
ttx
sss xx
*1
, ,CWT
s1
Time & Frequency ResolutionTime & Frequency Resolution
Time
Frequency
Better time resolution;Poor frequency resolution
Better frequency resolution;Poor time resolution
• Each box represents a equal portion • Resolution in STFT is selected once for entire analysis
Comparison of transformationsComparison of transformations
Discretization of Discretization of WTWT
It is Necessary to Sample the Time-Frequency (scale) Plane.At High Scale s (Lower Frequency f ), the Sampling Rate N can be Decreased.The Scale Parameter s is Normally Discretized on a Logarithmic Grid.The most Common Value is 2.
1211212 NffNssN S 2 4 8 …
N 32 16 8 …
SS
A1
A2 D2
A3 D3
D1
EEffective and Fastffective and Fast DWT DWT
The Discretized WT is not a True Discrete TransformDiscrete Wavelet Transform (DWT)
Provides sufficient information both for analysis and synthesis
Reduce the computation time sufficientlyEasier to implementAnalyze the signal at different frequency
bands with different resolutions Decompose the signal into a coarse
approximation and detail information
Decomposition with DWT Decomposition with DWT
Halves the Time ResolutionOnly half number of samples resulted
Doubles the Frequency ResolutionThe spanned frequency band halved
0-1000 Hz
D2: 250-500 Hz
D3: 125-250 Hz
Filter 1
Filter 2
Filter 3
D1: 500-1000 Hz
A3: 0-125 Hz
A1
A2
X[n]512
256
128
64
64
128
256SS
A1
A2 D2
A3 D3
D1
Decomposition of non-Decomposition of non-stationary signalstationary signal
Wavelet: db4
Level: 6
Signal:0.0-0.4: 20 Hz0.4-0.7: 10 Hz0.7-1.0: 2 Hz
fH
fL
Decomposition of non-Decomposition of non-stationary signalstationary signal
Wavelet: db4
Level: 6
Signal:0.0-0.4: 2 Hz0.4-0.7: 10 Hz0.7-1.0: 20Hz
fH
fL
RReconstruction from WTeconstruction from WT
WhatHow those components can be assembled
back into the original signal without loss of information?
A Process After decomposition or analysis.Also called synthesis
HowReconstruct the signal from the wavelet
coefficients Where wavelet analysis involves filtering and
downsampling, the wavelet reconstruction process consists of upsampling and filtering
RReconstruction from WTeconstruction from WT
Lengthening a signal component by inserting zeros between samples (upsampling)MATLAB Commands: idwt and waverec.
Wavelet ApplicationsWavelet Applications
Typical Application Fields Astronomy, acoustics, nuclear engineering, sub-
band coding, signal and image processing, neurophysiology, music, magnetic resonance imaging, speech discrimination, optics, fractals, turbulence, earthquake-prediction, radar, human vision, and pure mathematics applications
Sample ApplicationsDe-noising signalsBreakdown detectingDetecting self-similarityCompressing imagesIdentifying pure tone
Signal De-noisingSignal De-noising
Highest Frequencies Highest Frequencies Appear at the Start of Appear at the Start of The Original Signal The Original Signal Approximations Approximations Appear Less and Less Appear Less and Less NoisyNoisyAlso Lose Also Lose Progressively More Progressively More High-frequency High-frequency Information. Information. In AIn A55, About the First , About the First 20% of the Signal is 20% of the Signal is TruncatedTruncated
Breakdown Detection Breakdown Detection
The Discontinuous Signal Consists of a Slow Sine Wave Abruptly Followed by a Medium Sine Wave.The 1st and 2nd Level Details (D1 and D2) Show the Discontinuity Most Clearly Things to be Detected
The site of the change
The type of change (a rupture of the signal, or an abrupt change in its first or second derivative)
The amplitude of the change
Discontinuity Points
Detecting Self-similarityDetecting Self-similarityPurpose
How analysis by wavelets can detect a self-similar, or fractal, signal.
The signal here is the Koch curve -- a synthetic signal that is built recursively
Analysis If a signal is similar to
itself at different scales, then the "resemblance index" or wavelet coefficients also will be similar at different scales.
In the coefficients plot, which shows scale on the vertical axis, this self-similarity generates a characteristic pattern.
Image CompressionImage Compression
FingerprintsFBI maintains a large
database of fingerprints — about 30 million sets of them.
The cost of storing all this data runs to hundreds of millions of dollars.
ResultsValues under the threshold
are forced to zero, achieving about 42% zeros while retaining almost all (99.96%) the energy of the original image.
By turning to wavelets, the FBI has achieved a 15:1 compression ratio
better than the more traditional JPEG compression
Identifying Pure ToneIdentifying Pure Tone
Purpose Resolving a signal into
constituent sinusoids of different frequencies
The signal is a sum of three pure sine waves
Analysis D1 contains signal components
whose period is between 1 and 2.
Zooming in on detail D1 reveals that each "belly" is composed of 10 oscillations.
D3 and D4 contain the medium sine frequencies.
There is a breakdown between approximations A3 and A4 -> The medium frequency been subtracted.
Approximations A1 to A3 be used to estimate the medium sine.
Zooming in on A1 reveals a period of around 20.
Empirical Mode Empirical Mode DecompositionDecomposition
PrinciplePrincipleObjective — From one observation of x(t), get a AM-FM
type representation :
K
x(t) = Σ ak(t) Ψk(t) k=1
with ak(.) amplitude modulating functions and Ψk(.) oscillating functions.Idea — “signal = fast oscillations superimposed to slow oscillations”.
Operating mode — (“EMD”, Huang et al., ’98) (1) identify locally in time, the fastest oscillation ; (2) subtract it from the original signal ; (3) iterate upon the residual.
0 1
-1
0
1
0 1
-1
0
1
0 1
0
A LF sawtooth
A linear FM
+
=
Empirical Mode Empirical Mode DecompositionDecomposition
PrinciplePrinciple
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
First Intrinsic Mode Function
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Second Intrinsic Mode Function
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Third Intrinsic Mode Function
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Residu
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
SIFTING
PROCESS
Signal
1st Intrinsic Mode Function
2nd Intrinsic Mode Function
3rd Intrinsic Mode Function
Residu
Empirical Mode DecompositionEmpirical Mode DecompositionAlgorithmic definitionAlgorithmic definition
Empirical Mode DecompositionEmpirical Mode DecompositionIntrinsic Mode FunctionsIntrinsic Mode Functions
— Quasi monochromatic harmonic oscillations
#{zero crossing} = #{extrema} ± 1 symmetric envelopes around the y=0 axis
— IMF ≠ Fourier mode and, in nonlinear situations, IMF = several Fourier modes
— Output of a self-adaptive time-varying filter (≠ standard linear filter)
ex: 2 sinus FM + gaussian wave packet
Empirical Mode DecompositionEmpirical Mode DecompositionInstantaneous frequency (IF)Instantaneous frequency (IF)
— Analytic version of each IMF Ci(t) is computed using Hilbert transform as:
— hence zi(t) becomes complex with phase and amplitude. Then IF can be computed as:
-Hilbert spectrum (HS) is a triplet as is a triplet as HH((,t,t))=={{tt, , ii((tt), ), aaii((tt))}}
)()()]([)()( tjiiii
ietatCjHtCtz
dt
tdt i
i)(
)(
Empirical Mode DecompositionEmpirical Mode DecompositionIntrinsic Mode FunctionsIntrinsic Mode Functions
Signal
time
frequency
Spectrum
Time-Frequency representation
Signal
Empirical Mode DecompositionEmpirical Mode DecompositionIntrinsic Mode FunctionsIntrinsic Mode Functions
Empirical Mode DecompositionEmpirical Mode DecompositionIntrinsic Mode FunctionsIntrinsic Mode Functions
Signal
time
frequency
1st IMF
3rd IMF2nd IMF
Empirical Mode DecompositionEmpirical Mode DecompositionIntrinsic Mode FunctionsIntrinsic Mode Functions
•EMD is fully data adaptive decomposition EMD is fully data adaptive decomposition for spectral and time-frequency for spectral and time-frequency representation of non-linear and non-representation of non-linear and non-stationary time seriesstationary time series
•It does not employ any basis function for It does not employ any basis function for decompositiondecomposition
•It produces perfect localization of the It produces perfect localization of the signal components in high resolution time-signal components in high resolution time-frequency space of the time seriesfrequency space of the time series
Time-frequency representation of two pure tones (100Hz and 250Hz) using HS and STFT
Hilbert spectrum (HS) STFT
Empirical Mode DecompositionEmpirical Mode DecompositionComparison between HS and STFTComparison between HS and STFT
Empirical Mode DecompositionEmpirical Mode DecompositionComparison between Wavelet and HSComparison between Wavelet and HS
WaveletWavelet Hilbert spectrum Hilbert spectrum (HS)(HS)
Remarks on FTRemarks on FT
•Fourier Transform has a mathematical Fourier Transform has a mathematical foundationfoundation
•Can be used in robust analysis having Can be used in robust analysis having phase information phase information
•The detail signal information is limited The detail signal information is limited with the basis (sinusoid) functionwith the basis (sinusoid) function
•STFT analysis includes some addition STFT analysis includes some addition cross-spectral energy that cross-spectral energy that
degrades degrades the performance in some the performance in some applicationsapplications
Remarks on WTRemarks on WT
•WT employs data adaptive basis function WT employs data adaptive basis function base base on its time and frequency scaleson its time and frequency scales•It can produce more detail signal It can produce more detail signal information information in T-F representationin T-F representation•WT also perform well in multi-band WT also perform well in multi-band decomposition decomposition •The reconstruction error of multi-band The reconstruction error of multi-band representation is much less than the FTrepresentation is much less than the FT•It can not preserve the phase information It can not preserve the phase information for for perfect reconstruction from T-F spaceperfect reconstruction from T-F space
Remarks on EMD and HSRemarks on EMD and HS
•EMD is fully adaptive multi-band EMD is fully adaptive multi-band decomposition methoddecomposition method•It produces the perfect localization of It produces the perfect localization of signal signal components in T-F spacecomponents in T-F space•HS can represent the instantaneous HS can represent the instantaneous spectra of spectra of the signalthe signal•The signal can be reconstructed with The signal can be reconstructed with negligible negligible error termserror terms•It does not have mathematical foundation It does not have mathematical foundation yetyet•It is difficult to use EMD based It is difficult to use EMD based decomposition decomposition in robust analysisin robust analysis
Application of DSP in audio Application of DSP in audio analysisanalysis
•Audio source separation from Audio source separation from mixture mixture using independent subspace using independent subspace
analysis (ISA) analysis (ISA) •Audio source separation by spatial Audio source separation by spatial
localization in underdetermined localization in underdetermined casecase•Robust pitch estimation using EMDRobust pitch estimation using EMD
Application of DSP in audio Application of DSP in audio analysisanalysis
Audio source separation from Audio source separation from mixture using independent mixture using independent subspace subspace analysis (ISA) analysis (ISA)
Source separation by Independent Source separation by Independent subspace analysis (ISA)subspace analysis (ISA)
STFTMSTFTM
s(t)s(t)
Audio mixtureAudio mixture
PCAPCA
Basis vector selectionBasis vector selection
ICAICA
Basis Basis vector vector
clusteringclustering
ISTFTISTFT
Individual Individual sourcessources
STFTSTFT Source Source spectrogramsspectrograms
Short Time Fourier Transform (STFT)Short Time Fourier Transform (STFT)
Mixture Mixture AudioAudio
Magnitude Spectrogram Magnitude Spectrogram XX
Phase Information
windowwindow 30ms 30ms
OverlapOverlap20ms20ms
T-F representation of mixtureT-F representation of mixture
Proposed separation modelProposed separation model
Mixture spectrogram X= xi
xi=BiAi
Bi Invariant frequency n-component basis
Ai Corresponding amplitude envelope
Ai=BiTX, Bi=XAi
T
** To find independent Bi or Ai
Source Source SpectrogramsSpectrograms
]......,[ )()(2
)(1
in
iii bbbB
Tin
iii aaaA ]........,[ )()(
2)(
1
Dimension Reduction Dimension Reduction
Rows or columns of X number of sourcesSubject to reduce the dimension of XSingular value decomposition (SVD) is used
Xnk=UnnSnkVkkT
-U and V orthogonal matrices (column-wise)-S diagonal matrix of elements (singular values)1 2 3 .… n 0
p basis vectors (from U or V) are selected by setting =0.5 to 0.6 in inequality
p
iin
ii
1
1
1
Proposed separation modelProposed separation model
To derive the basis vectorsSingular value decomposition (SVD) is applied as PCA
Some principal components are selected as basis vectors
Independent component analysis (ICA) is applied to make the bases independent
Independent basis vectors Independent basis vectors
before ICA after ICA
**The bases Independent along time frames
Producing source subspacesProducing source subspaces
The bases of the speech signal
Time basesTime bases Frequency basesFrequency bases
Source Subspaces (cont.)Source Subspaces (cont.)
Mixture SpectrogramMixture Spectrogram
PCA+Basis Selection+ICAPCA+Basis Selection+ICA
BB AA
KLd based clusteringKLd based clustering
BB11AA11 BB22AA22
Basis vectorsBasis vectors
Source SubspacesSource Subspaces
Source Source SpectrogramsSpectrograms
Source re-synthesisSource re-synthesis
Separated subspacesSeparated subspaces
(spectrograms) (spectrograms)
Append phase Append phase informationinformation
InverseInverse STFTSTFT
Mixture of speech & bip-bip soundMixture of speech & bip-bip sound
Separated speechSeparated speech
Separated bip-bip soundSeparated bip-bip sound
)],([.),( knjii exknS
Experimental resultsExperimental results
Separated signals with proposed algorithm
mixtures separated
Speech+bip-bipSpeech+bip-bip
Male+female speechMale+female speech
Application of DSP in audio Application of DSP in audio analysisanalysis
Audio source separation by Audio source separation by spatial localization in spatial localization in
underdetermined caseunderdetermined case
Localization based separationLocalization based separation
To avoid the spectral dependency and signal content in separationTo increase the number of sourcesThe spatial location is considered
The use of Binaural mixtures instead of single mixture
Localization based Localization based separation (cont.)separation (cont.)
Consider a multi-source audio situationHuman can easily localize and separate the sources by HAS (human auditory system)The binaural cues ITD and ILD are mainly used in source localizationSeparation is performed by applying Beamforming and Binary mask
Source localization CuesSource localization Cues
• Interaural time difference (ITD) between two microphones’ signals (like two ears of human)
• Interaural level difference (ILD)
ITD ILD
Source localizationSource localization
Xr() and Xl() are STFT of xr(t) and xl(t)
ITD and ILD are calculated as
where r() and l() are unwrap phase of Xr() and Xl() respectively at frequency
)}()({1
)(
rlITD
|)(|
|)(|log20)(
r
lILD X
X
Source localization (cont.)Source localization (cont.)ITD becomes ambiguous at higher frequency (factor of mics` spacing)
ILD dominates to resolve the problem
ITDITD ITDITD
At low frequencyAt low frequency At high frequencyAt high frequency
Source localization (cont.)Source localization (cont.)
ITD and ILD are quantized into 50 levelsCollection of T-F points corresponding to each ITD/ILD quantized pair produces peaks
Separation by beamformingSeparation by beamforming
ITD is derived for each of the localized sourcesSpatial Beamforming is appliedLinearly constrained minimum variance Beamforming (LCMVB) is usedThe gain is selected based on the spatial locations
Separation by binary mask Separation by binary mask with HSwith HS
It is required to avoid the limitations of spatial beamforming Separation is performed by binary mask estimation based on ITD/ILDThe sources are considered as disjoint orthogonal in T-F space not more than one source is active at any T-F point
Computing ITD and ILDComputing ITD and ILDEach mixture is transformed to T-F domain using Hilbert spectrums (HL and HR)ITD and ILD are measured as:
where tf is the time frame
),(
),(,
),(
),(1),(),,(
fL
fR
fR
fLff tH
tH
tH
tHtILDtITD
2
2
),(
),(10log20),(
fL
fR
fdBtH
tHtILD
ITD-ILD Space LocalizationITD-ILD Space Localization
ITD and ILD are quantized into 50 levelsCollection of T-F points from HS corresponding to each ITD/ILD quantized pair produces peaks
Source SeparationSource Separation
Each peak region in the histogram refers to a source of the binaural mixturesConstruct a binary mask (nullifying T-F points of interfering sources) Mi(,t)The HS of ith source is separated as
Time domain ith source is given as
),(),(),( tHtMtH Lii
)],(cos[),()( ttHts ii
Source disjoint Source disjoint orthogonalityorthogonality
Disjoint orthogonality (DO) of audio sources assumes that not more than one source is active at any T-F point
where F1 and F2 are TFR of two signals
SIR (signal to interference ratio) is used as the basis to measure DO
ttFtF ,;0),(),( 21
Source disjoint orthogonality Source disjoint orthogonality (cont.)(cont.)
ss11 s s2 2 s s33 Three audio sourcesThree audio sources
MicrophoMicrophonesnesTFRTFRFrequencFrequenc
yy
TimeTime
ss11
ss22
ss33
Source disjoint orthogonality Source disjoint orthogonality (cont.)(cont.)
The SIR of the jth source is defined as:
Yj sum of interfering sources
N
jii
ij
jt j
jj
tXtY
tYtY
tXSIR
1
),(),(
0),(;),(
),(
Source disjoint orthogonality Source disjoint orthogonality (cont.)(cont.)
Dimensions of HS and STFT of same signal may be different
DO is defined as the percentage over the entire TFR region
Average DO (ADO) of all sources is
N number of sources
N
j
jSIRN
ADO1
1
Experimental resultsExperimental results
The three mixtures are defined as m1{sp1(-40, 0), sp2(30, 0), ft(0, 0)}, m2{sp1(20, 10), sp2(0, 10), ft(-10,10)}, m3{sp1(40, 20), sp2(30, 20), ft(-20, 20)}
The separation efficiency is measured as OSSR (original to separated signal ratio) defined as:
T
tw
i
separated
w
i
original
its
its
TOSSR
1
1
2
1
2
)(
)(
10log1
Experimental results Experimental results (cont.)(cont.)
The comparative separation efficiency (OSSR) using HS and STFT :
Mixtures TFR OSSR of sp1 OSSR of sp2 OSSR of ft
m1 HS -0.0271 0.0213 0.0264
STFT 0.0621 -0.0721 -0.0531
m2 HS 0.0211 -0.0851 -0.0872
STFT 0.0824 0.1202 0.1182
m3 HS 0.0941 -0.0832 0.0225
STFT -0.1261 0.1092 -0.0821
Experimental resultsExperimental results
This experiment also compares the DO using HS and STFT as TFR
STFT is affected by many factors window function and its length, overlapping, FFT points
HS is independent of such factors
It is slightly affected by the number of frequency bins used in TFR
Experimental results (cont.)Experimental results (cont.)
The ADO of HS and STFT as a function of number of frequency bins (N=3):
Experimental results (cont.)Experimental results (cont.)
The ADO of only STFT is affected by the factor of window overlapping (%)
Experimental results (cont.)Experimental results (cont.)
STFT includes more cross-spectral energy terms
The TFR of two pure tones using HS and STFT
Experimental results (cont.)Experimental results (cont.)
Always HS has better DO for audio signalsDO depends on the resolution of TFRSTFT has to satisfy the inequality
The frequency resolution of HS is up to Nyquist frequencyIts time resolution is up to sampling rate and hence offers better resolution
2
1 t
RemarksRemarks
The separation efficiency is independent The separation efficiency is independent of the signal’s spectral characteristicsof the signal’s spectral characteristics
The performance is affected by the apart The performance is affected by the apart angles and disjointness of the sourcesangles and disjointness of the sources
HS produces better disjointness in HS produces better disjointness in T-FT-F domain and hence better separationdomain and hence better separation
The Binaural mixtures are recorder in The Binaural mixtures are recorder in anechoic room of NTTanechoic room of NTT
Application of DSP in audio Application of DSP in audio analysisanalysis
Robust pitch estimation Robust pitch estimation using EMDusing EMD
Why EMD in pitch estimation?Why EMD in pitch estimation?
Pitch facilitates speech coding, enhancement, recognition etc. Autocorrelation function is mostly used in pitch estimation algorithmAutocorrelation (AC) function- recalls the periodic property of the speech
EMD in pitch estimation (cont.)EMD in pitch estimation (cont.)
EMD in pitch estimation (cont.)EMD in pitch estimation (cont.)
Pitch is the sample difference between two consecutive peaks in AC functionSometimes the pitch peak may be less prominent specially due to noise
EMD in pitch estimation (cont.)EMD in pitch estimation (cont.)
EMD decomposes any signal into higher to lower frequency componentIt produces the local and global oscillations of the signalThe global oscillation almost represents the envelop of the signalThe IMF of global oscillation is used to estimate the pitch
Pitch estimation with EMDPitch estimation with EMD
Pitch estimation with EMD Pitch estimation with EMD (cont.)(cont.)
There exists an IMF in EMD There exists an IMF in EMD domain representing the global domain representing the global oscillation of the AC function oscillation of the AC function
That IMF represents the sinusoid That IMF represents the sinusoid of the pitch periodof the pitch period
Pitch is the frequency of that IMF Pitch is the frequency of that IMF rather than finding the pitch peakrather than finding the pitch peak
Pitch estimation with EMD (cont.)Pitch estimation with EMD (cont.)
In EMD, IMF-5 is the oscillation of In EMD, IMF-5 is the oscillation of pitch periodpitch period
It is a crucial step to determine the It is a crucial step to determine the target IMF representing the target IMF representing the sinusoid with pitch periodsinusoid with pitch period
The IMF of low frequency oscillation The IMF of low frequency oscillation (than pitch period) can be (than pitch period) can be discarded by energy thresholdingdiscarded by energy thresholding
Pitch estimation with EMD Pitch estimation with EMD (cont.)(cont.)
A reference pitch is computed by weighted AC (WAC) methodSuch pitch information is used to select the IMF with pitch periodThe periodicity of the selected each IMF is computed as pitch period
Pitch estimation with EMD (cont.)Pitch estimation with EMD (cont.)
The peak at zero-lag is selectedTwo cycles are selected from both sides Average samples are the periodicity
Proposed Pitch estimation Proposed Pitch estimation AlgorithmAlgorithm
Normalized autocorrelation (AC) of the speech frame is computedDetermine rough pitch period using WAC method Apply EMD on AC functionSelect the IMF of pitch period on the basis of WAC based method The period of the selected IMF is the estimated pitch
Experimental resultsExperimental results
Keele pitch database is used here20kHz sampling rateFrame length is 25.6ms with 10ms shiftingEach frame is filtered by band-pass filter of pitch range (50-500Hz)Gross pitch error (GPE) is used to measure the performance
Experimental results (cont.)Experimental results (cont.)
The %GPE of male and female speech with different SNR are presented hereTotal number of frames is 1823
SNR 30dB
20dB
10dB
0dB -5dB -15dB
Female
1.90 2.83 3.93 10.12
21.83
64.24
Male 2.15 3.78 5.22 11.89
23.56
66.76
RemarksRemarks
The use of EMD makes the pitch estimation method more robustEMD of AC function can extract the fundamental oscillation of the signalThe pitch can be easily estimated from the single sinusoid of fundamental oscillationIt is not affected by the prominent non-pitch peak
Future worksFuture works
The open problem is to identify the IMF with pitch periodIn present algorithm the error to estimate pitch roughly in ACF can propagate to the performance of final estimationThe performance is not yet tested with other existing algorithm
Open Problem-1Open Problem-1Instantaneous Pitch (IP) estimation using EMD •Frame based pitch estimation is already doneFrame based pitch estimation is already done•Paper is accepted by EUROSPEECH 2007Paper is accepted by EUROSPEECH 2007
•We have used the pitch information based WAC to We have used the pitch information based WAC to compute the exact pitch (IMF) from EMD spacecompute the exact pitch (IMF) from EMD space•Problem to compute IP only from EMD spaceProblem to compute IP only from EMD space
Three methods Three methods of pitch of pitch estimationestimation
Open Problems-2Open Problems-2Voiced/Unvoiced Detection with EMD •Useful in speech enhancement and speech/speaker Useful in speech enhancement and speech/speaker recog.recog.•Paper with preliminary results is published in Paper with preliminary results is published in ICICT2007ICICT2007
•Problem to derive better separation region for V/UV Problem to derive better separation region for V/UV and to conduct experiment with large speech dataand to conduct experiment with large speech data
V/UV V/UV differentiationdifferentiation
Open Problems-3Open Problems-3Robust Audio Source Localization •Localization is done by delay-attenuation Localization is done by delay-attenuation computed in T-F space of binaural mixtures- NOT computed in T-F space of binaural mixtures- NOT noise robustnoise robust
•The problem is to derive mathematical mode for The problem is to derive mathematical mode for robust localization in underdetermined situationrobust localization in underdetermined situation
Localization of three Localization of three sources using TD-LD sources using TD-LD computed in T-F computed in T-F space space
Open Problems-4Open Problems-4Speech denoising using image processing •Noisy speech can be represented as an image Noisy speech can be represented as an image with time-frequency (T-F) representation e.g. with time-frequency (T-F) representation e.g. SpectrogramSpectrogram
•Image processing algorithm can be used for Image processing algorithm can be used for denoisingdenoising•It seems easy for musical/white noisesIt seems easy for musical/white noises•Problem is to deal with other noise even by using Problem is to deal with other noise even by using binaural mixturesbinaural mixtures
Speech Speech with white with white noisenoise
Open Problems-5Open Problems-5Auditory segmentation with binaural mixtures •Auditory segmentation is the first stage of Auditory segmentation is the first stage of source separation using auditory scene analysis source separation using auditory scene analysis (ASA)(ASA)
•Problem is to use of binaural mixtures for Problem is to use of binaural mixtures for improved auditory segmentation as source improved auditory segmentation as source separationseparation•T-F representation other than FT can be T-F representation other than FT can be employedemployed
Source Source separation separation by ASAby ASA
Open Problems-6Open Problems-6Two stage speech enhancement •Single stage speech enhancement is not efficient Single stage speech enhancement is not efficient in all noisy situationsin all noisy situations•For example, musical noise is introduced with For example, musical noise is introduced with binary masking and some thresholding methods binary masking and some thresholding methods •Noise may not be separated perfectly by using Noise may not be separated perfectly by using ICA, ISA (independent subspace analysis) based ICA, ISA (independent subspace analysis) based techniquestechniques•Multi-stage enhancement with suitable order can Multi-stage enhancement with suitable order can improve the performance improve the performance Noisy Noisy speechspeech
First stage First stage enhancemeenhancementnt
Second Second stage stage enhancemenenhancementt
Clean Clean speecspeechh
Open Problems-7Open Problems-7Informative features extraction •To use spectral dynamics in speech/speaker To use spectral dynamics in speech/speaker recog.recog.•Special type of speech features are requiredSpecial type of speech features are required
•How to parameterized speech signal to represent How to parameterized speech signal to represent speech dynamicsspeech dynamics•WT, HS based spectral analysis can be WT, HS based spectral analysis can be studied>>>studied>>>
Mixed signal Mixed signal with its with its spectrogramspectrogram
Open Problems-8Open Problems-8Source based audio indexing
•Useful in multimedia applications and moving Useful in multimedia applications and moving audio source separationaudio source separation
•Several new method could be used for Several new method could be used for indexing indexing Ada-boost, Tree-ICA, condition Ada-boost, Tree-ICA, condition random fieldrandom field
s1
s2
s3
Audio sources at different azimuth angles
(0 to 180 degree)
1.5m
Separation of Separation of moving moving sourcessources
Open Problems-9Open Problems-9Time-series prediction with EMD•Subject to financial and environment time seriesSubject to financial and environment time series•Conventional methods use Kalman filter (for Conventional methods use Kalman filter (for smoothing) and AR model for predictionsmoothing) and AR model for prediction
•EMD can be used as smoothing filter to enhance EMD can be used as smoothing filter to enhance the prediction accuracythe prediction accuracy
Non-stationary Non-stationary time-seriestime-series
Open Problems-10Open Problems-10Heart-rate analysis with ECG data using EMD
•ECG Variability analysis at different frequency regionECG Variability analysis at different frequency region•Analysis of instantaneous ECG conditionAnalysis of instantaneous ECG condition•Abnormality analysis of heart-rate using EMD based Abnormality analysis of heart-rate using EMD based spectral modeling spectral modeling
Different parts Different parts of ECG signalof ECG signal
The End
Questions/Suggestion Questions/Suggestion PleasePlease
Source SeparationSource Separation
Each peak region in the histogram refers to a source of the stereo mixturesConstruct a binary mask (nullifying TF points of interfering sources) Mi(n,t)The HS of ith source is separated as
Time domain ith source is given as
),(),(),( tnHtnMtnH Lii
n
ii tntnHts )],(cos[),()(
The End
Questions/Suggestion Questions/Suggestion PleasePlease
top related