i-hao hsiao, chun-tang chao*, and hi-jo wang...
TRANSCRIPT
I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent
Technologies and Engineering Systems, Lecture Notes in Electrical Engineering (LNEE), Vol.345, pp.523-
528. (EI Journal, ISBN: 978-3-319-17313-9).
Chapter 67
A HHT-Based Music Synthesizer
I-Hao Hsiao, Chun-Tang Chao, and Chi-Jo Wang
Abstract Synthesizing musical sound plays an important role in modern music
composition. Composers nowadays can easily take advantage of powerful and user-
friendly personal computers to produce the desired musical sound with a goodmusic
synthesis method. In this chapter, the Hilbert-Huang Transform (HHT) time-
frequency analysis method is employed, in an attempt to implement a new efficient
music synthesizer. By applying the HHT technique, the original varying-pitchmusic
signals can be decomposed into several intrinsic mode functions (IMF) based on the
empirical mode decomposition (EMD). The instantaneous amplitude and frequency
of IMFs can be further obtained by Hilbert transform. By extracting the main
spectrum coefficients of the instantaneous amplitude and frequency of the IMFs,
the original musical signal can be reconstructed with little error. Experimental
results indicate the feasibility of the proposed method.
Keywords Music synthesis • Hilbert-Huang transform (HHT) • Empirical mode
decomposition (EMD) • Intrinsic mode function (IMF)
67.1 Introduction
For regular music synthesis methods, the two most popular methods may be the
wavetable music synthesis [1] and FM synthesis [2]. A good music synthesis allows
music creators to synthesize the sound signal accurately and quickly. However, the
two methods have been unable to provide satisfactory quality for high performance
applications.
In recent years, the trend for a musical-tone generator has been based on physical
modeling of sound production mechanisms [3]. The digital waveguide filter [4, 5]
can be applied to simulate a wide class of musical instruments. Figure 67.1 shows
the nonlinear predictive model of an instrument. The excitation unit (Exciter) is the
nonlinear part, responsible for generating an oscillatory signal source. And the
I.-H. Hsiao • C.-T. Chao (*) • C.-J. Wang
Department of Electrical Engineering, Southern Taiwan University of Science
and Technology, Tainan, Taiwan
e-mail: [email protected]
© Springer International Publishing Switzerland 2016
J. Juang (ed.), Proceedings of the 3rd International Conference on IntelligentTechnologies and Engineering Systems (ICITES2014), Lecture Notesin Electrical Engineering 345, DOI 10.1007/978-3-319-17314-6_67
523
resonance unit (Resonator) belongs to the linear filter part, responsible for modu-
lating out the sound signal.
Figure 67.2 shows a simple model-based structure implemented by IIR (infinite
impulse response) synthesis, consisting of a prediction filter and a delay line to
synthesize tones produced by instruments [6]. The design of the coefficients for the
IIR synthesizer is accomplished by using a neural network (NN)-based training
algorithm. A recurrent NN (RNN) is applied for the prediction filter design.
However, such kinds of design approaches can be time-consuming during the
training process.
Huang et al. [7] in 1998 developed a new method called Hilbert-Huang Trans-
form (HHT) for analyzing nonlinear and nonstationary data. The HHT should be
more powerful and suitable in timbre analysis when compared with traditional
Short-Time Fourier Transform (STFT). Through the understanding of the HHT
method, this chapter proposes a more efficient HHT-Based Music Synthesizer.
67.2 The HHT and EMD
The HHT was pioneered by Huang et al., for adaptively representing nonstationary
signals as sums of zero-mean amplitude modulation frequency modulation
components. The Fourier Transform views the signal as a combination of many
HLP
ResonatorExciter
RBFNetwork
Delay Line u[n]HAP
Nonlinear Predictor
Fig. 67.1 The nonlinear
predictive model of an
instrument
Synthetic output
wavetable
y1
w0,1
z–1 z–1 z–1
w0,2 w0,3 w0,N
y2y3
yNy0
Fig. 67.2 The nonlinear predictive model of an instrument
524 I.-H. Hsiao et al.
fixed-frequency and fixed-amplitude sinusoids. The HHT regards the signal as a
combination of many intrinsic mode functions (IMF), which have time-varying
frequency (instantaneous frequency) and time-varying amplitude (instantaneous
amplitude) [8]. Thus, the HHT provides a more powerful analysis and synthesis tool
for the pitch and timbre of amusic sound. In this section, theHHT andEMDare briefly
introduced.
There are two steps in the HHT: (1) For a given signal x(t), extract the IMFs
by means of empirical mode decomposition (EMD); and (2) apply the Hilbert
Transform on each IMF to get the corresponding instantaneous frequency and
amplitude. Step 1 is iteratively finished until the residue becomes a monotonic
function or a function with only one cycle from which no more IMFs can be
extracted. Equation (67.1) shows the decomposition of the x(t) into N-empirical
modes, where cj(t) is the jth IMF and rN(t) is the final residue.
x tð Þ ¼XNj¼1
c j tð Þ þ rN tð Þ ð67:1Þ
In Step 2, the Hilbert Transform is utilized to obtain an analytic complex
representation z(t) for each IMF c(t), as shown in (67.2), where d(t) is the Hilbert
Transform of c(t). The instantaneous amplitude and instantaneous phase are
denoted as a(t) and θ(t), respectively.
z tð Þ ¼ c tð Þ þ id tð Þ ¼ a tð Þeiθ tð Þ ð67:2Þ
Then the original signal x(t) can be represented as
x tð Þ ¼ ReXNj¼1
a j tð Þei
ðω j τð Þdτ
8><>:
9>=>; ð67:3Þ
where ω tð Þ ¼ dθ tð Þdt is the instantaneous frequency and Re denotes real part.
Equation (67.3) shows the difference between the HHT and the Discrete Fourier
Transform. In the HHT, each component is considered as time-varying amplitude
and time-varying frequency sinusoid. For brevity, the instantaneous frequency
and the instantaneous amplitude will be referred to as “IF” and “IA” in the
following text. For each IMF, its corresponding “IF” and “IA” can be calculated.
On the contrary, the IMF can be reconstructed by its corresponding “IF” and “IA.”
In this chapter, the FFT will be applied to the “IF” and “IA,” instead of directly
applied to the IMF. Using this approach, less FFT coefficients are needed and will
yield better synthesis performance.
67 A HHT-Based Music Synthesizer 525
67.3 Simulation Results
The simulation was implemented in the MATLAB environment. Different sound
signals, including piano, trumpet, violin, and bird chirps, are provided. Figure 67.3
shows the EMD analysis of trumpet music (pitch A4 or A440), including the
original signal, IMF1–IMF8, and the final residue.
For each IMF, the corresponding “IF” and “IA” can be obtained. Figure 67.4
shows the “IA” analysis for each IMF. In the proposed method, only the first four
IMFs (IMF1–IMF4) are considered and the latter IMFs (IMF5–IMF11) are omitted.
The FFT is applied to “IF” and “IA,” and 128 main coefficients for “IF” and “IA,”
respectively, are selected. Thus for the first four IMFs, all the 1,024 FFT coeffi-
cients are stored. The proposed synthesis method is compared with the original
sound with direct 1,024-point FFT analysis, to demonstrate its efficiency and
feasibility. Table 67.1 shows the synthesis error comparison under the same coef-
ficients number 1,024 for different instruments, where the error is measured by the
Euclidean distance defined in (67.4).
d x; yð Þ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXni¼1
xi � yið Þ2s
ð67:4Þ
sign
alim
f1im
f2im
f3im
f4im
f5im
f6im
f7im
f8im
f9im
f10
imf1
1re
s.si
gnal
imf1
imf2
imf3
imf4
imf5
imf6
imf7
imf8
imf9
imf1
0im
f11
res.
Fig. 67.3 EMD analysis of trumpet music (A4)
526 I.-H. Hsiao et al.
0 5000 100002
3
4
5
Sample
Ampl
itude
imf1 IA
0 5000 10000
0.5
1
1.5
Sample
Ampl
itude
imf2 IA
0 5000 100000.4
0.6
0.8
1
1.2
Sample
Ampl
itude
imf3 IA
0 5000 10000
0.1
0.2
0.3
0.4
Sample
Ampl
itude
imf4 IA
0 5000 10000
0.1
0.2
0.3
0.4
Sample
Ampl
itude
imf5 IA
0 5000 10000
0.05
0.1
0.15
0.2
Sample
Ampl
itude
imf6 IA
0 5000 10000
0.01
0.02
0.03
0.04
0.05
Sample
Ampl
itude
imf7 IA
0 5000 10000
0.05
0.1
0.15
Sample
Ampl
itude
imf8 IA
11
l
Fig. 67.4 Instantaneous amplitude (IA) analysis for each IMF
67 A HHT-Based Music Synthesizer 527
67.4 Conclusion
This chapter presents a music synthesizer based on the HHT. For some advanced
model-based approaches, the procedure may be tedious and time-consuming for
parameter learning. Since most practical music sounds are not stationary, especially
in the beginning of the timbre, the conventional Fourier Transform cannot be
expected to realistically synthesize the music sounds. The HHT is an advanced
signal-processing technique for analyzing nonlinear and nonstationary time series
data. The signal is first segregated into narrow band components, the IMFs, by
performing EMD. The Hilbert transform is then applied on each mode to obtain the
respective instantaneous frequency and the amplitude. By extracting the main FFT
coefficients of the instantaneous frequency and the amplitude for each IMF, the
original signal can be restored in a good performance. Simulation results show
the feasibility of the proposed synthesis method. Further improvement should be
developed for practical applications.
References
1. Robert, B.J.: Wavetable synthesis 101, a fundamental perspective. In: Proceedings 101st
Convention of the Audio Engineering Society, Los Angeles (1996)
2. John, M.: Chowning: the synthesis of complex audio spectra by means of frequency modulation.
Comput. Music J. 1(2), 46–54 (1977)
3. Drioli, C., Rocchesso, D.: A generalized musical-tone generator with application to sound
compression and synthesis. In: Proceeding of the IEEE International Conference on Acoustics,
Speech, and Signal Processing, Munich, vol. 1, pp. 431–434 (1997)
4. Smith, J.O.: Physical modeling using digital waveguides. Comput. Music J. 16(4), 74–87 (1992)5. Smith, J.O.: Efficient synthesis of stringed musical instruments. In: Proceedings of the
1993 International Computer Music Conference, pp. 64–71, Computer Music Association,
Tokyo (1993)
6. Su, A.W.Y., Liang, S.F.: A new automatic IIR analysis/synthesis technique for plucked-string
instruments. IEEE Trans. Speech Audio Process. 9(7), 747–754 (2001)
7. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C.,
Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and
non-stationary time series analysis. Proc. R. Soc. Lond. Ser. 454(1971), 903–995 (1998)
8. Gloersen, P., Huang, N.E.: Comparison of interannual intrinsic modes in hemispheric sea
ice covers and other geophysical parameters. IEEE Trans. Geosci. Remote Sens. 41(5),1062–1074 (2003)
Table 67.1 Synthesis error
comparisonMethod
The proposed method Direct FFTSound
Piano 0.3407 0.8267
Trumpet 0.5202 0.8229
Violin 0.2152 0.8219
Bird chirps 0.0596 0.2952
528 I.-H. Hsiao et al.