i-hao hsiao, chun-tang chao*, and hi-jo wang...

I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent

Technologies and Engineering Systems, Lecture Notes in Electrical Engineering (LNEE), Vol.345, pp.523-

528. (EI Journal, ISBN: 978-3-319-17313-9).

Chapter 67

A HHT-Based Music Synthesizer

I-Hao Hsiao, Chun-Tang Chao, and Chi-Jo Wang

Abstract Synthesizing musical sound plays an important role in modern music

composition. Composers nowadays can easily take advantage of powerful and user-

friendly personal computers to produce the desired musical sound with a goodmusic

synthesis method. In this chapter, the Hilbert-Huang Transform (HHT) time-

frequency analysis method is employed, in an attempt to implement a new efficient

music synthesizer. By applying the HHT technique, the original varying-pitchmusic

signals can be decomposed into several intrinsic mode functions (IMF) based on the

empirical mode decomposition (EMD). The instantaneous amplitude and frequency

of IMFs can be further obtained by Hilbert transform. By extracting the main

spectrum coefficients of the instantaneous amplitude and frequency of the IMFs,

the original musical signal can be reconstructed with little error. Experimental

results indicate the feasibility of the proposed method.

Keywords Music synthesis • Hilbert-Huang transform (HHT) • Empirical mode

decomposition (EMD) • Intrinsic mode function (IMF)

67.1 Introduction

For regular music synthesis methods, the two most popular methods may be the

wavetable music synthesis [1] and FM synthesis [2]. A good music synthesis allows

music creators to synthesize the sound signal accurately and quickly. However, the

two methods have been unable to provide satisfactory quality for high performance

applications.

In recent years, the trend for a musical-tone generator has been based on physical

modeling of sound production mechanisms [3]. The digital waveguide filter [4, 5]

can be applied to simulate a wide class of musical instruments. Figure 67.1 shows

the nonlinear predictive model of an instrument. The excitation unit (Exciter) is the

nonlinear part, responsible for generating an oscillatory signal source. And the

I.-H. Hsiao • C.-T. Chao (*) • C.-J. Wang

Department of Electrical Engineering, Southern Taiwan University of Science

and Technology, Tainan, Taiwan

e-mail: [email protected]

© Springer International Publishing Switzerland 2016

J. Juang (ed.), Proceedings of the 3rd International Conference on IntelligentTechnologies and Engineering Systems (ICITES2014), Lecture Notesin Electrical Engineering 345, DOI 10.1007/978-3-319-17314-6_67

523

mailto:[email protected]

resonance unit (Resonator) belongs to the linear filter part, responsible for modu-

lating out the sound signal.

Figure 67.2 shows a simple model-based structure implemented by IIR (infinite

impulse response) synthesis, consisting of a prediction filter and a delay line to

synthesize tones produced by instruments [6]. The design of the coefficients for the

IIR synthesizer is accomplished by using a neural network (NN)-based training

algorithm. A recurrent NN (RNN) is applied for the prediction filter design.

However, such kinds of design approaches can be time-consuming during the

training process.

Huang et al. [7] in 1998 developed a new method called Hilbert-Huang Trans-

form (HHT) for analyzing nonlinear and nonstationary data. The HHT should be

more powerful and suitable in timbre analysis when compared with traditional

Short-Time Fourier Transform (STFT). Through the understanding of the HHT

method, this chapter proposes a more efficient HHT-Based Music Synthesizer.

67.2 The HHT and EMD

The HHT was pioneered by Huang et al., for adaptively representing nonstationary

signals as sums of zero-mean amplitude modulation frequency modulation

components. The Fourier Transform views the signal as a combination of many

HLP

ResonatorExciter

RBFNetwork

Delay Line u[n]HAP

Nonlinear Predictor

Fig. 67.1 The nonlinear

predictive model of an

instrument

Synthetic output

wavetable

y1

w0,1

z–1 z–1 z–1

w0,2 w0,3 w0,N

y2y3

yNy0

Fig. 67.2 The nonlinear predictive model of an instrument

524 I.-H. Hsiao et al.

fixed-frequency and fixed-amplitude sinusoids. The HHT regards the signal as a

combination of many intrinsic mode functions (IMF), which have time-varying

frequency (instantaneous frequency) and time-varying amplitude (instantaneous

amplitude) [8]. Thus, the HHT provides a more powerful analysis and synthesis tool

for the pitch and timbre of amusic sound. In this section, theHHT andEMDare briefly

introduced.

There are two steps in the HHT: (1) For a given signal x(t), extract the IMFs

by means of empirical mode decomposition (EMD); and (2) apply the Hilbert

Transform on each IMF to get the corresponding instantaneous frequency and

amplitude. Step 1 is iteratively finished until the residue becomes a monotonic

function or a function with only one cycle from which no more IMFs can be

extracted. Equation (67.1) shows the decomposition of the x(t) into N-empirical

modes, where cj(t) is the jth IMF and rN(t) is the final residue.

x tð Þ ¼XNj¼1

c j tð Þ þ rN tð Þ ð67:1Þ

In Step 2, the Hilbert Transform is utilized to obtain an analytic complex

representation z(t) for each IMF c(t), as shown in (67.2), where d(t) is the Hilbert

Transform of c(t). The instantaneous amplitude and instantaneous phase are

denoted as a(t) and θ(t), respectively.

z tð Þ ¼ c tð Þ þ id tð Þ ¼ a tð Þeiθ tð Þ ð67:2Þ

Then the original signal x(t) can be represented as

x tð Þ ¼ ReXNj¼1

a j tð Þei

ðω j τð Þdτ

8><>:

9>=>; ð67:3Þ

where ω tð Þ ¼ dθ tð Þdt is the instantaneous frequency and Re denotes real part.

Equation (67.3) shows the difference between the HHT and the Discrete Fourier

Transform. In the HHT, each component is considered as time-varying amplitude

and time-varying frequency sinusoid. For brevity, the instantaneous frequency

and the instantaneous amplitude will be referred to as “IF” and “IA” in the

following text. For each IMF, its corresponding “IF” and “IA” can be calculated.

On the contrary, the IMF can be reconstructed by its corresponding “IF” and “IA.”

In this chapter, the FFT will be applied to the “IF” and “IA,” instead of directly

applied to the IMF. Using this approach, less FFT coefficients are needed and will

yield better synthesis performance.

67 A HHT-Based Music Synthesizer 525

67.3 Simulation Results

The simulation was implemented in the MATLAB environment. Different sound

signals, including piano, trumpet, violin, and bird chirps, are provided. Figure 67.3

shows the EMD analysis of trumpet music (pitch A4 or A440), including the

original signal, IMF1–IMF8, and the final residue.

For each IMF, the corresponding “IF” and “IA” can be obtained. Figure 67.4

shows the “IA” analysis for each IMF. In the proposed method, only the first four

IMFs (IMF1–IMF4) are considered and the latter IMFs (IMF5–IMF11) are omitted.

The FFT is applied to “IF” and “IA,” and 128 main coefficients for “IF” and “IA,”

respectively, are selected. Thus for the first four IMFs, all the 1,024 FFT coeffi-

cients are stored. The proposed synthesis method is compared with the original

sound with direct 1,024-point FFT analysis, to demonstrate its efficiency and

feasibility. Table 67.1 shows the synthesis error comparison under the same coef-

ficients number 1,024 for different instruments, where the error is measured by the

Euclidean distance defined in (67.4).

d x; yð Þ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXni¼1

xi � yið Þ2s

ð67:4Þ

sign

alim

f1im

f2im

f3im

f4im

f5im

f6im

f7im

f8im

f9im

f10

imf1

1re

s.si

gnal

imf1

imf2

imf3

imf4

imf5

imf6

imf7

imf8

imf9

imf1

0im

f11

res.

Fig. 67.3 EMD analysis of trumpet music (A4)


0 5000 100002

3

4

5

Sample

Ampl

itude

imf1 IA

0 5000 10000

0.5

1

1.5

Sample

Ampl

itude

imf2 IA

0 5000 100000.4

0.6

0.8

1

1.2

Sample

Ampl

itude

imf3 IA

0 5000 10000

0.1

0.2

0.3

0.4

Sample

Ampl

itude

imf4 IA

0 5000 10000

0.1

0.2

0.3

0.4

Sample

Ampl

itude

imf5 IA

0 5000 10000

0.05

0.1

0.15

0.2

Sample

Ampl

itude

imf6 IA

0 5000 10000

0.01

0.02

0.03

0.04

0.05

Sample

Ampl

itude

imf7 IA

0 5000 10000

0.05

0.1

0.15

Sample

Ampl

itude

imf8 IA

11

l

Fig. 67.4 Instantaneous amplitude (IA) analysis for each IMF

67 A HHT-Based Music Synthesizer 527

67.4 Conclusion

This chapter presents a music synthesizer based on the HHT. For some advanced

model-based approaches, the procedure may be tedious and time-consuming for

parameter learning. Since most practical music sounds are not stationary, especially

in the beginning of the timbre, the conventional Fourier Transform cannot be

expected to realistically synthesize the music sounds. The HHT is an advanced

signal-processing technique for analyzing nonlinear and nonstationary time series

data. The signal is first segregated into narrow band components, the IMFs, by

performing EMD. The Hilbert transform is then applied on each mode to obtain the

respective instantaneous frequency and the amplitude. By extracting the main FFT

coefficients of the instantaneous frequency and the amplitude for each IMF, the

original signal can be restored in a good performance. Simulation results show

the feasibility of the proposed synthesis method. Further improvement should be

developed for practical applications.

References

1. Robert, B.J.: Wavetable synthesis 101, a fundamental perspective. In: Proceedings 101st

Convention of the Audio Engineering Society, Los Angeles (1996)

2. John, M.: Chowning: the synthesis of complex audio spectra by means of frequency modulation.

Comput. Music J. 1(2), 46–54 (1977)

3. Drioli, C., Rocchesso, D.: A generalized musical-tone generator with application to sound

compression and synthesis. In: Proceeding of the IEEE International Conference on Acoustics,

Speech, and Signal Processing, Munich, vol. 1, pp. 431–434 (1997)

4. Smith, J.O.: Physical modeling using digital waveguides. Comput. Music J. 16(4), 74–87 (1992)5. Smith, J.O.: Efficient synthesis of stringed musical instruments. In: Proceedings of the

1993 International Computer Music Conference, pp. 64–71, Computer Music Association,

Tokyo (1993)

6. Su, A.W.Y., Liang, S.F.: A new automatic IIR analysis/synthesis technique for plucked-string

instruments. IEEE Trans. Speech Audio Process. 9(7), 747–754 (2001)

7. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C.,

Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and

non-stationary time series analysis. Proc. R. Soc. Lond. Ser. 454(1971), 903–995 (1998)

8. Gloersen, P., Huang, N.E.: Comparison of interannual intrinsic modes in hemispheric sea

ice covers and other geophysical parameters. IEEE Trans. Geosci. Remote Sens. 41(5),1062–1074 (2003)

Table 67.1 Synthesis error

comparisonMethod

The proposed method Direct FFTSound

Piano 0.3407 0.8267

Trumpet 0.5202 0.8229

Violin 0.2152 0.8219

Bird chirps 0.0596 0.2952


i-hao hsiao, chun-tang chao*, and hi-jo wang...

Documents