enhanced vertical perception through head-related ... - sdac...

12
IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER Enhanced Vertical Perception through Head-Related Impulse Response Customization Based on Pinna Response Tuning in the Median Plane Ki Hoon SHIN a) , Nonmember and Youngjin PARK †† , Member SUMMARY Human’s ability to perceive elevation of a sound and dis- tinguish whether a sound is coming from the front or rear strongly depends on the monaural spectral features of the pinnae. In order to realize an ef- fective virtual auditory display by HRTF (head-related transfer function) customization, the pinna responses were isolated from the median HRIRs (head-related impulse responses) of 45 individual HRIRs in the CIPIC HRTF database and modeled as linear combinations of 4 or 5 basic tem- poral shapes (basis functions) per each elevation on the median plane by PCA (principal components analysis) in the time domain. By tuning the weight of each basis function computed for a specific height to replace the pinna response in the KEMAR HRIR at the same height with the resulting customized pinna response and listening to the filtered stimuli over head- phones, 4 individuals with normal hearing sensitivity were able to create a set of HRIRs that outperformed the KEMAR HRIRs in producing verti- cal eects with reduced front/back ambiguity in the median plane. Since the monaural spectral features of the pinnae are almost independent of az- imuthal variation of the source direction, similar vertical eects could also be generated at dierent azimuthal directions simply by varying the ITD (interaural time dierence) according to the direction as well as the size of each individual’s own head. key words: HRTF customization, HRIR, pinna response tuning, principal components analysis 1. Introduction The ability of humans to use sonic cues to localize a sound in the surrounding 3 dimensional space is referred to as audi- tory localization. At its very core, lies the head-related trans- fer function (HRTF) which comprises major cues for spatial hearing such as the ITD (interaural time dierence), ILD (interaural level dierence), and spectral modification in- duced by the pinna folds. Synthesis of spatial hearing based on HRTFs is of great practical and research importance and non-individualized HRTFs measured with a dummy head microphone system (the KEMAR for instance) are used for most virtual audio syntheses. However, subjective evalua- tions on these non-individualized HRTFs involving a group of individuals often report front/back reversal and poor ver- tical eects. Both front/back distinction and vertical perception for humans are mainly triggered by the spectral features (peaks Manuscript received April 9, 2007. Manuscript revised June 29, 2007. The author is with Samsung Electronics, Suwon-City, 443- 742, Republic of Korea. †† The author is with KAIST, Science Town, Daejeon, 305-701, Republic of Korea. a) E-mail: [email protected] DOI: 10.1093/ietfec/e91–a.1.345 and notches) produced by the direction-dependent filtering of the pinna as described by Shaw and Teranishi [1]. In particular, the importance of spectral notches (or nulls) as localization cues in the median plane (0 azimuth) is sup- ported by Blauert [2] and also by Hebrank and Wright [3]. They concluded that elevation in the median plane where both ITD and ILD are zero is cued by a spectral notch whose frequency has similar dependence on elevation as that pre- viously observed by Shaw and Teranishi in the lateral plane. Further results confirmed this conclusion both in the median plane [4] and in the lateral plane [5]. In an attempt to ex- plain such a prominent feature in HRTFs, Lopez-Poveda and Meddis [6] suggested a diraction/reflection model based on the posterior wall of the human concha and was able to predict the notch frequencies with reasonable accuracy. More recently, Langendijk and Bronkhorst [7] were able to isolate the frequency bands responsible for front/back and up/down cues in human HRTFs via a series of subjective listening tests. They concluded that front/back cues and up/down cues were located mainly in the 8–16-kHz band and in the 6–12-kHz band, respectively. Both bands lie in the spectral region of the pinna response which generally spans from 2 kHz to above 14 kHz [8]. Individual pinnae take a large variety of size and shape and the artificial set of pinnae mounted on the KEMAR are manufactured based on the average dimensions of human pinna cavities. Therefore, the pinna response of the non-individualized HRTF gener- ally cannot match that of each individual HRTF resulting in front/back confusion and compromised vertical eects for most listeners. Based on the hypothesis that the structure of an HRTF is closely related to the dimensions and orientation of each individual body part, i.e. head, torso, shoulders, and pinnae, a variety of HRTF customization techniques by modifying other people’s HRTFs has been introduced to accomplish perceptual fidelity in virtual audio synthesis. Some stud- ies such as HRTF clustering and selection of a few most representative ones by Shimada et al. [9], a structural model for composition and decomposition of HRTFs by Algazi et al. [10], HRTF scaling in frequency by Middlebrooks [11], and database matching by Zotkin et al. [12] already sug- gested that the hypothesis is somewhat valid although a per- fect localization (equivalent to the localization based on the listener’s own HRTFs) was never closely achieved. For ex- ample, the work of Middlebrooks is based on the idea that Copyright c 2008 The Institute of Electronics, Information and Communication Engineers

Upload: doanduong

Post on 12-Dec-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008345

PAPER

Enhanced Vertical Perception through Head-Related ImpulseResponse Customization Based on Pinna Response Tuningin the Median Plane

Ki Hoon SHIN†a), Nonmember and Youngjin PARK††, Member

SUMMARY Human’s ability to perceive elevation of a sound and dis-tinguish whether a sound is coming from the front or rear strongly dependson the monaural spectral features of the pinnae. In order to realize an ef-fective virtual auditory display by HRTF (head-related transfer function)customization, the pinna responses were isolated from the median HRIRs(head-related impulse responses) of 45 individual HRIRs in the CIPICHRTF database and modeled as linear combinations of 4 or 5 basic tem-poral shapes (basis functions) per each elevation on the median plane byPCA (principal components analysis) in the time domain. By tuning theweight of each basis function computed for a specific height to replace thepinna response in the KEMAR HRIR at the same height with the resultingcustomized pinna response and listening to the filtered stimuli over head-phones, 4 individuals with normal hearing sensitivity were able to createa set of HRIRs that outperformed the KEMAR HRIRs in producing verti-cal effects with reduced front/back ambiguity in the median plane. Sincethe monaural spectral features of the pinnae are almost independent of az-imuthal variation of the source direction, similar vertical effects could alsobe generated at different azimuthal directions simply by varying the ITD(interaural time difference) according to the direction as well as the size ofeach individual’s own head.key words: HRTF customization, HRIR, pinna response tuning, principalcomponents analysis

1. Introduction

The ability of humans to use sonic cues to localize a sound inthe surrounding 3 dimensional space is referred to as audi-tory localization. At its very core, lies the head-related trans-fer function (HRTF) which comprises major cues for spatialhearing such as the ITD (interaural time difference), ILD(interaural level difference), and spectral modification in-duced by the pinna folds. Synthesis of spatial hearing basedon HRTFs is of great practical and research importance andnon-individualized HRTFs measured with a dummy headmicrophone system (the KEMAR for instance) are used formost virtual audio syntheses. However, subjective evalua-tions on these non-individualized HRTFs involving a groupof individuals often report front/back reversal and poor ver-tical effects.

Both front/back distinction and vertical perception forhumans are mainly triggered by the spectral features (peaks

Manuscript received April 9, 2007.Manuscript revised June 29, 2007.†The author is with Samsung Electronics, Suwon-City, 443-

742, Republic of Korea.††The author is with KAIST, Science Town, Daejeon, 305-701,

Republic of Korea.a) E-mail: [email protected]

DOI: 10.1093/ietfec/e91–a.1.345

and notches) produced by the direction-dependent filteringof the pinna as described by Shaw and Teranishi [1]. Inparticular, the importance of spectral notches (or nulls) aslocalization cues in the median plane (0◦ azimuth) is sup-ported by Blauert [2] and also by Hebrank and Wright [3].They concluded that elevation in the median plane whereboth ITD and ILD are zero is cued by a spectral notch whosefrequency has similar dependence on elevation as that pre-viously observed by Shaw and Teranishi in the lateral plane.Further results confirmed this conclusion both in the medianplane [4] and in the lateral plane [5]. In an attempt to ex-plain such a prominent feature in HRTFs, Lopez-Poveda andMeddis [6] suggested a diffraction/reflection model basedon the posterior wall of the human concha and was ableto predict the notch frequencies with reasonable accuracy.More recently, Langendijk and Bronkhorst [7] were able toisolate the frequency bands responsible for front/back andup/down cues in human HRTFs via a series of subjectivelistening tests. They concluded that front/back cues andup/down cues were located mainly in the 8–16-kHz bandand in the 6–12-kHz band, respectively. Both bands lie inthe spectral region of the pinna response which generallyspans from 2 kHz to above 14 kHz [8]. Individual pinnaetake a large variety of size and shape and the artificial set ofpinnae mounted on the KEMAR are manufactured based onthe average dimensions of human pinna cavities. Therefore,the pinna response of the non-individualized HRTF gener-ally cannot match that of each individual HRTF resulting infront/back confusion and compromised vertical effects formost listeners.

Based on the hypothesis that the structure of an HRTFis closely related to the dimensions and orientation of eachindividual body part, i.e. head, torso, shoulders, and pinnae,a variety of HRTF customization techniques by modifyingother people’s HRTFs has been introduced to accomplishperceptual fidelity in virtual audio synthesis. Some stud-ies such as HRTF clustering and selection of a few mostrepresentative ones by Shimada et al. [9], a structural modelfor composition and decomposition of HRTFs by Algazi etal. [10], HRTF scaling in frequency by Middlebrooks [11],and database matching by Zotkin et al. [12] already sug-gested that the hypothesis is somewhat valid although a per-fect localization (equivalent to the localization based on thelistener’s own HRTFs) was never closely achieved. For ex-ample, the work of Middlebrooks is based on the idea that

Copyright c© 2008 The Institute of Electronics, Information and Communication Engineers

Page 2: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

346IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008

the HRTF will be shifted toward the lower frequencies whilemaintaining its shape when the pinna is scaled up in size. Ifthe listener deduces the source elevation from the positionsof peaks and notches in the oncoming sound spectrum, lo-calization with the scaled-up pinna larger than the listener’sown pinna will result in systematic bias in elevation percep-tion and personalization may be achieved simply by scalingdown the HRTF of the scaled-up pinna. However, the pin-nae of different individuals are different in many more as-pects than just a simple scaling, and an insignificantly smallchange in the shape of the pinna can cause dramatic changesin the HRTF. The database matching technique suggestedby Zotkin et al. [12] relies on the HRTF database releasedby the CIPIC Interface Laboratory at UC Davis containing43 sets of individual HRTFs and 2 sets of KEMAR HRTFsalong with some anthropometric information. By taking apicture of the listener’s own ear and comparing the anthro-pometric parameters measured from the image to the onesprovided in the database, they selected the best matchingset of individual HRTFs for virtual auditory synthesis. Al-though the localization performance on source elevation wasimproved by 20–30% for 4 out of 6 subjects, this methodrequires a sophisticated imaging system that can capture thesubject’s ear to its real life size and automatically computethe anthropometric dimensions from the image.

In 1984, Morimoto and Aokata [13] introduced theinteraural-polar coordinate system and showed that the sim-ilar spectral cues observed in the median plane occur in anysagittal plane. Moreover, Wightman and Kistler [14] con-ducted a series of experiments in which the produced stim-uli contained the ITD signaling one direction and ILD andpinna cues signaling another direction through manipulationof the ITD in the measured HRTFs of several individuals.The apparent lateral directions of such stimuli with conflict-ing cues almost always followed the ITD cue as long as thestimuli included low frequencies. Morimoto et al. [15] pro-posed a new sound localization method based on [13] thatsuccessfully rendered 3-d sound images in a sagittal planeby simulating interaural differences (ITD and ILD) and in-dividual HRTFs measured in the median plane. They furthershowed that the ITD was dominant on lateral perception byperforming localization tests in which either one of the ITDor ILD was manipulated separately while the other one waskept at zero.

In this paper, a measurement-free and yet effectiveHRTF customization method that can be based on any in-dividual HRTF database of substantial size is proposed. Thegoal of our study is not in the retrieval of exact individualHRTFs. Rather, our goal lies in the development of hy-brid HRTFs that can deliver the necessary vertical percep-tion better than the non-individual HRTFs while reducingfront/back reversal for any particular listener. The basic ideais similar to that suggested in [15]. Vertical perception iscontrolled by modifying the pinna responses extracted fromthe median HRIRs in any individual HRTF database thatdoes not contain the HRTF of the target subject, and lateralperception is controlled by introducing the head shadow ef-

fect to compensate for ILDs and proper ITDs that are repre-sented as simple linear delays. Justification for approximat-ing the HRTF phase as linear functions independent of fre-quency can be found in the work of Kulkarni et al. [16]. Ourmethod is developed primarily in the time domain becausestructural decomposition of an HRTF is generally not easyin the frequency domain. An HRIR is a sequence of tem-poral events of sound waves reaching the ears over multiplepaths. Therefore, the pinna response can be easily extractedfrom an HRIR simply by clipping away the shoulder/torsoresponse and keeping only the early response since the pinnais located closest to the ear canal. Brown and Duda [17]argued that most pinna activity occurred in the first 0.7 mssince the arrival of the direct pulse by comparing the KE-MAR HRIRs measured with pinna to the HRIRs measuredwithout pinna. However, a more detailed comparison of thedata presented in their work reveals that the difference isnot so prominent after the first 0.2 ms. Examination of theHRIRs from our HRTF database [18] and those from theCIPIC HRTF database [19] also indicates that most pinnaactivity with largest intersubject variation is concentrated inthe first 0.2 ms, which corresponds to 10 samples at a 44.1-kHz sampling rate.

The proposed HRTF customization procedure consistsof the following steps (See Fig. 1). First, the temporal pinnaresponses, each containing exactly 10 samples from the be-ginning of the direct pulse, are extracted from a group ofindividual HRIRs measured in the median plane after allinitial time delays are removed. Then, principal compo-nents analysis (PCA) is performed on the isolated pinna

Fig. 1 Outline of procedures for the proposed HRTF customizationmethod.

Page 3: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION347

responses at each selected elevation angle to model themas linear combinations of 4 or 5 basis functions (or princi-pal components) by using the covariance method [20]. Agraphical user interface (GUI) designed using MATLABTM

allows the subject to tune the pinna response by changingthe weight on each basis function and listening to a broad-band stimulus (100 Hz–20 kHz) filtered with the resultingpinna response aligned with a shoulder/torso response ex-tracted from the KEMAR HRIR at the same elevation an-gle over a set of headphones (Sennheiser HD 250 linear II).KEMAR’s shoulder/torso response at each elevation anglecan be obtained simply by clipping away the pinna responseand linear delay from the corresponding KEMAR HRIR andthis step is indicated by the dashed crosses shown in Fig. 1.Adjustment of the weight on each basis function can con-tinue until a satisfactory elevation perception is achieved.The proposed HRIR customization procedure also includesthe steps for introducing the head shadow effect and individ-ualized ITDs to the customized pinna responses as shown inFig. 1 for an accurate virtual auditory synthesis in the entire3-d space around a target listener’s head. However, it shouldbe noted that these interaural differences were ignored inthis study because we wanted to verify first the effective-ness of the proposed HRIR customization method in render-ing enhanced elevation perception and reduced front/backconfusion in the median plane only where all the interauraldifferences are zero. A total of 4 subjects with normal hear-ing sensitivity participated in this study. For performancecomparison, the individual HRTFs of these 4 participantswere measured in the median plane. Subjective listeningtests were performed on the customized HRIRs, individualHRIRs, and the KEMAR HRIRs in order to verify feasibil-ity of the proposed method.

2. Method

2.1 PCA of Pinna Responses in the Time Domain

A typical HRIR can be decomposed into a series of tempo-

Fig. 2 Structural decomposition of an HRIR measured with a B&KHATS (Head And Torso Simulator) with an acoustic point source locatedat 0◦ azimuth and 0◦ elevation [18].

ral sound events as shown in Fig. 2. There is first an initialtime delay due to the distance of the source with respect tothe ears. Then, a direct pulse whose amplitude depends onthe source distance and shadow effect arrives, followed by aridge-trough combination caused by reflection and diffrac-tion due to pinna cavities. The rest of the signal containsreflections from shoulder, torso, and measurement devicessuch as the turntable and vertical hoop stand for holding thepoint source at desired angle. Technically, the direct pulsecannot be part of the pinna response, but the early responsethat lasts for about 0.2 ms since the arrival of the direct pulseis referred to as the pinna response throughout the rest of thispaper for convenience. It should be noted that the individ-ual HRIRs used in our analysis are the ones from the CIPICHRTF database [19] containing HRTFs obtained from 43 in-dividual subjects plus the KEMAR with 2 sets of pinnae ofdifferent size.

The procedure of the covariance method [20] used forPCA is as follows. Let X be an M by N data matrix con-taining the extracted pinna responses at selected elevationangle where M is the number of total dimensions (10 inthis case) and N is the number of available data sets (45 inthis case). The empirical mean of X along each dimensionm = 1, . . . ,M can be computed from

u[m] =1N

N∑

n=1

X[m, n]. (1)

The empirical mean of the 45 individual pinna responsesmeasured at 45◦ elevation is shown for both ears in Fig. 3as an example. This mean vector u is then subtracted fromeach column of X to get a mean-subtracted data matrix B:

B = X − u · h (2)

where h is a 1 by N row vector of all 1’s. The M by Mcovariance matrix C is obtained from the outer product of Bwith itself:

C = E[B ⊗ B] =1

N − 1B · B∗ (3)

where * is the conjugate transpose operator. Next, the eigen-value matrix D and the orthonormal eigenvector matrix V of

Fig. 3 Empirical mean of 45 pinna responses per each ear collected fromthe CIPIC HRIRs measured at 45◦ elevation.

Page 4: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

348IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008

the covariance matrix C are computed satisfying the follow-ing relationship:

C · V = V · D (4)

where D is an M by M diagonal matrix with eigenvalues ofC in the diagonal. Matrices V and D must be rearranged inorder of decreasing eigenvalue. Now the eigenvalues repre-sent the energy distribution of the data X among each of theeigenvectors that forms a basis for the data. The cumulativeenergy content g is the sum of the energy content across allof the eigenvectors from 1 through m:

g[m] =m∑

q=1

λq (5)

where λq is the qth eigenvalue and m = 1, . . . ,M. By choos-ing a suitable accuracy bound, which was set to be morethan 90% of the total energy stored in the original data inour analysis, a subset of the eigenvectors are selected as ba-sis vectors (principal components). The first L columns ofV that satisfies the following accuracy bound on the cumu-lative energy ratio (CER) are chosen as the principal com-ponents (PCs):

CER (%) =g[L]g[M]

× 100 > 90%. (6)

The CER computed for the pinna responses at 45◦ elevationusing the above equation with L = 1, . . . , 10 is shown inFig. 4. It can be seen that at least 5 PCs are required for themodeled data to represent more than 90% of the energy inthe original data for both ears. So L = 5 in this case. These5 PCs obtained for each ear are shown in Fig. 5. Note thatthe PCs obtained for the left ear pinna responses are almostidentical to those obtained for the right ear pinna responses.This was generally the case for other sets of data at differentelevation angles. Sometimes the required number of PCswas 4 depending on the elevation angle. Now let W be an Mby L matrix with L PCs as its column vectors:

W[p, q] = V[p, q] (7)

Fig. 4 Cumulative energy ratio (CER in Eq. (6)) plotted with increasingnumber of PCs for 45◦ elevation. The number of PCs on the horizontal axisrepresents L in Eq. (6).

for p = 1, . . . ,M and q = 1, . . . , L. A new data matrix Y ,which is a transformation of X onto the L principal compo-nents, can be obtained simply by

Y = W∗ · B. (8)

This new data matrix Y (an L by N matrix) can then be usedto retrieve a truncated version of the original data X̃ by

X̃ = W · Y + u · h. (9)

In essence, a linear superposition of the L PCs in W with thenth column of Y as a set of L principal component weights(PCWs) approximately recovers the nth column of the orig-

Fig. 5 Five basis functions (principal components: PC1–PC5) of thepinna responses at 45◦ elevation. The solid lines denote the left ear princi-pal components and the dashed lines denote the right ear principal compo-nents.

Fig. 6 Pinna responses at 45◦ elevation of subject 50 (solid) in the CIPICHRTF database and their approximations (dashed) computed as a linearcombination of the 5 PCs per ear shown in Fig. 5. Left ear responses areplotted in the upper panel and right ear responses in the lower panel.

Page 5: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION349

Fig. 7 Five sets of PCWs required in order to recover the original pinnaresponses in the CIPIC HRTF database as linear combinations of the fivePCs (left ear) depicted in Fig. 5. Note that the distribution of the PCWsbecomes smaller as the eigenvalue decreases.

Fig. 8 Left ear pinna responses at 45◦ elevation of 4 randomly selectedsubjects from the CIPIC HRTF database.

inal data X. The left and right pinna responses at 45◦ ele-vation for subject 50 from the CIPIC HRTF database alongwith the approximations computed using Eq. (9) are plot-ted for comparison in Fig. 6. It can be seen that 5 PCs areenough to recover the original data with close resemblance.45 sets of 5 PCWs for the left ear PCs shown in Fig. 5 thatare required to model the entire 45 left ear pinna responsesin the CIPIC HRTF database are captured in Fig. 7. Notethat the spread of PCWs is the largest for PC 1 and smallestfor PC 5. This is a direct consequence of rearranging V andD (Eq. (4)) in order of decreasing eigenvalue since largereigenvalue implies bigger energy distribution of the originaldata along the corresponding eigenvector. In other words,the first 2 PCs are more important basis functions than thelatter 3 PCs in representing the variation of the original data.

The left ear pinna responses of 4 randomly selected in-dividuals at 45◦ elevation depicted in Fig. 8 shows large in-tersubject variations around 0.08, 0.11, and 0.16 ms. Onecan easily observe from the left ear PCs in Fig. 5 that the first3 PCs have ridges at the above temporal positions indicatingthat a linear combination of these first 3 PCs with appropri-ate PCWs can cover most intersubject variation in the shape

Fig. 9 Left ear pinna responses of subject 8 (solid), subject 60 (dashed),and subject 153 (dotted) from the CIPIC HRTF database at various eleva-tion angles. The numbers in the right indicate the corresponding angles.

and amplitude of the ridge-trough pair following the directpulse. Amplitude variation of the direct pulse can be cov-ered with PC 5 because it has a ridge in the region where thedirect pulse is likely to reside. Therefore, by allowing a sub-ject to tune the weight on each PC for customization, one ismerely adding a timed ridge-trough pair with adjusted am-plitude and an overall level shift to the mean pinna responsein Fig. 3.

The left ear pinna responses of 3 randomly selected in-dividuals at elevations from −30◦ through 210◦ are plottedin Fig. 9 in order to observe the intersubject variation patternper elevation angle in the median plane. The most commonand salient change in the individual pinna responses as thesource climbs in elevation lies in the arrival time and levelof the first reflection (second ridge) immediately after thedirect pulse (first ridge) and also in the shape and durationof the trough that follows. The temporal interval betweenthe arrivals of the direct pulse and first reflection contractsas the source rises in the frontal hemisphere up to 60◦ wherethe two pulses merge into a single ridge. The two pulsesstay merged for all rear source positions. Meanwhile, thewidth of the following trough decreases as the source risesto 90◦ which is directly over the head and increases backas the source descends in the rear hemisphere. The above

Page 6: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

350IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008

phenomenon is similar to that observed by Hiranaka and Ya-masaki [21].

After examining many individual pinna responses inthe CIPIC HRTF database, we could conclude that most in-tersubject variation in pinna responses lies in the amplitudeand arrival times of either the direct pulse or ridge-troughpair depending on the elevation angle of the source. Notethat these intersubject variations become quite small as thesource moves into the rear hemisphere especially when thesource lies directly behind the listener at 180◦. However, itcan be shown that even a very small difference in the timedomain yields a large difference in the frequency domain.

2.2 PCW Tuning for Customization

As mentioned above, letting a subject tune the weight oneach PC brings an actual change in the shape of the pinna re-sponse. Four male subjects with normal hearing sensitivityparticipated in making customized HRTFs by using a GUI(graphical user interface) depicted in Fig. 10. Sectors in theGUI are bound by boxes and labeled per function in the fig-ure. A subject may choose any elevation angle from −45◦ to230◦ in the median plane since the HRTFs from the CIPICHRTF database are available in that angular range at inter-vals of 5.625◦. However, customization was only carried outat 9 specific elevation angles from −30◦ to 210◦ at 30◦ inter-vals in the median plane in order to compare the localizationperformance of the customized HRTFs to that of individualHRTFs of the participants measured at those angles. Bal-

Fig. 10 A MATLABTM GUI for pinna response customization based ontuning of PCWs (See text for details).

ance control in the GUI adjusts gains to be applied to theleft and right channels since it is necessary to render soundimages in the center before the tuning commences and aninteraural difference in perceived levels between the left andright ears is quite common even for individuals with normalhearing sensitivity. As mentioned in the previous section,the PCs obtained for left and right ears turned out to be sim-ilar to each other at most elevation angles despite the inter-aural shape difference in pinna responses for some individu-als in the CIPIC HRTF database. As a result, ear symmetrywas assumed and customization was performed by tuningthe PCWs on only one ear. The slider on each slide-baron the GUI represents the PCW values for each PC. Afterpunching in an elevation angle at which customization is tobe performed, principal components analysis is executed onthe isolated pinna responses measured at the specified an-gle and corresponding PCs are computed by simply pushingthe ‘PCA’ button. Then, each participant fiddles with theslide-bars to adjust the PCW on each PC and listens to aninput stimulus (100 Hz–20 kHz) filtered by the newly cre-ated HRIR (marked as ‘Custom HRIR’ in Fig. 10) by push-ing the ‘PLAY’ button. This ‘Custom HRIR’ is formed byaligning the pinna response obtained as a linear combina-tion of the tuned PCs to the shoulder/torso response of theKEMAR HRIR measured at the same angle. The ‘PLAYKEMAR’ button is for listening to the same input stimu-lus filtered by the KEMAR HRIR. Some listeners may findthe vertical perceptions produced by the KEMAR HRIRs‘good’ enough in which case they can tune the PCWs so

Page 7: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION351

that the resulting pinna response shown as a solid line in thetop-right panel on the GUI takes a similar shape with thatof the KEMAR’s shown as a dashed line in the same plot orsimply keep the KEMAR HRIR as their customized HRIRat each angle of concern. On the other hand, if the KEMARHRIR performs poorly in producing the necessary verticaleffects, then the tuning can continue until each participant issatisfied with the resulting vertical effect he or she perceives.In our study, all participants reported unsatisfactory verticalperceptions with the KEMAR HRIRs so the tuning was per-formed on all target angles. Note that the headphone- pinnacoupling effect for each subject was cancelled using the sub-ject’s own headphone-to-meatus-entrance transfer functionfor all the output stimuli produced in the above tuning ex-periment.

2.3 Individual HRTF Measurement

The individual HRTFs of the four subjects who participatedin the above tuning experiment were measured at the el-evation angles where the pinna customization took place.Subjects were seated in a chair coupled to a vertical hoopdesigned to hold an acoustic point source. Details on themeasurement apparatus and method can be looked up in ourprevious work on modeling the HRTFs for nearby sources[18]. For correct headphone-presented simulation of free-field listening when evaluating these individual HRTFs ontheir localization capabilities, headphone-pinna coupling ef-fect was cancelled using the headphone-to-meatus-entrancetransfer function measured on each subject according to themethod suggested by Wightman and Kistler [22].

While a typical HRTF measurement for an individualis carried out by placing a probe tube in the ear canal at aposition very close to the eardrum, this is obviously a verydifficult task. Møller, Sorensen, Hammershøi, and Jensen[23] demonstrated that HRTF measurements could also bemade by measuring free-field and headphone responses atthe entrance of a blocked ear canal. With their technique,however, a miniature microphone embedded in an earplugthat can be fitted in each subject’s ear canal is required. In-stead of dealing with all the laborious procedures involved inthe conventional measurement techniques, we adopted the

Fig. 11 B&K Binaural Microphone Type 4101 (right) for measuring in-dividual HRTFs mounted inside a subject’s pinna at the entrance to the earcanal (left).

blocked-meatus measurement technique using a B&K Bin-aural Microphone Type 4101 mounted inside each subject’spinna as shown in Fig. 11 for the sakes of convenience andefficiency. Although this stethoscope-like microphone setsimplifies the overall measurement process by far, it was dif-ficult to bend the microphone arms so that the microphonetips could be fitted with precision at the ear canal entrancewithout touching the tragus. Anchoring them in the exactsame positions during measurement was another difficultywe faced. The microphone arms were taped on each sub-ject’s lower cheeks in an effort to anchor the microphonetips and the subjects were instructed to restrain from makingany noticeable movement during the experiment. However,as the evaluation results shown in the subsequent chaptersuggest, we believe that our individual HRTFs contain someerrors induced by imprecise positioning of the microphonetips.

3. Subjective Evaluation Results

Subjective listening tests were carried out on all four sub-jects (ID: SK, HS, KB, and CH) to assess the performance ofthe three HRIR sets: Customized HRIRs, individual HRIRs,and KEMAR HRIRs. In an attempt to prevent any possi-ble learning acquired by the subject during the tuning pro-cess from affecting the overall evaluation result, the evalu-ation experiment was conducted several days after comple-tion of tuning by all subjects. The subjects listened to broad-band stimuli filtered by HRIRs from each of the above threeHRIR sets over the headphones and gave their perceived re-sponses by typing into a GUI designed for the evaluationtest. Each of the 9 elevation angles is simulated 10 times ina random order yielding in total 90 stimuli to evaluate perHRIR set.

The subjective evaluation results are shown in Figs. 12–15 for all 4 subjects. Evaluations on the KEMAR, individ-ual, and customized HRIRs are displayed in the left, center,and right panel, respectively, in each figure. The horizontalaxis denotes the actual source positions and the vertical axisdenotes the perceived source positions in each panel. Notethe response frequency scale drawn in a small box in theright panel of Fig. 15. The response frequency is representedby the size of the square with the largest square indicating10 redundant responses and the smallest square indicating 1response per each source location. The positive-sloped di-agonal line in each panel indicates the perfect hearing con-dition in which the perceived source position correspondsexactly with the actual source position.

The following observations are based on the evaluationresponses presented in Figs. 12–15. All subjects reporteddifficulties of varying degree in making correct judgmentson the source elevation on most trials with the KEMARHRIRs. Either front/back reversal was frequent (especiallyfor subjects SK and CH), which is evident from the manyoff-diagonal responses in symmetric positions with respectto the diagonal, or localization performance was low (forall 4 subjects) judging by the large response spread about

Page 8: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

352IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008

Fig. 12 Subjective evaluation result for subject SK on 3 HRIR sets: KEMAR, individual, andcustomized HRIRs (Refer to text for detail).

Fig. 13 Subjective evaluation result for subject HS (Refer to text for detail).

Fig. 14 Subjective evaluation result for subject KB (Refer to text for detail).

Fig. 15 Subjective evaluation result for subject CH (Refer to text for detail).

the diagonal. With individual HRIRs, front/back reversalswere reduced for all subjects except for subject HS who of-

ten perceived the frontal sources at −30◦ and 0◦ to be inthe rear instead. Subject KB made quite a few errors in

Page 9: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION353

Fig. 16 Illustration for resolving front/back confusions. The confusionsare reflected about the vertical plane (horizontal dashed line) onto the cor-rect hemisphere.

localizing the rear sources even with his own HRIRs andthe scattered responses produced by subject CH for sourcesat −30◦, 0◦, and 30◦ suggest that he too had difficulty inlocalizing the frontal sources near the horizontal plane. Ingeneral, however, it can be said that all subjects performedbetter with their own individual HRIRs than with the KE-MAR HRIRs judging by the tighter distribution of the re-sponses around the diagonal. Comparison of the responsedata made with customized HRIRs to those made with theKEMAR HRIRs reveals the following. Front/back rever-sals were reduced for all subjects with customized HRIRsexcept for subject HS who made similar confusion errorsfor the sources on and below the horizontal plane as he didwith his own HRIR set. The localization performance wasenhanced for all subjects for most source positions judgingby the smaller spread about the diagonal. Although subjectHS’ localization performance with customized HRIRs waspoor for sources near the horizontal plane, it was slightlyimproved for sources positioned at other elevation angles,i.e. from 0◦ to 150◦. Subject KB made poor elevation judg-ments with customized HRIRs as the source shifted from90◦ to 210◦ into the rear hemisphere, but it should be notedthat his localization performance on rear sources was poorwith all 3 HRIR sets.

When computing error indices to account for the lo-calization performance associated with a particular set ofHRIRs, it has been common practice to treat front/back con-fusions and localization accuracy separately by resolvingthe confusions in order to avoid error inflation [24]. On theother hand, resolution of the confusions can be misleading ifwe assume the responses correctly reflect the subject’s per-ception. However, since our primary goal was to comparethe three HRIR sets in terms of localization performance, wetoo elected to resolve all apparent confusions and report theincidence of confusions associated with each set of HRIRs.If the angle between the actual source position and the per-ceived response is made smaller by reflecting the responseabout the vertical plane passing through the subject’s ears as

Table 1 Localization errors s̄ in Eq. (10) and front/back confusioncounts computed by resolution of the responses shown in Figs. 12–15. Theletters denote confusion clusters, i.e. C indicate the total confusions, B thebackward confusions, and F the forward confusions.

shown in Fig. 16, the response is entered in reflected formand the confusion count is increased by one. Then, the lo-calization error was computed in the root mean square senseincluding both the responses lying in the same hemisphereas the sources and the confusions in reflected form by thefollowing definition

s̄ =

⎡⎢⎢⎢⎢⎢⎢⎣190

90∑

i=1

(xi − φsource(i))2

⎤⎥⎥⎥⎥⎥⎥⎦

12

(10)

where xi is the perceived response for the ith stimulus cor-responding to the actual source position φsource(i) and thenumber 90 is the total number of presented stimuli per eachHRIR set. Table 1 depicts these RMS errors and the con-fusion counts organized per subject per HRIR set evaluated.The RMS errors are indicated by the numbers in the top rowof each cell and the confusion counts follow in the bottomrow in the form: s̄/no. of total confusions (no. of backwardconfusions + no. of forward confusions).

From these error indices shown in Table 1 we can de-duce the following conclusions regarding the localizationperformance associated with each set of HRIRs. Compar-ison of the localization errors produced with the KEMARHRIRs to those with the customized HRIRs reveals that thelocalization accuracy was improved by far with the cus-tomized HRIRs for subjects KB and CH whereas subjectsSK and HS showed slightly better accuracy with the KE-MAR HRIRs. Obviously this is a direct result of resolu-tion of the confusions because it appears to be otherwisefor subjects SK and HS in Figs. 12 and 13. Of course, withthe customized HRIRs front/back confusions were reducedfor all subjects, and in particular, subjects SK and CH haveshown dramatic improvements, i.e., the confusion countswent from 29 to 9 for SK and from 43 to 6 for CH. Onthe contrary, the localization performance with individualHRIRs was not quite satisfactory for all subjects. IndividualHRIRs are generally known to produce good localization re-sults, but studies in the past like the one by Wightman andKistler [24] show that headphone simulation of free-field lis-

Page 10: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

354IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008

Fig. 17 KEMAR (solid), individual (dashed), and customized (dotted) HRIRs (left) and thecorresponding HRTFs (right) for subject SK.

Fig. 18 KEMAR (solid), individual (dashed), and customized (dotted) HRIRs (left) and thecorresponding HRTFs (right) for subject CH.

tening tend to produce more frequent front/back confusionsand less well defined source elevation as opposed to the free-field condition. With individual HRIRs, subjects HS and KBproduced the best overall localization accuracy and subjectKB’s front/back confusions were the least of all three HRIRcases. On the other hand, the localization performance in-dices by the customized and individual HRIRs indicate that

subjects SK and CH showed better localization accuracy andsubjects HS and CH produced less confusions with the cus-tomized HRIRs than with individual HRIRs. In short, it canbe said that with the customized HRIRs most subjects pro-duced less confusions and 2 out of 4 subjects (SK and CH)performed best in the aspects of both the localization accu-racy and front/back confusion.

Page 11: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION355

The customized and individual HRIRs for subjects SKand CH along with the KEMAR HRIRs and the correspond-ing HRTFs which are direct Fourier transforms obtainedfrom the temporal responses are depicted in Figs. 17 and 18for example. These plots immediately reveal that most spec-tral deviations among the HRTFs take place in the high fre-quency region and that the differences between the KEMARand customized HRTFs mostly occur in the region above6 kHz, which is a direct consequence of the pinna responsemodification by tuning. It is also clear that even a small vari-ation in the time response renders a substantial difference inthe frequency response. In our study, we had hoped to findsome similarity between the customized and individual re-sponses both in the temporal and spectral shapes because intheory the two sets of responses are supposed to capture andreflect the individual pinna features better than the KEMARHRIRs if the tuning had worked well as it did for these twosubjects in particular. Unfortunately however, as was ex-pected during the measurement phase of our study and alsofrom the analysis of the evaluation results, there was verylittle similarity or none at all between the customized andindividual HRTFs. The spectral notches and roll-offs thatare known to be responsible for elevation perception do notseem to coincide even barely except for a few spectral re-gions, i.e. notches at 7 kHz at 210◦, roll-offs at 10 kHz at150◦, notches at 16 kHz at 120◦ and notches at 11.3 kHz at90◦ for subject SK in Fig. 17. Although the localization per-formances by subjects SK and CH using their own HRIRswere passable considering the results of headphone simula-tion of free-field condition achieved by others in the past,we believe that the individual HRIRs measured in this studycontain errors probably induced by imprecise positioning ofthe microphone tips at the ear canal entrance as mentionedearlier. As a result, we cannot confirm if the spectral fea-tures in the HRTFs obtained by the proposed customizationmethod indeed represent each individual’s pinna character-istics at this point even though they have shown to bringimprovements in the localization performance.

4. Discussion and Future Work

The proposed HRIR customization method based on tuningof the basis functions obtained from decomposition of thepinna responses in the time domain by PCA was shown tobe effective in producing the necessary vertical effects whilereducing front/back reversals. We confirmed this by a seriesof subjective listening tests. With the customized HRIRsin comparison to the KEMAR HRIRs, 2 out of 4 subjectsmanaged to show explicit improvements with noticeabledecrease in front/back reversals while the other 2 subjectsdemonstrated enhanced elevation perception to some de-gree. All subjects reported that the sources at 60◦, 90◦, and120◦ in elevation angle were among the toughest to discrim-inate from one another for both individual and customizedHRIRs and that they had to guess the source elevation onmost trials with the KEMAR HRIRs. We also verified thatsimilar vertical effects could also be generated at other az-

imuthal directions simply by adding proper ITDs to the cus-tomized HRIRs developed using the proposed method. Thelocalization performance in other sagittal planes along withdetailed analysis will follow in a subsequent paper.

Acknowledgments

This work was supported by the Korea Science andEngineering Foundation (KOSEF) through the NationalResearch Laboratory Program (M10500000112-05J0000-11210) and the BK 21 Project (2006) of Republic of Korea.

References

[1] E.A.G. Shaw and R. Teranishi, “Sound pressure generated in anexternal-ear replica and real human ears by a nearby point source,”J. Acoust. Soc. Am., vol.44, pp.240–249, 1968.

[2] J. Blauert, “Sound localization in the median plane,” Acoustica,vol.22, pp.205–213, 1969/1970.

[3] J. Hebrank and D. Wright, “Spectral cues used in the localization ofsound sources on the median plane,” J. Acoust. Soc. Am., vol.56,pp.1829–1834, 1974.

[4] R.A. Butler and K. Belendiuk, “Spectral cues utilized in the local-ization of sound in the median sagittal plane,” J. Acoust. Soc. Am.,vol.61, pp.1264–1269, 1977.

[5] P.J. Bloom, “Determination of monaural sensitivity changes due tothe pinna by use of minimum-audible-field measurements in the lat-eral vertical plane,” J. Acoust. Soc. Am., vol.61, pp.820–828, 1977.

[6] E.A. Lopez-Poveda and R. Meddis, “A physical model of sounddiffraction and reflections in the human concha,” J. Acoust. Soc.Am., vol.100, pp.3248–3259, 1996.

[7] E.H.A. Langendijk and A.W. Bronkhorst, “Contribution of spectralcues to human sound localization,” J. Acoust. Soc. Am., vol.112,pp.1583–1596, 2002.

[8] H.W. Gierlich, “The application of binaural technology,” AppliedAcoustics, vol.36, pp.219–243, 1992.

[9] S. Shimada, M. Hayashi, and S. Hayashi, “A clustering method forsound localization transfer functions,” J. Audio Eng. Soc., vol.42,pp.577–584, 1994.

[10] V.R. Algazi, R.O. Duda, R.P. Morrison, and D.M. Thompson,“Structural composition and decomposition of HRTFs,” Proc. WAS-PAA01, pp.103–106, New Paltz, NY, 2001.

[11] J.C. Middlebrooks, “Virtual localization improved by scaling non-individualized external-ear transfer functions in frequency,” J.Acoust. Soc. Am., vol.106, pp.1493–1510, 1999.

[12] D.N. Zotkin, R. Duraiswami, and L.S. Davis, “Customizable au-ditory displays,” Proc. Int. Conf. on Auditory Display (ICAD),pp.167–176, Kyoto, Japan, 2002.

[13] M. Morimoto and H. Aokata, “Localization cues of sound sources inthe upper hemisphere,” J. Acoust. Soc. Jpn. (E), vol.5, pp.165–173,1984.

[14] F.L. Wightman and D.J. Kistler, “The dominant role of low-frequency interaural time differences in sound localization,” J.Acoust. Soc. Am., vol.91, pp.1648–1661, 1991.

[15] M. Morimoto, M. Itoh, and K. Iida, “3-D sound image localizationby interaural differences and the median plane HRTF,” Proc. 2002Int. Conf. on Auditory Display (ICAD), Kyoto, Japan, July 2002.

[16] A. Kulkarni, S.K. Isabelle, and H.S. Colburn, “Sensitivity of humansubjects to head-related transfer-function phase spectra,” J. Acoust.Soc. Am., vol.105, pp.2821–2840, 1999.

[17] C.P. Brown and R.O. Duda, “A structural model for binaural soundsynthesis,” IEEE Trans. Speech Audio Process., vol.6, no.5, pp.476–488, 1998.

[18] K. Shin and Y. Park, “Modeling of non-individualized head-relatedtransfer functions for nearby sources,” Proc. 9th Western Pacific

Page 12: Enhanced Vertical Perception through Head-Related ... - SDAC …sdac.kaist.ac.kr/upload/paper/IEICE_Shin.pdf · IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008 345 PAPER

356IEICE TRANS. FUNDAMENTALS, VOL.E91–A, NO.1 JANUARY 2008

Acoustics Conf. (WESPAC), pp.164–172, Seoul, Korea, June 2006.[19] CIPIC HRTF Database Files, Release 1.1, August 2001, CIPIC In-

terface Laboratory, U.C. Davis, available from http://interface.cipic.ucdavis.edu/

[20] J.E. Jackson, A User’s Guide to Principal Components, pp.1–25,John Wiley & Sons, 1991.

[21] Y. Hiranaka and H. Yamasaki, “Envelope representation of pinna im-pulse responses relating to three-dimensional localization of soundsources,” J. Acoust. Soc. Am., vol.73, pp.291–296, 1983.

[22] F.L. Wightman and D.J. Kistler, “Headphone simulation of free-field listening. I: Stimulus synthesis,” J. Acoust. Soc. Am., vol.85,pp.858–867, 1989.

[23] H. Møller, M.F. Sorensen, D. Hammershøi, and C.B. Jensen, “Head-related transfer functions of human subjects,” J. Audio Eng. Soc.,vol.43, pp.300–321, 1995.

[24] F.L. Wightman and D.J. Kistler, “Headphone simulation of free-field listening. II: Psychophysical validation,” J. Acoust. Soc. Am.,vol.85, pp.868–878, 1989.

Ki Hoon Shin was born in Seoul, Koreain 1974. He received his B.S. and M.S. de-grees in mechanical engineering from Univer-sity of Rochester, NY, in 1996 and 1998, re-spectively. From 1998 to 2000, he was enrolledin a Ph.D. program in aerospace engineering atGeorgia Tech, GA. Since 2001, he engaged inresearches on virtual audio synthesis for a Ph.D.in mechanical engineering at Korea AdvancedInstitute of Science and Technology (KAIST).He is now at the Digital Media R&D Center of

Samsung Electronics developing audio algorithms for DTVs and home the-atres.

Youngjin Park was born in Seoul, Koreain 1957. He received his B.S. and M.S. degreesin mechanical engineering from Seoul NationalUniversity in 1980 and 1982, respectively, andthe Ph.D. in mechanical engineering from Uni-versity of Michigan, MI, in 1987. From 1987 to1988, he worked as a research fellow at Univer-sity of Michigan. He also worked as an assistantprofessor at NJIT, NJ, from 1988 to 1990. Hejoined the faculty of Korea Advanced Instituteof Science and Technology (KAIST) in 1990,

where he is a Professor of Mechanical Engineering. His research inter-ests include general control theories, virtual audio synthesis, active controlof noise and vibration, system identification, etc.