prospects for transaural recording - cooper e bauck

8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

1/19

PAPERS

'

Prospects for Transaural Recording*

DUANE H. COOPER AND JERALD L. BAUCK

University of Illinois, Urbana, IL 6180/, USA

Transaural stereo, generic for binaural stereo processed for cancellation of loudspeakerto-ear crosstalk, results from the use of minimum-phase filters in shuffler configuration. Simplifying the filters further at short wavelengths makes the listener position noncritical.Full spatial qualities appear in a conventional stereo playback that avoids early reflections.Inverse shufflers provide precise transaural pan functions for multitrack work.

0 INTRODUCTION

Transaural stereo (generic term) is a stereo-system

plan that, like binaural stereo, takes the end point of

the recording-reproducing chain to be the actual sounds

at the ears. It contrasts with the taking of loudspeaker

sounds as the end point, which is necessarily the plan

of conventional stereo. It differs from binaural in that

the sounds for each ear, rather than being supplied by

direct signal chains ending at earphones, result indi

rectly, instead, from the preparation of structured

composite signals to be supplied to the loudspeakers.

1.1 Crosstalk Cancellation

The composite-signal structure is subsequently in

verted (decomposition) in the intervening loudspeaker

to-ear transmission to produce the intended sounds at

the ears. On the way to the ears, in addition to the

direct transmission, left to left and right to right, there

occur the cross transmissions of left to right and right

to left. The latter are traditionally called crosstalk (from

telephony), and the composition-decomposition

scheme cited is a nonadaptive precancellation of crosstalk. It consists of the "planting" of a crosstalk process,

in advance, that is devised to be the inverse of the

acoustic crosstalk expected to occur subsequently. When

properly done, the net result is the elimination of all

evidence of crosstalk.

1.2 Recording Binaural Signals

Signals representing ear sounds may be recorded

(binaural recording), in advance of crosstalk cancel-

* Presented at the 85th Convention of the Audio EngineeringSociety, Los Angeles, 1988 November 3-6.

J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

lation, by two pickup methods. One uses microphones

fitted in the ears of an artificial head. The other uses

free-space microphones whose signals have been pro

cessed to simulate transmissions around an acoustic

obstacle (human head) to specific points on the obstacle

(ears). _..

The second of these pickup methods, including its

source-to-ear processing, is known as binaural syn

thesis, and it may include the processing of as many

different microphone signals as may be suitable for a

given project. It may also include reverberant-field

synthesis as needed. The correspondence with multi

track stereo synthesis is notable: pan functions replaced

by binaural simulation for specific imaging directions

and reverberation units replaced by simulation of spatial

(binaural space) reverberation, such as being developed

by Kendall et al. [1]. After the completion of all binaural

processing, crosstalk canceling is the means of pro

ducing the master transaural recording.

For concert-hall recording, an artificial head would

be used, and it and the orchestra would be deployed

for optimal pickup. Under ideal conditions this may

suffice. However, further microphone deployments maybe considered to represent early reflection and late

reflection hall-sound pickup. The signals from these

latter would be delayed and subjected to binaural syn

thesis needed to produce the decorrelated ear sounds

deemed suitable for hall-sound representation. The final

step in the production is conversion to transaural.

1.3 Transaural Options

Some recording engineers may wish to use only a

part of the transaural technology. In multitrack work,

for example, it might be decided that only a few of the

tracks require the precise imaging of binaural synthesis,

3


2/19

Abbozzare

COOPER AND BAUCK

or that only a portion of the performing ensemble re

quires the spatial delineation available through artificial

head pickup. Such artistic decisions remain, of course,

with the producing authority, and it is the re!iponsibility

of the engineer to provide incisive imaging, to the extent

possible, where desired. Transaural technology may

be viewed as providing improved options for that pur

pose, not necessarily a whole new recording style.

A better choice for incisive imaging, however, cannot

be made. In a previous paper, Cooper, using calculations

from Bauck's thesis [2], showed [3, Fig. 8] the required

loudspeaker-signal specifications for two examples of

imaging. None of the conventional stereo methods

produces signals that in any way resemble these spec

ifications, except at low frequencies. Conventional

stereo has not sought to devise loudspeaker signals to

meet imaging-signal specifications at the ears, as was

required in these calculations, except in the low-fre

quency work of Blumlein [4]. Specifically, none of the

existing pan-pot formulas meet these specifications,

nor do any of the stereo microphone arrays, whether

coincident or spaced, whether using directional elements

or not.Some recording engineers, seeking a spacious effect,

use widely spaced microphones in a concert-hall setting.

It is known, of course, that the signals so obtained are

highly decorrelated, and it is also a known fact, in

concert-hall acoustics, that highly decorrelated ear

sounds are identified with spacious acoustic impres

sions. Unfortunately, the interaural correlation wiii al

ways be greater than the correlation at the loudspeakers,

because of crosstalk. The net result is that the spacious

effect is perceived as confined to an "acoustic stage,"

as in a different space from that of the listener. An

important aspect of the concert-hall experience is lost.

The use of widely spaced microphones with binauralsynthesis and suitable delay, however, will give the

recording engineer much greater control over the rep

resentation of the sound of the hall. Thus many more

venues may be exploited to advantage. At the same

time, a full spatial envelopment of the listener can be

provided to the extent desired. Many recording engi

neers will discover, also, that imaging and spaciousness

are not mutually exclusive, but, as has long been known

in concert-hall acoustics, belong together. Placing them

together is natural in transaural technology.

At first the recording engineer wiii want to try only

the simplest things from transaural technology. Indeed,

it is likely that only the simpler equipment wiii become

available at first. Existing techniques wiii necessarily

continue to be used, and the improvements oftransaural

technology wiii, in some instances, be adapted to that.

For reviews of existing techniques, the writings of Ear

gle may be consulted [5]. The evolution of such tech

niques to suit a binaural style of recording is not amen

able to detailed prediction, and will not be attempted here.

It is possible, however, to sketch a catalog of specific

kinds of transaural-related equipment, the developmentprevisto

of which may be foreseen. Some of these items arediscussed in a later section.

4

PAPERS

1.4 Binaural Monitoring

Prospects for binaural monitoring apparently have

advanced substantially in the recent decade. It had long

been the experience that earphones characteristically

produced "in-the-head" (interior) sounds and, with bi

naural material, sounds that were much more vulnerable

to front-back bias than is the case with natural hearing.

The problem has been traced to a disturbance in the

conch resonance of the human pinna. The conch is the

principal cavity in the pinna, and its resonance involves

its acoustic near field even at some distance from the

ear. Disturbance of this near field causes an "at-the

ear"judgment, as may be easily demonstrated by placing

a hand near the ear. Earphones are ordinarily placed

near enough to disturb this resonance (besides possibly

deforming the pinnas). Equalization to restore the res

onance restores natural, exterior hearing. A compli

cation is that a significant part of the resonance effect

varies with direction, so that a direction assignment

for equalization seems necessary.

A way to avoid a directional assignment has been

sought via the use of a diffuse-field reference [6]. Analternate approach uses a frontal-incidence plane wave

(free field) as the reference. The argument for the latter

is that it avoids a front-back bias while not impairing

back localizations. In a later section we find evidence

to support a variation in this free-field approach.

Th se issues take on a sharper focus if related to the

style of equalization used for the artificial head. Ob

viously, a free-field equalization for the head mandates

the corresponding free-field equalization for the ear

phones to be used with that head. In this case, a large

part of the conch resonance is removed in the head

equalization (a use of natural ear molds in artificial

heads accounts for this resonance being modeled, although modeling of canal resonance has long been

omitted) and then restored in the earphone equalization.

Presumably a similar rationale supports diffuse-field

equalization, both for the head [7] and for the earphones.

We are unable to report any complete experience with

diffuse-field equalization, but we can report remarkably

good experiences with free-field equalization.

The recording engineer should understand that the

equalizations discussed here and below are not matters

to be accommodated with the EQ facilities on a mixing

board. It is appropriate to regard these design equali

zation requirements as to be met internally, to be in

herent characteristics of the device or of an accessoryspecific to the device. In the same way, the sometimes

strenuous equalizations undertaken in some highly

valued microphones are of concern to the design en

gineer, not the recording engineer. It is sufficient if

the recording engineer deems the overall characteristic

as apt for his or her needs.

The advent of binaural monitoring will prove to be

a substantial convenience in comparison to loudspeaker

monitoring, especially for location work or other sit

uations in which access to a proper listening room is

inconvenient. Transaural monitoring (with loudspeak

ers) can, of course, be made available as needed.

J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February


3/19

PAPERS

1.5 Beginnings of Transaural Recording

Transaural stereo had its first trials in 1962 by B. S.

Atal and M. R. Schroeder [8]-[12]. They used a pow

erful (for its day) mainframe computer, an IBM 7090,

to perform digital finite-impulse response (FIR) filtering

for crosstalk "planting" combined with equalization,

using functions derived from testing an artificial head.

They also saw binaural synthesis as a related process,

and they were striving for a spatial synthesis of rever

beration. Their transaural trials, however, were designed

to reproduce the actual sounds of known concert halls

and were based on binaural recordings made in those

halls. In those days, earphone listening produced only

interior sounds, so that transaural conversion was

mandatory for their purposes. The Atal-Schroeder re

sults were described [12] as "nothing less than amazing."

The listener experienced authentic, exterior, spatial

envelopment as well as authoritative imaging to the

front and sides, in elevation, and even behind.

Unfortunately the reports of this work left lasting

impressions of a heroic technology producing fragile

results: the listening space had to be anechoic, and thelistener could not move by more than about 10 or

3 in (75 mm) without spoiling the effect. Later work

by Damaske [13], with "90 crosstalk filters," a code

word designation, did little to dispel these disheartening

impressions. He found that reverberant listening spaces

degraded the effect, damaging side imaging and causing

front-back ambiguity. Other work over the past quarter

century, including the Q-Biphonic development [14],

has not significantly advanced the technology nor

changed overall impressions of dim prospects for trans

aural recording.

1.6 Present ProspectsBrightened prospects are suggested by our work,

reported herewith. By casting the crosstalk-canceling

filters in shuftler form, we are able to greatly simplify

the technology: a handful of operational-amplifier chips,

or the equivalent on a digital signal-processing chip,

suffice. This economy (not having to use FIR filters)

is a consequence of our discovery that the shuftler con

sists entirely of minimum-phase filters. The simplifi

cation also reveals a structure that allows secure control

over the design of equalization as independent of the

crosstalk canceling. Thus we are able to simplify the

crosstalk function, more particularly at short wave

lengths, to make the effect of cancellation quite tolerant

of listener movement.

Listeners find that a 30 head rotation produces a

benign, albeit noticeable to some, change in auditory

perspective. Imaging at 90 is less tolerant. Com

parable effects are noticed for lateral movement over

a range equal to the loudspeaker spacing, but there is

more tolerance for forward-backward motion. We have

no data for transaural systems designed for a wider

loudspeaker spacing, and we are not entirely satisfied

with explanations we offer in Sec. 1.7. Perhaps some

credit is owed to good equalization. 1


PROSPECTS FOR TRANSAURAL RECORDING

The significance of equalization has become clear

to us through o1r.experiences with recordings made

with a Neumann KU-80 head, which is equalized tqprovide a correct ear-canal entrance signal, arid with

the Aachen head (Aachener Kopf, or AK) devised by

Gierlich and Genuit [15], which is equalized for a flat

free-field response for frontal incidence. Recordings

from the KU-80 are unfit for loudspeaker playback (the

KU-81 should do better), showing a poor stereo effect

with a very large "hole in the middle," while the same

playback from the AK shows a stereo excellence un

attainable by any other known stereo array.

With equalizations as given, crosstalk cancellation,

besides revealing the qualities noted by Atal and

Schroeder, actually "covers" the hole in the KU-80

presentation, while for the AK, it corrects image

placements, sharpens the images, extends the range of

image placement, removes front-back ambiguity, ex

tends the perception of depth, and completes the spatial

envelopment. Thus the equalization we would have

used (see explanation in Sec. 1.5), based on free-field

incidence at 30 and being nearly the same as in the

AK, was confirmed in its correctness.As a result of our experience with actual trials of

differing equalizations in different rooms, we are in a

position to be more precise than Damaske could be,

about the significance of listening-space acoustics. It

is, in fact, a misunderstanding that an anechoic space

need be used. Atal .and Schroeder egarded listener

space reverberation to be a contaminant in their studies

of concert-hall sound, and they wished to exclude it.

Specifically, we did identify one minor aspect of lis

tener-space acoustics, one easily avoided, that accounts

for the effects noted by Damaske.

Jhe integrity of the crosstalk paths from loudspeakers

to ears can be compromised by competing ttfl.ectedpaths that differ in delay from the primary paths by

amounts of less than 1 (or perhaps 2) ms. Substantial

contributions from such paths can begin to impair side

imaging and allow some appearance of front-back

ambiguity. Ordinary care taken in the setup to avoid

significant early-reflection paths obviates any delete

rious effects. Longer delayed reflections merely appear

as "early" reflections in the concert-hall sense. These

are attributed by the listener to the performing space,

usually as minor augmentations in its reverberance.

Ensuring good equalization guarantees that if a user

is so careless with the setup as to allow early reflections,

the playback of a transaural recording will exhibit a

gradual degradation from a quality that is "nothing less

than amazing" to one that is at least "excellent."

1.7 Summary

The principal purpose of this paper is to report on

improvements we have discovered in a particular signal

processing scheme, the crosstalk-canceling scheme of

Atal and Schroeder. These improvements, which are

largely practical, offer the possibility of a significant

restructuring of stereo recording to make for extraor

dinary improvements in stereo quality.

5


4/19

COOPER AND BAUCK

We have cited economies in processing following

from the discovery that minimum-phase filters may be

used exclusively, have cited a revealed structure that

allows the shaping of crosstalk filters independently

of equalization filters, and have cited bases for robust

results, in contrast to a previously supposed fragility.

This robustness consists of a tolerance for listener

movement and a tolerance for nonanechoic listening

space acoustics.

In the following sections we review the Atal

Schroeder scheme and show how its lattice-arrayedfilters may be seen to be equivalent to a shuffler array,

develop the corresponding formulas, and illustrate these

with plots based on the spherical-model head. We cast

the shuffler functions in a form that exhibits a factoring

into an equalization part and a crosstalk-canceling part,

and we illustrate the significance of these with plots

taken from old data for the so-called CBS-NASA head

[16].

In so doing, we point out that crosstalk canceling is

a process that is an inverse of the process we have

called binaural synthesis, and we provide a block dia

gram of a multiple-input binaural synthesizer.

Finally, we turn to aspects of transaural technology

that are less related to recording. We introduce theper cui

concept of virtual loudspeakers, whereby a given pairof actual loudspeakers may be replaced by a number

of virtual loudspeakers at arbitrary positions. This may

be used to solve the problem of too closely spaced

loudspeakers in stereo television, for example. It also

may be used to present cinema-surround stereo via only

two loudspeakers without loss of surround effect, or

to similarly present full-sphere ambisonic surround.

Also, we can resurrect a contribution by Bauer [ 17] to

provide binaural-like listening to stereo material,

making for inexpensive, accurate "Bauer boxes."

1 CROSSTALK-CANCELING FILTERS

1.1 Atai-Schroeder Filters

The Atal-Schroeder crosstalk canceler is shown in

Fig. 1(a), adapted from [12]. In Schroeder's notation,S represents the transfer function from a source (loud

PAPERS

realizable as an FIR digital filter, or a transversal analog

filter, and so also for C2 Then 1/(1 - C2) is realized

by placing the C2 filter in a recursive loop. The terminal

filter 11S is not causal on its face, but with its impulse

response padded with sufficient delay, the same in both

channels, causal representations are obtained. These

realizations were signal-processing routines in an IBM

7090 computer.

The impulse response plotted in Fig. l(b) is of short

duration, which shows that crosstalk cancellation is

speedily completed, requiring the listening space to beanechoic for only the first few milliseconds. This is

equivalent to the finding, stated in the Introduction,

that it is sufficient, in the listening setup, to exclude

early reflection paths.

The brevity of this impulse response bears also on

questions of equalization style, as will be seen later.

1.2 Shuffler Filters

The Atal-Schroeder scheme may be seen to be

equivalent to the lattice arrangement of filters shown

in Fig. 2, provided that the filter in the cross path is

(la)

and that the one in the same-side path is

(lb)

These may be seen to be the matrix elements (S' on

the diagonal, A' on the counterdiagonal) of the 2 X 2

matrix that is inverse to the acoustic matrix evident

from Fig. 1.

The shuffler arrangement of filters, also shown in

Fig. 2, may be seen to be equivalent to the lattice.

There the filter for the sum of inputs with both parts

positive is denoted by P', while the filter for the dif

ference (sum with one input negative) has been denoted

byN'. Equivalence demands that

speaker) to a same-side (ipsilateral) ear, whileA is the

transfer function to an alternate-side (contralateral) ear.

The acoustic layout is symmetric, so that the transfer

functions from the LF loudspeaker to the ears equal

those for the RF loudspeaker. The notation C = -AIS

is used for the filter in the cross path. Elementary al gebra may be used [11] to show that a signal introduced

at the top left does indeed appear, unchanged and un

S' A'N'

2

and

S' + A'P'2

(2a)

(2b)

contaminated, at the left ear of the listener, and so

on.

Schroeder also treats the requirements of causality

(realizability) [11]. It is clear from first principles that

A involves a greater delay than does S, so that C is

Division by 2 would be omitted for difference-sum

networks designed with uniform 3-dB losses so that,

without loss of generality, we write

causal. This is also seen from Fig. 1(b), adapted from

M!IJller [18], which shows a plot of the impulse response

c(T) for the cross filter C(w), as determined for the

Neumann KU-80i head at 45 incidence. Thus C is

N'

and

N S-A(3a)

6 J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February


5/19

.A l A. ft.n_. AA\ Avv-vv1

vv V V

PAPERS PROSPECTS FOR TRANSAURAL RECORDING

P'p

S +A

(3b)was then identifi d.as being of frequency-independent

slope. The result was experimental in that mel_lsured,

and smoothed response functions were used in thecal

Thus the matrix of the shuffter transfer functions, di

agonal with elements N' and P', is the inverse of the

diagonal acoustic matrix for difference-and-sum ear

sounds with elements N = S -A and P = S +A.

1.3 Minimum-Phase Characteristics

In 1977 Mehrgardt and Mellert [19] showed experimentally that the head-related transfer functions are

of minimum phase to within a frequency-independent

delay, a delay that is incident-angle dependent. They

proceeded via the Hilbert transform of the log-magnitude

response to calculate the minimum-phase part of the

phase response. The remainder, or excess-phase part,

L

culations. Thus, Sand A have excess-phase parts that

differ in the amount of frequency-independent delay.

Considering S alone, however, Schroeder found the

delay to be ignorable for the purposes of constructing

liS, as we have seen.

To discuss pairs of filter functions, we introduce the

concept ofjoint minimum phase. To be of joint minimumphase, a set of filter functions is to have a common

excess phase, and this excess is to be a (bounded) fre

quency-independent delay. Then removing the excess

phase to a common factor leaves a delay-normalized

set of filters that are of minimum phase in the ordinary

sense. They are also at least conditionally stable, so

R

J

(a)

(/)

t:

wZ(/) :::> 0 IZ>-00::a.c:x(/)0::w ..... -1O::ii)

a:


6/19

-

COOPER AND BAUCK

that products, ratios, and reciprocals are in the set.

Thus A and S are not of joint minimum phase, and

neither areA 1 and S1

Of course, in the Atal-Schroeder filters, joint minimum phase is not at issue, since the 2 X 2 matrix has

S 1 along the diagonal andA1

along the counterdiagonal.

On the other hand, the shuffler filter has N1

and P1

along the same diagonal (the counterdiagonal is zero)

so that it would seem odd if N 1 and P1

were not of joint

minimum phase-odd because then the difference signal

would be required to become more and more out of

step with the sum, as frequency increases. However

that may be, we made the same sort of check that

Mehrgardt and Mellert had made and found that N1

andP 1 are indeed of joint minimum phase, so also for

Nand P.

A practical consequence is that magnitudes alone,

1Nl and IPI, or their reciprocals are a sufficient spec

ification (phase is redundant as being calculable by

Hilbert transform), whether for filter synthesis or in

determining the head functions to be measured exper

imentally. Also, since any common frequency-inde

pendent delay may be omitted, the programming andhardware requirements of an FIR realization are re

duced. In fact, a non-FIR (or IIR) filter may be pro

grammed at low cost. Successive fitting of a cascade

of biquadratic forms (ratios of frequency-dependent,

second-order transfer functions) is a natural approach,

and these take scarcely more than a half-dozen lines

of code each in a typical DSP chip. In analog-filter

synthesis, the "biquad" is also a natural choice of syn

thesis element.

1.4 Structure of 1Nl and IPI

Head data are most often measured in the form of

lA I, IS I, and IT I, of which the last is the interauralphase delay, redundant in that part calculable from the

Lattice Shuffler

PAPERS

Hilbert transform of log lA! SI. Nevertheless, it is easy

to see that these data are sufficient to determine 1Nl

and I PI, since the magnitudes of the phasor difference

and sum are, according to the triimgle rule, simply

1Nl= (IAI 2 + ISI 2 - 2IASI cos w-r)l-2 (4a)

and

IPI =


7/19

PAPERS PROSPECTS FOR TRANSAURAL RECORDING

p

E(eo)(8b)

our loudspeaker playback of recordings from the Neu

mann KU-80 h;a l. As indicated in the Introduction,

the stereo effect"was of a "hole in the middle you could

The reference direction has been taken to be 0 for the

AK (Aachen head), but for loudspeakers to be placed

at 30, a 30 reference would be more appropriate.

When plotted with the same incidence angle as the

reference, the frequency-response curves for I 0NI and

I0PI intersect one another at the constant level 0 dB.

Such a plot for a spherical-model head [2], [3] is shown

in Fig. 3, solid line (a). These plots are actually for

the reciprocals lWNI and lii0PI, as would be used ina crosstalk canceler, but the decibel scale makes it easy

to interpret the plots also for the direct functions loNI

and loP I. The dashed line shows a possible modification,

to be discussed later. Curve (b) shows the equalization

11IE 1. A crosstalk canceler based on these curves hasbeen tried, and its performance is extremely satisfying.

Plots for more realistic models of the human head

(see Fig. 13) resemble the solid-line plot (a) in Fig. 3

but differ remarkably in equalization from plot (b).

The reason is that the spherical-model head is one whose

functions are quite smooth [20] because pinnas areomitted. Inclusion of pinnas on a realistic model invokes

the large conch resonance, profoundly altering the 11curve.

For example, data [21] for the CBS-NASA head pro

vide the equalization curves of Fig. 4. A curve for 30

and one for 0 are shown. For clarity, the plot was

made with a 3-dB displacement inserted between the

curves. It will be seen that the curves differ by little

in comparison to the range of variation that they en

compass. Thus a 0 equalization could substitute for a

30 one with little effect, but omission of such equal

ization would be a serious matter.

The seriousness of the matter became evident from

12r--r-r-- ---- -r

10

8

6r----+ -- ---+ -- ----

drive a truck through," as one listener said. Wli n co

verted to transaural, using the crosstalk canceler built

with the functions of Fig. 3, Schroeder's description

of "nothing less than amazing" spatial and imaging

qualities certainly applied, but it was possible to notice

that the equalization was "a little off." Later, the ap

pearance of a "hole" tendency in this recording would

alert us to early reflections in a listening setup. As wealso noted, recordings from the Aachen head (0 equal

ization) provided stereo of unequaled excellence by

ordinary standards. Certainly, no "hole" was observed,

even without cancellation.

1.6 System Transfer Functions

In the following, M will be used to designate either

N or P. It will be understood to be a function of fre

quency and incidence angle. Thus for natural directional

hearing, either member of the pair of overall transfer

functions from a source at angle e to the ears is designated

Hn = Mn(e) (9)

and it is the transfer function for the difference or sum

in ear signals, depending on whetherNor Pis substituted

for M. The sources are to be consiaered one at a time,

whether a direct source or one of the many components

of reverberation. Superposition is applicable in linear

acoustics. The subscript n is used to designate a natural

head, the head of the listener. A signal-theoretic basis

for understanding directional hearing would begin at

tbis point.

.I8.--- --,- -rrT ---- -r-r-r

6r--- ------ ----+----- ---- --

CD

4

2

j

Fig. 3. Shuffier filter characteristics in crosstalk canceling for spherical-model head. (a) Magnitudes of l!N and 1/Pnormalized against curve shown in (b). Because curves (a)are free of the idiosyncratic detail for specific heads (as inFig. 13), such characteristics are tolerant of variations inlistener-head shapes and positions. Dashed curves show apossible modification of the envelope of the alteri:i'ations.Because the filters are of joint minimum phase, the ',Phasedata are redundant and not shown.


Frequency

Fig. 4. Equalization curves from data [21] for the CBS-NASAhead [ 16]. Curves for both 30 and 0 reference directionsare shown, displaced from one another by 3 dB. The decibelrange of variation (conch resonance) greatly exceeds anydifference.

9


8/19

COOPER AND BAUCK

For listening via loudspeakers to a binaural recording,

the transfer functions (subscript b) are designated as

(10)

in which the artificial head (or equivalent in binaural

synthesis), subscript a, is designated as being equalized

for the loudspeaker positions So. This equalization

prevents the nondirectional part of the conch resonance,

already present in the listener's ears, from being in

troduced a second time, a minimum requirement of

equalization.

In this instance, the sounds at the listener's ears will

be drawn from an extremely restricted set, in comparison

to all possible sound combinations. The restriction is

to that of linear combinations of the sounds at the ears

of the artificial head, namely "shuffled" ear sounds,

but combinations that otherwise closely resemble ear

sounds themselves.

For a reasonably apt artificial head the joint angle

dependent spectral magnitudes will be determined by

the artificial head, except for shuffling, to closely re

semble those of natural hearing. The result, as confirmedin listening, is an extremely plausible directional por

trayal, much more so than available by any means in

conventional stereo. Even so, listeners do greatly ap

preciate the improvement they experience in being

provided "unshuffled" ear sounds, those that embody

the alternations in their difference-and-sum spectra that

are characteristics of their own ear sounds. Since these

alternations depend strongly on the cosine of the in

teraural phase, the significance of this element is con

firmed. This unshuffling is provided in the crosstalk

cancellation of transaural stereo.

For transaural listening, the transfer function (sub

script t) is

H = 0Ma(S)Mn(S0)t oMx(So)

(11)

in which subscript x designates transfer functions of

the head used to model the crosstalk cancellation. This

equation shows the use of equalized functions, for the

reference direction S0 , for both Ma and Mx If thesediffer in any of their characteristics, each is to be

equalized against its own characteristics. The appear

ance of a conch resonance (for So) is, as in the above,

reserved for the listener's head M0

IfMx is the same asM 0, for example, then Eq. (11)describes the simulation of natural directional hearing

except for the substitution of the artificial head (and

ears) for the listener's own. Clearly, if all three heads

are the same, then Eq. (11) is identical to Eq. (9), and

an exact simulation of natural hearing would be the

result.

In this last statement direct proof is seen of what is

generally regarded as the "unquestionable validity" of

the transaural plan for stereo recording and reproduc

tion. Of course, this provides the transaural design

engineer an extremely strong vantage position from

10

PAPERS

which to undeitake departures in the service of prac

ticality. It is usually one of the strengths of starting

from an optimal position that departures from the op

timum in design parameters usually produce remarkably

small effects.

1.7 Practical Design Considerations

Except for a custom-designed crosstalk canceler, it

is not to be expected that Mx will be the same asM 0,

and a commercial release of a transaural recording would

have to embody an Mx that would be required to besatisfactory for a wide range of listener heads, each

with its own M 0 Generally this is not a difficult re

quirement. It has been found, for example, that the

crosstalk canceler based on a spherical-model head [2],

[3] produces immensely satisfying results for a wide

range of listeners' heads. Heads that are somewhat

small may be placed somewhat nearer the loudspeakers,

and those that are somewhat large may be placed at a

somewhat greater distance, as may be seen from the

structure of head functions, but the exact placement

does not seem to be a critical matter for most listeners.

What is probably the case is not that a sphere is

necessarily a best fit, but that it is a "comfortable" fit

for most heads just because of its inexactness. While

the advantages of inexactness merit further exploration,

we have tried another aspect for inexact treatment, the

domain of wavelengths shorter than about 50 mm (fre

quencies higher than about 6 kHz). The first experi

mental crosstalk-canceler filters followed, after a

somewhat abrupt transition, the null-crosstalk contour

of Eq. (6) for the shorter wavelengths. We attribute

the tolerance in listener movement to these aspects of

inexactness inMx filters.

The choice of a rather abrupt "cut" in our first ex

perimental canceler may have been somewhat extreme.We do notice a tendency for sibilantlike sounds and

clicklike sounds to be mislocated, generally toward

the front. This is a confirmation, extended to short

wavelengths, of the importance of interaural phase.

Although this style of design variation has proved in

structive, we are now inclined to rely on a more uniform

distribution of inexactness, of which the spherical

functions are a good example. Another variation of

interest is that of introducing a gradual taper, as shown

in Fig. 3(a), dashed line, wherein the upper and lower

envelopes approach the null-crosstalk contour in a

somewhat less accelerated manner for short wave

lengths, replacing the more abrupt cut.We visualize these styles of inexactness as defining

a volume of space near each ear of the listener, a space

over which cancellation is satisfactorily accurate. We

visualize this volume as being of smaller extent for the

shorter wavelengths, and we suppose that it is appro

priate to be less exact at these shorter wavelengths.

We also believe, despite our successes with spherical

functions, that we need to continue to investigate this

problem. Thus the tolerance we have gained for listener

movement, already satisfactory for most purposes, may

be extended.

J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February


9/19

PAPERS

2 BINAURAL SYNTHESIS

2.1 Synthesis Filters

Shuffler filters based on the direct functions N and

P are used to simulate the progress around the head tothe ears of two sounds of incidence angles ei at once.Instead of an inverse crosstalk filter, it is the direct

crosstalk filter that is to be constructed. Of course, if

only one of these signals is desired, one of the inputs

to the shuffler may be left silent. Degenerate forms of

the shuffler are used for 0 and 180.

The shuffler synthesizer implements Eq. (9) in effect,

but provides equalized ear signals instead, thus actually

simulating the use of an equalized artificial head. The

transfer function may be written

(12)

using the same convention thatM may stand for either

of Nor P. The subscript s is used to denote head func

tions used in synthesis, even though these might have

been measured for an artificial head, a natural head,

or derived from a mathematical model. The transferfunction is for a source simulated at position eio wherei is a symbol for the indexing over a discrete set of

incidence angles.

This transfer function may be written in greater detail

as


particular, "tonal-color" characteristics, for the two

ears jointly, are represented by the factor oE. It seems

that this color is used in a part of the directional earing

process (spectral pattern recognition) at a level below

consciousness, but that only the directional result is

presented at the level of consciousness. For example,

speaking voices from behind would have an extremely

"hollow" sound, as will be seen, if the hearing mech

anism did not function as indicated.

This hollowness can be heard only under exceptional

circumstances, such as a binaural recording played

without crosstalk cancellation. In this example, a voice

was recorded with the AK while the speaking person

moved around the artificial head. Listeners heard the

voice move outside the space between the 30 loud

speakers, barely into the side quadrants, in the listener

space. While the original movement had been through

to the back quadrant, the listeners heard movement

that turned forward again into the front quadrant, but

with an altered vocal quality, that "hollow" quality.

Some listeners, when particularly neutral, transparent

sounding loudspeakers were used, would hear the voice

"jump" to the back quadrant before the change in qualityhad become explicit. The listeners that stayed with the

frontal localization presumably did so because of the

"visual knowledge" that the loudspeakers were in front

and because the equalization of the AK is not exactly

suited to 30 presentation. With, crosstalk cancel

lation, the transitioq to the back quadrant was char

H = (6i)Ms(6i)s (60 )(6i)

which may be written in factored form as

in which the factors are

(6i)

(6o)

and

(13)

(14)

(15)

(16)

acterized by continuity.

Changes in tonal color for sounds presented in ele

vation seem also to be responsible for impressions of

elevated localization, and, again, the coloration is pre

vented from appearing in consciousness. One of us

re


10/19

""'-- - " \

(\

I 1/' !I

"'I

500l I \

rt \

I

I II

COOPER AND BAUCK

exclusively of the phasor interaural transmission ratio

AIS. Its magnitude determines K, and its phase is equal

to the argument of the cosine in Eq. ( 17). Thus 'M and

AI S are essentially equivalent signal-theoretic bases

for their role in directional hearing. This role appears

largely to be the determination of the lateral aspect of

the localization angle, as distinguished from front

back and elevation aspects . Plots of ISIA I and interauralphase delay may be found in Mertens [22]. These are

adapted here as Fig. 5, showing interaural phase delay

in microseconds, and Fig. 6, showing the interaurallevel difference ISIAI in decibel units, both plotted

versus incidence angle.

The plot of interaural phase delay, Fig. 5, shows

clearly that a substantial part of that delay is frequency

independent, seeming to plot a trend toward a high

frequency-limit curve (lower bound) that is ramplike

PAPERS

in increase and decrease. A ray-acoustic approximation[3, Fig. l(b)] of this limit is (alc)(Sc -lee- SI+ sin8), where a is the head radius, for example, 90 mm; c

is the speed of sound, for example, 345 mls; and Se =-rrl2. This is 671 f..LS at 8 = Se Increments in midfrequency delay arise from Hilbert transforms of log ISIAj. The Rayleigh, low-frequency, total-delay limit

3(alc) sin 8, with 3alc = 783 f..LS, disagrees With the140-Hz plot. Heads other than this one of papier

mache agree more closely with Rayleigh.

However that may be, it is seen from Fig. 5 that the interaural phase delays for directions less than 90 very

nearly mirror those for directions exceeding 90. Thus

it may be supposed that interaural delay is not relied

upon, to any great extent, in distinguishing sounds in

back from those in front. On the other hand, it is seen

that interaural delay is a steep function of direction,

1300

1200

1100

1000

,.,+--l I I I I

I/ I

" I II / \ II I I I \ II I I '\ I I

f I \\

I

- 9001/1

j.

>-

wVl 700

I / \I I /- 1\I 140 / .- --

-..' '\

\

I I 31 --- - \I

/ /;..,,

I ',,\\ \

w

I \

\ \j \ \'

/ I oo I AV \1\ Iuzwa::w

I 1\ '\ \\ \1/ - \u..u..Ci 400

I

!-'rli .,\ ' \. \'

. 'I. . I7f.. 7800 1\[\. \\

300

I /,.-

I 1;' .M 2200 - \71. iVi '"- \

200

100

/,: lti'--. 1100 '\.ll';(1 \ -;(/; 4200 \ \!M

y ,-

00 20 40 60 80 100 120

INCIDENCE ANGLE (degl

140 160

\\180

Fig. 5. Plots of interaural difference in phase delay versus angle of incidence for various frequencies (Hz). Diffraction theoryindicates a maximum delay at low frequencies that is much less than shown for 140Hz. A low-frequency mechanical resonancein this papier-mache head is suspected. Adapted from [22].

12 J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February


11/19

I I

! !

20

'\

I

: I

A \ I

'

I

,.,

I --.

26

24

22

I

10I

!8 .

I

:6I I

PAPERS

for directions in front, and a steep function again, for

directions in back. Thus a strong reliance may be placed

on interaural phase, from a signal-theoretic point of

view, for a precise determination of the lateral com

ponent of direction.

In'Eq. (17) it is seen that interaural phase appears,

for the sum-and-difference ear sounds, as an alternation

in their spectral magnitudes because of the cosine. The

extent to which the alternation appears does depend

on the value of K. The upper and lower envelopes of

these alternations, the extrema in I 'MI.are given by

I 'Mlex = (1 K)'h (20)

in which K is determined from ISIA I, as shown in Eq.(18). Thus Fig. 6 may be studied to estimate the di

rectional dependence of these envelope functions. To

assist in this estimation, envelope contours for the three


highest curves in Fig. 6 have been computed, with the

results plotted i i Fig. 7.

Generally it may be said that Figs. 6 and 7 show that

the upper and lower envelope contours lie closer together.

at higher frequencies and for directions near 90, con

ditions making for the deepest head shadow for the

contralateral ear. Also, the alternations in magnitude,

along the frequency scale, are most rapid near 90.

Toward the front, less than 60, the alternations are

largest in magnitude, as they also are toward the back,

beyond 120. Thus the most reliance, for directional

hearing, on interaural phase should lie toward the front

or toward the back, and there should be somewhat less

reliance on interaural phase near 90, but only at the

higher frequencies. At 4 kHz and below, the reliance

would appear to be substantial. Some front-back

asymmetry is seen in these curves, but it is not clear

whether directional hearing can rely on these asym-

28

II

I I I i I \ I I I

i

I I I

I II

I I \ I I II ! i i I I 'II

I! I I ,t, I ,I I I I I

I I I I 1 1 \ I I I 1\1 II I I /I \ I \ ( II I f 'I II : I I 1

I I I ,! w'I I II I I 'Ji ! I I ;

! I : I I \CO 18'0

....J

16UJ....J

z

I I I I I I I I I Ii I r I I !\ I I J

I 8obl I I ! \ \I I i \

H 14UJu

I..

I ', ,_J1\ I I h \1/ \ -f.,./' .\lJ

,z

12u..

.r I 1\ .i I I

a I4Zoq !If \ i I \ i \\

I I : I Y. \ ! V I\

I /!J'I09 f\1 \ 1\ I .,I '/1 I' I{ \,j I \l /: j

I

\ !\

J

I I

I

I 1 lII

..-2(0 \

'I I

:/ 'i/.-

',...\ -r 1,

v \\,\_

'/ft .. -)0 ,j \'- rA4

I 1:f,Y --. -- --../1/ I

2 I

'J. \\\\

/) I 310 \

0 =-- L - ---!--- ,.. \ '

0 20 40 60 80 100 120 140

INc;IDENCE ANGLE (deg)

160 180

Fig. 6. Plots of interaural difference in leveliStAI ver ys angle of incidence for various frequencies (Hz). Adapted from

[22].

13


12/19


13/19

.

/' t ......

7

i

2

COOPER AND BAUCK

metries to resolve front-back determinations. These

asymmetries are not consistent with frequency, to be

sure, but one should be wary of hearing's potential of

making much of seemingly insignificant, even idio

syncratic, detail. Nevertheless it behooves us to look

elsewhere for the front-back cues.

We have used head data supplied by Torick [21] to

plot 0(6) against frequency for a reference direction

60 = 90, a medial angle. Plots for 6 of 120, 150, and 180, a back-angle family, are shown in Fig. 8, while plots for the front-angle family, 0, 30, and 60,

PAPERS

are shown in Fig. 9. In Fig. 8 it is seen that differences

in spectral transmission between back angles are not

very large, and are mostly in the region above 4kHz.

Between front angles, the differences are seen from

Fig. 9 to be not very large either, and are mostly in

the "presence" region from 1 to about 4kHz. Thus we

come to the conclusion that there is little to rely upon,

in terms of spectral color, for distinguishing among

the back angles or among the front angles, although

we cannot dismiss the possibility entirely that some

reliance is placed there. It is clear, however, that the

3

-=- ............. f-"";.

,._, -

-......: _...... .-:-.,I ''..._ ... ....\

...............}-.. . /./... ......... ----

./.,

178oot-- J .........../r t---...i

--V

"' \\

j-2

J;200

7;

I'

\

.... / ...... '

\ \

{/ ocl ...._.1 \ \CD

/

-6 J .,

Vr \ Ii\7

I \ I

/,\ \i \\ l V\

\ I1i \

V \ !\ 1/i \I j V i. \ \

fli\. !

. I \0 20 40 60 80 100 120

INCIDENCE ANGLE (deg)

140

'"

" 160 180

Fig. 7. Plots of alternation envelopes square root of 1 K, versus angle of incidence for three highest frequencies of Fig.

6.

8

2r----r-- rn---- --r- -n

or- - --

-2CD

6

4

2

CD 03

-.:::::-

3-4

-6 ---+------4-----+--- ---+

- -2(I)

>

j -4(I)

_J

-8-10

-12L,--*-..l.-..l.....,..J.,,.W.....L...I..-!---+- Y....J.....:.'-W.:',0.1


14/19

-

6-8

-10

Frequency

Fig. 8. Plots of normalized rms transmission to two earsjointly for a family of back angles of incidence. The reference

-120.1 0.2

Frequency

direction for normalization is taken to be 90. Data are forthe CBS-NASA head. The contrast with Fig. 9 is remarkable.

14

Fig. 9. Plots as in Fig. 8, except that a family of front anglesof incidence are shown. The contrast with Fig. 8 is remarkable.



15/19

1-'AI-'t:H:>

two families are remarkably different from each other.

A direct indication of front-back difference may be

shown as in Fig. 10, plots of 0(6) for the reference

direction of 0 and e equal to 90' 120' 150' and180. This is the front-back difference against 0. These

curves indicate the front-back difference as charac

terized by a marked depression in level in the range

from about 1 to 6 kHz, along with elevations that are

equally striking in the range from about 6 to 10kHz.

The 90 curve is included as a "back" spectrum because

its shape qualifies it as a family member and one whoseemphasis on high frequencies tends to elevate interaural

level difference to an importance not identified else

where. It is not difficult to imagine the "hollow" sound,

as discussed above, that these transmission character

istics would cause if they were ordinarily consciously

heard. This altered spectral quality does indeed, how

ever, appear to be the principal determinant of back

sound in discrimination from front sound.

This review of hearing characteristics allows certain

rules of thumb to be identified, namely, that interaural

phase, represented as amplitude alternations in the

spectra of difference-and-sum ear signals, is the dis

criminant for the lateral component of image position,

while variations in joint spectral transmission are the

discriminant for the front-back component and, pre

sumably, for the elevation component. However, it

also allows certain areas of uncertainty to be noted.

These various points of observation will be of limited

help in the design of the binaural-synthesis filters be

cause the best rule will doubtlessly prove to be slavish

simulations of the best measured head-related transfer

functions available. At least these observations, together

with those the designer's own experience may develop,

will make for a certain intelligent slavishness.

12.---.--.-.-. .----.--r-r -rrrn

10

8

6 -- ------+--- -----r------r+

4

CD 2


2.3 Basic Synthesis Array'

An array of filters for the sum and difference signals

is shown in Fig. 11. On the right-hand side, several

inputs are designated, one for each of the incidence

angles to be simulated. For each left-angle input, a

matching input is shown for the symmetric right-hand

angle, and sum and difference signals are shown as

being formed from these symmetric inputs. The signal

pairs are then transmitted through 0N and op filters, 0N

for difference signals and op for sum signals. Each ofthese filters is designed to match the specific angle

designations for the inputs, 0N(6i) and0P(6i) separately

for each Si designated at the input. The filtered difference

signals are then combined in a common sum, and the

filtered sum signals are then combined in a common

sum. These common sums are then further combined

in difference-and-sum fashion to form simulated bi

naural outputs.

This basic array may be thought of as a discrete

angle, binaural, panoramic mixer. Variations may in

clude linear mixing arrangements and level adjustments

to be provided at each of the inputs. Also, some of the

inputs may receive outputs from an array of reverber

ators to form synthetic binaural-space reverberation

systems in the manner of Kendall et al. [1]. Low-cost

versions could provide for a very limited set of angles

as supplementary pa functions for tise with standardmixing boards, and transaural outputs could be provided

oo.---'1'------------'1'----1'----o Inputs

R

-85

-90

Outputs

Frequency

Fig. 10. Plots showing front-back difference as a family ofback-angle joint transmissions normalized against transmi sion for the incidence angle of 0, taken as the referencedirection.

J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 J nuary/February

Fig. 11. Basic binaural synthesizer. Inputs for a discrete,symmetric set of simulated incidence angles are sho"Yn: Amultiplicity of shu fier filter , J; ased on head tran.smtsswnfunctions, each spectfic to the mctdence angle to be stmulated, are used. The output is binaural.

Hi


16/19

1 Ensemble

COOPER AND BAUCK

for productions that are being developed primarily in

conventional stereo. Further variations may be con

ceived. Some of these are described hereafter.

3 PROSPECTIVE PRODUCTION PROCESSORS

3.1 Equalizers

All commerically available artificial heads of quality

suitable for professional applications stand in need of

further equalization, since neither ear-canal nor diffuse

field equalizations are appropriate in transaural re

cording. Ear resonances can be allowed only once in

the signal chain-in the listener's ear. Also, Fig. 1(b)

shows that directional information is recieved too rap

idly for any diffuseness to develop.Among those heads providing ear-canal signals, we

list Neumann KU-80, KEMAR, 1 B&K 4128, and the

Aachen head (AK). The extent of the 30 free-field

equalization required may be estimated from Fig. 4,

although those data are not specific to these heads. The

Aachen head is also available with external free-field

equalization for 0 incidence. The diffuse-field equal

ization devised by Killion [7] for KEMAR, and providedby Neumann for their newer KU-81, reduces further

equalization needs to moderate corrections, as is also

true of the AK. Such further equalization may be pro

vided by the manufacturer or a third party.

All commercially available earphones for binaural

monitoring similarly stand in need of 30 free-field

equalization. A few manufacturers are currently pro

viding diffuse-field equalization [6], and an (inadvis

able) interest in such standardization continues [23].

We are aware of only one earphone set, the Stax Pro

Lambda, that has been accurately equalized against a

free-field reference by a third party [15], but for 0,

not 30. A decision by a third party to supply externalequalization for any but a selected few models entails

a substantial risk that only professional needs could

justify. Volume distribution of earphones suitably

equalized by the manufacturer probably lies some dis

tance in the future.

3.2 Monitoring

Facilities for earphone monitoring require 30 free

field equalization as above, if it is not internal to the

earphone. If the program material to be monitored is

in the form of loudspeaker signals (whether transaural

or conventional stereo), there would also be needed a

binaural-synthesizer version of a circuit devised by

Bauer [17], the so-called Bauer box. The two inputs

would be processed to simulate 30.

Loudspeaker monitoring would require transaural

monitor equipment to derive the proper signals from

binaural material. It could embody a crosstalk canceler

of standard grade adopted for mass distribution. Some

means of assurance of adherence to a standard would

be needed for full reliance on such monitoring. Also,

1 KEMAR is a registered trademark ofKnowles Electronics.

16

PAPERS

the prospective availability of postproduction consumer

equipment, such as Bauer boxes and loudspeaker

placement compensators described below, requires such

standardization.

The acoustic characteristics of a loudspeaker-mon

itoring facility demand the usual attention. In addition,

the most accurate, most transparent, most self-effacing

loudspeakers should be chosen for such use and placed

to avoid early reflections.

3.3 Haii-Sound-Pickup SynthesizerAn arrangement involving hall-sound-pickup mi

crophones is shown in Fig. 12. Two omnidirectional

microphones are flanking the artificial head. The signals

from these are delayed and provided to a binaural syn

thesizer. The latter may need inputs only for 90,

120, and 150 to provide sufficient flexibility, es

pecially if more than two hall-sound-pickup micro

phones are needed in particular halls.

For the flanking microphones not too far back from

the orchestra, the hall-sound pickup would enhance

early reflections (concert-hall concept) in the 10-20-

ms range, and 90 synthesis angles would be suitable,along with a choice of delay only somewhat more than

the microphone-head distance. For microphones placed

far enough back to represent the whole reverberation

field, synthesis would be at 120, with a delay some

what more than the microphone-head distance. The

150 synthesis angle would probably be used infre

quently. The relative level would follow the usual pre

scription of several decibels below that for a plainly

audible effect. For good concert halls, an almost sub

liminal contribution, if any, would be sufficient.

3.4 Transaural Panoramic Mixer

A transaural panoramic mixer is meant primarily asa supplement to the pan functions of an ordinary stereo

mixer. It would be capable of replacing some of the

existing facilities solely to enhance the imaging qualities

by accurate synthesis for a limited number of channels,

or for special effects. A transaural converter would be

a part of the equipment.

r -------------------------- 1

L-------------------------

Art.

Head

Binaural OutputL--::...._--.J _r--u

Fig. 12. Layout for use of a hall-sound-pickup synthesizer.Flanking-microphone signals are delayed and subjected tobinaural synthesis simulating incidence angles from a limitedset of back angles. The binaural signals so derived are mixedat reduced level with the signals from the main-pickup artificialhead.



17/19

PAPERS

3.5 Binaural Panoramic Mixer

A binaural panoramic mixer would be a full elabo

ration of the basic synthesis array discussed in con

nection with Fig. 11. It would otherwise correspond

to a full stereo mixing console, except that binaural

pan functions would be used and the signals would be

in binaural format. Monitoring would be possible in

either binaural or transaural format.

3.6 Transaural Converter

Transaural conversion need be done only once, except

for monitoring, in the processing of a complete pro

duction, and there are good reasons for doing it only

once. The conversion would adhere to standards spec

ified for mass distribution, and it would be executed

in an off-line facility capable of providing standards

assurance. At present, many producers use an off-line

facility for the conversion of digital masters to a final for

mat as release masters. A similar concept applies here.

3.7 Processing Technology

All of these processing concepts may be realized ineither digital or analog form. Conversions between an

alog and digital data streams are, of course, to be kept

to a minimum, and this consideration will determine

the technology to be used in each instance. Equipment

for some of the processing steps should be made avail

able in both technologies.

4 VIRTUAL LOUDSPEAKERS

A virtual loudspeaker is a transaural image synthe

sized to simulate the effect of a loudspeaker placed at

a specified image location. The process involves bi

naural synthesis followed by transaural conversion. For example, an experimental processor has been con

structed that makes a pair of loudspeakers placed at

15 sound as if the loudspeakers had been placed at

30. Applications are indicated below.

4.1 Correction of Loudspeaker Placement

Some users may find that a loudspeaker placement

that is convenient for their listening-room layout, and

that avoids early reflections, may make for an incon

venient listening position unless the equal-distance 30

rule is violated. In such cases, virtual-loudspeaker

electronics can provide a 30 impression for loud

speakers placed at other angles. An adjustment for un

equal distances may also be provided.

4.2 TV Expander

Another example of correction of loudspeaker

placement is found in the so-called TV expander. Tele

vision receivers usually offer cabinet-mounted loud

speakers that are spaced much too close together (less

than 15) to provide a good stereo effect. Present

day TV expanders, usually involving some kind.of ad

hoc processing of the difference channel, are to '.im

precise to preserve the producer's intentions. The vir-



tual-loudspeaker expander is, in principle, exact.

4.3 Centered Virtual Loudspeaker

Sound systems for large-screen television applications.

often lack a facility found in cinema exhibition, a cen

tered, behind-the-screen loudspeaker, important for a

realistic presentation of dialogue. The substitute phan

tom image from two loudspeakers unfortunately does

not sound the same as that single loudspeaker. A cen

tered virtual loudspeaker would be a significant im

provement.

4.4 Virtual Loudspeakers in Back

Some television sound systems are designed to supply

special-effects signals (derived from cinema sound

tracks) to loudspeakers placed behind the viewer. Un

fortunately many viewers cannot provide space behind

their favorite viewing position nor bear the expense of

such loudspeakers. Virtual loudspeakers may be sub

stituted. Similarly, certain ambience-enhancing systems

require loudspeakers placed behind the listener. These

also can be virtual.

4.5 Surround Stereo

Ear-sound-oriented, transaural stereo is a full-sphere

(includes imaging in elevation) surround-stereo system.

While it is most naturally used as a straightforward

enhancement of the basic virtues of qonventional stereo,

it may certainly be u:;ed to provide any of the astonishing

demonstrations of loudspeaker-oriented quadraphonic

systems of a previous era.

An exemplary sound-field-oriented surround-stereo

system [3] is the Ambisonic system UHJ, for which a

substantial body of program material, in full-sphere B

format [24], exists. Some of this may be recast, using

virtual-loudspeaker processing, for rerelease,fn transaural format.

5 INSTRUMENTATION-GRADE CANCELER

A need exists for a crosstalk canceler satisfying the

original aspirations of Atal and Schroeder. Accurate

documentation of the subjective experience of a sonic

event requires an instrumentation-grade artificial head

and recording means, together with an acoustic pre

sentation means of equal quality. Loudspeaker presen

tation through an instrumentation-grade crosstalk can

celer is the option that will provide full assurance that

the sounds will be heard as exterior to the listener'shead.

Such a canceler will use head functions as closely

modeled on a replica of a representative head as possible

and, where necessary, will use data taken for a specific

listener. An example of canceler curves for a specific

head is shown in Fig. 13. A digital canceler would be

able to accept data files for different listeners and adjust

the filters accordingly. In any case, the canceler would

be accurately faithful to its head model over the whole

audio-frequency range.

Applications abound in environmental acoustics,

17


18/19

COOPER AND BAUCK

psychoacoustic and otological research laboratories,

and audiometric and otological clinics, to name a few.

In critical applications, replacement of earphones of

dubious characteristics and flawed exteriorization could

prove decisive.

6 CONCLUSIONS

We have shown that crosstalk canceling of well-pre

pared binaural-stereo program material, to make trans

aural recordings, can be accomplished with a technologythat is simpler than previously supposed, and can pro

duce recordings that may be played as ordinary stereo

recordings, but that reveal "amazing" natural spatial

and imaging effects that are more robust, with respect

to listener movement and playback acoustics, than pre

viously supposed. The recording of such "well-pre

pared" binaural material is seen as a crucial starting

point for making a good transaural recording.

Artistic considerations are of major importance, of

course, and we have also shown that recent technical

advances in understanding the importance of correct

equalizations must be implemented to support the artistic

intent. We have argued that this support requires im

plementation at the equipment-design level. We have

explored the relation of equalization with respect to

the maintenance of an excellent stereo effect under all

conditions of playback, with respect to the prospects

of monitoring with binaural headphones and with respect

to preserving the integrity of localization.

We have provided a brief survey of the variety of

processing that may be accomplished within our con

ception of transaural-binaural technology. This has in

cluded the processing necessary in record production

and a few items that the consumer could use to advan

tage. We also note instrumentation applications. The expectation is that some of this transaural-bi-

Fig. l3. Shuftler filter characteristics in crosstalk cancelingfor a specific listener head [16] and a loudspeaker placementof 30. Solid-line curves show magnitudes of 1/N and 11P normalized according to solid-line curve of Fig. 4. Dashedcurves show envelopes of alternations. Extensive idiosyncratic detail indicates that a crosstalk canceler based on curves ofFig. 3(a) would be more tolerant of variations in listeners' heads and positions.

18

PAPERS

naural technology would be implemented in the near

future as the industry begins to see how the technology

will help its practitioners reach their .goals more directly

and more easily. The eventual outcome of the infusion

of new technology may not be predicted with assurance,

but the prospects for a dramatic improvement in stereo

quality do appear bright.

7 ACKNOWLEDGMENT

We wish to thank the many persons who have listenedto our experimental transaural recordings and offered

their critical comments. We tried their patience with

recordings of differing equalizations, and some with

not-the-lowest noise floor, and their patience survived.

We would particularly like to thank those who offered

us playback facilities that happened to prove instructive

in regard to early reflections. Their patience was some

times not rewarded by hearing the merits we claimed.

In other cases, our own ineptness left a bad impression.

We are grateful, also, for those listeners who delighted

us by being entirely enthusiastic.

We owe special thanks to Wade Bray of Jaffe Acous

tics for providing us with digital tapes made with the

Aachen head. Our studies of these recordings impressed

us with the importance of reconsidering the whole

question of equalization.

Finally, y


19/19

COOPER AND BAUCK

[7) M. C. Killion, "Equalization Filter for Eardrum

Pressure Recording Using a KEMAR Manikin," J.Au

dio Eng. Soc., vol. 27, pp. 13-16 (1979 Jan./Feb.).

[8] B. S. Atal and M. R. Schroeder, "Apparent Sound

Source Translator," U.S. patent 3,236,949 (1966 Feb.

22).

[9] M. R. Schroeder and B. S. Atal, "Computer

Simulation of Sound Transmission in Rooms," IEEE

Conv. Rec., pt. 7, pp. 150-155 (1963).

[10] M. R. Schroeder, "Digital Simulation of Sound

Transmission in Reverberant Spaces," J.Acoust. Soc.Am., vol. 47, pp. 424-431 (1970 Feb.).

[11] M. R. Schroeder, "Computer Models for Con

cert Hall Acoustics," Am. J. Phys., vol. 41, pp. 461-

471 (1973 Apr.).

[12] M. R. Schroeder, "Models of Hearing," Proc.

IEEE, vol. 63, pp. 1332-1350 (1975 Sept.).

[13] P. Damaske, "Head-Related Two-Channel

Stereophony with Loudspeaker Reproduction," J.Acoust.

Soc. Am., vol. 50, pt. 2, pp. 1109-1115 (Oct. 1971).

[14] T. Mori, G. Fujiki, N. Takahashi, and F. Maruyama, "Precision Sound-Image-Localization Tech

nique Utilizing Multitrack Tape Masters," J. Audio

Eng. Soc. (Engineering Reports), vol. 27, pp. 32-38

(1979 Jan./Feb.).

[15] H. W. Gierlich and K. Genuit, "Processing Ar

tificial-Head Recordings," J. Audio Eng. Soc. (Engi

neering Reports), vol. 37, this issue, pp. 35-40. Also,

W. Bray, private communication (1987 Nov.)

[16] E. L. Torick, A. Di Mattia, A. J. Rosenheck,

PAPERS

L. A. Abbagnaro, and B. B. Bauer, "An Electronic

Dummy for Acoustical Testing," J. Audio Eng. Soc.,

vol. 16, pp. 397-403 (1968 Oct.). '

[17] B. B. Bauer, "StereophonicEarphonesandBinaural Loudspeakers," J.Audio Eng. Soc., vol. 9, pp.148-151 (1961 Apr.).

[18] H. MfZiller, "Cancellation of Crosstalk in Ar

tificial-Head Recordings Reproduced through Loud

speakers," J. AudioEng. Soc., vol. 37, this issue, pp.

31-34.

[19] S. Mehrgardt and V. Mellert, "TransformationCharacteristics of the External Human Ear," J.Acoust.

Soc. Am., vol. 61, pp. 1567-1576 (1977 June).

[20] D. H. Cooper and J. L. Bauck, "Corrections

to L. Schwarz, 'On the Theory of _Diffraction of a Plane

Soundwave Around a Sphere' ['Zur Theorie der Beu

gung einer ebenen Schallwelle an der Kugel,' Akust.

Z., vol. 8, pp. 91-117 (1943)]," J. Acoust. Soc. Am.,

vol. 80, pp. 1793-1802 (1986 Dec.).

[21] E. L. Torick, private communication (1975

Nov.).

[22] H. Mertens, "Directional Hearing in Stere

ophony- Theory and Experimental Verification," EBU

Rev., pt. A, no. 92, pp. 146-168 (1965 Aug.).

[23] J. S. Russotti, T. P. Santoro, and G. B. Haskell,

"Proposed Technique for Earphone Calibration,''

J.Audio Eng. Soc., vol. 36, pp. 643-650 (1988 Sept.).

[24] M. A. Gerzon, "Ambisonrcs in Multichannel

Broadcasting and Video,'' J.Audio Eng. Soc., vol. 33,

pp. 859-871 (1985 Nov.).

D. H. Cooper

THE AUTHORS

i

J. L. Bauck

Duane H. Cooper was born in 1923. He earned a Ph.D. in physics at California Institute of Technologyin 1955 and is currently associate professor of physicsand electrical engineering at the University of Illinois.He teaches circuits, systems, modulation, randomprocesses, electrodynamics, and acoustics. He contributed to the theory of disk recording, invented the skew-sampling method of tracing-error correction, andcontributed to the theory of multichannel stereo. Hemade the first prototype Cooper Time Cube, and heinvented the first working version (UMX) of thesoundfield stereo system now called Ambisonics. Dr.Cooper is a member of the American Physical Society, the Acoustical Society of America, a senior memberof the Institute of Electrical and Electronics Engineers,and a fellow and honorary member of the Audio En gineering Society. He has served the AES as governor,vice president, and president. He is now vice president

of the AES Educational Foundation. Dr. Cooper holdsthe Society's Emile Berliner Award and Gold Medal.

Jerald L. Bauck was born in 1955. He earned a B.S.

degree in electrical engineering at Kansas State University in 1977 and an M. S. degree in electrical engineering at the University of Illinois in 1979. He iscurrently an electrical engineering doctoral candidateat the University of Illinois. He worked for five yearswith Motorola' s government electronics group inScottsdale, Arizona, where he earned four patents andthe Motorola Engineering Award in 1983. Mr. Bauckis a member of the Institute of Electrical and ElectronicsEngineers and of the Audio Engineering Society. His current interests include tomographic imaging in synthetic aperture radar and audio imaging.

prospects for transaural recording - cooper e bauck

Documents