prospects for transaural recording - cooper e bauck
TRANSCRIPT
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
1/19
PAPERS
'
Prospects for Transaural Recording*
DUANE H. COOPER AND JERALD L. BAUCK
University of Illinois, Urbana, IL 6180/, USA
Transaural stereo, generic for binaural stereo processed for cancellation of loudspeakerto-ear crosstalk, results from the use of minimum-phase filters in shuffler configuration. Simplifying the filters further at short wavelengths makes the listener position noncritical.Full spatial qualities appear in a conventional stereo playback that avoids early reflections.Inverse shufflers provide precise transaural pan functions for multitrack work.
0 INTRODUCTION
Transaural stereo (generic term) is a stereo-system
plan that, like binaural stereo, takes the end point of
the recording-reproducing chain to be the actual sounds
at the ears. It contrasts with the taking of loudspeaker
sounds as the end point, which is necessarily the plan
of conventional stereo. It differs from binaural in that
the sounds for each ear, rather than being supplied by
direct signal chains ending at earphones, result indi
rectly, instead, from the preparation of structured
composite signals to be supplied to the loudspeakers.
1.1 Crosstalk Cancellation
The composite-signal structure is subsequently in
verted (decomposition) in the intervening loudspeaker
to-ear transmission to produce the intended sounds at
the ears. On the way to the ears, in addition to the
direct transmission, left to left and right to right, there
occur the cross transmissions of left to right and right
to left. The latter are traditionally called crosstalk (from
telephony), and the composition-decomposition
scheme cited is a nonadaptive precancellation of crosstalk. It consists of the "planting" of a crosstalk process,
in advance, that is devised to be the inverse of the
acoustic crosstalk expected to occur subsequently. When
properly done, the net result is the elimination of all
evidence of crosstalk.
1.2 Recording Binaural Signals
Signals representing ear sounds may be recorded
(binaural recording), in advance of crosstalk cancel-
* Presented at the 85th Convention of the Audio EngineeringSociety, Los Angeles, 1988 November 3-6.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
lation, by two pickup methods. One uses microphones
fitted in the ears of an artificial head. The other uses
free-space microphones whose signals have been pro
cessed to simulate transmissions around an acoustic
obstacle (human head) to specific points on the obstacle
(ears). _..
The second of these pickup methods, including its
source-to-ear processing, is known as binaural syn
thesis, and it may include the processing of as many
different microphone signals as may be suitable for a
given project. It may also include reverberant-field
synthesis as needed. The correspondence with multi
track stereo synthesis is notable: pan functions replaced
by binaural simulation for specific imaging directions
and reverberation units replaced by simulation of spatial
(binaural space) reverberation, such as being developed
by Kendall et al. [1]. After the completion of all binaural
processing, crosstalk canceling is the means of pro
ducing the master transaural recording.
For concert-hall recording, an artificial head would
be used, and it and the orchestra would be deployed
for optimal pickup. Under ideal conditions this may
suffice. However, further microphone deployments maybe considered to represent early reflection and late
reflection hall-sound pickup. The signals from these
latter would be delayed and subjected to binaural syn
thesis needed to produce the decorrelated ear sounds
deemed suitable for hall-sound representation. The final
step in the production is conversion to transaural.
1.3 Transaural Options
Some recording engineers may wish to use only a
part of the transaural technology. In multitrack work,
for example, it might be decided that only a few of the
tracks require the precise imaging of binaural synthesis,
3
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
2/19
Abbozzare
COOPER AND BAUCK
or that only a portion of the performing ensemble re
quires the spatial delineation available through artificial
head pickup. Such artistic decisions remain, of course,
with the producing authority, and it is the re!iponsibility
of the engineer to provide incisive imaging, to the extent
possible, where desired. Transaural technology may
be viewed as providing improved options for that pur
pose, not necessarily a whole new recording style.
A better choice for incisive imaging, however, cannot
be made. In a previous paper, Cooper, using calculations
from Bauck's thesis [2], showed [3, Fig. 8] the required
loudspeaker-signal specifications for two examples of
imaging. None of the conventional stereo methods
produces signals that in any way resemble these spec
ifications, except at low frequencies. Conventional
stereo has not sought to devise loudspeaker signals to
meet imaging-signal specifications at the ears, as was
required in these calculations, except in the low-fre
quency work of Blumlein [4]. Specifically, none of the
existing pan-pot formulas meet these specifications,
nor do any of the stereo microphone arrays, whether
coincident or spaced, whether using directional elements
or not.Some recording engineers, seeking a spacious effect,
use widely spaced microphones in a concert-hall setting.
It is known, of course, that the signals so obtained are
highly decorrelated, and it is also a known fact, in
concert-hall acoustics, that highly decorrelated ear
sounds are identified with spacious acoustic impres
sions. Unfortunately, the interaural correlation wiii al
ways be greater than the correlation at the loudspeakers,
because of crosstalk. The net result is that the spacious
effect is perceived as confined to an "acoustic stage,"
as in a different space from that of the listener. An
important aspect of the concert-hall experience is lost.
The use of widely spaced microphones with binauralsynthesis and suitable delay, however, will give the
recording engineer much greater control over the rep
resentation of the sound of the hall. Thus many more
venues may be exploited to advantage. At the same
time, a full spatial envelopment of the listener can be
provided to the extent desired. Many recording engi
neers will discover, also, that imaging and spaciousness
are not mutually exclusive, but, as has long been known
in concert-hall acoustics, belong together. Placing them
together is natural in transaural technology.
At first the recording engineer wiii want to try only
the simplest things from transaural technology. Indeed,
it is likely that only the simpler equipment wiii become
available at first. Existing techniques wiii necessarily
continue to be used, and the improvements oftransaural
technology wiii, in some instances, be adapted to that.
For reviews of existing techniques, the writings of Ear
gle may be consulted [5]. The evolution of such tech
niques to suit a binaural style of recording is not amen
able to detailed prediction, and will not be attempted here.
It is possible, however, to sketch a catalog of specific
kinds of transaural-related equipment, the developmentprevisto
of which may be foreseen. Some of these items arediscussed in a later section.
4
PAPERS
1.4 Binaural Monitoring
Prospects for binaural monitoring apparently have
advanced substantially in the recent decade. It had long
been the experience that earphones characteristically
produced "in-the-head" (interior) sounds and, with bi
naural material, sounds that were much more vulnerable
to front-back bias than is the case with natural hearing.
The problem has been traced to a disturbance in the
conch resonance of the human pinna. The conch is the
principal cavity in the pinna, and its resonance involves
its acoustic near field even at some distance from the
ear. Disturbance of this near field causes an "at-the
ear"judgment, as may be easily demonstrated by placing
a hand near the ear. Earphones are ordinarily placed
near enough to disturb this resonance (besides possibly
deforming the pinnas). Equalization to restore the res
onance restores natural, exterior hearing. A compli
cation is that a significant part of the resonance effect
varies with direction, so that a direction assignment
for equalization seems necessary.
A way to avoid a directional assignment has been
sought via the use of a diffuse-field reference [6]. Analternate approach uses a frontal-incidence plane wave
(free field) as the reference. The argument for the latter
is that it avoids a front-back bias while not impairing
back localizations. In a later section we find evidence
to support a variation in this free-field approach.
Th se issues take on a sharper focus if related to the
style of equalization used for the artificial head. Ob
viously, a free-field equalization for the head mandates
the corresponding free-field equalization for the ear
phones to be used with that head. In this case, a large
part of the conch resonance is removed in the head
equalization (a use of natural ear molds in artificial
heads accounts for this resonance being modeled, although modeling of canal resonance has long been
omitted) and then restored in the earphone equalization.
Presumably a similar rationale supports diffuse-field
equalization, both for the head [7] and for the earphones.
We are unable to report any complete experience with
diffuse-field equalization, but we can report remarkably
good experiences with free-field equalization.
The recording engineer should understand that the
equalizations discussed here and below are not matters
to be accommodated with the EQ facilities on a mixing
board. It is appropriate to regard these design equali
zation requirements as to be met internally, to be in
herent characteristics of the device or of an accessoryspecific to the device. In the same way, the sometimes
strenuous equalizations undertaken in some highly
valued microphones are of concern to the design en
gineer, not the recording engineer. It is sufficient if
the recording engineer deems the overall characteristic
as apt for his or her needs.
The advent of binaural monitoring will prove to be
a substantial convenience in comparison to loudspeaker
monitoring, especially for location work or other sit
uations in which access to a proper listening room is
inconvenient. Transaural monitoring (with loudspeak
ers) can, of course, be made available as needed.
J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
3/19
PAPERS
1.5 Beginnings of Transaural Recording
Transaural stereo had its first trials in 1962 by B. S.
Atal and M. R. Schroeder [8]-[12]. They used a pow
erful (for its day) mainframe computer, an IBM 7090,
to perform digital finite-impulse response (FIR) filtering
for crosstalk "planting" combined with equalization,
using functions derived from testing an artificial head.
They also saw binaural synthesis as a related process,
and they were striving for a spatial synthesis of rever
beration. Their transaural trials, however, were designed
to reproduce the actual sounds of known concert halls
and were based on binaural recordings made in those
halls. In those days, earphone listening produced only
interior sounds, so that transaural conversion was
mandatory for their purposes. The Atal-Schroeder re
sults were described [12] as "nothing less than amazing."
The listener experienced authentic, exterior, spatial
envelopment as well as authoritative imaging to the
front and sides, in elevation, and even behind.
Unfortunately the reports of this work left lasting
impressions of a heroic technology producing fragile
results: the listening space had to be anechoic, and thelistener could not move by more than about 10 or
3 in (75 mm) without spoiling the effect. Later work
by Damaske [13], with "90 crosstalk filters," a code
word designation, did little to dispel these disheartening
impressions. He found that reverberant listening spaces
degraded the effect, damaging side imaging and causing
front-back ambiguity. Other work over the past quarter
century, including the Q-Biphonic development [14],
has not significantly advanced the technology nor
changed overall impressions of dim prospects for trans
aural recording.
1.6 Present ProspectsBrightened prospects are suggested by our work,
reported herewith. By casting the crosstalk-canceling
filters in shuftler form, we are able to greatly simplify
the technology: a handful of operational-amplifier chips,
or the equivalent on a digital signal-processing chip,
suffice. This economy (not having to use FIR filters)
is a consequence of our discovery that the shuftler con
sists entirely of minimum-phase filters. The simplifi
cation also reveals a structure that allows secure control
over the design of equalization as independent of the
crosstalk canceling. Thus we are able to simplify the
crosstalk function, more particularly at short wave
lengths, to make the effect of cancellation quite tolerant
of listener movement.
Listeners find that a 30 head rotation produces a
benign, albeit noticeable to some, change in auditory
perspective. Imaging at 90 is less tolerant. Com
parable effects are noticed for lateral movement over
a range equal to the loudspeaker spacing, but there is
more tolerance for forward-backward motion. We have
no data for transaural systems designed for a wider
loudspeaker spacing, and we are not entirely satisfied
with explanations we offer in Sec. 1.7. Perhaps some
credit is owed to good equalization. 1
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
PROSPECTS FOR TRANSAURAL RECORDING
The significance of equalization has become clear
to us through o1r.experiences with recordings made
with a Neumann KU-80 head, which is equalized tqprovide a correct ear-canal entrance signal, arid with
the Aachen head (Aachener Kopf, or AK) devised by
Gierlich and Genuit [15], which is equalized for a flat
free-field response for frontal incidence. Recordings
from the KU-80 are unfit for loudspeaker playback (the
KU-81 should do better), showing a poor stereo effect
with a very large "hole in the middle," while the same
playback from the AK shows a stereo excellence un
attainable by any other known stereo array.
With equalizations as given, crosstalk cancellation,
besides revealing the qualities noted by Atal and
Schroeder, actually "covers" the hole in the KU-80
presentation, while for the AK, it corrects image
placements, sharpens the images, extends the range of
image placement, removes front-back ambiguity, ex
tends the perception of depth, and completes the spatial
envelopment. Thus the equalization we would have
used (see explanation in Sec. 1.5), based on free-field
incidence at 30 and being nearly the same as in the
AK, was confirmed in its correctness.As a result of our experience with actual trials of
differing equalizations in different rooms, we are in a
position to be more precise than Damaske could be,
about the significance of listening-space acoustics. It
is, in fact, a misunderstanding that an anechoic space
need be used. Atal .and Schroeder egarded listener
space reverberation to be a contaminant in their studies
of concert-hall sound, and they wished to exclude it.
Specifically, we did identify one minor aspect of lis
tener-space acoustics, one easily avoided, that accounts
for the effects noted by Damaske.
Jhe integrity of the crosstalk paths from loudspeakers
to ears can be compromised by competing ttfl.ectedpaths that differ in delay from the primary paths by
amounts of less than 1 (or perhaps 2) ms. Substantial
contributions from such paths can begin to impair side
imaging and allow some appearance of front-back
ambiguity. Ordinary care taken in the setup to avoid
significant early-reflection paths obviates any delete
rious effects. Longer delayed reflections merely appear
as "early" reflections in the concert-hall sense. These
are attributed by the listener to the performing space,
usually as minor augmentations in its reverberance.
Ensuring good equalization guarantees that if a user
is so careless with the setup as to allow early reflections,
the playback of a transaural recording will exhibit a
gradual degradation from a quality that is "nothing less
than amazing" to one that is at least "excellent."
1.7 Summary
The principal purpose of this paper is to report on
improvements we have discovered in a particular signal
processing scheme, the crosstalk-canceling scheme of
Atal and Schroeder. These improvements, which are
largely practical, offer the possibility of a significant
restructuring of stereo recording to make for extraor
dinary improvements in stereo quality.
5
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
4/19
COOPER AND BAUCK
We have cited economies in processing following
from the discovery that minimum-phase filters may be
used exclusively, have cited a revealed structure that
allows the shaping of crosstalk filters independently
of equalization filters, and have cited bases for robust
results, in contrast to a previously supposed fragility.
This robustness consists of a tolerance for listener
movement and a tolerance for nonanechoic listening
space acoustics.
In the following sections we review the Atal
Schroeder scheme and show how its lattice-arrayedfilters may be seen to be equivalent to a shuffler array,
develop the corresponding formulas, and illustrate these
with plots based on the spherical-model head. We cast
the shuffler functions in a form that exhibits a factoring
into an equalization part and a crosstalk-canceling part,
and we illustrate the significance of these with plots
taken from old data for the so-called CBS-NASA head
[16].
In so doing, we point out that crosstalk canceling is
a process that is an inverse of the process we have
called binaural synthesis, and we provide a block dia
gram of a multiple-input binaural synthesizer.
Finally, we turn to aspects of transaural technology
that are less related to recording. We introduce theper cui
concept of virtual loudspeakers, whereby a given pairof actual loudspeakers may be replaced by a number
of virtual loudspeakers at arbitrary positions. This may
be used to solve the problem of too closely spaced
loudspeakers in stereo television, for example. It also
may be used to present cinema-surround stereo via only
two loudspeakers without loss of surround effect, or
to similarly present full-sphere ambisonic surround.
Also, we can resurrect a contribution by Bauer [ 17] to
provide binaural-like listening to stereo material,
making for inexpensive, accurate "Bauer boxes."
1 CROSSTALK-CANCELING FILTERS
1.1 Atai-Schroeder Filters
The Atal-Schroeder crosstalk canceler is shown in
Fig. 1(a), adapted from [12]. In Schroeder's notation,S represents the transfer function from a source (loud
PAPERS
realizable as an FIR digital filter, or a transversal analog
filter, and so also for C2 Then 1/(1 - C2) is realized
by placing the C2 filter in a recursive loop. The terminal
filter 11S is not causal on its face, but with its impulse
response padded with sufficient delay, the same in both
channels, causal representations are obtained. These
realizations were signal-processing routines in an IBM
7090 computer.
The impulse response plotted in Fig. l(b) is of short
duration, which shows that crosstalk cancellation is
speedily completed, requiring the listening space to beanechoic for only the first few milliseconds. This is
equivalent to the finding, stated in the Introduction,
that it is sufficient, in the listening setup, to exclude
early reflection paths.
The brevity of this impulse response bears also on
questions of equalization style, as will be seen later.
1.2 Shuffler Filters
The Atal-Schroeder scheme may be seen to be
equivalent to the lattice arrangement of filters shown
in Fig. 2, provided that the filter in the cross path is
(la)
and that the one in the same-side path is
(lb)
These may be seen to be the matrix elements (S' on
the diagonal, A' on the counterdiagonal) of the 2 X 2
matrix that is inverse to the acoustic matrix evident
from Fig. 1.
The shuffler arrangement of filters, also shown in
Fig. 2, may be seen to be equivalent to the lattice.
There the filter for the sum of inputs with both parts
positive is denoted by P', while the filter for the dif
ference (sum with one input negative) has been denoted
byN'. Equivalence demands that
speaker) to a same-side (ipsilateral) ear, whileA is the
transfer function to an alternate-side (contralateral) ear.
The acoustic layout is symmetric, so that the transfer
functions from the LF loudspeaker to the ears equal
those for the RF loudspeaker. The notation C = -AIS
is used for the filter in the cross path. Elementary al gebra may be used [11] to show that a signal introduced
at the top left does indeed appear, unchanged and un
S' A'N'
2
and
S' + A'P'2
(2a)
(2b)
contaminated, at the left ear of the listener, and so
on.
Schroeder also treats the requirements of causality
(realizability) [11]. It is clear from first principles that
A involves a greater delay than does S, so that C is
Division by 2 would be omitted for difference-sum
networks designed with uniform 3-dB losses so that,
without loss of generality, we write
causal. This is also seen from Fig. 1(b), adapted from
M!IJller [18], which shows a plot of the impulse response
c(T) for the cross filter C(w), as determined for the
Neumann KU-80i head at 45 incidence. Thus C is
N'
and
N S-A(3a)
6 J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
5/19
.A l A. ft.n_. AA\ Avv-vv1
vv V V
PAPERS PROSPECTS FOR TRANSAURAL RECORDING
P'p
S +A
(3b)was then identifi d.as being of frequency-independent
slope. The result was experimental in that mel_lsured,
and smoothed response functions were used in thecal
Thus the matrix of the shuffter transfer functions, di
agonal with elements N' and P', is the inverse of the
diagonal acoustic matrix for difference-and-sum ear
sounds with elements N = S -A and P = S +A.
1.3 Minimum-Phase Characteristics
In 1977 Mehrgardt and Mellert [19] showed experimentally that the head-related transfer functions are
of minimum phase to within a frequency-independent
delay, a delay that is incident-angle dependent. They
proceeded via the Hilbert transform of the log-magnitude
response to calculate the minimum-phase part of the
phase response. The remainder, or excess-phase part,
L
culations. Thus, Sand A have excess-phase parts that
differ in the amount of frequency-independent delay.
Considering S alone, however, Schroeder found the
delay to be ignorable for the purposes of constructing
liS, as we have seen.
To discuss pairs of filter functions, we introduce the
concept ofjoint minimum phase. To be of joint minimumphase, a set of filter functions is to have a common
excess phase, and this excess is to be a (bounded) fre
quency-independent delay. Then removing the excess
phase to a common factor leaves a delay-normalized
set of filters that are of minimum phase in the ordinary
sense. They are also at least conditionally stable, so
R
J
(a)
(/)
t:
wZ(/) :::> 0 IZ>-00::a.c:x(/)0::w ..... -1O::ii)
a:
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
6/19
-
COOPER AND BAUCK
that products, ratios, and reciprocals are in the set.
Thus A and S are not of joint minimum phase, and
neither areA 1 and S1
Of course, in the Atal-Schroeder filters, joint minimum phase is not at issue, since the 2 X 2 matrix has
S 1 along the diagonal andA1
along the counterdiagonal.
On the other hand, the shuffler filter has N1
and P1
along the same diagonal (the counterdiagonal is zero)
so that it would seem odd if N 1 and P1
were not of joint
minimum phase-odd because then the difference signal
would be required to become more and more out of
step with the sum, as frequency increases. However
that may be, we made the same sort of check that
Mehrgardt and Mellert had made and found that N1
andP 1 are indeed of joint minimum phase, so also for
Nand P.
A practical consequence is that magnitudes alone,
1Nl and IPI, or their reciprocals are a sufficient spec
ification (phase is redundant as being calculable by
Hilbert transform), whether for filter synthesis or in
determining the head functions to be measured exper
imentally. Also, since any common frequency-inde
pendent delay may be omitted, the programming andhardware requirements of an FIR realization are re
duced. In fact, a non-FIR (or IIR) filter may be pro
grammed at low cost. Successive fitting of a cascade
of biquadratic forms (ratios of frequency-dependent,
second-order transfer functions) is a natural approach,
and these take scarcely more than a half-dozen lines
of code each in a typical DSP chip. In analog-filter
synthesis, the "biquad" is also a natural choice of syn
thesis element.
1.4 Structure of 1Nl and IPI
Head data are most often measured in the form of
lA I, IS I, and IT I, of which the last is the interauralphase delay, redundant in that part calculable from the
Lattice Shuffler
PAPERS
Hilbert transform of log lA! SI. Nevertheless, it is easy
to see that these data are sufficient to determine 1Nl
and I PI, since the magnitudes of the phasor difference
and sum are, according to the triimgle rule, simply
1Nl= (IAI 2 + ISI 2 - 2IASI cos w-r)l-2 (4a)
and
IPI =
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
7/19
PAPERS PROSPECTS FOR TRANSAURAL RECORDING
p
E(eo)(8b)
our loudspeaker playback of recordings from the Neu
mann KU-80 h;a l. As indicated in the Introduction,
the stereo effect"was of a "hole in the middle you could
The reference direction has been taken to be 0 for the
AK (Aachen head), but for loudspeakers to be placed
at 30, a 30 reference would be more appropriate.
When plotted with the same incidence angle as the
reference, the frequency-response curves for I 0NI and
I0PI intersect one another at the constant level 0 dB.
Such a plot for a spherical-model head [2], [3] is shown
in Fig. 3, solid line (a). These plots are actually for
the reciprocals lWNI and lii0PI, as would be used ina crosstalk canceler, but the decibel scale makes it easy
to interpret the plots also for the direct functions loNI
and loP I. The dashed line shows a possible modification,
to be discussed later. Curve (b) shows the equalization
11IE 1. A crosstalk canceler based on these curves hasbeen tried, and its performance is extremely satisfying.
Plots for more realistic models of the human head
(see Fig. 13) resemble the solid-line plot (a) in Fig. 3
but differ remarkably in equalization from plot (b).
The reason is that the spherical-model head is one whose
functions are quite smooth [20] because pinnas areomitted. Inclusion of pinnas on a realistic model invokes
the large conch resonance, profoundly altering the 11curve.
For example, data [21] for the CBS-NASA head pro
vide the equalization curves of Fig. 4. A curve for 30
and one for 0 are shown. For clarity, the plot was
made with a 3-dB displacement inserted between the
curves. It will be seen that the curves differ by little
in comparison to the range of variation that they en
compass. Thus a 0 equalization could substitute for a
30 one with little effect, but omission of such equal
ization would be a serious matter.
The seriousness of the matter became evident from
12r--r-r-- ---- -r
10
8
6r----+ -- ---+ -- ----
drive a truck through," as one listener said. Wli n co
verted to transaural, using the crosstalk canceler built
with the functions of Fig. 3, Schroeder's description
of "nothing less than amazing" spatial and imaging
qualities certainly applied, but it was possible to notice
that the equalization was "a little off." Later, the ap
pearance of a "hole" tendency in this recording would
alert us to early reflections in a listening setup. As wealso noted, recordings from the Aachen head (0 equal
ization) provided stereo of unequaled excellence by
ordinary standards. Certainly, no "hole" was observed,
even without cancellation.
1.6 System Transfer Functions
In the following, M will be used to designate either
N or P. It will be understood to be a function of fre
quency and incidence angle. Thus for natural directional
hearing, either member of the pair of overall transfer
functions from a source at angle e to the ears is designated
Hn = Mn(e) (9)
and it is the transfer function for the difference or sum
in ear signals, depending on whetherNor Pis substituted
for M. The sources are to be consiaered one at a time,
whether a direct source or one of the many components
of reverberation. Superposition is applicable in linear
acoustics. The subscript n is used to designate a natural
head, the head of the listener. A signal-theoretic basis
for understanding directional hearing would begin at
tbis point.
.I8.--- --,- -rrT ---- -r-r-r
6r--- ------ ----+----- ---- --
CD
4
2
j
Fig. 3. Shuffier filter characteristics in crosstalk canceling for spherical-model head. (a) Magnitudes of l!N and 1/Pnormalized against curve shown in (b). Because curves (a)are free of the idiosyncratic detail for specific heads (as inFig. 13), such characteristics are tolerant of variations inlistener-head shapes and positions. Dashed curves show apossible modification of the envelope of the alteri:i'ations.Because the filters are of joint minimum phase, the ',Phasedata are redundant and not shown.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
Frequency
Fig. 4. Equalization curves from data [21] for the CBS-NASAhead [ 16]. Curves for both 30 and 0 reference directionsare shown, displaced from one another by 3 dB. The decibelrange of variation (conch resonance) greatly exceeds anydifference.
9
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
8/19
COOPER AND BAUCK
For listening via loudspeakers to a binaural recording,
the transfer functions (subscript b) are designated as
(10)
in which the artificial head (or equivalent in binaural
synthesis), subscript a, is designated as being equalized
for the loudspeaker positions So. This equalization
prevents the nondirectional part of the conch resonance,
already present in the listener's ears, from being in
troduced a second time, a minimum requirement of
equalization.
In this instance, the sounds at the listener's ears will
be drawn from an extremely restricted set, in comparison
to all possible sound combinations. The restriction is
to that of linear combinations of the sounds at the ears
of the artificial head, namely "shuffled" ear sounds,
but combinations that otherwise closely resemble ear
sounds themselves.
For a reasonably apt artificial head the joint angle
dependent spectral magnitudes will be determined by
the artificial head, except for shuffling, to closely re
semble those of natural hearing. The result, as confirmedin listening, is an extremely plausible directional por
trayal, much more so than available by any means in
conventional stereo. Even so, listeners do greatly ap
preciate the improvement they experience in being
provided "unshuffled" ear sounds, those that embody
the alternations in their difference-and-sum spectra that
are characteristics of their own ear sounds. Since these
alternations depend strongly on the cosine of the in
teraural phase, the significance of this element is con
firmed. This unshuffling is provided in the crosstalk
cancellation of transaural stereo.
For transaural listening, the transfer function (sub
script t) is
H = 0Ma(S)Mn(S0)t oMx(So)
(11)
in which subscript x designates transfer functions of
the head used to model the crosstalk cancellation. This
equation shows the use of equalized functions, for the
reference direction S0 , for both Ma and Mx If thesediffer in any of their characteristics, each is to be
equalized against its own characteristics. The appear
ance of a conch resonance (for So) is, as in the above,
reserved for the listener's head M0
IfMx is the same asM 0, for example, then Eq. (11)describes the simulation of natural directional hearing
except for the substitution of the artificial head (and
ears) for the listener's own. Clearly, if all three heads
are the same, then Eq. (11) is identical to Eq. (9), and
an exact simulation of natural hearing would be the
result.
In this last statement direct proof is seen of what is
generally regarded as the "unquestionable validity" of
the transaural plan for stereo recording and reproduc
tion. Of course, this provides the transaural design
engineer an extremely strong vantage position from
10
PAPERS
which to undeitake departures in the service of prac
ticality. It is usually one of the strengths of starting
from an optimal position that departures from the op
timum in design parameters usually produce remarkably
small effects.
1.7 Practical Design Considerations
Except for a custom-designed crosstalk canceler, it
is not to be expected that Mx will be the same asM 0,
and a commercial release of a transaural recording would
have to embody an Mx that would be required to besatisfactory for a wide range of listener heads, each
with its own M 0 Generally this is not a difficult re
quirement. It has been found, for example, that the
crosstalk canceler based on a spherical-model head [2],
[3] produces immensely satisfying results for a wide
range of listeners' heads. Heads that are somewhat
small may be placed somewhat nearer the loudspeakers,
and those that are somewhat large may be placed at a
somewhat greater distance, as may be seen from the
structure of head functions, but the exact placement
does not seem to be a critical matter for most listeners.
What is probably the case is not that a sphere is
necessarily a best fit, but that it is a "comfortable" fit
for most heads just because of its inexactness. While
the advantages of inexactness merit further exploration,
we have tried another aspect for inexact treatment, the
domain of wavelengths shorter than about 50 mm (fre
quencies higher than about 6 kHz). The first experi
mental crosstalk-canceler filters followed, after a
somewhat abrupt transition, the null-crosstalk contour
of Eq. (6) for the shorter wavelengths. We attribute
the tolerance in listener movement to these aspects of
inexactness inMx filters.
The choice of a rather abrupt "cut" in our first ex
perimental canceler may have been somewhat extreme.We do notice a tendency for sibilantlike sounds and
clicklike sounds to be mislocated, generally toward
the front. This is a confirmation, extended to short
wavelengths, of the importance of interaural phase.
Although this style of design variation has proved in
structive, we are now inclined to rely on a more uniform
distribution of inexactness, of which the spherical
functions are a good example. Another variation of
interest is that of introducing a gradual taper, as shown
in Fig. 3(a), dashed line, wherein the upper and lower
envelopes approach the null-crosstalk contour in a
somewhat less accelerated manner for short wave
lengths, replacing the more abrupt cut.We visualize these styles of inexactness as defining
a volume of space near each ear of the listener, a space
over which cancellation is satisfactorily accurate. We
visualize this volume as being of smaller extent for the
shorter wavelengths, and we suppose that it is appro
priate to be less exact at these shorter wavelengths.
We also believe, despite our successes with spherical
functions, that we need to continue to investigate this
problem. Thus the tolerance we have gained for listener
movement, already satisfactory for most purposes, may
be extended.
J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
9/19
PAPERS
2 BINAURAL SYNTHESIS
2.1 Synthesis Filters
Shuffler filters based on the direct functions N and
P are used to simulate the progress around the head tothe ears of two sounds of incidence angles ei at once.Instead of an inverse crosstalk filter, it is the direct
crosstalk filter that is to be constructed. Of course, if
only one of these signals is desired, one of the inputs
to the shuffler may be left silent. Degenerate forms of
the shuffler are used for 0 and 180.
The shuffler synthesizer implements Eq. (9) in effect,
but provides equalized ear signals instead, thus actually
simulating the use of an equalized artificial head. The
transfer function may be written
(12)
using the same convention thatM may stand for either
of Nor P. The subscript s is used to denote head func
tions used in synthesis, even though these might have
been measured for an artificial head, a natural head,
or derived from a mathematical model. The transferfunction is for a source simulated at position eio wherei is a symbol for the indexing over a discrete set of
incidence angles.
This transfer function may be written in greater detail
as
PROSPECTS FOR TRANSAURAL RECORDING
particular, "tonal-color" characteristics, for the two
ears jointly, are represented by the factor oE. It seems
that this color is used in a part of the directional earing
process (spectral pattern recognition) at a level below
consciousness, but that only the directional result is
presented at the level of consciousness. For example,
speaking voices from behind would have an extremely
"hollow" sound, as will be seen, if the hearing mech
anism did not function as indicated.
This hollowness can be heard only under exceptional
circumstances, such as a binaural recording played
without crosstalk cancellation. In this example, a voice
was recorded with the AK while the speaking person
moved around the artificial head. Listeners heard the
voice move outside the space between the 30 loud
speakers, barely into the side quadrants, in the listener
space. While the original movement had been through
to the back quadrant, the listeners heard movement
that turned forward again into the front quadrant, but
with an altered vocal quality, that "hollow" quality.
Some listeners, when particularly neutral, transparent
sounding loudspeakers were used, would hear the voice
"jump" to the back quadrant before the change in qualityhad become explicit. The listeners that stayed with the
frontal localization presumably did so because of the
"visual knowledge" that the loudspeakers were in front
and because the equalization of the AK is not exactly
suited to 30 presentation. With, crosstalk cancel
lation, the transitioq to the back quadrant was char
H = (6i)Ms(6i)s (60 )(6i)
which may be written in factored form as
in which the factors are
(6i)
(6o)
and
(13)
(14)
(15)
(16)
acterized by continuity.
Changes in tonal color for sounds presented in ele
vation seem also to be responsible for impressions of
elevated localization, and, again, the coloration is pre
vented from appearing in consciousness. One of us
re
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
10/19
""'-- - " \
(\
I 1/' !I
"'I
500l I \
rt \
I
I II
COOPER AND BAUCK
exclusively of the phasor interaural transmission ratio
AIS. Its magnitude determines K, and its phase is equal
to the argument of the cosine in Eq. ( 17). Thus 'M and
AI S are essentially equivalent signal-theoretic bases
for their role in directional hearing. This role appears
largely to be the determination of the lateral aspect of
the localization angle, as distinguished from front
back and elevation aspects . Plots of ISIA I and interauralphase delay may be found in Mertens [22]. These are
adapted here as Fig. 5, showing interaural phase delay
in microseconds, and Fig. 6, showing the interaurallevel difference ISIAI in decibel units, both plotted
versus incidence angle.
The plot of interaural phase delay, Fig. 5, shows
clearly that a substantial part of that delay is frequency
independent, seeming to plot a trend toward a high
frequency-limit curve (lower bound) that is ramplike
PAPERS
in increase and decrease. A ray-acoustic approximation[3, Fig. l(b)] of this limit is (alc)(Sc -lee- SI+ sin8), where a is the head radius, for example, 90 mm; c
is the speed of sound, for example, 345 mls; and Se =-rrl2. This is 671 f..LS at 8 = Se Increments in midfrequency delay arise from Hilbert transforms of log ISIAj. The Rayleigh, low-frequency, total-delay limit
3(alc) sin 8, with 3alc = 783 f..LS, disagrees With the140-Hz plot. Heads other than this one of papier
mache agree more closely with Rayleigh.
However that may be, it is seen from Fig. 5 that the interaural phase delays for directions less than 90 very
nearly mirror those for directions exceeding 90. Thus
it may be supposed that interaural delay is not relied
upon, to any great extent, in distinguishing sounds in
back from those in front. On the other hand, it is seen
that interaural delay is a steep function of direction,
1300
1200
1100
1000
,.,+--l I I I I
I/ I
" I II / \ II I I I \ II I I '\ I I
f I \\
I
- 9001/1
j.
>-
wVl 700
I / \I I /- 1\I 140 / .- --
-..' '\
\
I I 31 --- - \I
/ /;..,,
I ',,\\ \
w
I \
\ \j \ \'
/ I oo I AV \1\ Iuzwa::w
I 1\ '\ \\ \1/ - \u..u..Ci 400
I
!-'rli .,\ ' \. \'
. 'I. . I7f.. 7800 1\[\. \\
300
I /,.-
I 1;' .M 2200 - \71. iVi '"- \
200
100
/,: lti'--. 1100 '\.ll';(1 \ -;(/; 4200 \ \!M
y ,-
00 20 40 60 80 100 120
INCIDENCE ANGLE (degl
140 160
\\180
Fig. 5. Plots of interaural difference in phase delay versus angle of incidence for various frequencies (Hz). Diffraction theoryindicates a maximum delay at low frequencies that is much less than shown for 140Hz. A low-frequency mechanical resonancein this papier-mache head is suspected. Adapted from [22].
12 J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
11/19
I I
! !
20
'\
I
: I
A \ I
'
I
,.,
I --.
26
24
22
I
10I
!8 .
I
:6I I
PAPERS
for directions in front, and a steep function again, for
directions in back. Thus a strong reliance may be placed
on interaural phase, from a signal-theoretic point of
view, for a precise determination of the lateral com
ponent of direction.
In'Eq. (17) it is seen that interaural phase appears,
for the sum-and-difference ear sounds, as an alternation
in their spectral magnitudes because of the cosine. The
extent to which the alternation appears does depend
on the value of K. The upper and lower envelopes of
these alternations, the extrema in I 'MI.are given by
I 'Mlex = (1 K)'h (20)
in which K is determined from ISIA I, as shown in Eq.(18). Thus Fig. 6 may be studied to estimate the di
rectional dependence of these envelope functions. To
assist in this estimation, envelope contours for the three
PROSPECTS FOR TRANSAURAL RECORDING
highest curves in Fig. 6 have been computed, with the
results plotted i i Fig. 7.
Generally it may be said that Figs. 6 and 7 show that
the upper and lower envelope contours lie closer together.
at higher frequencies and for directions near 90, con
ditions making for the deepest head shadow for the
contralateral ear. Also, the alternations in magnitude,
along the frequency scale, are most rapid near 90.
Toward the front, less than 60, the alternations are
largest in magnitude, as they also are toward the back,
beyond 120. Thus the most reliance, for directional
hearing, on interaural phase should lie toward the front
or toward the back, and there should be somewhat less
reliance on interaural phase near 90, but only at the
higher frequencies. At 4 kHz and below, the reliance
would appear to be substantial. Some front-back
asymmetry is seen in these curves, but it is not clear
whether directional hearing can rely on these asym-
28
II
I I I i I \ I I I
i
I I I
I II
I I \ I I II ! i i I I 'II
I! I I ,t, I ,I I I I I
I I I I 1 1 \ I I I 1\1 II I I /I \ I \ ( II I f 'I II : I I 1
I I I ,! w'I I II I I 'Ji ! I I ;
! I : I I \CO 18'0
....J
16UJ....J
z
I I I I I I I I I Ii I r I I !\ I I J
I 8obl I I ! \ \I I i \
H 14UJu
I..
I ', ,_J1\ I I h \1/ \ -f.,./' .\lJ
,z
12u..
.r I 1\ .i I I
a I4Zoq !If \ i I \ i \\
I I : I Y. \ ! V I\
I /!J'I09 f\1 \ 1\ I .,I '/1 I' I{ \,j I \l /: j
I
\ !\
J
I I
I
I 1 lII
..-2(0 \
'I I
:/ 'i/.-
',...\ -r 1,
v \\,\_
'/ft .. -)0 ,j \'- rA4
I 1:f,Y --. -- --../1/ I
2 I
'J. \\\\
/) I 310 \
0 =-- L - ---!--- ,.. \ '
0 20 40 60 80 100 120 140
INc;IDENCE ANGLE (deg)
160 180
Fig. 6. Plots of interaural difference in leveliStAI ver ys angle of incidence for various frequencies (Hz). Adapted from
[22].
13
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
12/19
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
13/19
.
/' t ......
7
i
2
COOPER AND BAUCK
metries to resolve front-back determinations. These
asymmetries are not consistent with frequency, to be
sure, but one should be wary of hearing's potential of
making much of seemingly insignificant, even idio
syncratic, detail. Nevertheless it behooves us to look
elsewhere for the front-back cues.
We have used head data supplied by Torick [21] to
plot 0(6) against frequency for a reference direction
60 = 90, a medial angle. Plots for 6 of 120, 150, and 180, a back-angle family, are shown in Fig. 8, while plots for the front-angle family, 0, 30, and 60,
PAPERS
are shown in Fig. 9. In Fig. 8 it is seen that differences
in spectral transmission between back angles are not
very large, and are mostly in the region above 4kHz.
Between front angles, the differences are seen from
Fig. 9 to be not very large either, and are mostly in
the "presence" region from 1 to about 4kHz. Thus we
come to the conclusion that there is little to rely upon,
in terms of spectral color, for distinguishing among
the back angles or among the front angles, although
we cannot dismiss the possibility entirely that some
reliance is placed there. It is clear, however, that the
3
-=- ............. f-"";.
,._, -
-......: _...... .-:-.,I ''..._ ... ....\
...............}-.. . /./... ......... ----
./.,
178oot-- J .........../r t---...i
--V
"' \\
j-2
J;200
7;
I'
\
.... / ...... '
\ \
{/ ocl ...._.1 \ \CD
/
-6 J .,
Vr \ Ii\7
I \ I
/,\ \i \\ l V\
\ I1i \
V \ !\ 1/i \I j V i. \ \
fli\. !
. I \0 20 40 60 80 100 120
INCIDENCE ANGLE (deg)
140
'"
" 160 180
Fig. 7. Plots of alternation envelopes square root of 1 K, versus angle of incidence for three highest frequencies of Fig.
6.
8
2r----r-- rn---- --r- -n
or- - --
-2CD
6
4
2
CD 03
-.:::::-
3-4
-6 ---+------4-----+--- ---+
- -2(I)
>
j -4(I)
_J
-8-10
-12L,--*-..l.-..l.....,..J.,,.W.....L...I..-!---+- Y....J.....:.'-W.:',0.1
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
14/19
-
6-8
-10
Frequency
Fig. 8. Plots of normalized rms transmission to two earsjointly for a family of back angles of incidence. The reference
-120.1 0.2
Frequency
direction for normalization is taken to be 90. Data are forthe CBS-NASA head. The contrast with Fig. 9 is remarkable.
14
Fig. 9. Plots as in Fig. 8, except that a family of front anglesof incidence are shown. The contrast with Fig. 8 is remarkable.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
15/19
1-'AI-'t:H:>
two families are remarkably different from each other.
A direct indication of front-back difference may be
shown as in Fig. 10, plots of 0(6) for the reference
direction of 0 and e equal to 90' 120' 150' and180. This is the front-back difference against 0. These
curves indicate the front-back difference as charac
terized by a marked depression in level in the range
from about 1 to 6 kHz, along with elevations that are
equally striking in the range from about 6 to 10kHz.
The 90 curve is included as a "back" spectrum because
its shape qualifies it as a family member and one whoseemphasis on high frequencies tends to elevate interaural
level difference to an importance not identified else
where. It is not difficult to imagine the "hollow" sound,
as discussed above, that these transmission character
istics would cause if they were ordinarily consciously
heard. This altered spectral quality does indeed, how
ever, appear to be the principal determinant of back
sound in discrimination from front sound.
This review of hearing characteristics allows certain
rules of thumb to be identified, namely, that interaural
phase, represented as amplitude alternations in the
spectra of difference-and-sum ear signals, is the dis
criminant for the lateral component of image position,
while variations in joint spectral transmission are the
discriminant for the front-back component and, pre
sumably, for the elevation component. However, it
also allows certain areas of uncertainty to be noted.
These various points of observation will be of limited
help in the design of the binaural-synthesis filters be
cause the best rule will doubtlessly prove to be slavish
simulations of the best measured head-related transfer
functions available. At least these observations, together
with those the designer's own experience may develop,
will make for a certain intelligent slavishness.
12.---.--.-.-. .----.--r-r -rrrn
10
8
6 -- ------+--- -----r------r+
4
CD 2
PROSPECTS FOR TRANSAURAL RECORDING
2.3 Basic Synthesis Array'
An array of filters for the sum and difference signals
is shown in Fig. 11. On the right-hand side, several
inputs are designated, one for each of the incidence
angles to be simulated. For each left-angle input, a
matching input is shown for the symmetric right-hand
angle, and sum and difference signals are shown as
being formed from these symmetric inputs. The signal
pairs are then transmitted through 0N and op filters, 0N
for difference signals and op for sum signals. Each ofthese filters is designed to match the specific angle
designations for the inputs, 0N(6i) and0P(6i) separately
for each Si designated at the input. The filtered difference
signals are then combined in a common sum, and the
filtered sum signals are then combined in a common
sum. These common sums are then further combined
in difference-and-sum fashion to form simulated bi
naural outputs.
This basic array may be thought of as a discrete
angle, binaural, panoramic mixer. Variations may in
clude linear mixing arrangements and level adjustments
to be provided at each of the inputs. Also, some of the
inputs may receive outputs from an array of reverber
ators to form synthetic binaural-space reverberation
systems in the manner of Kendall et al. [1]. Low-cost
versions could provide for a very limited set of angles
as supplementary pa functions for tise with standardmixing boards, and transaural outputs could be provided
oo.---'1'------------'1'----1'----o Inputs
R
-85
-90
Outputs
Frequency
Fig. 10. Plots showing front-back difference as a family ofback-angle joint transmissions normalized against transmi sion for the incidence angle of 0, taken as the referencedirection.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 J nuary/February
Fig. 11. Basic binaural synthesizer. Inputs for a discrete,symmetric set of simulated incidence angles are sho"Yn: Amultiplicity of shu fier filter , J; ased on head tran.smtsswnfunctions, each spectfic to the mctdence angle to be stmulated, are used. The output is binaural.
Hi
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
16/19
1 Ensemble
COOPER AND BAUCK
for productions that are being developed primarily in
conventional stereo. Further variations may be con
ceived. Some of these are described hereafter.
3 PROSPECTIVE PRODUCTION PROCESSORS
3.1 Equalizers
All commerically available artificial heads of quality
suitable for professional applications stand in need of
further equalization, since neither ear-canal nor diffuse
field equalizations are appropriate in transaural re
cording. Ear resonances can be allowed only once in
the signal chain-in the listener's ear. Also, Fig. 1(b)
shows that directional information is recieved too rap
idly for any diffuseness to develop.Among those heads providing ear-canal signals, we
list Neumann KU-80, KEMAR, 1 B&K 4128, and the
Aachen head (AK). The extent of the 30 free-field
equalization required may be estimated from Fig. 4,
although those data are not specific to these heads. The
Aachen head is also available with external free-field
equalization for 0 incidence. The diffuse-field equal
ization devised by Killion [7] for KEMAR, and providedby Neumann for their newer KU-81, reduces further
equalization needs to moderate corrections, as is also
true of the AK. Such further equalization may be pro
vided by the manufacturer or a third party.
All commercially available earphones for binaural
monitoring similarly stand in need of 30 free-field
equalization. A few manufacturers are currently pro
viding diffuse-field equalization [6], and an (inadvis
able) interest in such standardization continues [23].
We are aware of only one earphone set, the Stax Pro
Lambda, that has been accurately equalized against a
free-field reference by a third party [15], but for 0,
not 30. A decision by a third party to supply externalequalization for any but a selected few models entails
a substantial risk that only professional needs could
justify. Volume distribution of earphones suitably
equalized by the manufacturer probably lies some dis
tance in the future.
3.2 Monitoring
Facilities for earphone monitoring require 30 free
field equalization as above, if it is not internal to the
earphone. If the program material to be monitored is
in the form of loudspeaker signals (whether transaural
or conventional stereo), there would also be needed a
binaural-synthesizer version of a circuit devised by
Bauer [17], the so-called Bauer box. The two inputs
would be processed to simulate 30.
Loudspeaker monitoring would require transaural
monitor equipment to derive the proper signals from
binaural material. It could embody a crosstalk canceler
of standard grade adopted for mass distribution. Some
means of assurance of adherence to a standard would
be needed for full reliance on such monitoring. Also,
1 KEMAR is a registered trademark ofKnowles Electronics.
16
PAPERS
the prospective availability of postproduction consumer
equipment, such as Bauer boxes and loudspeaker
placement compensators described below, requires such
standardization.
The acoustic characteristics of a loudspeaker-mon
itoring facility demand the usual attention. In addition,
the most accurate, most transparent, most self-effacing
loudspeakers should be chosen for such use and placed
to avoid early reflections.
3.3 Haii-Sound-Pickup SynthesizerAn arrangement involving hall-sound-pickup mi
crophones is shown in Fig. 12. Two omnidirectional
microphones are flanking the artificial head. The signals
from these are delayed and provided to a binaural syn
thesizer. The latter may need inputs only for 90,
120, and 150 to provide sufficient flexibility, es
pecially if more than two hall-sound-pickup micro
phones are needed in particular halls.
For the flanking microphones not too far back from
the orchestra, the hall-sound pickup would enhance
early reflections (concert-hall concept) in the 10-20-
ms range, and 90 synthesis angles would be suitable,along with a choice of delay only somewhat more than
the microphone-head distance. For microphones placed
far enough back to represent the whole reverberation
field, synthesis would be at 120, with a delay some
what more than the microphone-head distance. The
150 synthesis angle would probably be used infre
quently. The relative level would follow the usual pre
scription of several decibels below that for a plainly
audible effect. For good concert halls, an almost sub
liminal contribution, if any, would be sufficient.
3.4 Transaural Panoramic Mixer
A transaural panoramic mixer is meant primarily asa supplement to the pan functions of an ordinary stereo
mixer. It would be capable of replacing some of the
existing facilities solely to enhance the imaging qualities
by accurate synthesis for a limited number of channels,
or for special effects. A transaural converter would be
a part of the equipment.
r -------------------------- 1
L-------------------------
Art.
Head
Binaural OutputL--::...._--.J _r--u
Fig. 12. Layout for use of a hall-sound-pickup synthesizer.Flanking-microphone signals are delayed and subjected tobinaural synthesis simulating incidence angles from a limitedset of back angles. The binaural signals so derived are mixedat reduced level with the signals from the main-pickup artificialhead.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
17/19
PAPERS
3.5 Binaural Panoramic Mixer
A binaural panoramic mixer would be a full elabo
ration of the basic synthesis array discussed in con
nection with Fig. 11. It would otherwise correspond
to a full stereo mixing console, except that binaural
pan functions would be used and the signals would be
in binaural format. Monitoring would be possible in
either binaural or transaural format.
3.6 Transaural Converter
Transaural conversion need be done only once, except
for monitoring, in the processing of a complete pro
duction, and there are good reasons for doing it only
once. The conversion would adhere to standards spec
ified for mass distribution, and it would be executed
in an off-line facility capable of providing standards
assurance. At present, many producers use an off-line
facility for the conversion of digital masters to a final for
mat as release masters. A similar concept applies here.
3.7 Processing Technology
All of these processing concepts may be realized ineither digital or analog form. Conversions between an
alog and digital data streams are, of course, to be kept
to a minimum, and this consideration will determine
the technology to be used in each instance. Equipment
for some of the processing steps should be made avail
able in both technologies.
4 VIRTUAL LOUDSPEAKERS
A virtual loudspeaker is a transaural image synthe
sized to simulate the effect of a loudspeaker placed at
a specified image location. The process involves bi
naural synthesis followed by transaural conversion. For example, an experimental processor has been con
structed that makes a pair of loudspeakers placed at
15 sound as if the loudspeakers had been placed at
30. Applications are indicated below.
4.1 Correction of Loudspeaker Placement
Some users may find that a loudspeaker placement
that is convenient for their listening-room layout, and
that avoids early reflections, may make for an incon
venient listening position unless the equal-distance 30
rule is violated. In such cases, virtual-loudspeaker
electronics can provide a 30 impression for loud
speakers placed at other angles. An adjustment for un
equal distances may also be provided.
4.2 TV Expander
Another example of correction of loudspeaker
placement is found in the so-called TV expander. Tele
vision receivers usually offer cabinet-mounted loud
speakers that are spaced much too close together (less
than 15) to provide a good stereo effect. Present
day TV expanders, usually involving some kind.of ad
hoc processing of the difference channel, are to '.im
precise to preserve the producer's intentions. The vir-
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February
PROSPECTS FOR TRANSAURAL RECORDING
tual-loudspeaker expander is, in principle, exact.
4.3 Centered Virtual Loudspeaker
Sound systems for large-screen television applications.
often lack a facility found in cinema exhibition, a cen
tered, behind-the-screen loudspeaker, important for a
realistic presentation of dialogue. The substitute phan
tom image from two loudspeakers unfortunately does
not sound the same as that single loudspeaker. A cen
tered virtual loudspeaker would be a significant im
provement.
4.4 Virtual Loudspeakers in Back
Some television sound systems are designed to supply
special-effects signals (derived from cinema sound
tracks) to loudspeakers placed behind the viewer. Un
fortunately many viewers cannot provide space behind
their favorite viewing position nor bear the expense of
such loudspeakers. Virtual loudspeakers may be sub
stituted. Similarly, certain ambience-enhancing systems
require loudspeakers placed behind the listener. These
also can be virtual.
4.5 Surround Stereo
Ear-sound-oriented, transaural stereo is a full-sphere
(includes imaging in elevation) surround-stereo system.
While it is most naturally used as a straightforward
enhancement of the basic virtues of qonventional stereo,
it may certainly be u:;ed to provide any of the astonishing
demonstrations of loudspeaker-oriented quadraphonic
systems of a previous era.
An exemplary sound-field-oriented surround-stereo
system [3] is the Ambisonic system UHJ, for which a
substantial body of program material, in full-sphere B
format [24], exists. Some of this may be recast, using
virtual-loudspeaker processing, for rerelease,fn transaural format.
5 INSTRUMENTATION-GRADE CANCELER
A need exists for a crosstalk canceler satisfying the
original aspirations of Atal and Schroeder. Accurate
documentation of the subjective experience of a sonic
event requires an instrumentation-grade artificial head
and recording means, together with an acoustic pre
sentation means of equal quality. Loudspeaker presen
tation through an instrumentation-grade crosstalk can
celer is the option that will provide full assurance that
the sounds will be heard as exterior to the listener'shead.
Such a canceler will use head functions as closely
modeled on a replica of a representative head as possible
and, where necessary, will use data taken for a specific
listener. An example of canceler curves for a specific
head is shown in Fig. 13. A digital canceler would be
able to accept data files for different listeners and adjust
the filters accordingly. In any case, the canceler would
be accurately faithful to its head model over the whole
audio-frequency range.
Applications abound in environmental acoustics,
17
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
18/19
COOPER AND BAUCK
psychoacoustic and otological research laboratories,
and audiometric and otological clinics, to name a few.
In critical applications, replacement of earphones of
dubious characteristics and flawed exteriorization could
prove decisive.
6 CONCLUSIONS
We have shown that crosstalk canceling of well-pre
pared binaural-stereo program material, to make trans
aural recordings, can be accomplished with a technologythat is simpler than previously supposed, and can pro
duce recordings that may be played as ordinary stereo
recordings, but that reveal "amazing" natural spatial
and imaging effects that are more robust, with respect
to listener movement and playback acoustics, than pre
viously supposed. The recording of such "well-pre
pared" binaural material is seen as a crucial starting
point for making a good transaural recording.
Artistic considerations are of major importance, of
course, and we have also shown that recent technical
advances in understanding the importance of correct
equalizations must be implemented to support the artistic
intent. We have argued that this support requires im
plementation at the equipment-design level. We have
explored the relation of equalization with respect to
the maintenance of an excellent stereo effect under all
conditions of playback, with respect to the prospects
of monitoring with binaural headphones and with respect
to preserving the integrity of localization.
We have provided a brief survey of the variety of
processing that may be accomplished within our con
ception of transaural-binaural technology. This has in
cluded the processing necessary in record production
and a few items that the consumer could use to advan
tage. We also note instrumentation applications. The expectation is that some of this transaural-bi-
Fig. l3. Shuftler filter characteristics in crosstalk cancelingfor a specific listener head [16] and a loudspeaker placementof 30. Solid-line curves show magnitudes of 1/N and 11P normalized according to solid-line curve of Fig. 4. Dashedcurves show envelopes of alternations. Extensive idiosyncratic detail indicates that a crosstalk canceler based on curves ofFig. 3(a) would be more tolerant of variations in listeners' heads and positions.
18
PAPERS
naural technology would be implemented in the near
future as the industry begins to see how the technology
will help its practitioners reach their .goals more directly
and more easily. The eventual outcome of the infusion
of new technology may not be predicted with assurance,
but the prospects for a dramatic improvement in stereo
quality do appear bright.
7 ACKNOWLEDGMENT
We wish to thank the many persons who have listenedto our experimental transaural recordings and offered
their critical comments. We tried their patience with
recordings of differing equalizations, and some with
not-the-lowest noise floor, and their patience survived.
We would particularly like to thank those who offered
us playback facilities that happened to prove instructive
in regard to early reflections. Their patience was some
times not rewarded by hearing the merits we claimed.
In other cases, our own ineptness left a bad impression.
We are grateful, also, for those listeners who delighted
us by being entirely enthusiastic.
We owe special thanks to Wade Bray of Jaffe Acous
tics for providing us with digital tapes made with the
Aachen head. Our studies of these recordings impressed
us with the importance of reconsidering the whole
question of equalization.
Finally, y
-
8/10/2019 Prospects for Transaural Recording - Cooper e Bauck
19/19
COOPER AND BAUCK
[7) M. C. Killion, "Equalization Filter for Eardrum
Pressure Recording Using a KEMAR Manikin," J.Au
dio Eng. Soc., vol. 27, pp. 13-16 (1979 Jan./Feb.).
[8] B. S. Atal and M. R. Schroeder, "Apparent Sound
Source Translator," U.S. patent 3,236,949 (1966 Feb.
22).
[9] M. R. Schroeder and B. S. Atal, "Computer
Simulation of Sound Transmission in Rooms," IEEE
Conv. Rec., pt. 7, pp. 150-155 (1963).
[10] M. R. Schroeder, "Digital Simulation of Sound
Transmission in Reverberant Spaces," J.Acoust. Soc.Am., vol. 47, pp. 424-431 (1970 Feb.).
[11] M. R. Schroeder, "Computer Models for Con
cert Hall Acoustics," Am. J. Phys., vol. 41, pp. 461-
471 (1973 Apr.).
[12] M. R. Schroeder, "Models of Hearing," Proc.
IEEE, vol. 63, pp. 1332-1350 (1975 Sept.).
[13] P. Damaske, "Head-Related Two-Channel
Stereophony with Loudspeaker Reproduction," J.Acoust.
Soc. Am., vol. 50, pt. 2, pp. 1109-1115 (Oct. 1971).
[14] T. Mori, G. Fujiki, N. Takahashi, and F. Maruyama, "Precision Sound-Image-Localization Tech
nique Utilizing Multitrack Tape Masters," J. Audio
Eng. Soc. (Engineering Reports), vol. 27, pp. 32-38
(1979 Jan./Feb.).
[15] H. W. Gierlich and K. Genuit, "Processing Ar
tificial-Head Recordings," J. Audio Eng. Soc. (Engi
neering Reports), vol. 37, this issue, pp. 35-40. Also,
W. Bray, private communication (1987 Nov.)
[16] E. L. Torick, A. Di Mattia, A. J. Rosenheck,
PAPERS
L. A. Abbagnaro, and B. B. Bauer, "An Electronic
Dummy for Acoustical Testing," J. Audio Eng. Soc.,
vol. 16, pp. 397-403 (1968 Oct.). '
[17] B. B. Bauer, "StereophonicEarphonesandBinaural Loudspeakers," J.Audio Eng. Soc., vol. 9, pp.148-151 (1961 Apr.).
[18] H. MfZiller, "Cancellation of Crosstalk in Ar
tificial-Head Recordings Reproduced through Loud
speakers," J. AudioEng. Soc., vol. 37, this issue, pp.
31-34.
[19] S. Mehrgardt and V. Mellert, "TransformationCharacteristics of the External Human Ear," J.Acoust.
Soc. Am., vol. 61, pp. 1567-1576 (1977 June).
[20] D. H. Cooper and J. L. Bauck, "Corrections
to L. Schwarz, 'On the Theory of _Diffraction of a Plane
Soundwave Around a Sphere' ['Zur Theorie der Beu
gung einer ebenen Schallwelle an der Kugel,' Akust.
Z., vol. 8, pp. 91-117 (1943)]," J. Acoust. Soc. Am.,
vol. 80, pp. 1793-1802 (1986 Dec.).
[21] E. L. Torick, private communication (1975
Nov.).
[22] H. Mertens, "Directional Hearing in Stere
ophony- Theory and Experimental Verification," EBU
Rev., pt. A, no. 92, pp. 146-168 (1965 Aug.).
[23] J. S. Russotti, T. P. Santoro, and G. B. Haskell,
"Proposed Technique for Earphone Calibration,''
J.Audio Eng. Soc., vol. 36, pp. 643-650 (1988 Sept.).
[24] M. A. Gerzon, "Ambisonrcs in Multichannel
Broadcasting and Video,'' J.Audio Eng. Soc., vol. 33,
pp. 859-871 (1985 Nov.).
D. H. Cooper
THE AUTHORS
i
J. L. Bauck
Duane H. Cooper was born in 1923. He earned a Ph.D. in physics at California Institute of Technologyin 1955 and is currently associate professor of physicsand electrical engineering at the University of Illinois.He teaches circuits, systems, modulation, randomprocesses, electrodynamics, and acoustics. He contributed to the theory of disk recording, invented the skew-sampling method of tracing-error correction, andcontributed to the theory of multichannel stereo. Hemade the first prototype Cooper Time Cube, and heinvented the first working version (UMX) of thesoundfield stereo system now called Ambisonics. Dr.Cooper is a member of the American Physical Society, the Acoustical Society of America, a senior memberof the Institute of Electrical and Electronics Engineers,and a fellow and honorary member of the Audio En gineering Society. He has served the AES as governor,vice president, and president. He is now vice president
of the AES Educational Foundation. Dr. Cooper holdsthe Society's Emile Berliner Award and Gold Medal.
Jerald L. Bauck was born in 1955. He earned a B.S.
degree in electrical engineering at Kansas State University in 1977 and an M. S. degree in electrical engineering at the University of Illinois in 1979. He iscurrently an electrical engineering doctoral candidateat the University of Illinois. He worked for five yearswith Motorola' s government electronics group inScottsdale, Arizona, where he earned four patents andthe Motorola Engineering Award in 1983. Mr. Bauckis a member of the Institute of Electrical and ElectronicsEngineers and of the Audio Engineering Society. His current interests include tomographic imaging in synthetic aperture radar and audio imaging.