prospects for transaural recording - cooper e bauck

Upload: pelodiroccia

Post on 02-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    1/19

    PAPERS

    '

    Prospects for Transaural Recording*

    DUANE H. COOPER AND JERALD L. BAUCK

    University of Illinois, Urbana, IL 6180/, USA

    Transaural stereo, generic for binaural stereo processed for cancellation of loudspeakerto-ear crosstalk, results from the use of minimum-phase filters in shuffler configuration. Simplifying the filters further at short wavelengths makes the listener position noncritical.Full spatial qualities appear in a conventional stereo playback that avoids early reflections.Inverse shufflers provide precise transaural pan functions for multitrack work.

    0 INTRODUCTION

    Transaural stereo (generic term) is a stereo-system

    plan that, like binaural stereo, takes the end point of

    the recording-reproducing chain to be the actual sounds

    at the ears. It contrasts with the taking of loudspeaker

    sounds as the end point, which is necessarily the plan

    of conventional stereo. It differs from binaural in that

    the sounds for each ear, rather than being supplied by

    direct signal chains ending at earphones, result indi

    rectly, instead, from the preparation of structured

    composite signals to be supplied to the loudspeakers.

    1.1 Crosstalk Cancellation

    The composite-signal structure is subsequently in

    verted (decomposition) in the intervening loudspeaker

    to-ear transmission to produce the intended sounds at

    the ears. On the way to the ears, in addition to the

    direct transmission, left to left and right to right, there

    occur the cross transmissions of left to right and right

    to left. The latter are traditionally called crosstalk (from

    telephony), and the composition-decomposition

    scheme cited is a nonadaptive precancellation of crosstalk. It consists of the "planting" of a crosstalk process,

    in advance, that is devised to be the inverse of the

    acoustic crosstalk expected to occur subsequently. When

    properly done, the net result is the elimination of all

    evidence of crosstalk.

    1.2 Recording Binaural Signals

    Signals representing ear sounds may be recorded

    (binaural recording), in advance of crosstalk cancel-

    * Presented at the 85th Convention of the Audio EngineeringSociety, Los Angeles, 1988 November 3-6.

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

    lation, by two pickup methods. One uses microphones

    fitted in the ears of an artificial head. The other uses

    free-space microphones whose signals have been pro

    cessed to simulate transmissions around an acoustic

    obstacle (human head) to specific points on the obstacle

    (ears). _..

    The second of these pickup methods, including its

    source-to-ear processing, is known as binaural syn

    thesis, and it may include the processing of as many

    different microphone signals as may be suitable for a

    given project. It may also include reverberant-field

    synthesis as needed. The correspondence with multi

    track stereo synthesis is notable: pan functions replaced

    by binaural simulation for specific imaging directions

    and reverberation units replaced by simulation of spatial

    (binaural space) reverberation, such as being developed

    by Kendall et al. [1]. After the completion of all binaural

    processing, crosstalk canceling is the means of pro

    ducing the master transaural recording.

    For concert-hall recording, an artificial head would

    be used, and it and the orchestra would be deployed

    for optimal pickup. Under ideal conditions this may

    suffice. However, further microphone deployments maybe considered to represent early reflection and late

    reflection hall-sound pickup. The signals from these

    latter would be delayed and subjected to binaural syn

    thesis needed to produce the decorrelated ear sounds

    deemed suitable for hall-sound representation. The final

    step in the production is conversion to transaural.

    1.3 Transaural Options

    Some recording engineers may wish to use only a

    part of the transaural technology. In multitrack work,

    for example, it might be decided that only a few of the

    tracks require the precise imaging of binaural synthesis,

    3

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    2/19

    Abbozzare

    COOPER AND BAUCK

    or that only a portion of the performing ensemble re

    quires the spatial delineation available through artificial

    head pickup. Such artistic decisions remain, of course,

    with the producing authority, and it is the re!iponsibility

    of the engineer to provide incisive imaging, to the extent

    possible, where desired. Transaural technology may

    be viewed as providing improved options for that pur

    pose, not necessarily a whole new recording style.

    A better choice for incisive imaging, however, cannot

    be made. In a previous paper, Cooper, using calculations

    from Bauck's thesis [2], showed [3, Fig. 8] the required

    loudspeaker-signal specifications for two examples of

    imaging. None of the conventional stereo methods

    produces signals that in any way resemble these spec

    ifications, except at low frequencies. Conventional

    stereo has not sought to devise loudspeaker signals to

    meet imaging-signal specifications at the ears, as was

    required in these calculations, except in the low-fre

    quency work of Blumlein [4]. Specifically, none of the

    existing pan-pot formulas meet these specifications,

    nor do any of the stereo microphone arrays, whether

    coincident or spaced, whether using directional elements

    or not.Some recording engineers, seeking a spacious effect,

    use widely spaced microphones in a concert-hall setting.

    It is known, of course, that the signals so obtained are

    highly decorrelated, and it is also a known fact, in

    concert-hall acoustics, that highly decorrelated ear

    sounds are identified with spacious acoustic impres

    sions. Unfortunately, the interaural correlation wiii al

    ways be greater than the correlation at the loudspeakers,

    because of crosstalk. The net result is that the spacious

    effect is perceived as confined to an "acoustic stage,"

    as in a different space from that of the listener. An

    important aspect of the concert-hall experience is lost.

    The use of widely spaced microphones with binauralsynthesis and suitable delay, however, will give the

    recording engineer much greater control over the rep

    resentation of the sound of the hall. Thus many more

    venues may be exploited to advantage. At the same

    time, a full spatial envelopment of the listener can be

    provided to the extent desired. Many recording engi

    neers will discover, also, that imaging and spaciousness

    are not mutually exclusive, but, as has long been known

    in concert-hall acoustics, belong together. Placing them

    together is natural in transaural technology.

    At first the recording engineer wiii want to try only

    the simplest things from transaural technology. Indeed,

    it is likely that only the simpler equipment wiii become

    available at first. Existing techniques wiii necessarily

    continue to be used, and the improvements oftransaural

    technology wiii, in some instances, be adapted to that.

    For reviews of existing techniques, the writings of Ear

    gle may be consulted [5]. The evolution of such tech

    niques to suit a binaural style of recording is not amen

    able to detailed prediction, and will not be attempted here.

    It is possible, however, to sketch a catalog of specific

    kinds of transaural-related equipment, the developmentprevisto

    of which may be foreseen. Some of these items arediscussed in a later section.

    4

    PAPERS

    1.4 Binaural Monitoring

    Prospects for binaural monitoring apparently have

    advanced substantially in the recent decade. It had long

    been the experience that earphones characteristically

    produced "in-the-head" (interior) sounds and, with bi

    naural material, sounds that were much more vulnerable

    to front-back bias than is the case with natural hearing.

    The problem has been traced to a disturbance in the

    conch resonance of the human pinna. The conch is the

    principal cavity in the pinna, and its resonance involves

    its acoustic near field even at some distance from the

    ear. Disturbance of this near field causes an "at-the

    ear"judgment, as may be easily demonstrated by placing

    a hand near the ear. Earphones are ordinarily placed

    near enough to disturb this resonance (besides possibly

    deforming the pinnas). Equalization to restore the res

    onance restores natural, exterior hearing. A compli

    cation is that a significant part of the resonance effect

    varies with direction, so that a direction assignment

    for equalization seems necessary.

    A way to avoid a directional assignment has been

    sought via the use of a diffuse-field reference [6]. Analternate approach uses a frontal-incidence plane wave

    (free field) as the reference. The argument for the latter

    is that it avoids a front-back bias while not impairing

    back localizations. In a later section we find evidence

    to support a variation in this free-field approach.

    Th se issues take on a sharper focus if related to the

    style of equalization used for the artificial head. Ob

    viously, a free-field equalization for the head mandates

    the corresponding free-field equalization for the ear

    phones to be used with that head. In this case, a large

    part of the conch resonance is removed in the head

    equalization (a use of natural ear molds in artificial

    heads accounts for this resonance being modeled, although modeling of canal resonance has long been

    omitted) and then restored in the earphone equalization.

    Presumably a similar rationale supports diffuse-field

    equalization, both for the head [7] and for the earphones.

    We are unable to report any complete experience with

    diffuse-field equalization, but we can report remarkably

    good experiences with free-field equalization.

    The recording engineer should understand that the

    equalizations discussed here and below are not matters

    to be accommodated with the EQ facilities on a mixing

    board. It is appropriate to regard these design equali

    zation requirements as to be met internally, to be in

    herent characteristics of the device or of an accessoryspecific to the device. In the same way, the sometimes

    strenuous equalizations undertaken in some highly

    valued microphones are of concern to the design en

    gineer, not the recording engineer. It is sufficient if

    the recording engineer deems the overall characteristic

    as apt for his or her needs.

    The advent of binaural monitoring will prove to be

    a substantial convenience in comparison to loudspeaker

    monitoring, especially for location work or other sit

    uations in which access to a proper listening room is

    inconvenient. Transaural monitoring (with loudspeak

    ers) can, of course, be made available as needed.

    J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    3/19

    PAPERS

    1.5 Beginnings of Transaural Recording

    Transaural stereo had its first trials in 1962 by B. S.

    Atal and M. R. Schroeder [8]-[12]. They used a pow

    erful (for its day) mainframe computer, an IBM 7090,

    to perform digital finite-impulse response (FIR) filtering

    for crosstalk "planting" combined with equalization,

    using functions derived from testing an artificial head.

    They also saw binaural synthesis as a related process,

    and they were striving for a spatial synthesis of rever

    beration. Their transaural trials, however, were designed

    to reproduce the actual sounds of known concert halls

    and were based on binaural recordings made in those

    halls. In those days, earphone listening produced only

    interior sounds, so that transaural conversion was

    mandatory for their purposes. The Atal-Schroeder re

    sults were described [12] as "nothing less than amazing."

    The listener experienced authentic, exterior, spatial

    envelopment as well as authoritative imaging to the

    front and sides, in elevation, and even behind.

    Unfortunately the reports of this work left lasting

    impressions of a heroic technology producing fragile

    results: the listening space had to be anechoic, and thelistener could not move by more than about 10 or

    3 in (75 mm) without spoiling the effect. Later work

    by Damaske [13], with "90 crosstalk filters," a code

    word designation, did little to dispel these disheartening

    impressions. He found that reverberant listening spaces

    degraded the effect, damaging side imaging and causing

    front-back ambiguity. Other work over the past quarter

    century, including the Q-Biphonic development [14],

    has not significantly advanced the technology nor

    changed overall impressions of dim prospects for trans

    aural recording.

    1.6 Present ProspectsBrightened prospects are suggested by our work,

    reported herewith. By casting the crosstalk-canceling

    filters in shuftler form, we are able to greatly simplify

    the technology: a handful of operational-amplifier chips,

    or the equivalent on a digital signal-processing chip,

    suffice. This economy (not having to use FIR filters)

    is a consequence of our discovery that the shuftler con

    sists entirely of minimum-phase filters. The simplifi

    cation also reveals a structure that allows secure control

    over the design of equalization as independent of the

    crosstalk canceling. Thus we are able to simplify the

    crosstalk function, more particularly at short wave

    lengths, to make the effect of cancellation quite tolerant

    of listener movement.

    Listeners find that a 30 head rotation produces a

    benign, albeit noticeable to some, change in auditory

    perspective. Imaging at 90 is less tolerant. Com

    parable effects are noticed for lateral movement over

    a range equal to the loudspeaker spacing, but there is

    more tolerance for forward-backward motion. We have

    no data for transaural systems designed for a wider

    loudspeaker spacing, and we are not entirely satisfied

    with explanations we offer in Sec. 1.7. Perhaps some

    credit is owed to good equalization. 1

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

    PROSPECTS FOR TRANSAURAL RECORDING

    The significance of equalization has become clear

    to us through o1r.experiences with recordings made

    with a Neumann KU-80 head, which is equalized tqprovide a correct ear-canal entrance signal, arid with

    the Aachen head (Aachener Kopf, or AK) devised by

    Gierlich and Genuit [15], which is equalized for a flat

    free-field response for frontal incidence. Recordings

    from the KU-80 are unfit for loudspeaker playback (the

    KU-81 should do better), showing a poor stereo effect

    with a very large "hole in the middle," while the same

    playback from the AK shows a stereo excellence un

    attainable by any other known stereo array.

    With equalizations as given, crosstalk cancellation,

    besides revealing the qualities noted by Atal and

    Schroeder, actually "covers" the hole in the KU-80

    presentation, while for the AK, it corrects image

    placements, sharpens the images, extends the range of

    image placement, removes front-back ambiguity, ex

    tends the perception of depth, and completes the spatial

    envelopment. Thus the equalization we would have

    used (see explanation in Sec. 1.5), based on free-field

    incidence at 30 and being nearly the same as in the

    AK, was confirmed in its correctness.As a result of our experience with actual trials of

    differing equalizations in different rooms, we are in a

    position to be more precise than Damaske could be,

    about the significance of listening-space acoustics. It

    is, in fact, a misunderstanding that an anechoic space

    need be used. Atal .and Schroeder egarded listener

    space reverberation to be a contaminant in their studies

    of concert-hall sound, and they wished to exclude it.

    Specifically, we did identify one minor aspect of lis

    tener-space acoustics, one easily avoided, that accounts

    for the effects noted by Damaske.

    Jhe integrity of the crosstalk paths from loudspeakers

    to ears can be compromised by competing ttfl.ectedpaths that differ in delay from the primary paths by

    amounts of less than 1 (or perhaps 2) ms. Substantial

    contributions from such paths can begin to impair side

    imaging and allow some appearance of front-back

    ambiguity. Ordinary care taken in the setup to avoid

    significant early-reflection paths obviates any delete

    rious effects. Longer delayed reflections merely appear

    as "early" reflections in the concert-hall sense. These

    are attributed by the listener to the performing space,

    usually as minor augmentations in its reverberance.

    Ensuring good equalization guarantees that if a user

    is so careless with the setup as to allow early reflections,

    the playback of a transaural recording will exhibit a

    gradual degradation from a quality that is "nothing less

    than amazing" to one that is at least "excellent."

    1.7 Summary

    The principal purpose of this paper is to report on

    improvements we have discovered in a particular signal

    processing scheme, the crosstalk-canceling scheme of

    Atal and Schroeder. These improvements, which are

    largely practical, offer the possibility of a significant

    restructuring of stereo recording to make for extraor

    dinary improvements in stereo quality.

    5

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    4/19

    COOPER AND BAUCK

    We have cited economies in processing following

    from the discovery that minimum-phase filters may be

    used exclusively, have cited a revealed structure that

    allows the shaping of crosstalk filters independently

    of equalization filters, and have cited bases for robust

    results, in contrast to a previously supposed fragility.

    This robustness consists of a tolerance for listener

    movement and a tolerance for nonanechoic listening

    space acoustics.

    In the following sections we review the Atal

    Schroeder scheme and show how its lattice-arrayedfilters may be seen to be equivalent to a shuffler array,

    develop the corresponding formulas, and illustrate these

    with plots based on the spherical-model head. We cast

    the shuffler functions in a form that exhibits a factoring

    into an equalization part and a crosstalk-canceling part,

    and we illustrate the significance of these with plots

    taken from old data for the so-called CBS-NASA head

    [16].

    In so doing, we point out that crosstalk canceling is

    a process that is an inverse of the process we have

    called binaural synthesis, and we provide a block dia

    gram of a multiple-input binaural synthesizer.

    Finally, we turn to aspects of transaural technology

    that are less related to recording. We introduce theper cui

    concept of virtual loudspeakers, whereby a given pairof actual loudspeakers may be replaced by a number

    of virtual loudspeakers at arbitrary positions. This may

    be used to solve the problem of too closely spaced

    loudspeakers in stereo television, for example. It also

    may be used to present cinema-surround stereo via only

    two loudspeakers without loss of surround effect, or

    to similarly present full-sphere ambisonic surround.

    Also, we can resurrect a contribution by Bauer [ 17] to

    provide binaural-like listening to stereo material,

    making for inexpensive, accurate "Bauer boxes."

    1 CROSSTALK-CANCELING FILTERS

    1.1 Atai-Schroeder Filters

    The Atal-Schroeder crosstalk canceler is shown in

    Fig. 1(a), adapted from [12]. In Schroeder's notation,S represents the transfer function from a source (loud

    PAPERS

    realizable as an FIR digital filter, or a transversal analog

    filter, and so also for C2 Then 1/(1 - C2) is realized

    by placing the C2 filter in a recursive loop. The terminal

    filter 11S is not causal on its face, but with its impulse

    response padded with sufficient delay, the same in both

    channels, causal representations are obtained. These

    realizations were signal-processing routines in an IBM

    7090 computer.

    The impulse response plotted in Fig. l(b) is of short

    duration, which shows that crosstalk cancellation is

    speedily completed, requiring the listening space to beanechoic for only the first few milliseconds. This is

    equivalent to the finding, stated in the Introduction,

    that it is sufficient, in the listening setup, to exclude

    early reflection paths.

    The brevity of this impulse response bears also on

    questions of equalization style, as will be seen later.

    1.2 Shuffler Filters

    The Atal-Schroeder scheme may be seen to be

    equivalent to the lattice arrangement of filters shown

    in Fig. 2, provided that the filter in the cross path is

    (la)

    and that the one in the same-side path is

    (lb)

    These may be seen to be the matrix elements (S' on

    the diagonal, A' on the counterdiagonal) of the 2 X 2

    matrix that is inverse to the acoustic matrix evident

    from Fig. 1.

    The shuffler arrangement of filters, also shown in

    Fig. 2, may be seen to be equivalent to the lattice.

    There the filter for the sum of inputs with both parts

    positive is denoted by P', while the filter for the dif

    ference (sum with one input negative) has been denoted

    byN'. Equivalence demands that

    speaker) to a same-side (ipsilateral) ear, whileA is the

    transfer function to an alternate-side (contralateral) ear.

    The acoustic layout is symmetric, so that the transfer

    functions from the LF loudspeaker to the ears equal

    those for the RF loudspeaker. The notation C = -AIS

    is used for the filter in the cross path. Elementary al gebra may be used [11] to show that a signal introduced

    at the top left does indeed appear, unchanged and un

    S' A'N'

    2

    and

    S' + A'P'2

    (2a)

    (2b)

    contaminated, at the left ear of the listener, and so

    on.

    Schroeder also treats the requirements of causality

    (realizability) [11]. It is clear from first principles that

    A involves a greater delay than does S, so that C is

    Division by 2 would be omitted for difference-sum

    networks designed with uniform 3-dB losses so that,

    without loss of generality, we write

    causal. This is also seen from Fig. 1(b), adapted from

    M!IJller [18], which shows a plot of the impulse response

    c(T) for the cross filter C(w), as determined for the

    Neumann KU-80i head at 45 incidence. Thus C is

    N'

    and

    N S-A(3a)

    6 J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    5/19

    .A l A. ft.n_. AA\ Avv-vv1

    vv V V

    PAPERS PROSPECTS FOR TRANSAURAL RECORDING

    P'p

    S +A

    (3b)was then identifi d.as being of frequency-independent

    slope. The result was experimental in that mel_lsured,

    and smoothed response functions were used in thecal

    Thus the matrix of the shuffter transfer functions, di

    agonal with elements N' and P', is the inverse of the

    diagonal acoustic matrix for difference-and-sum ear

    sounds with elements N = S -A and P = S +A.

    1.3 Minimum-Phase Characteristics

    In 1977 Mehrgardt and Mellert [19] showed experimentally that the head-related transfer functions are

    of minimum phase to within a frequency-independent

    delay, a delay that is incident-angle dependent. They

    proceeded via the Hilbert transform of the log-magnitude

    response to calculate the minimum-phase part of the

    phase response. The remainder, or excess-phase part,

    L

    culations. Thus, Sand A have excess-phase parts that

    differ in the amount of frequency-independent delay.

    Considering S alone, however, Schroeder found the

    delay to be ignorable for the purposes of constructing

    liS, as we have seen.

    To discuss pairs of filter functions, we introduce the

    concept ofjoint minimum phase. To be of joint minimumphase, a set of filter functions is to have a common

    excess phase, and this excess is to be a (bounded) fre

    quency-independent delay. Then removing the excess

    phase to a common factor leaves a delay-normalized

    set of filters that are of minimum phase in the ordinary

    sense. They are also at least conditionally stable, so

    R

    J

    (a)

    (/)

    t:

    wZ(/) :::> 0 IZ>-00::a.c:x(/)0::w ..... -1O::ii)

    a:

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    6/19

    -

    COOPER AND BAUCK

    that products, ratios, and reciprocals are in the set.

    Thus A and S are not of joint minimum phase, and

    neither areA 1 and S1

    Of course, in the Atal-Schroeder filters, joint minimum phase is not at issue, since the 2 X 2 matrix has

    S 1 along the diagonal andA1

    along the counterdiagonal.

    On the other hand, the shuffler filter has N1

    and P1

    along the same diagonal (the counterdiagonal is zero)

    so that it would seem odd if N 1 and P1

    were not of joint

    minimum phase-odd because then the difference signal

    would be required to become more and more out of

    step with the sum, as frequency increases. However

    that may be, we made the same sort of check that

    Mehrgardt and Mellert had made and found that N1

    andP 1 are indeed of joint minimum phase, so also for

    Nand P.

    A practical consequence is that magnitudes alone,

    1Nl and IPI, or their reciprocals are a sufficient spec

    ification (phase is redundant as being calculable by

    Hilbert transform), whether for filter synthesis or in

    determining the head functions to be measured exper

    imentally. Also, since any common frequency-inde

    pendent delay may be omitted, the programming andhardware requirements of an FIR realization are re

    duced. In fact, a non-FIR (or IIR) filter may be pro

    grammed at low cost. Successive fitting of a cascade

    of biquadratic forms (ratios of frequency-dependent,

    second-order transfer functions) is a natural approach,

    and these take scarcely more than a half-dozen lines

    of code each in a typical DSP chip. In analog-filter

    synthesis, the "biquad" is also a natural choice of syn

    thesis element.

    1.4 Structure of 1Nl and IPI

    Head data are most often measured in the form of

    lA I, IS I, and IT I, of which the last is the interauralphase delay, redundant in that part calculable from the

    Lattice Shuffler

    PAPERS

    Hilbert transform of log lA! SI. Nevertheless, it is easy

    to see that these data are sufficient to determine 1Nl

    and I PI, since the magnitudes of the phasor difference

    and sum are, according to the triimgle rule, simply

    1Nl= (IAI 2 + ISI 2 - 2IASI cos w-r)l-2 (4a)

    and

    IPI =

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    7/19

    PAPERS PROSPECTS FOR TRANSAURAL RECORDING

    p

    E(eo)(8b)

    our loudspeaker playback of recordings from the Neu

    mann KU-80 h;a l. As indicated in the Introduction,

    the stereo effect"was of a "hole in the middle you could

    The reference direction has been taken to be 0 for the

    AK (Aachen head), but for loudspeakers to be placed

    at 30, a 30 reference would be more appropriate.

    When plotted with the same incidence angle as the

    reference, the frequency-response curves for I 0NI and

    I0PI intersect one another at the constant level 0 dB.

    Such a plot for a spherical-model head [2], [3] is shown

    in Fig. 3, solid line (a). These plots are actually for

    the reciprocals lWNI and lii0PI, as would be used ina crosstalk canceler, but the decibel scale makes it easy

    to interpret the plots also for the direct functions loNI

    and loP I. The dashed line shows a possible modification,

    to be discussed later. Curve (b) shows the equalization

    11IE 1. A crosstalk canceler based on these curves hasbeen tried, and its performance is extremely satisfying.

    Plots for more realistic models of the human head

    (see Fig. 13) resemble the solid-line plot (a) in Fig. 3

    but differ remarkably in equalization from plot (b).

    The reason is that the spherical-model head is one whose

    functions are quite smooth [20] because pinnas areomitted. Inclusion of pinnas on a realistic model invokes

    the large conch resonance, profoundly altering the 11curve.

    For example, data [21] for the CBS-NASA head pro

    vide the equalization curves of Fig. 4. A curve for 30

    and one for 0 are shown. For clarity, the plot was

    made with a 3-dB displacement inserted between the

    curves. It will be seen that the curves differ by little

    in comparison to the range of variation that they en

    compass. Thus a 0 equalization could substitute for a

    30 one with little effect, but omission of such equal

    ization would be a serious matter.

    The seriousness of the matter became evident from

    12r--r-r-- ---- -r

    10

    8

    6r----+ -- ---+ -- ----

    drive a truck through," as one listener said. Wli n co

    verted to transaural, using the crosstalk canceler built

    with the functions of Fig. 3, Schroeder's description

    of "nothing less than amazing" spatial and imaging

    qualities certainly applied, but it was possible to notice

    that the equalization was "a little off." Later, the ap

    pearance of a "hole" tendency in this recording would

    alert us to early reflections in a listening setup. As wealso noted, recordings from the Aachen head (0 equal

    ization) provided stereo of unequaled excellence by

    ordinary standards. Certainly, no "hole" was observed,

    even without cancellation.

    1.6 System Transfer Functions

    In the following, M will be used to designate either

    N or P. It will be understood to be a function of fre

    quency and incidence angle. Thus for natural directional

    hearing, either member of the pair of overall transfer

    functions from a source at angle e to the ears is designated

    Hn = Mn(e) (9)

    and it is the transfer function for the difference or sum

    in ear signals, depending on whetherNor Pis substituted

    for M. The sources are to be consiaered one at a time,

    whether a direct source or one of the many components

    of reverberation. Superposition is applicable in linear

    acoustics. The subscript n is used to designate a natural

    head, the head of the listener. A signal-theoretic basis

    for understanding directional hearing would begin at

    tbis point.

    .I8.--- --,- -rrT ---- -r-r-r

    6r--- ------ ----+----- ---- --

    CD

    4

    2

    j

    Fig. 3. Shuffier filter characteristics in crosstalk canceling for spherical-model head. (a) Magnitudes of l!N and 1/Pnormalized against curve shown in (b). Because curves (a)are free of the idiosyncratic detail for specific heads (as inFig. 13), such characteristics are tolerant of variations inlistener-head shapes and positions. Dashed curves show apossible modification of the envelope of the alteri:i'ations.Because the filters are of joint minimum phase, the ',Phasedata are redundant and not shown.

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

    Frequency

    Fig. 4. Equalization curves from data [21] for the CBS-NASAhead [ 16]. Curves for both 30 and 0 reference directionsare shown, displaced from one another by 3 dB. The decibelrange of variation (conch resonance) greatly exceeds anydifference.

    9

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    8/19

    COOPER AND BAUCK

    For listening via loudspeakers to a binaural recording,

    the transfer functions (subscript b) are designated as

    (10)

    in which the artificial head (or equivalent in binaural

    synthesis), subscript a, is designated as being equalized

    for the loudspeaker positions So. This equalization

    prevents the nondirectional part of the conch resonance,

    already present in the listener's ears, from being in

    troduced a second time, a minimum requirement of

    equalization.

    In this instance, the sounds at the listener's ears will

    be drawn from an extremely restricted set, in comparison

    to all possible sound combinations. The restriction is

    to that of linear combinations of the sounds at the ears

    of the artificial head, namely "shuffled" ear sounds,

    but combinations that otherwise closely resemble ear

    sounds themselves.

    For a reasonably apt artificial head the joint angle

    dependent spectral magnitudes will be determined by

    the artificial head, except for shuffling, to closely re

    semble those of natural hearing. The result, as confirmedin listening, is an extremely plausible directional por

    trayal, much more so than available by any means in

    conventional stereo. Even so, listeners do greatly ap

    preciate the improvement they experience in being

    provided "unshuffled" ear sounds, those that embody

    the alternations in their difference-and-sum spectra that

    are characteristics of their own ear sounds. Since these

    alternations depend strongly on the cosine of the in

    teraural phase, the significance of this element is con

    firmed. This unshuffling is provided in the crosstalk

    cancellation of transaural stereo.

    For transaural listening, the transfer function (sub

    script t) is

    H = 0Ma(S)Mn(S0)t oMx(So)

    (11)

    in which subscript x designates transfer functions of

    the head used to model the crosstalk cancellation. This

    equation shows the use of equalized functions, for the

    reference direction S0 , for both Ma and Mx If thesediffer in any of their characteristics, each is to be

    equalized against its own characteristics. The appear

    ance of a conch resonance (for So) is, as in the above,

    reserved for the listener's head M0

    IfMx is the same asM 0, for example, then Eq. (11)describes the simulation of natural directional hearing

    except for the substitution of the artificial head (and

    ears) for the listener's own. Clearly, if all three heads

    are the same, then Eq. (11) is identical to Eq. (9), and

    an exact simulation of natural hearing would be the

    result.

    In this last statement direct proof is seen of what is

    generally regarded as the "unquestionable validity" of

    the transaural plan for stereo recording and reproduc

    tion. Of course, this provides the transaural design

    engineer an extremely strong vantage position from

    10

    PAPERS

    which to undeitake departures in the service of prac

    ticality. It is usually one of the strengths of starting

    from an optimal position that departures from the op

    timum in design parameters usually produce remarkably

    small effects.

    1.7 Practical Design Considerations

    Except for a custom-designed crosstalk canceler, it

    is not to be expected that Mx will be the same asM 0,

    and a commercial release of a transaural recording would

    have to embody an Mx that would be required to besatisfactory for a wide range of listener heads, each

    with its own M 0 Generally this is not a difficult re

    quirement. It has been found, for example, that the

    crosstalk canceler based on a spherical-model head [2],

    [3] produces immensely satisfying results for a wide

    range of listeners' heads. Heads that are somewhat

    small may be placed somewhat nearer the loudspeakers,

    and those that are somewhat large may be placed at a

    somewhat greater distance, as may be seen from the

    structure of head functions, but the exact placement

    does not seem to be a critical matter for most listeners.

    What is probably the case is not that a sphere is

    necessarily a best fit, but that it is a "comfortable" fit

    for most heads just because of its inexactness. While

    the advantages of inexactness merit further exploration,

    we have tried another aspect for inexact treatment, the

    domain of wavelengths shorter than about 50 mm (fre

    quencies higher than about 6 kHz). The first experi

    mental crosstalk-canceler filters followed, after a

    somewhat abrupt transition, the null-crosstalk contour

    of Eq. (6) for the shorter wavelengths. We attribute

    the tolerance in listener movement to these aspects of

    inexactness inMx filters.

    The choice of a rather abrupt "cut" in our first ex

    perimental canceler may have been somewhat extreme.We do notice a tendency for sibilantlike sounds and

    clicklike sounds to be mislocated, generally toward

    the front. This is a confirmation, extended to short

    wavelengths, of the importance of interaural phase.

    Although this style of design variation has proved in

    structive, we are now inclined to rely on a more uniform

    distribution of inexactness, of which the spherical

    functions are a good example. Another variation of

    interest is that of introducing a gradual taper, as shown

    in Fig. 3(a), dashed line, wherein the upper and lower

    envelopes approach the null-crosstalk contour in a

    somewhat less accelerated manner for short wave

    lengths, replacing the more abrupt cut.We visualize these styles of inexactness as defining

    a volume of space near each ear of the listener, a space

    over which cancellation is satisfactorily accurate. We

    visualize this volume as being of smaller extent for the

    shorter wavelengths, and we suppose that it is appro

    priate to be less exact at these shorter wavelengths.

    We also believe, despite our successes with spherical

    functions, that we need to continue to investigate this

    problem. Thus the tolerance we have gained for listener

    movement, already satisfactory for most purposes, may

    be extended.

    J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    9/19

    PAPERS

    2 BINAURAL SYNTHESIS

    2.1 Synthesis Filters

    Shuffler filters based on the direct functions N and

    P are used to simulate the progress around the head tothe ears of two sounds of incidence angles ei at once.Instead of an inverse crosstalk filter, it is the direct

    crosstalk filter that is to be constructed. Of course, if

    only one of these signals is desired, one of the inputs

    to the shuffler may be left silent. Degenerate forms of

    the shuffler are used for 0 and 180.

    The shuffler synthesizer implements Eq. (9) in effect,

    but provides equalized ear signals instead, thus actually

    simulating the use of an equalized artificial head. The

    transfer function may be written

    (12)

    using the same convention thatM may stand for either

    of Nor P. The subscript s is used to denote head func

    tions used in synthesis, even though these might have

    been measured for an artificial head, a natural head,

    or derived from a mathematical model. The transferfunction is for a source simulated at position eio wherei is a symbol for the indexing over a discrete set of

    incidence angles.

    This transfer function may be written in greater detail

    as

    PROSPECTS FOR TRANSAURAL RECORDING

    particular, "tonal-color" characteristics, for the two

    ears jointly, are represented by the factor oE. It seems

    that this color is used in a part of the directional earing

    process (spectral pattern recognition) at a level below

    consciousness, but that only the directional result is

    presented at the level of consciousness. For example,

    speaking voices from behind would have an extremely

    "hollow" sound, as will be seen, if the hearing mech

    anism did not function as indicated.

    This hollowness can be heard only under exceptional

    circumstances, such as a binaural recording played

    without crosstalk cancellation. In this example, a voice

    was recorded with the AK while the speaking person

    moved around the artificial head. Listeners heard the

    voice move outside the space between the 30 loud

    speakers, barely into the side quadrants, in the listener

    space. While the original movement had been through

    to the back quadrant, the listeners heard movement

    that turned forward again into the front quadrant, but

    with an altered vocal quality, that "hollow" quality.

    Some listeners, when particularly neutral, transparent

    sounding loudspeakers were used, would hear the voice

    "jump" to the back quadrant before the change in qualityhad become explicit. The listeners that stayed with the

    frontal localization presumably did so because of the

    "visual knowledge" that the loudspeakers were in front

    and because the equalization of the AK is not exactly

    suited to 30 presentation. With, crosstalk cancel

    lation, the transitioq to the back quadrant was char

    H = (6i)Ms(6i)s (60 )(6i)

    which may be written in factored form as

    in which the factors are

    (6i)

    (6o)

    and

    (13)

    (14)

    (15)

    (16)

    acterized by continuity.

    Changes in tonal color for sounds presented in ele

    vation seem also to be responsible for impressions of

    elevated localization, and, again, the coloration is pre

    vented from appearing in consciousness. One of us

    re

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    10/19

    ""'-- - " \

    (\

    I 1/' !I

    "'I

    500l I \

    rt \

    I

    I II

    COOPER AND BAUCK

    exclusively of the phasor interaural transmission ratio

    AIS. Its magnitude determines K, and its phase is equal

    to the argument of the cosine in Eq. ( 17). Thus 'M and

    AI S are essentially equivalent signal-theoretic bases

    for their role in directional hearing. This role appears

    largely to be the determination of the lateral aspect of

    the localization angle, as distinguished from front

    back and elevation aspects . Plots of ISIA I and interauralphase delay may be found in Mertens [22]. These are

    adapted here as Fig. 5, showing interaural phase delay

    in microseconds, and Fig. 6, showing the interaurallevel difference ISIAI in decibel units, both plotted

    versus incidence angle.

    The plot of interaural phase delay, Fig. 5, shows

    clearly that a substantial part of that delay is frequency

    independent, seeming to plot a trend toward a high

    frequency-limit curve (lower bound) that is ramplike

    PAPERS

    in increase and decrease. A ray-acoustic approximation[3, Fig. l(b)] of this limit is (alc)(Sc -lee- SI+ sin8), where a is the head radius, for example, 90 mm; c

    is the speed of sound, for example, 345 mls; and Se =-rrl2. This is 671 f..LS at 8 = Se Increments in midfrequency delay arise from Hilbert transforms of log ISIAj. The Rayleigh, low-frequency, total-delay limit

    3(alc) sin 8, with 3alc = 783 f..LS, disagrees With the140-Hz plot. Heads other than this one of papier

    mache agree more closely with Rayleigh.

    However that may be, it is seen from Fig. 5 that the interaural phase delays for directions less than 90 very

    nearly mirror those for directions exceeding 90. Thus

    it may be supposed that interaural delay is not relied

    upon, to any great extent, in distinguishing sounds in

    back from those in front. On the other hand, it is seen

    that interaural delay is a steep function of direction,

    1300

    1200

    1100

    1000

    ,.,+--l I I I I

    I/ I

    " I II / \ II I I I \ II I I '\ I I

    f I \\

    I

    - 9001/1

    j.

    >-

    wVl 700

    I / \I I /- 1\I 140 / .- --

    -..' '\

    \

    I I 31 --- - \I

    / /;..,,

    I ',,\\ \

    w

    I \

    \ \j \ \'

    / I oo I AV \1\ Iuzwa::w

    I 1\ '\ \\ \1/ - \u..u..Ci 400

    I

    !-'rli .,\ ' \. \'

    . 'I. . I7f.. 7800 1\[\. \\

    300

    I /,.-

    I 1;' .M 2200 - \71. iVi '"- \

    200

    100

    /,: lti'--. 1100 '\.ll';(1 \ -;(/; 4200 \ \!M

    y ,-

    00 20 40 60 80 100 120

    INCIDENCE ANGLE (degl

    140 160

    \\180

    Fig. 5. Plots of interaural difference in phase delay versus angle of incidence for various frequencies (Hz). Diffraction theoryindicates a maximum delay at low frequencies that is much less than shown for 140Hz. A low-frequency mechanical resonancein this papier-mache head is suspected. Adapted from [22].

    12 J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    11/19

    I I

    ! !

    20

    '\

    I

    : I

    A \ I

    '

    I

    ,.,

    I --.

    26

    24

    22

    I

    10I

    !8 .

    I

    :6I I

    PAPERS

    for directions in front, and a steep function again, for

    directions in back. Thus a strong reliance may be placed

    on interaural phase, from a signal-theoretic point of

    view, for a precise determination of the lateral com

    ponent of direction.

    In'Eq. (17) it is seen that interaural phase appears,

    for the sum-and-difference ear sounds, as an alternation

    in their spectral magnitudes because of the cosine. The

    extent to which the alternation appears does depend

    on the value of K. The upper and lower envelopes of

    these alternations, the extrema in I 'MI.are given by

    I 'Mlex = (1 K)'h (20)

    in which K is determined from ISIA I, as shown in Eq.(18). Thus Fig. 6 may be studied to estimate the di

    rectional dependence of these envelope functions. To

    assist in this estimation, envelope contours for the three

    PROSPECTS FOR TRANSAURAL RECORDING

    highest curves in Fig. 6 have been computed, with the

    results plotted i i Fig. 7.

    Generally it may be said that Figs. 6 and 7 show that

    the upper and lower envelope contours lie closer together.

    at higher frequencies and for directions near 90, con

    ditions making for the deepest head shadow for the

    contralateral ear. Also, the alternations in magnitude,

    along the frequency scale, are most rapid near 90.

    Toward the front, less than 60, the alternations are

    largest in magnitude, as they also are toward the back,

    beyond 120. Thus the most reliance, for directional

    hearing, on interaural phase should lie toward the front

    or toward the back, and there should be somewhat less

    reliance on interaural phase near 90, but only at the

    higher frequencies. At 4 kHz and below, the reliance

    would appear to be substantial. Some front-back

    asymmetry is seen in these curves, but it is not clear

    whether directional hearing can rely on these asym-

    28

    II

    I I I i I \ I I I

    i

    I I I

    I II

    I I \ I I II ! i i I I 'II

    I! I I ,t, I ,I I I I I

    I I I I 1 1 \ I I I 1\1 II I I /I \ I \ ( II I f 'I II : I I 1

    I I I ,! w'I I II I I 'Ji ! I I ;

    ! I : I I \CO 18'0

    ....J

    16UJ....J

    z

    I I I I I I I I I Ii I r I I !\ I I J

    I 8obl I I ! \ \I I i \

    H 14UJu

    I..

    I ', ,_J1\ I I h \1/ \ -f.,./' .\lJ

    ,z

    12u..

    .r I 1\ .i I I

    a I4Zoq !If \ i I \ i \\

    I I : I Y. \ ! V I\

    I /!J'I09 f\1 \ 1\ I .,I '/1 I' I{ \,j I \l /: j

    I

    \ !\

    J

    I I

    I

    I 1 lII

    ..-2(0 \

    'I I

    :/ 'i/.-

    ',...\ -r 1,

    v \\,\_

    '/ft .. -)0 ,j \'- rA4

    I 1:f,Y --. -- --../1/ I

    2 I

    'J. \\\\

    /) I 310 \

    0 =-- L - ---!--- ,.. \ '

    0 20 40 60 80 100 120 140

    INc;IDENCE ANGLE (deg)

    160 180

    Fig. 6. Plots of interaural difference in leveliStAI ver ys angle of incidence for various frequencies (Hz). Adapted from

    [22].

    13

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    12/19

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    13/19

    .

    /' t ......

    7

    i

    2

    COOPER AND BAUCK

    metries to resolve front-back determinations. These

    asymmetries are not consistent with frequency, to be

    sure, but one should be wary of hearing's potential of

    making much of seemingly insignificant, even idio

    syncratic, detail. Nevertheless it behooves us to look

    elsewhere for the front-back cues.

    We have used head data supplied by Torick [21] to

    plot 0(6) against frequency for a reference direction

    60 = 90, a medial angle. Plots for 6 of 120, 150, and 180, a back-angle family, are shown in Fig. 8, while plots for the front-angle family, 0, 30, and 60,

    PAPERS

    are shown in Fig. 9. In Fig. 8 it is seen that differences

    in spectral transmission between back angles are not

    very large, and are mostly in the region above 4kHz.

    Between front angles, the differences are seen from

    Fig. 9 to be not very large either, and are mostly in

    the "presence" region from 1 to about 4kHz. Thus we

    come to the conclusion that there is little to rely upon,

    in terms of spectral color, for distinguishing among

    the back angles or among the front angles, although

    we cannot dismiss the possibility entirely that some

    reliance is placed there. It is clear, however, that the

    3

    -=- ............. f-"";.

    ,._, -

    -......: _...... .-:-.,I ''..._ ... ....\

    ...............}-.. . /./... ......... ----

    ./.,

    178oot-- J .........../r t---...i

    --V

    "' \\

    j-2

    J;200

    7;

    I'

    \

    .... / ...... '

    \ \

    {/ ocl ...._.1 \ \CD

    /

    -6 J .,

    Vr \ Ii\7

    I \ I

    /,\ \i \\ l V\

    \ I1i \

    V \ !\ 1/i \I j V i. \ \

    fli\. !

    . I \0 20 40 60 80 100 120

    INCIDENCE ANGLE (deg)

    140

    '"

    " 160 180

    Fig. 7. Plots of alternation envelopes square root of 1 K, versus angle of incidence for three highest frequencies of Fig.

    6.

    8

    2r----r-- rn---- --r- -n

    or- - --

    -2CD

    6

    4

    2

    CD 03

    -.:::::-

    3-4

    -6 ---+------4-----+--- ---+

    - -2(I)

    >

    j -4(I)

    _J

    -8-10

    -12L,--*-..l.-..l.....,..J.,,.W.....L...I..-!---+- Y....J.....:.'-W.:',0.1

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    14/19

    -

    6-8

    -10

    Frequency

    Fig. 8. Plots of normalized rms transmission to two earsjointly for a family of back angles of incidence. The reference

    -120.1 0.2

    Frequency

    direction for normalization is taken to be 90. Data are forthe CBS-NASA head. The contrast with Fig. 9 is remarkable.

    14

    Fig. 9. Plots as in Fig. 8, except that a family of front anglesof incidence are shown. The contrast with Fig. 8 is remarkable.

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    15/19

    1-'AI-'t:H:>

    two families are remarkably different from each other.

    A direct indication of front-back difference may be

    shown as in Fig. 10, plots of 0(6) for the reference

    direction of 0 and e equal to 90' 120' 150' and180. This is the front-back difference against 0. These

    curves indicate the front-back difference as charac

    terized by a marked depression in level in the range

    from about 1 to 6 kHz, along with elevations that are

    equally striking in the range from about 6 to 10kHz.

    The 90 curve is included as a "back" spectrum because

    its shape qualifies it as a family member and one whoseemphasis on high frequencies tends to elevate interaural

    level difference to an importance not identified else

    where. It is not difficult to imagine the "hollow" sound,

    as discussed above, that these transmission character

    istics would cause if they were ordinarily consciously

    heard. This altered spectral quality does indeed, how

    ever, appear to be the principal determinant of back

    sound in discrimination from front sound.

    This review of hearing characteristics allows certain

    rules of thumb to be identified, namely, that interaural

    phase, represented as amplitude alternations in the

    spectra of difference-and-sum ear signals, is the dis

    criminant for the lateral component of image position,

    while variations in joint spectral transmission are the

    discriminant for the front-back component and, pre

    sumably, for the elevation component. However, it

    also allows certain areas of uncertainty to be noted.

    These various points of observation will be of limited

    help in the design of the binaural-synthesis filters be

    cause the best rule will doubtlessly prove to be slavish

    simulations of the best measured head-related transfer

    functions available. At least these observations, together

    with those the designer's own experience may develop,

    will make for a certain intelligent slavishness.

    12.---.--.-.-. .----.--r-r -rrrn

    10

    8

    6 -- ------+--- -----r------r+

    4

    CD 2

    PROSPECTS FOR TRANSAURAL RECORDING

    2.3 Basic Synthesis Array'

    An array of filters for the sum and difference signals

    is shown in Fig. 11. On the right-hand side, several

    inputs are designated, one for each of the incidence

    angles to be simulated. For each left-angle input, a

    matching input is shown for the symmetric right-hand

    angle, and sum and difference signals are shown as

    being formed from these symmetric inputs. The signal

    pairs are then transmitted through 0N and op filters, 0N

    for difference signals and op for sum signals. Each ofthese filters is designed to match the specific angle

    designations for the inputs, 0N(6i) and0P(6i) separately

    for each Si designated at the input. The filtered difference

    signals are then combined in a common sum, and the

    filtered sum signals are then combined in a common

    sum. These common sums are then further combined

    in difference-and-sum fashion to form simulated bi

    naural outputs.

    This basic array may be thought of as a discrete

    angle, binaural, panoramic mixer. Variations may in

    clude linear mixing arrangements and level adjustments

    to be provided at each of the inputs. Also, some of the

    inputs may receive outputs from an array of reverber

    ators to form synthetic binaural-space reverberation

    systems in the manner of Kendall et al. [1]. Low-cost

    versions could provide for a very limited set of angles

    as supplementary pa functions for tise with standardmixing boards, and transaural outputs could be provided

    oo.---'1'------------'1'----1'----o Inputs

    R

    -85

    -90

    Outputs

    Frequency

    Fig. 10. Plots showing front-back difference as a family ofback-angle joint transmissions normalized against transmi sion for the incidence angle of 0, taken as the referencedirection.

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 J nuary/February

    Fig. 11. Basic binaural synthesizer. Inputs for a discrete,symmetric set of simulated incidence angles are sho"Yn: Amultiplicity of shu fier filter , J; ased on head tran.smtsswnfunctions, each spectfic to the mctdence angle to be stmulated, are used. The output is binaural.

    Hi

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    16/19

    1 Ensemble

    COOPER AND BAUCK

    for productions that are being developed primarily in

    conventional stereo. Further variations may be con

    ceived. Some of these are described hereafter.

    3 PROSPECTIVE PRODUCTION PROCESSORS

    3.1 Equalizers

    All commerically available artificial heads of quality

    suitable for professional applications stand in need of

    further equalization, since neither ear-canal nor diffuse

    field equalizations are appropriate in transaural re

    cording. Ear resonances can be allowed only once in

    the signal chain-in the listener's ear. Also, Fig. 1(b)

    shows that directional information is recieved too rap

    idly for any diffuseness to develop.Among those heads providing ear-canal signals, we

    list Neumann KU-80, KEMAR, 1 B&K 4128, and the

    Aachen head (AK). The extent of the 30 free-field

    equalization required may be estimated from Fig. 4,

    although those data are not specific to these heads. The

    Aachen head is also available with external free-field

    equalization for 0 incidence. The diffuse-field equal

    ization devised by Killion [7] for KEMAR, and providedby Neumann for their newer KU-81, reduces further

    equalization needs to moderate corrections, as is also

    true of the AK. Such further equalization may be pro

    vided by the manufacturer or a third party.

    All commercially available earphones for binaural

    monitoring similarly stand in need of 30 free-field

    equalization. A few manufacturers are currently pro

    viding diffuse-field equalization [6], and an (inadvis

    able) interest in such standardization continues [23].

    We are aware of only one earphone set, the Stax Pro

    Lambda, that has been accurately equalized against a

    free-field reference by a third party [15], but for 0,

    not 30. A decision by a third party to supply externalequalization for any but a selected few models entails

    a substantial risk that only professional needs could

    justify. Volume distribution of earphones suitably

    equalized by the manufacturer probably lies some dis

    tance in the future.

    3.2 Monitoring

    Facilities for earphone monitoring require 30 free

    field equalization as above, if it is not internal to the

    earphone. If the program material to be monitored is

    in the form of loudspeaker signals (whether transaural

    or conventional stereo), there would also be needed a

    binaural-synthesizer version of a circuit devised by

    Bauer [17], the so-called Bauer box. The two inputs

    would be processed to simulate 30.

    Loudspeaker monitoring would require transaural

    monitor equipment to derive the proper signals from

    binaural material. It could embody a crosstalk canceler

    of standard grade adopted for mass distribution. Some

    means of assurance of adherence to a standard would

    be needed for full reliance on such monitoring. Also,

    1 KEMAR is a registered trademark ofKnowles Electronics.

    16

    PAPERS

    the prospective availability of postproduction consumer

    equipment, such as Bauer boxes and loudspeaker

    placement compensators described below, requires such

    standardization.

    The acoustic characteristics of a loudspeaker-mon

    itoring facility demand the usual attention. In addition,

    the most accurate, most transparent, most self-effacing

    loudspeakers should be chosen for such use and placed

    to avoid early reflections.

    3.3 Haii-Sound-Pickup SynthesizerAn arrangement involving hall-sound-pickup mi

    crophones is shown in Fig. 12. Two omnidirectional

    microphones are flanking the artificial head. The signals

    from these are delayed and provided to a binaural syn

    thesizer. The latter may need inputs only for 90,

    120, and 150 to provide sufficient flexibility, es

    pecially if more than two hall-sound-pickup micro

    phones are needed in particular halls.

    For the flanking microphones not too far back from

    the orchestra, the hall-sound pickup would enhance

    early reflections (concert-hall concept) in the 10-20-

    ms range, and 90 synthesis angles would be suitable,along with a choice of delay only somewhat more than

    the microphone-head distance. For microphones placed

    far enough back to represent the whole reverberation

    field, synthesis would be at 120, with a delay some

    what more than the microphone-head distance. The

    150 synthesis angle would probably be used infre

    quently. The relative level would follow the usual pre

    scription of several decibels below that for a plainly

    audible effect. For good concert halls, an almost sub

    liminal contribution, if any, would be sufficient.

    3.4 Transaural Panoramic Mixer

    A transaural panoramic mixer is meant primarily asa supplement to the pan functions of an ordinary stereo

    mixer. It would be capable of replacing some of the

    existing facilities solely to enhance the imaging qualities

    by accurate synthesis for a limited number of channels,

    or for special effects. A transaural converter would be

    a part of the equipment.

    r -------------------------- 1

    L-------------------------

    Art.

    Head

    Binaural OutputL--::...._--.J _r--u

    Fig. 12. Layout for use of a hall-sound-pickup synthesizer.Flanking-microphone signals are delayed and subjected tobinaural synthesis simulating incidence angles from a limitedset of back angles. The binaural signals so derived are mixedat reduced level with the signals from the main-pickup artificialhead.

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    17/19

    PAPERS

    3.5 Binaural Panoramic Mixer

    A binaural panoramic mixer would be a full elabo

    ration of the basic synthesis array discussed in con

    nection with Fig. 11. It would otherwise correspond

    to a full stereo mixing console, except that binaural

    pan functions would be used and the signals would be

    in binaural format. Monitoring would be possible in

    either binaural or transaural format.

    3.6 Transaural Converter

    Transaural conversion need be done only once, except

    for monitoring, in the processing of a complete pro

    duction, and there are good reasons for doing it only

    once. The conversion would adhere to standards spec

    ified for mass distribution, and it would be executed

    in an off-line facility capable of providing standards

    assurance. At present, many producers use an off-line

    facility for the conversion of digital masters to a final for

    mat as release masters. A similar concept applies here.

    3.7 Processing Technology

    All of these processing concepts may be realized ineither digital or analog form. Conversions between an

    alog and digital data streams are, of course, to be kept

    to a minimum, and this consideration will determine

    the technology to be used in each instance. Equipment

    for some of the processing steps should be made avail

    able in both technologies.

    4 VIRTUAL LOUDSPEAKERS

    A virtual loudspeaker is a transaural image synthe

    sized to simulate the effect of a loudspeaker placed at

    a specified image location. The process involves bi

    naural synthesis followed by transaural conversion. For example, an experimental processor has been con

    structed that makes a pair of loudspeakers placed at

    15 sound as if the loudspeakers had been placed at

    30. Applications are indicated below.

    4.1 Correction of Loudspeaker Placement

    Some users may find that a loudspeaker placement

    that is convenient for their listening-room layout, and

    that avoids early reflections, may make for an incon

    venient listening position unless the equal-distance 30

    rule is violated. In such cases, virtual-loudspeaker

    electronics can provide a 30 impression for loud

    speakers placed at other angles. An adjustment for un

    equal distances may also be provided.

    4.2 TV Expander

    Another example of correction of loudspeaker

    placement is found in the so-called TV expander. Tele

    vision receivers usually offer cabinet-mounted loud

    speakers that are spaced much too close together (less

    than 15) to provide a good stereo effect. Present

    day TV expanders, usually involving some kind.of ad

    hoc processing of the difference channel, are to '.im

    precise to preserve the producer's intentions. The vir-

    J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

    PROSPECTS FOR TRANSAURAL RECORDING

    tual-loudspeaker expander is, in principle, exact.

    4.3 Centered Virtual Loudspeaker

    Sound systems for large-screen television applications.

    often lack a facility found in cinema exhibition, a cen

    tered, behind-the-screen loudspeaker, important for a

    realistic presentation of dialogue. The substitute phan

    tom image from two loudspeakers unfortunately does

    not sound the same as that single loudspeaker. A cen

    tered virtual loudspeaker would be a significant im

    provement.

    4.4 Virtual Loudspeakers in Back

    Some television sound systems are designed to supply

    special-effects signals (derived from cinema sound

    tracks) to loudspeakers placed behind the viewer. Un

    fortunately many viewers cannot provide space behind

    their favorite viewing position nor bear the expense of

    such loudspeakers. Virtual loudspeakers may be sub

    stituted. Similarly, certain ambience-enhancing systems

    require loudspeakers placed behind the listener. These

    also can be virtual.

    4.5 Surround Stereo

    Ear-sound-oriented, transaural stereo is a full-sphere

    (includes imaging in elevation) surround-stereo system.

    While it is most naturally used as a straightforward

    enhancement of the basic virtues of qonventional stereo,

    it may certainly be u:;ed to provide any of the astonishing

    demonstrations of loudspeaker-oriented quadraphonic

    systems of a previous era.

    An exemplary sound-field-oriented surround-stereo

    system [3] is the Ambisonic system UHJ, for which a

    substantial body of program material, in full-sphere B

    format [24], exists. Some of this may be recast, using

    virtual-loudspeaker processing, for rerelease,fn transaural format.

    5 INSTRUMENTATION-GRADE CANCELER

    A need exists for a crosstalk canceler satisfying the

    original aspirations of Atal and Schroeder. Accurate

    documentation of the subjective experience of a sonic

    event requires an instrumentation-grade artificial head

    and recording means, together with an acoustic pre

    sentation means of equal quality. Loudspeaker presen

    tation through an instrumentation-grade crosstalk can

    celer is the option that will provide full assurance that

    the sounds will be heard as exterior to the listener'shead.

    Such a canceler will use head functions as closely

    modeled on a replica of a representative head as possible

    and, where necessary, will use data taken for a specific

    listener. An example of canceler curves for a specific

    head is shown in Fig. 13. A digital canceler would be

    able to accept data files for different listeners and adjust

    the filters accordingly. In any case, the canceler would

    be accurately faithful to its head model over the whole

    audio-frequency range.

    Applications abound in environmental acoustics,

    17

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    18/19

    COOPER AND BAUCK

    psychoacoustic and otological research laboratories,

    and audiometric and otological clinics, to name a few.

    In critical applications, replacement of earphones of

    dubious characteristics and flawed exteriorization could

    prove decisive.

    6 CONCLUSIONS

    We have shown that crosstalk canceling of well-pre

    pared binaural-stereo program material, to make trans

    aural recordings, can be accomplished with a technologythat is simpler than previously supposed, and can pro

    duce recordings that may be played as ordinary stereo

    recordings, but that reveal "amazing" natural spatial

    and imaging effects that are more robust, with respect

    to listener movement and playback acoustics, than pre

    viously supposed. The recording of such "well-pre

    pared" binaural material is seen as a crucial starting

    point for making a good transaural recording.

    Artistic considerations are of major importance, of

    course, and we have also shown that recent technical

    advances in understanding the importance of correct

    equalizations must be implemented to support the artistic

    intent. We have argued that this support requires im

    plementation at the equipment-design level. We have

    explored the relation of equalization with respect to

    the maintenance of an excellent stereo effect under all

    conditions of playback, with respect to the prospects

    of monitoring with binaural headphones and with respect

    to preserving the integrity of localization.

    We have provided a brief survey of the variety of

    processing that may be accomplished within our con

    ception of transaural-binaural technology. This has in

    cluded the processing necessary in record production

    and a few items that the consumer could use to advan

    tage. We also note instrumentation applications. The expectation is that some of this transaural-bi-

    Fig. l3. Shuftler filter characteristics in crosstalk cancelingfor a specific listener head [16] and a loudspeaker placementof 30. Solid-line curves show magnitudes of 1/N and 11P normalized according to solid-line curve of Fig. 4. Dashedcurves show envelopes of alternations. Extensive idiosyncratic detail indicates that a crosstalk canceler based on curves ofFig. 3(a) would be more tolerant of variations in listeners' heads and positions.

    18

    PAPERS

    naural technology would be implemented in the near

    future as the industry begins to see how the technology

    will help its practitioners reach their .goals more directly

    and more easily. The eventual outcome of the infusion

    of new technology may not be predicted with assurance,

    but the prospects for a dramatic improvement in stereo

    quality do appear bright.

    7 ACKNOWLEDGMENT

    We wish to thank the many persons who have listenedto our experimental transaural recordings and offered

    their critical comments. We tried their patience with

    recordings of differing equalizations, and some with

    not-the-lowest noise floor, and their patience survived.

    We would particularly like to thank those who offered

    us playback facilities that happened to prove instructive

    in regard to early reflections. Their patience was some

    times not rewarded by hearing the merits we claimed.

    In other cases, our own ineptness left a bad impression.

    We are grateful, also, for those listeners who delighted

    us by being entirely enthusiastic.

    We owe special thanks to Wade Bray of Jaffe Acous

    tics for providing us with digital tapes made with the

    Aachen head. Our studies of these recordings impressed

    us with the importance of reconsidering the whole

    question of equalization.

    Finally, y

  • 8/10/2019 Prospects for Transaural Recording - Cooper e Bauck

    19/19

    COOPER AND BAUCK

    [7) M. C. Killion, "Equalization Filter for Eardrum

    Pressure Recording Using a KEMAR Manikin," J.Au

    dio Eng. Soc., vol. 27, pp. 13-16 (1979 Jan./Feb.).

    [8] B. S. Atal and M. R. Schroeder, "Apparent Sound

    Source Translator," U.S. patent 3,236,949 (1966 Feb.

    22).

    [9] M. R. Schroeder and B. S. Atal, "Computer

    Simulation of Sound Transmission in Rooms," IEEE

    Conv. Rec., pt. 7, pp. 150-155 (1963).

    [10] M. R. Schroeder, "Digital Simulation of Sound

    Transmission in Reverberant Spaces," J.Acoust. Soc.Am., vol. 47, pp. 424-431 (1970 Feb.).

    [11] M. R. Schroeder, "Computer Models for Con

    cert Hall Acoustics," Am. J. Phys., vol. 41, pp. 461-

    471 (1973 Apr.).

    [12] M. R. Schroeder, "Models of Hearing," Proc.

    IEEE, vol. 63, pp. 1332-1350 (1975 Sept.).

    [13] P. Damaske, "Head-Related Two-Channel

    Stereophony with Loudspeaker Reproduction," J.Acoust.

    Soc. Am., vol. 50, pt. 2, pp. 1109-1115 (Oct. 1971).

    [14] T. Mori, G. Fujiki, N. Takahashi, and F. Maruyama, "Precision Sound-Image-Localization Tech

    nique Utilizing Multitrack Tape Masters," J. Audio

    Eng. Soc. (Engineering Reports), vol. 27, pp. 32-38

    (1979 Jan./Feb.).

    [15] H. W. Gierlich and K. Genuit, "Processing Ar

    tificial-Head Recordings," J. Audio Eng. Soc. (Engi

    neering Reports), vol. 37, this issue, pp. 35-40. Also,

    W. Bray, private communication (1987 Nov.)

    [16] E. L. Torick, A. Di Mattia, A. J. Rosenheck,

    PAPERS

    L. A. Abbagnaro, and B. B. Bauer, "An Electronic

    Dummy for Acoustical Testing," J. Audio Eng. Soc.,

    vol. 16, pp. 397-403 (1968 Oct.). '

    [17] B. B. Bauer, "StereophonicEarphonesandBinaural Loudspeakers," J.Audio Eng. Soc., vol. 9, pp.148-151 (1961 Apr.).

    [18] H. MfZiller, "Cancellation of Crosstalk in Ar

    tificial-Head Recordings Reproduced through Loud

    speakers," J. AudioEng. Soc., vol. 37, this issue, pp.

    31-34.

    [19] S. Mehrgardt and V. Mellert, "TransformationCharacteristics of the External Human Ear," J.Acoust.

    Soc. Am., vol. 61, pp. 1567-1576 (1977 June).

    [20] D. H. Cooper and J. L. Bauck, "Corrections

    to L. Schwarz, 'On the Theory of _Diffraction of a Plane

    Soundwave Around a Sphere' ['Zur Theorie der Beu

    gung einer ebenen Schallwelle an der Kugel,' Akust.

    Z., vol. 8, pp. 91-117 (1943)]," J. Acoust. Soc. Am.,

    vol. 80, pp. 1793-1802 (1986 Dec.).

    [21] E. L. Torick, private communication (1975

    Nov.).

    [22] H. Mertens, "Directional Hearing in Stere

    ophony- Theory and Experimental Verification," EBU

    Rev., pt. A, no. 92, pp. 146-168 (1965 Aug.).

    [23] J. S. Russotti, T. P. Santoro, and G. B. Haskell,

    "Proposed Technique for Earphone Calibration,''

    J.Audio Eng. Soc., vol. 36, pp. 643-650 (1988 Sept.).

    [24] M. A. Gerzon, "Ambisonrcs in Multichannel

    Broadcasting and Video,'' J.Audio Eng. Soc., vol. 33,

    pp. 859-871 (1985 Nov.).

    D. H. Cooper

    THE AUTHORS

    i

    J. L. Bauck

    Duane H. Cooper was born in 1923. He earned a Ph.D. in physics at California Institute of Technologyin 1955 and is currently associate professor of physicsand electrical engineering at the University of Illinois.He teaches circuits, systems, modulation, randomprocesses, electrodynamics, and acoustics. He contributed to the theory of disk recording, invented the skew-sampling method of tracing-error correction, andcontributed to the theory of multichannel stereo. Hemade the first prototype Cooper Time Cube, and heinvented the first working version (UMX) of thesoundfield stereo system now called Ambisonics. Dr.Cooper is a member of the American Physical Society, the Acoustical Society of America, a senior memberof the Institute of Electrical and Electronics Engineers,and a fellow and honorary member of the Audio En gineering Society. He has served the AES as governor,vice president, and president. He is now vice president

    of the AES Educational Foundation. Dr. Cooper holdsthe Society's Emile Berliner Award and Gold Medal.

    Jerald L. Bauck was born in 1955. He earned a B.S.

    degree in electrical engineering at Kansas State University in 1977 and an M. S. degree in electrical engineering at the University of Illinois in 1979. He iscurrently an electrical engineering doctoral candidateat the University of Illinois. He worked for five yearswith Motorola' s government electronics group inScottsdale, Arizona, where he earned four patents andthe Motorola Engineering Award in 1983. Mr. Bauckis a member of the Institute of Electrical and ElectronicsEngineers and of the Audio Engineering Society. His current interests include tomographic imaging in synthetic aperture radar and audio imaging.