recommended methods and values for digitization projects of audio archives

29
MANDATORY PRINCIPLES, RECOMMENDED METHODS AND VALUES FOR DIGITIZATION PROJECTS OF AUDIO ARCHIVES Mihai DUMITRU Romanian Radio Broadcasting Corporation, Technical Department ABSTRACT Standards and best practices are provided as foundation for preservation work by outlining expectations and goals for the output of capturing analogue sound in digital preservation system. It is critical that capturing analogue sound technologies, formats, procedures, and techniques developed by technical experts to be adequate implemented, ensuring the high-quality output of a preservation system with longer usability, sustainability, and products interoperability. The digitization of analogue sources as PCM audio at 96 kHz or 192kHz sample frequency and 24bit word length were evaluated and presented as the best compromise, using mathematical formulae, simulation software applications and data sheets product information, instead of electrical measurements support, and dynamic range evaluation in a complete audio signal chain correlated with statistics about the human hearing sense, instead of subjective evaluations and listening tests. Trade-offs of high sampling frequencies in typical real digital systems and consideration about recent promoted technical specifications, established consensus and perception tests were presented. 1 INTRODUCTION Long-term preservation of audio data is hopeless: the carriers are unstable, the commercial lifetimes of the formats seem to become shorter and shorter, and the amount of data to be stored increases every day. Magnetic tapes, for analogue and digital audio recordings, typically contain the following materials: magnetic oxide, polyurethane binder, polyester base and carbon back coating. Magnetic Particle Binder Lubricant Reservoir Top Coat Back Coat Substratum All three components - magnetic particle, binder, and backing - are potential sources of failure for a magnetic tape medium. Polyester base is considered very stable under typical storage conditions. The binder is subject to hydrolysis. The result is sticky syndrome or sticky shed. The Magnetic-Media Industries Association of Japan (MIAJ) has concluded that the shelf life of magnetic tape under normal conditions is controlled by the binder rather than the magnetic particles ("DDS Specs Drive DAT Reliability," Computer Technology Review, 13 (5), May 1993). In this instance, the shelf life would refer both to the life of recorded as well as unrecorded media; the life of the binder is independent of whether or not the tape has ever been recorded. Accordingly, analogue and digital tape formats share many of the physical attributes as it relates to aging and life expectancy. Digital’s greatest advantage is each duplicate matches the quality of the original. The risk is, when failure occurs, it is complete failure. 1/29

Upload: mihaidumitru

Post on 03-Oct-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Guide

TRANSCRIPT

  • MANDATORY PRINCIPLES, RECOMMENDED METHODS AND VALUES FOR DIGITIZATION PROJECTS OF AUDIO ARCHIVES

    Mihai DUMITRU Romanian Radio Broadcasting Corporation, Technical Department

    ABSTRACT

    Standards and best practices are provided as foundation for preservation work by outlining expectations and goals for the output of capturing analogue sound in digital preservation system. It is critical that capturing analogue sound technologies, formats, procedures, and techniques developed by technical experts to be adequate implemented, ensuring the high-quality output of a preservation system with longer usability, sustainability, and products interoperability. The digitization of analogue sources as PCM audio at 96 kHz or 192kHz sample frequency and 24bit word length were evaluated and presented as the best compromise, using mathematical formulae, simulation software applications and data sheets product information, instead of electrical measurements support, and dynamic range evaluation in a complete audio signal chain correlated with statistics about the human hearing sense, instead of subjective evaluations and listening tests. Trade-offs of high sampling frequencies in typical real digital systems and consideration about recent promoted technical specifications, established consensus and perception tests were presented.

    1 INTRODUCTION Long-term preservation of audio data is hopeless: the carriers are unstable, the commercial lifetimes of the formats seem to become shorter and shorter, and the amount of data to be stored increases every day. Magnetic tapes, for analogue and digital audio recordings, typically contain the following materials: magnetic oxide, polyurethane binder, polyester base and carbon back coating.

    M agne tic P a rtic le

    B inde r Lub rican t R ese rvo ir

    T op C oa t

    B ack C oa t

    S u b s tra tu m

    All three components - magnetic particle, binder, and backing - are potential sources of failure for a magnetic tape medium. Polyester base is considered very stable under typical storage conditions. The binder is subject to hydrolysis. The result is sticky syndrome or sticky shed. The Magnetic-Media Industries Association of Japan (MIAJ) has concluded that the shelf life of magnetic tape under normal conditions is controlled by the binder rather than the magnetic particles ("DDS Specs Drive DAT Reliability," Computer Technology Review, 13 (5), May 1993). In this instance, the shelf life would refer both to the life of recorded as well as unrecorded media; the life of the binder is independent of whether or not the tape has ever been recorded. Accordingly, analogue and digital tape formats share many of the physical attributes as it relates to aging and life expectancy. Digitals greatest advantage is each duplicate matches the quality of the original. The risk is, when failure occurs, it is complete failure.

    1/29

  • Analogue technology is well documented. Operators and engineering expertise are still available. Over time this will diminish, as the younger people in the industry are not using analogue as the primary recording method. On the other hand, special knowledge is necessary about the technology used for digital medium formats: CD-R and DVD-R both use organic dyes that respond similarly to temperature and humidity over time. Manufactures have conducted accelerated aging tests by subjecting discs to higher temperature/humidity then extrapolating the future failure point. Using similar methods, National Institute of Standards & Technology (NIST) found DVD-R authoring discs to last 25 years. Replicated discs, DVD Video & audio CDs use aluminium as the reflective layer (CD-R/DVD-R use a more stable Silver) that can be subject to rot as the metal oxidizes if not properly sealed during manufacturing. All DVD discs are constructed by gluing two polycarbonate discs together. There are two methods: one is the UV cured lacquer (considered more stable) and thermal melt glue. Separation of the discs can cause failure. MO Discs use heat and magnetism to mark the disc. A NIST study concluded with 95% confidence that an MO disc will last 57 years at room temperature at 90% humidity. Helical scan data recording formats for digital tape (AIT-SAIT-DTRS-DAT-HDCAM-Hi8) are considered higher risk because misalignment of the recording heads or warp-age of the media base can cause data retrieval failure. Sony quotes 30 years life expectancy under proper storage conditions (from Sony web site). Linear recording formats for digital tape (DLT-SDLT-LTO1, 2, 3, 4) are considered more reliable because of the fixed position recording head. Metal particle (MP) or metal evaporated (ME) tape both does not use binders for extra thin coating providing better wrap and signal performance but at a price of being more fragile1. All practical media degrade. The substrate and the active layer deteriorate over time. This can be slowed by optimal storage, but not halted. With good quality media and optimal storage, magnetic recordings will last for many decades. The problem is always that of finding a drive that will play the medium. (In 20 years time, how many 1/4-inch open reel recorders will be around? How many DAT machines?2) From this point forward only digital solutions will be available. Organizations responsible for the preservation of audio elements can expect the digital media formats to evolve faster than before in history, resulting in less compatibility regionally and worldwide. The challenge is to choose archive strategies that compensate for these dynamics. One of the things digital recording does very reliably is not to cause generation loss. So, if by reliability we mean the ability to record audio for an indefinite period, then digital becomes the only choice. Unlike analogue recording, digital audio can be copied and recopied without loss of signal or addition of noise or distortion. The potential offered by the production of digital surrogates for the purpose of preservation seems to provide an answer to linked issues of preservation and access.

    2/29

  • A well-engineered digital recorder has an effective error correction system that puts the data back to its original binary value provided the error rate is within limits. Thus it is possible to determine if a digital medium is deteriorating by playing it once in a while. If the error rate starts to increase, we know that the medium is destined to disaster. However, if we make a digital copy onto a new medium (which may even be a new type of medium) before the error rate exceeds the performance of the error correction, or before the player becomes obsolete, we can start the race again. Thus the question of reliability comes down to how well the system for doing that digital copying is administered. A survey done with ten broadcasters in Europe indicates that the audio holdings are mainly in quarter inch tapes and shellac and vinyl discs, besides cassettes, DAT, CD, minidisks and Tandberg QIC cartridges. For all broadcasters the problems with the analogue replay equipment appear more and more difficult to overpass them, but those caused by digital carriers in the same situation are even greater. In consequence are enough arguments, before the digital recording technology it is made on becomes obsolete, for a migration of all recordings made in digital form to a stable storage system and format. That means, the digital preservation should be initiated (a planned, organised and standardised transfer to suitable storage), as soon as possible, enabling the digital file to be successfully and simply migrated when necessary3. The automatically accessible, self-controlling and self re-generating archival system, also known as digital mass storage system (DMSS), is the most appropriate solution. The features of such system are4, ,5 6: The management of audiovisual data as computer files in mass storage systems, e.g. libraries or robotics of magnetic tape cartridges. An open file architecture to accommodate all audio data together with catalogue / content information and written text (metadata). The access time of such systems is not of major importance. Data integrity is controlled automatically, and copying of the information onto new carriers is done automatically before mistakes cannot be fully corrected. Once new storage media and systems are available due to technical development, automated migration will be implemented. Similar observations are done in IASA after extensive pilot projects, digital mass storage systems (DMSS) have been installed in major archives for the storage of large audio collections. Such systems permit the automatic performance of tasks including checking of data integrity, refreshment, and, finally, migration with a minimum use of manpower (cf. IASA-TC 04, 6.2). The benefits of using a networked radio station with digital mass storage versus using a conventional radio sound archive are broadly described in several documents.7,7

    2 CONSERVATION, MIGRATION, RESTORATION In fact, before the digital preservation procedure start-up, there are several types of project that must be initiated8: Conservation: improving storage conditions to make existing material last longer.

    3/29

  • Migration: this is the main immediate work for audio and video materials, and for decaying film. Preservation via migration consists of transferring master material from old formats to new ones. The same technical process, namely transfer from one format to another, is also used to make viewing copies, usually in lower quality. All formats benefit from cost effective methods for production of viewing copies. Restoration: Archive media can be of varying quality, and modern technology can significantly improve the result of a migration (transfer) process. Restoration has tended to be seen a too expensive to incorporate in cost-effective transfers, but some of the data necessary for restoration is calculated as a matter of course in the digitisation stage. To integrate the workflow so that restoration becomes more cost effective, so that more archive material can benefit from the power of digital restoration, it could be a major goal for this kind of project. Presto published a model for a range of activities associated with digitisation, from all the actions needed to identify materials for transfer, gather them together and transport them, and their metadata, to the digitisation area providing web access and updating the metadata (catalogue). These additional activities are shown in the following figure.

    Digitisation

    Create a digital master copy plus low-data rate

    versions

    Composition

    Identify andassemble materials

    New Media Creation

    Create a new archiveitem (physical or

    electronic)

    Update Archive

    Replace old itemwith new update

    metadata

    Figure 1 Digitisation and associated activities Of course, the above figure is by no means everything. Each of the boxes could be broken down into many more steps and costs. As an example, the Composition process could be broken in more steps, each with two parallel streams: Actions concerning documentation (metadata); Actions on the actual media The primary goal of transfer work for digital preservation being the creation of a surrogate (as an accurate, authentic, and very high quality representation of the original), it is necessary to identify the adequate equipment and operating personnel that could be involved in the preservation system. Both the abilities of staff and the equipment used greatly impact the success of the analogue playback stage. The engineer must understand how field recordings carried on obsolete and deteriorating historic formats may be optimally reproduced despite degradation, taking into account specific characteristics of both the individual recording and the format itself. The engineer must also align, calibrate, and verify the performance of the playback machine, which itself must be able to reproduce the recording at the highest fidelity possible 9. Recommendations for an experienced preservation transfer personnel

    4/29

  • IASA-TC 0310 and TC 0411, in addition to stating that equipment must be optimally adjusted and maintained, suggest that playback requires knowledge of the historic audio technologies and a technical awareness of the advances in replay technology. The CLIR/LC12 report, Capturing Analog Sound, addresses this directly, suggesting that there are many areas in which a trained ear and years of experience are by far the most important tools. in some archives, fragile audio recordings are being handled, played, and transferred for digital preservation by staff who have limited experience working with audio recordings or little knowledge about the sonic characteristics and weaknesses of various audio formats. Recommendations and basic audio engineering principles regarding all signal chain components and technical spaces used for preservation transfer work IASA-TC 04 stipulates: The combination of reproduction equipment, signal cables, mixers and other audio processing equipment should have specifications that equal or exceed that of digital audio at the specified sampling rate and bit depth. The quality of the replay equipment, audio path, target format and standards must exceed that of the original carrier. The CLIR/LC report discusses the need for accurate monitoring systems to evaluate quality as well as test equipment to evaluate potential problems. Richard Warrens storage document published in the ARSC Journal13 recommends a Noise Criteria-level of 20-25 dB for critical listening areas. More generally, he also calls for consideration of the proper acoustical conditions to prevent the room from distorting the sounds to be studied. According to IASA-TC 04, any transfer should attempt to extract the optimal signal from the original [as] the original carrier may deteriorate, and future replay may not achieve the same quality, or may in fact become impossible, and secondly, signal extraction is such a time consuming effort that financial considerations call for optimization at the first attempt.

    Taking care of conclusion that the most direct and clean signal path must be used from source to destination, it is very important to underline the weakest link in the digital chain: the point of conversion from analogue to digital. The choices made regarding conversion technologies, and the selection of digital formats, resolutions, carriers and technology systems will impose limits on the effectiveness of digital preservation that cannot be reversed, as will the quality of audio being encoded. Optimal signal extraction from original carriers is the indispensable starting point of each digitization process14.

    Then, having faced the need to copy, the selection of storage format becomes next major issue. The sound archiving community is rallying around the European Broadcast Unions Broadcast Wave Format (BWF). BWF [EBU Tech 3285] is a format that complies with the specification of the .wav format but has included a number of metadata tags in same manner as TIFF (tagged image file formats) has done for images. The International Association of Sound Archives recommends the use of linear BWF files for archiving: because of the simplicity and ubiquity of

    5/29

  • linear PCM (interleaved for stereo) The BWF format is widely accepted by the archiving community All responsible archiving groups and associations strongly argue against the use of any format that uses lossy data compression or perceptual coding in archival recordings, or in recordings eventually intended for archives. MP3 (MPEG 2 layer 3), minidisk and any form of streamed audio are all formats which employ bit rate reduction or data compression, and should not be used in archival processes, including field recording. It is not possible to uncompress recorded audio that uses perceptual coding; instead the part of the audio that is discarded remains forever lost, permanently limiting the quality and use of that audio thereafter.

    3 RECOMMENDED METHODS TO IDENTIFY THE BEST POSSIBLE A/D CONVERTERS FOR CRITICAL AUDIO APPLICATIONS

    The conversion and storage system consists of three parts, the analogue to digital conversion hardware, the computer system and the storage system. We are aware of the difficulties using transducers in a complete audio signal chain to convert signals from acoustical to electrical, what is done by the microphone, and back again from electrical to acoustical, by the loudspeaker. But, trying to keep meaningfully audible audio signals for indefinite long time, it is necessary to store them in the best conditions offered by digital domain. Hence, the A/D converter becomes the key component in the signal path, as the choice of the A/D converter irrevocably affects the fidelity of the resulting signal. To assess the degree of transparency, the converters electrical measurements and subjective aural performance, as well as the converters operating parameters such as sampling frequency and word length, must be considered. Finally, the signal-level input to the converter, converter-component design, and external conditions such as grounding and shielding can greatly affect the fidelity of the resulting file. Choosing an A/D converter must be based on an evaluation of technical measurements and of subjective listening. In converting analogue audio to a digital data stream, the analogue to digital converter should not colour the audio or add any extra noise. It must exhibit audio transparencythat is, it should neither add to nor subtract from the sound. In practice, the A/D converter incorporated in a computers sound card does not, and cannot, meet the specifications required due to low cost circuitry and the inherent electrical noise in a computer. A discrete (stand alone) A/D converter that will convert from analogue to digital in accordance with the professional specifications is always recommended. The more recent generations of computers have sufficient power to manipulate large audio files. Once in the digital domain, the integrity of the audio files should be maintained. As noted above, the critical point in the preservation process is converting the analogue audio to digital, and this

    6/29

  • relies on the A/D converter, and entering the data into the system, either through the sound card or other data port.

    Requirements (dynamic range) for digital audio processingMixing consoles >288 dB

    Storage >144 dB

    Available distribution mediaCD-A 96 dBSACD (with noise shaping)120 dBDVD-A 144 dB

    Peak Levels in Music Performances

    Classical music 90-118 dB SPL

    Rock music 115-129dB SPL

    Jazz music 114-127 dB SPLOthers 116-127 dB SPL

    Headroom6-9 dB

    Footroom6-9 dB

    Techniques for increasing

    dynamic range

    Just Audible Noise Level

    Mean threshold ~4 dB SPL (for 20-kHz low-pass - filtered white noise-2 to 9 dB SPL typical detection levels span)

    Wide-band noise levels in listening rooms 20-35 dBA SPL

    Available transducers and equipment - dynamic rangeMicrophones 110-115 dB with A/D incorporated 120-125 dBA/D convertors 115-130 dB

    Reproduction system limitations- dynamic range

    D/A convertors >110 dBPower amplifier 110-120 dB Loudspeakers (1m peak outputs) consumer 112-120 dB SPL professional 128-131 dB SPL

    Figure 2 The dynamic range values in a complete audio signal chain Unfortunately, many historic recordings were recorded with very limited audio bandwidth and high noise floor. Even so, any digitization must use the best-possible signal chain to capture and preserve as much information as possible. This is a more prudent approach because in any archival-conversion project the cost of digitization equipment is trivial, compared with the cost of labour. An archival conversion signal chain must provide very high fidelity. But, only an ideal converter has no sound of its own. Most converters are certainly not transparent. Only the best converters can approach transparency. The factors that influence the A/D converter fidelity (sampling frequency, quantization word length, dither, converter chip architecture, converter component design, input audio preamplifier and signal levels) have to be evaluated using electrical measurements completed with subjective evaluations and listening tests.

    7/29

  • 3.1 Recommended Values 3.1.1 Sampling frequency: 96 kHz, 192kH IASA-TC 0412, the CLIR/LC document13 and Ken Pohlmann article on converters15 recommend higher sampling rates than 44.1 kHz for several reasons:Many musical instruments are capable of producing information in higher frequency rangesincluding inaudible higher frequency harmonic content that also impacts our perception of sounds: a cymbal might have response of 90 dB SPL (sound pressure level) beyond 60 kHz, and a violin might have content beyond 100 kHz; The binaural time response leading to improved imaging in multichannel recordings (a 15-S difference between the pulses can be heard, being a time difference shorter than the time between two samples at 48 kHz, 22.7 S being at 44.1 kHz and 5.2 S at 192 kHz);

    The temporal response as the musical instruments can generate transients with rise times of less than 10 S and some reverberation might comprise arrivals spaced regularly at less than 2 S time interval; The filter (anti-aliasing) and signal processing performance as a lower order slope might be employed, providing improved time-domain response; It is important to accurately capture noise, such as clicks and pops on a disc, and other inaudible, high frequency information so that improved signal processing algorithms in the future that are able to take advantage of higher frequency information will have enough data to work as effectively as possible. Some of this noise resides in frequency ranges higher than can be captured at 44.1 kHz. In accordance with these arguments, we can present the step responses (Figure 3) and impulse responses (Figure 4) analysing (with Filter Design and Analysis Tool user interface, in MATLAB workspace) a digital Finite Impulse Response (FIR) filter, used in most ADC.

    0.5 0.6 0.7 0.8 0.9 1

    0

    0.2

    0.4

    0.6

    0.8

    1

    Time (mseconds)

    Am

    plitu

    de

    Step Response, Fs=48000Hz, Fpass=20000Hz

    0 0.05 0.1 0.15

    0

    0.2

    0.4

    0.6

    0.8

    1

    Time (mseconds)

    Am

    plitu

    de

    Step Response, Fs=96000Hz, Fpass=20000Hz

    Figure 3 Step responses of an equiripple FIR filter for two different sampling frequencies:

    48 kHz and 96 kHz

    8/29

  • 0 f (Hz)

    Mag. (dB)

    Apass

    Astop

    |Fpass

    |Fstop

    Fs/2

    0

    0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0

    -1 2 0

    -1 0 0

    -8 0

    -6 0

    -4 0

    -2 0

    0

    F re q u e n c y (k H z)

    Mag

    nitu

    de (d

    B)

    M a g n i tu d e R e s p o n s e (d B )-L o w p a s s E q u i r ip p le F IRF s = 1 9 2 0 0 0 H z, F p a s s = 2 0 0 0 0 H z

    F s to p = 2 4 0 0 0 H zF s to p = 4 8 0 0 0 H zF s to p = 9 6 0 0 0 H z

    0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9

    -0 .2

    -0 .1

    0

    0 .1

    0 .2

    0 .3

    0 .4

    0 .5

    0 .6

    0 .7

    0 .8

    T im e (m s e c o n d s )

    Am

    plitu

    de

    Im p u ls e R e s p o n s e -L o w p a s s E q u i r ip p le F IRF s = 1 9 2 0 0 0 H z, F p a s s = 2 0 0 0 0 H z

    F s to p = 2 4 0 0 0 H zF s to p = 4 8 0 0 0 H zF s to p = 9 6 0 0 0 H z

    Figure 4 Impulse responses of an equiripple FIR filter for different attenuation slopes

    3.1.2 Quantization Word Length: 24 bits The word length of the converter describes the length of the output digital word and hence the number of bits used to represent the amplitude of the audio samples.

    9/29

  • An ideal ADC (without room for any internal noise) has noise spread over the band from DC to the folding frequency and can be determined using the following equation:

    ( ) 10log 3.01 6.02 FS1/ 2

    BB

    s

    FIdeal noise DCto f n dBF

    = +

    H(f)

    f

    Sa(t) Sq(n)

    This equation is based on several assumptions: A linear model of quantization (sampling rate satisfying the sampling theorem, i.e. the

    signal being sampled at least twice the highest frequency in the input signal Fs>2FB); B The noise of analogue to digital conversion is, mainly, due to the error sequence of

    quantization process (with q quantization step size and n bits resolution); The error sequence is a stationary, random process, being uncorrelated with itself and the

    input Sq(n); The quantization error q is uniformly distributed over a quantization step:

    1/ / 2( )

    0 /

    q

    q

    q

    q qp

    q

    = > 2

    The quantisation noise power is given by16:

    { }2

    2 22

    1( )12 3 2q q nqE = = =

    .

    In the digital domain the signal levels is expressed relative to digital full scale, as it was defined in AES 1717: the level of the sine wave that has peak level equivalent to the maximum positive value:

    ( ) cos(2 / )a vS t U t T= Fitting U , the average power of the reference signal is 1v =

    { }2

    2 2 1( ( ) ( ) )2 2av

    s a aUE S t S t = = =

    Accordingly, the signal to noise ratio inside of the convenient audio bandwidth being:

    This is the available when the quantizer, a nonlinear device, behaves in a statistical sense likes IUDN (Independent Uniformly-

    istributed Noise). So, although quantization acts nonlinearly on signals, it acts linearly on their robability densities.

    2

    10 102

    1/ 2 1/ 210log 6.02 1.76 10log FSaS s sB B

    F FSNR n dBF F

    = + + a

    linear device, the quantization noise being modelled aDpThe quantizer is then a source of additive noise whose statistical properties are known and fixed: mean = 0, variance = q2/12, uncorrelated with the quantizer input.

    10/29

  • If the quantizer input Sa has a PDF (probability density function) that does not satisfy any of the

    dither signal d to the quantizer put. This usually means that: each dither sample is produced by a pseudorandom number

    creased noise power due to dither signal.

    quantizing theorems, the quantization noise will not have properties like IUDN. These properties can be obtained by the addition of a suitably designed independent ingenerator, and a D/A converter is used to convert the number to an analog level to be added to the input of the quantizer before quantization. The total output noise q+d should be independent of the quantizer input Sa (or q+d uncorrelated with Sa) in order to satisfy the ideal objective for a linear quantizer device. So, the quantizer would be linearized by the dither, and the IUDN model would prevail. The price paid using this technique is the inUsing, for example, a Gaussian dither, whose standard deviation is q/2, the noise of ideal converter will be increased by

    2 2 2q q q 6dB+ = + . 12 4 3

    Or, using a triangular dither, whose amplitude range is +/-q, the total output noise power will be 2 2 2

    4.7712 6 4q q q dB+ = +

    As, from the statistical point of view of second-order moments, the triangular probability distribution function (TPDF) dither ensure the desired behaviour of the IUDN model much better,

    using this dither, the ideal signal to noise ratio of the converter will became: 2

    10 102

    1/ 2 1/ 210log 6.02 3.01 10log FSaS s sB B

    F FSNR n dBF F

    = +

    Howe this i o the con erter er

    However, a well-designed 24 oor that lies at the limits of udibility offering the potential for the requested highest fidelity of a complete audio signal

    ng the converter resolution required for transparency could be made easier

    tly. Loudness, for example, unlike electrical level, is subjective.

    Number of bits

    v r,ev

    s a theoretical figure. A more effective measure of the converter quality, due trors, is ENOB (effective number of bits) where

    ENOB = (dynamic range 1.76)/6.02 For example, a 24-bit converter with a measured dynamic range of 125 dB provides only 20.5 bits of resolution.

    -bit converter will provide a noise flachain. The debate regardiusing some statistics about the human hearing sense. Listeners weigh the determining factors, sound pressure level, frequency contents, and duration, differen

    Fs = 44.100 Hz Fs = 48.000 Hz Fs = 96.000 Hz Fs = 192.000 Hz

    16 93.73 dBFS 94.10 dBFS 97.11 dBFS 100.12 dBFS 24 141.89 dBFS 142.26 dBFS 145.27 dBFS 148.28 dBFS S ADC SB

    to-peak), in unweighted bandwidth (20.000 Hz) measurement conditions NR of a ideal (with unshaped TPDF dither of 2 L s amplitude peak-

    11/29

  • Our sense of hearing assesses loudness by how the cilia and corresponding auditory nerve fibres are excited in the basilar membrane in the inner ear. This excitation is distributed on the

    ne adds up to the total loudness.

    ith a 1 kHz tone.

    membrane by frequency bands, forming a kind of biological spectrum analyzer. Each frequency excites a certain zone on the basilar membrane and each excited zo

    The Fletcher/Munson curves were constructed by subjective responses to sinusoidal tones presented frontally. The phon values were defined by the 1 kHz sinusoidal tones, measured in dB, the levels giving the name of the phon curves. For example, the 40 phon curve has 40 dB intensity w

    Several corrections to the Fletcher/Munson were done and included in ISO 226, as a standard for the hearing threshold of sine waves under free-field conditions, and modified to diffuse-field conditions by the ISO 454.

    16bit-A/D : dynamic range

    24bit-A/D : dynamic range

    8 extra bits for guard-band computer errors

    32bit-DSP : dynamic rangeThreshold of pain

    Music - Approx. range

    Speech - Approx. range

    Figure 5 Equal loudness contour as described by ISO226 versus Dynamic range of high

    quality audio A/D converters and DSPs

    12/29

  • The analysis18 of the sound levels of acoustic noise (taking care of the ability of the listeners to detect noise, 3.8 dB SPL being just audible level of white noise), and the sound level of music (taking care of 120-129 dB SPL peak levels of some music performances) give us the figure of the necessary dynamic range: 122-124 dB (Figure 5). Accordingly, if a digital system produces processing artefacts, which are above the noise floor of the input signal, then these artefacts will be audible under certain circumstances. The archival conversion of old recordings signals, with low intensity or limited frequency content (Figures 6 a) and b)), should be followed by digital processing designed to prevent processing noise from reaching levels at which it may appear above the noise floor of the input and hence becoming audible.

    Year Old recording medium dB Frequency bandwidth (Hz) 1897 Shellac Discs 28 168-2.000 1931 Vinyl long play records 60 30-10.000 1944 Decca FFRR (Full Frequency Range

    Recordings) 60 10-15.000

    Table 1 Dynamic range and frequency bandwidth of gramophone discs

    0 5 10 15 20-180

    -160

    -140

    -120

    -100

    -80

    -60

    -40

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Power spectrum estimate - Example1: old gramophone disc, specific background noise

    HammingKaiserChebyshev

    0 5 10 15 20

    -200

    -180

    -160

    -140

    -120

    -100

    -80

    -60

    -40

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Power spectrum estimate - Example2: old gramophone disc, specific background noise

    HammingKaiserChebyshev

    a) b)

    Figure 6 - Specific background noise of old gramophone discs (two examples) It is important to quantize with a word length that is relatively longer than what may be

    mediately required. The larger dynamic range provided by recommended 24-bit word length supplies greater headroom, which makes level setting less critical. im

    13/29

  • Also, a well-designed 24-bit converter will offer the pote or th of a l l chain, providing a noise floor th es at the limits ty (Figure 7

    ntial f e requested highest fidelity comp ete audio signa at li of audibili).

    0 10 20 30 40 50 60 70 80 90-220

    -200

    -180

    -160

    -140

    -120

    -100

    -80

    -60

    -40

    -20

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Power spectrum estimate - Musical modern recording fragment, 24bit, Fs=192.000Hz

    HammingKaiserChebyshev

    0 5 10 15 20-200

    -180

    -160

    -140

    -120

    -100

    -80

    -60

    -40

    -20

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    =192.000Hz

    Power spectrum estimate - Musical modern recording fragment, 24bit, Fs

    HammingKaiserChebyshev

    a) b)

    Figure 7 Fragment of recent piano recording, made with extremely low self-noise microphone and 24 bit (192 kHz sampling) digital recorder

    - Large bandwidth power spectrum estimation; the bandwidth is limited at Nyquist frequency (half the sampling frequency); - Enlarged part of the above power spectrum estimation including only frequencies up to 25 kHz

    In order for the DSP to maintain the SNR established by the A/D converter, all intermediate DSP calculations require the use of higher precision processing. The digital processing, as you could see in next figure, decreases useful worth length, effectively, because, the cascading mathematical operations, truncation and rounding add error to the least significant bit (LSB).

    u1+x1

    z z z-1 -1 -1

    A/D

    u1x2

    x1

    u1x2

    x1

    u1x2

    x1

    z-1

    u1x2

    x1

    b0 b1 b2 b3 bm

    u1 +x1

    D/A

    Sa(t)

    yn(t)

    Sa(t)

    xn(t)+eq+d

    ep

    Saturation effects

    Saturation effects

    MemoryRounding/Truncation ~yn(t)

    precision

    Saturation effects

    Saturation effects

    Arithmetic precision

    Arithmetic precision

    Arithmetic precision

    Arithmetic precision

    Arithmetic ep ep ep e

    i

    p

    s s s s

    er/t

    eo

    e

    Figure 8 Error sources in a digitisation operation including a FIR filter processing

    e e e e

    14/29

  • 4 THE TRADE-OFFS OF HIGH SAMPLING FREQUENCIES IN TYPICAL REAL DIGITAL SYSTEMS

    4.1 Preliminary considerations The non-linear phase distortion caused by the anti-aliasing filter may create harmonic distortion and audible degradation. Since the analog anti-aliasing filter is the limiting factor in controlling the bandwidth and phase distortion of the input signal, a high performance anti-aliasing filter is required to obtain high resolution and minimum distortion. While a Nyquist-rate A/D converter performs the quantization in a single sampling interval to the full pre uses a sequence of coarsquan

    cision of the converter, an oversampling converter generally ely tized data at the input oversampling rate of 12ms BF F

    += (m being the doubling factoed by a digital-domain decimation process to compute a more

    r of frequency) follow precise estimate for the Nyquist samplers. Regardless of the quantization process, the oversampling has immediate benThe ing and special filtering, designed to shape away the noise from passband, are the

    ey elements of sigma-delta modulation.

    the analog input at the lower output sampling rate, Fs, which is the same as used by

    efits for the anti-aliasing filter. oversampl

    kThe general formula for the SNR19 of an ideal sigma-delta modulation Nth order converter is:

    2 (2 1)

    10 102 2

    (2 1)210log 10logaN m

    SN

    NSNR

    ++= +

    + + + + +

    106.02 1.76 10log (2 1) 9.94 3.01(2 1) FSn N N N m dBIn practice, of course, no actual realization can achieve this theoretical performance (167.15 dBFS, for example, is estimated SNR in case of sigma-delta modulation 5th order, 64 x oversampled 1-bit A/D converters). Example 1: Specific components

    The CS5381 is a complete analog-to-digital integrated circuit converter for digital audio systems, designed by Cirrus Logic. The CS5381 uses a 5th-order, multi-bit delta-sigma modulator followed by digital filtering and decimation, which removes the need for an external anti-alias filter. Designed for audio systems requiring wide dynamic range, negligible distortion and low noise, such as A/V receivers, DVD-R, CD-R, digital mixing consoles, and effects processors, the CS5381 has the following main features20:

    24-Bit conversion 120 dB dynamic range -110 dB THD+N Supports all audio sample rates including 192 kHz

    xample 2E : Stand alone equipment

    The Apogee AD-8000 and the RME ADI-8 DS eight-channel converters, very well appreciated in this high-end sector, have following specifications21:

    15/29

  • Apogee AD-8000 Channel 1-8

    RME ADI-8 DS Channel 1-8

    SNR (dB) rms unweighted 107109 113.5 SNR (dB) rms A-weighted 108113 117 Frequency response Fs=44.1kHz

    (3 dB)

    Fs1=44.1 kHz

    20.72 kHz (0.4 dB) or,

    10 Hz (0.1 dB) 20.81 kHz (0.4 dB);

    Fs2=88.2 kHz Fs3=96.0 kHz

    10 Hz (0.1 dB)

    21.44 kHz

    41.01 kHz (0.4 dB) or, 44.67 kHz (0.4 dB); 21.44 kHz (3 dB) or, 42.89 kHz (3 dB) or,

    46.52 kHz (3 dB). THD+N (dB) -105 -107

    Joshua D. Reiss, in his recent, already cited article Understanding sigma-delta modulation: the solved and unsolved issues, described several limitation of the practical sigmadelta modulation: limit cycles, idle to es, harn monic distortion, dead zones, noise modulation, andstability.

    Definitions included in the cited article Conclusions about these issues Limit cycles: the occurrence of a repeating It may be considered a mostly solved problem. sequence in the output bitstream, for audio applications, being possible audible artefacts. Idle tones: a discrete peak in the frequency It is no theoretical basis for these well-defined spectrum of the output of a convertesigmadelta

    r with m

    a background

    and simple relationships between the input odulation, but superimposed on

    of noise. signal and the frequencies of the tones that have been observed.

    Harmonic diunwanted harsignal and thorelationship t

    stortion: peaks that are due to monics or aliasing of the input

    It is not well-understood phenomenon, but clearly related with idle tones.

    se that bear no apparent o the input frequency.

    Dead zones: sigmadelta maverage output value

    a range of input for which the odulato same

    It is without reported problems in high order or commercial designs. r may produce the

    . Noise modulation: tpower depends on thperceived after the qu

    There is no well-established theory even for odulators.

    he quantization noise e signal and it can be antization of audio

    low-order sigmadelta m

    signals. Stability: with given initial conditions and constant input, the stable behaviour of the higher order sigmadelta modulators

    It is necessary a better understanding of stability problematic as far as robust, high perform

    converter is questionable. ance implementations should be

    developed. Although, the dithering technology could be an

    16/29

  • effective solution for al s not ted for low bit quan issues,

    decreases the stable range of a sigmadelta ulator.

    l above issues, it iindica tizer stabilityas it mod

    Further, the evaluation of the limiting factors in typical real digital systems should be done in conjunction with the objective of audio preservation in a present practical Reproduction bandwidths (greater then 20 kHz) offered to the consum

    specification; Recording, transmission/stora ion and Old and new sound carriers (SACD, DVD-A, HD DVD, Blu-ray HD) in relation with the

    generally limited bandwidth of available sound reproducers.

    perspective22: er as higher fidelity

    ge resources, amplificat sound radiation aspects;

    Promoted technical specifications Established consensus, perception tests Expanded high frequency limit of

    io chain, up to two and half more octaves than 20 kHz

    0 kHz as a single noise lik ; High intensity sound above 20 kHz may be

    the audThe sound is perceived, via bone conduction, up to 10 e pitch

    perceived as pain; The propagation in air is less directive and increasingly lossy at higher frequency.

    SACD and DVD-A carriers are capable of 100kHz replay

    Significant ultrasonic noise which accompanies the noise shaping D/A converters requires low p less than 50 kamplifier.

    bandwidth; HD DVD and Blu-ray have higher storage capacity with potential for multiple wide band audio channels.

    ass filter restricting bandwidth (to Hz) before the signal reach the end audio

    Higher frequency or extra bandwidthreproducers

    s

    Mamo

    amelLoudspeakers designers have to overpass the conflicting requiremnecessary sensitivity in opposition with reduced d

    characteristics; T ar se ts of higher frequency range only to those sounds with direct path to the entrance of the ear canal.

    any elements of the replay channel (decoders, plifiers) have low pass filters at 20-25 kHz,

    me of them (switching technology power plifiers) to combat their tendency for

    ectromagnetic radiation;

    ents at higher frequency: the

    iaphragm area, imposed by the directivity and continuous response (without high Q resonances)

    he room behaviour (more absorbed) and the ensitivity (more directional) restrict the benefi

    Super-tweeters T be acmodulation in the common audible range. The commercial prommin

    he extended response performance shouldhieved and validated without any inter-

    oted add-on tweeters and atching crossovers seem to have this convenient effect, more or less subtle.

    17/29

  • Tuphysiological effects. Further work suggested that the previous reported phenomenon had been

    ic

    firmly, audible energy band from the inaudible, nt,

    he extensive brain scanner investigation23 with ltrasound stimuli noticed quite complex

    in relation with a body exposed to the ultrasonsound field, not just the ears. But, other different investigations24 separating

    using very steep band filtering in the experimereiterated that 20 kHz is entirely sufficient for sound reproduction.

    4.2 Th ng fil

    Anti-alia are perform erters subsystems, by gentle, non-critical, analogue lo ith an oversampled converter and high order digital finite imp lse stages d sha cuThe FIR filter can be designed with exact linea n relation . The min r i three parameters: the transition region width, mAcc rdi s directly ing digital filter speci lar frequency

    e benefits of high samplitering design s and anti-image filtering

    rate in anti-alias and anti-image

    ed, in almost all audio A/D and D/A convw-pass filter of low order in conjunction wbrickwall filter. The digital filter, using a

    urp

    response (FIR), of one or more t-off.

    , permits the performance of necessary, requeste

    r phase and the filter structure is always stable i

    s related to

    with the quantized filter coefficients

    imum length of an FIR low pass filteaximum pass-band error (ripple) and m

    ngly with several authors estimationinimum stop-band rejection.

    , the minimum value of the filter order N comefications: normalized passband, edge angu

    o 25from follow

    , normalized stopband edge angular frequency s , peak passband ripplep p , and peak stopband ripple s . Kaiser, for example, developed a rather simple approximate formula:

    ( ))(

    10 13

    / 2p s

    s p

    20log

    14.6N

    tion band will imply a very long length FIR filter, r length FIR filte

    In this context, a sharp cutoff or a narrow transiwhereas a wider transition will involve a shorteParks and Burrus26 proposed the following alter

    r. native formula for very wide band filter case: ( )

    ( )10 5.94

    / 2p

    s p

    20log

    27N

    +

    The esti ter order is more depe e. The passband response, especially, should ap that minimises the maximum distortion of the real fi

    mation of the fil ndent of the passband ripple in this circumstancproximate the ideal of being flat in a way lter response.

    18/29

  • In consequence, filter design algorithms rely on iterative optimization techniques in order to minimize the error between desired frequency r Equiripple linear-phase FIR filter design has b the classic work by McClellan and Parks. The basic idea included in the Parks-McClellanvalue of the weighted error given by the differe onse of the digital transfer function (designed response,

    esponse and that of the DSP generated filter.

    ecome a mainstay of FIR filter design after

    algorithm27 is to minimize the peak absolute nce between the frequency resp

    ( )jH e ) aresponse, (

    nd the desired frequency response (ideal )jD e ), according to following equa

    j jW e H e tion:

    ( ) 0jD e for( ) ( ) ( ) ency response of the

    = The linear-phase property ensures that the frequ filter can be written28:

    ( )( ) ( ) ( ) exp , : :j p ppH e H H j a b a b real constant coeficients and H R R = = + ssed sine

    e (designed with REMEZ algorithm, in Signal

    as a phase factor (linear-phase) in cascade with a real frequency response which can be expreas the sum of cosines. The sum of cosines term in turn can be expanded as a sum of copowers, i.e. a Chebyshev polynomial in cos() . With this decomposition, algorithms such as the Remez exchange procedure can be used to design optimal min-max approximations to a desired response. In concordance with above design idea, the filter passband response (and similar the stopband) can be considered as the desired flat response with additional error response.

    ext figures illustrate the possible responsNProcessing Toolbox from MATLAB workspace) of some high-end equipment, when anti-alias and anti-image, equiripple linear-phase FIR filters are used:

    19/29

  • 0 5 10 15 20

    -120

    -100

    -80

    -60

    -40

    -20

    0

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=48kHz

    Frequency (kHz): 21.41309 Magnitude (dB): -3.016742

    Lowpass EFIR 118 taFrequency R

    quiripplep

    esponse0 10 20 30 40

    -120

    -100

    -80

    -60

    -40

    -20

    0

    Magnitude Response (dB), Fs=96kHz

    Frequency (kHz): 21.41602 Magnitude (dB): -3.080036

    Freque (kHz)

    B)

    Mag

    nitu

    de (d

    Lowpass EquirippleFIR 147 tap Frequency Response

    ncy

    a) d) -3 -3

    0 5 10 15 20-1

    -0.5

    0

    0.5

    1x 10 Magnitude Response (dB), Fs=48kHz

    0 5 10Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Lowpass EquirippleFIR 118 tap Passband Magnified

    15 20-1

    -0.5

    0

    0.5

    1x 10 Magnitude Response (dB), Fs=96kHz

    Frequency (kHz)

    (dB

    )M

    agni

    tude

    Lowpass EquirippleFIR 147 tap Passband Magnified

    b) e)

    23 23.2 23.4 23.6 23.8 24

    -180

    -170

    -160

    -150

    -140

    -130

    -120

    -110

    -100

    Magnitude Response (dB), Fs=48kHz

    25 30 35 40 45

    -180

    -160

    -140

    -120

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Lowpass EquirippleFIR 118 tap Stopband Magnified

    -100

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=96kHz

    Lowpass EquirippleFIR 147 tap Stopband Magnified

    c) f) Figure 9 - FIR filter specifications for 48 kHz sampling rate (a, b, c), in conjunction with critical, analogue low-pass filter of high order; the same specifications (d, e, f) for 2x oversampling equivalent filter, in conjunction with gentle, analogue low-pass filter of lower order

    20/29

  • 0 10 20 30 40-140

    -120

    -100

    -80

    -60

    -40

    -20

    0

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=96kHz

    Frequency (kHz): 29.80078 Magnitude (dB): -3.07245

    Lowpass Equiripple FIR 22 tap Frequency Response

    0 20 40 60 80

    -120

    -100

    -80

    -60

    -40

    -20

    0

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=192kHz

    Frequency (kHz): 29.90625 Magnitude (dB): -3.001046

    Lowpass EquirippleFIR 41 tap Frequency Response

    a) d)

    0 5 10 15 20-1

    -0.5

    0

    0.5

    1x 10

    -3

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=96kHz

    Lowpass EquirippleFIR 22 tap Passband Magnified

    0 5 10 15 20-1

    -0.5

    0

    0.5

    1x 10

    -3

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=192kHz

    Lowpass EquirippleFIR 41 tap Lowpass Magnified

    b) e)

    44.5 45 45.5 46 46.5 47 47.5 48

    -170

    -160

    -150

    -140

    -130

    -120

    -110

    -100

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=96kHz

    Lowpass Equiripple FIR 22 tap Stopband Magnified

    50 60 70 80 90-180

    -170

    -160

    -150

    -140

    -130

    -120

    -110

    -100

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    Magnitude Response (dB), Fs=192kHz

    Lowpass Equiripple FIR 41 tap Stopband Magnified

    c) f) ure 10 Gentle, digital low-pass filters wit ery small errors in the 20kHz band using

    additional error response as a constant ripple. This error can be approximated by osinusoidal shape in frequency domain, indicating pre and post-echoes in the time domain.

    The above figures show echo amplitudes less than 80dB and timing variations of between 0.1ms (approximated at 192kHz sampling rate) and 1.2ms (at 48kHz sampling rate). However, these values are far away from those that were found to be quite perceptible by untrained listeners (-30 dB at +/- 40ms).

    Fig h vhigh frequency sampling: 96kHz (a, b, c) or 192kHz (d, e, f) The passband response of this kind of digital filter is not ideal flat in an obvious manner, having specificc

    21/29

  • Taking care of the interest in the growing requirement for restoration of degraded sources to get improved resolution of the impulsive signals and an improved perception of musical transient attacks passages, it is recommended to repeat perception experiments noticing the difference between 48kHz and 96kHz or 192kHz in localisation accuracy with available real-less ideal filters. The real anti-alias and anti-image filters should develop, inside of a more or less large transition region, the full attenuation of the filter in order to avoid alias or image specific distortions. And, in accordance with this principle, for systems operating at low sampling frequency and requiring small transition region (0.45Fs to 0.5Fs), it is very difficult to achieve the desired

    formance even with highest performance integrated circuits and filters design.

    4.2.1 The effect of aliasing during digitization process The aliasing caused by the reflection of the spectrum of the audio signal about the folding frequency (0.5Fs) during sampling process in an analogue to digital conversion process produces frequency shifted signal in the audio band. The poor rejection of the alias components in the transition region (above 19-20kHz, for example) could involve low direct effects, being inaudible for most of listeners. But, any intermodulation mechanism, likely to happen inside following stages of processing and reproduction system, could provoke, at lower frequency, audible frequency distortion.

    alias signal, consequently, will modulate with harmonics of the original signal generating a-harmonic signals as intermodulation distortion. The solution is to have full attenuation at the half of the sample frequency. On the other side it is necessary to have as wide frequency response as possible for different sampling rate applications. For example, most implemented digital filters as anti-alias filters, in A/D conversion, using 44.196kHz sampling rates, start at 45% and have full attenuation at 55% of the sample frequency (Table 1). Parameter Min Typ Max Unit

    per

    The the

    Single Speed Mode (2 kHz sample rates) to 50 kHzPassband (-0.1 dB) 0 - 0.47 FsPassband Ripple - - +/-0.035 dB Stopband 0.58 - - FsStopband Attenuation -95 - - dB Total Group Delay (Fs = Output Sample Rate) tgd - 12/Fs - s

    Dual Speed Mode (50 kHz to 100 kHz sample rates) Passband (-0.1 dB) 0 - 0.45 FsPassband Ripple - - +/-0.035 dB Stopband 0.68 - - FsStopband Attenuation -92 - - dB Total Group Delay (Fs = Output Sample Rate) t - 9/F - s gd s

    Quad Speed Mode (100 kHz to 200 kHz sample rates) Passband (-0.1 dB) 0 - 0.24 Fs

    22/29

  • Passband Ripple - - +/-0.035 dB Stopband 0.78 - - FsStopband Attenuation -97 - - dB Total Group Delay (Fs = Output Sample Rate) tgd - 5/Fs - s Table 2 - Digital filter characteristics of CS5381 (120 dB, 192 kHz, multi-bit audio A/D onverter), Cirrus Logic -Product information

    e and

    ble frequency omponents, so called Aliasing Intermodulation Distortion.

    two quarter inch microphones, a Bruel&Kjaer 4135 odel and, the other, an Aco/Pacific 7016 model).

    c Above exemplified filter, at 48kHz sampling frequency, offers 22.5kHz as passband edg27.5kHz as the end of transition region to the stopband full attenuation. In this case, the a-harmonic mirrored frequencies: Fs-f (where f > 0.5Fs), reproduced in a loudspeaker, could intermodulate with the audible signal and create, new audic James Boyk carried out, in 1992-1997, measurements of several instruments, mainly in the Music Lab at California Institute of Technology, capturing their ultrasonic extension and energy29 (with a Hewlett Packard 3567 FFT analyzer andmRegarding these aspects, he gave interesting information about the highest frequency where the harmonics are still present (Table 3, for instruments with harmonics) and about the highest frequency where the sound level is, at least, 10dB above background (Table 4, for instruments without harmonics) Instrument with

    harmonics SPL (dB)

    Harmonics still prese

    Percentage of nt power above

    20 kHz 20kHz 1. Trumpet (Harmon 96 >50kHz 0.5%

    mute)

    2. Trumpet (Harmon 76 >80kHz 2%

    mute)

    3. Trumpet (straight 83 >85kHz 0.7%

    mute)

    4. French horn (bell up) 113 >90kHz 0.03%

    5. French horn (mute) 99 >65kHz 0.05%

    23/29

  • 6. French horn 105 >55kHz 0.1%

    7. Violin (double-stop) 87 >50kHz 0.04%

    8. Violin (sul ponticello) 77 >35kHz 0.02%

    9. Oboe 84 >40kHz 0.01%

    Table 3 Frequency extension and ultrasonic energy of some instruments with harmonics

    background 20 kHz 20kHz

    Instrument without

    harmonics SPL (dB)

    Sound level: 10 dB above

    Percentage of power above

    1. Speech Sibilant 72 >40kHz 1.7%

    2. Claves 104 >102kHz 3.8%

    3. Rimshot (jazz music) 73 >90kHz 6%

    4. Crash Cymbal 108 >102kHz 40%

    5. Triangle 96 >90kHz 1%

    6. Keys jangling 71 >60kHz 68%

    24/29

  • 7. Piano 111 >70kHz 0.02%

    Tab nsion u energy instruments without harmonics This evidence is not a confirmation for the ultrasound perception abilities, but it could be the

    w und lity that might interfere, indirectly, with the recordingreproducing process. There are areas where the desired quality of audio restoration process is strong related with previous signal enhancement, due to very poor high frequency response of most early recordings.

    th , the high frequency infor recorded signal being buried deep in noise, it is important to predict these low level components using an adequate model, including frequency characteristics of instruments. Even we ignore the frequency extension and ultrasonic energy of instruments, the non-linear b

    p ans the nonlinearities in the s nal of ud g a generating modulation between f mponen th ase during the evaluation

    c estoration work.

    4.2.2 The effect of imaging during audio signal reproduction Even though the audibility and relevance of signals above 20 kHz is matter of further debates, all

    .5 ially, f er sample rate Fs, could provoke tefacts in audio band.

    It is necessary to take into consideration, once again, the potential non-linear behaviour of the electronic and electromechanical stages following the digital to analogue conversion.

    c hig p ponents (bellow the Fs), having im Fs, (more or less attenuated by

    image filter of D/A converter), should be evaluated in correlation with specific non-linearity in amplifiers, loudspeakers or other parts of the system.

    To maximise archiving quality, interrelated with necessary conditions for further restoration and post-production activities, several investigations (objective analysis and subjective listening tests) have to be done:

    v eeters response in order to evaluate their significant amounts of intermodulation products, below 20kHz, when driven by ultrasonic signals;Of amplifiers that can produce distortion products below 20kHz, audible (even with difficulty), in the absence of other signals below 20kHz.

    e ing h ic and intermodulation distortion measurement numbers in the context of their effects perception. They remain purely mathematical relationships without any further consideration for the characteristics of the receiver the human ear.

    le 4 - Frequency exte and ltrasonic of some

    kno ledge of the ultraso rea

    In is case mation of

    ehaviour of the stages following the digital to analogue conversion could cause intermodulationistortion artefacts. d

    So, the poor rejection of the alias com the lots of

    onents in the trspeakers bein

    e signal) incre

    ition region and good example, th ude

    ig path (the behaviourrequency co e incertit

    pro

    ess of the audio r

    images above folding frequency (0distor

    Fs), espec or lowtion ar

    Achalf sam

    ordingly, the effects ofple frequency, 0.5

    h am litude and frequency input signal comage components above 0.5

    Of arious tw

    Th sound systems quality should be judged us armon

    25/29

  • Rea s can have frequen ependent nonlineari st notably loudspeakers, limiting their performance at high amplitudes. Besides, the recent application of psychoacoustics to audio data compression problems demonstrates the dominant role of masking in hearing acuity.

    e of instrument with

    l system cy d ties, mo

    a) Exampl

    ultrasound energy

    order products),

    b) A-harmonic signal as 1% intermodulation product (second

    due to non-linearities in the signal path, when aliasing distortion is present

    0 3.9 10 15 17.9 21.8 24 26.2 30 35 40 45

    -120

    -100

    -80

    -60

    -40

    -20

    0

    Magnitude Response (dB), 2Fs=96kHz, Halfband Anti-Alias Filter

    Frequency (kHz): 26.20313 Magnitude (dB): -21.84183

    Frequency (kHz)

    Mag

    nitu

    de (d

    B)

    f2=26.2kHz, f2=21.8kHz, f1=17.9kHz: IMD (second order intermodulation product)= f2- f1 (~-60dB)

    Aliasing distortio

    Intermodulation distortion

    n

    26/29

  • c) A-harmonic signal as 1% intermodulation

    product (second order products),

    imaging distortion is present

    due to non-linearities in the signal path, when

    0 6 10 15 21 24 27 30 35 40 45

    -120

    -100

    -80

    -60

    -40

    -20

    0

    Magnitude Response (dB), 2Fs=96kHz, Halfband Anti-Image Filter

    Frequency (kHz): 27.01758 Magnitude (dB): -32.00448

    Frequency (kHz)

    agni

    tude

    (dB

    )M

    f2=27kHz (0dB), f2=27kHz (-32dB), f1=21kHz: IMD (second order intermodulation product)= f2- f1 (~-70dB)

    u ated e ects of inadequate low pass filtering in A/D and D/A, 48kHz sampling frequency subsystems, producing frequency shifted signal in the audio band;

    Crash cymbals recording as an example of instrument with ultrasound energy (copyright, James Boyk) Second order IMD as effect of poor alias rejection at close to the 0.5Fs frequency, using half-band anti-alias filter (as decimation filter before the decrease sampling rate stage in an oversampling A/D converter) Second order IMD as effect of poor image rejection at close to the 0.5Fs frequency, using half-band anti-image filter (as interpolation filter after the increase sampling rate stage in an oversampling D/A converter).

    5 CONCLUSIONS The requirements for higher resolution in acquisition of the impulsive signals and better perception of musical transient attacks passages in the restoration of degraded sources activities should be analyzed in the modern surrounding conditions with extended bandwidth, gentle

    proved phase and impulse chaThe effort to increase bandwid ved off-axis response of loudspeaker and better sound quality at higher frequency of those, with diaphragm resonances located well out of audible range. In these conditions the transfer work for digital preservation, interpreted as the creation of a surrogate (as an accurate, authentic, and very high quality representation of the original), could start, identifying all necessary and adequate equipment and operating personnel that could be involved in the preservation system. The evaluation of the factors that influence the A/D converter fidelity described here indicate that, reducing distortion mechanisms by filters designed for higher sampling frequency with

    Intermodulation distortion

    Imaging distortion

    Fig re 11Simul ff

    filtering, im racteristics. th should be correlated with new designing results for an impro

    27/29

  • relax region provement in localisation of sound sources could be made redu ility of the echo30. Analogue recordings, with different audio fidelity peculiarities, should be digitized using a high-qual , trying to minimize the risk of losing information of the original source. For cription, the merits of the audio conversion equip nt, with optimum coverage of th limits, should be considered before any evaluation of the time and effort need sult:

    92 kHz sampling frequency for a wide audio bandwidth, good temporal response, ed low-pass filter ch cteristics;

    word length for a large dynamic range, with more headroom in level setting and good margin for the effects of rounding in subsequent digital signal processing;

    More than one conversion of the same analogue source, using different converters, critically monitoring input and output levels, using high-quality D/A converters, high quality loudspeakers, and ambient room (acoustics) conditions.

    6 BIBLIOGRAPHY AN

    ed transitioncing the audib

    ity A/D convertera good transe human hearinged to achieve the re 96 or 1

    and improv 24-bit

    , an im

    me

    ara

    D REFERENCES 1 Watanabe K., FPC Inc., A Kodak Company, Evolution Availability Longevity, Joint Technical Symposium, 22

    004 W

    3 Br4 Sc(On

    atkinson J., Is digital storage more reliable than analogue?, Resolution, November/December 2002 adley K., Critical Choices, Critical Decisions: Sound Archiving and Changing Technology, 2004 huller, D. Preserving Audio and Video Recordings in the Long-term, International Preservation News, 14, 1997. -line): Hhttp://www.ifla.org/VI/4/news/14-97.htmHhuller, D. Preserving the Facts for the Future: Principles and Practices for the Transfer of Analog Audio uments into the Digital Domain. Journal of the Aud

    5 ScDoc io Engineering Society, 49 (2001), 7/8, 618-621 6 Haand1997 He chive to BWF Online Archive A New Optimized

    orkstation Concept, Journal of the Audio Engineering Society, 49, 7/8, 2001, p. 606-617 Presto Space, Preservation Status, Annual Report on Preservation Issues for European Audiovisual Collections,

    5

    fner, A. The Suedwestrundfunk (SWR) and the Mass Storage Systems in Its Radio Sound Archives: Concepts some Performance/Cost Aspects, 106th Audio Engineering Society Convention, Munich, Germany, May 08-11, 9 rla, S., Houpert J. and Lott, F. From Single-Carrier Sound Ar

    W8

    Deliverable D22.4 DIS4, 31/01/2009 Best Practices For Audio Preservation, by Mike Casey, Indiana University and Bruce Gordon, Harvard University, Hhttp://www.dlib.indiana.edu/projects/sounddirections/bestpractices2007/H10 IASA-TC 03: The Safeguarding of the Audio Heritage: Ethics, Principles and Preservation Strategy, Version 3, December 2005, Hhttp://www.iasa-web.org/IASA_TC03/IASA_TC03.pdfH 11 IASA-TC 04: Guidelines on the Production and Preservation of Digital Audio Objects 12Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for Transferring Analog Discs and Tapes, CLIR/LC, NRPB (Council on Library and Information Resources and the Library of Congress under the auspices of the National Recording Preservation Board)

    , 2006). Also available online: ttp://www.clir.org/activities/details/AD-Converters-Pohlmann.pdf

    13 Richard Warren, Jr., Storage of Sound Recordings, ARSC Journal 24, no. 2 (1993) 14 Bradley K., Critical Choices, Critical Decisions: Sound Archiving and Changing Technology, 2004 15 Ken C. Pohlmann, Measurement and Evaluation of Analog-to-Digital Converters Used in the Long Term Preservation of Audio Recordings (roundtable discussion, Issues in Digital Audio Preservation Planning and Management, Washington, DC, March 10-11h . 16 Joshua D. Reiss, Understanding sigma-delta modulation: the solved and unsolved issues, J. Audio Eng. Soc., Vol. 56, No. 1/2, 2008 January/February

    28/29

  • 17 AES17, AES standard method for digital a

    ng. Soc., vol. 46 No. 5, pp. 428-447, 199udio engineering - Measurement of digital audio equipment, J. Audio

    8 May

    22 MartinInst23 Tsuto rnal of Neu24 Nishigcompone25 Sanjit , Second edition, 2001

    Parks T.W. and Burrus C.S., Digital filter Design, Wiley, 1987 27 Parks T.W. and McClellan J.H., Chebyshev approximation for nonrecursive digital filters with linear phase,

    29

    E18 Fielder, L. Dynamic Range Issues in the Modern Digital Audio Environment Proceedings AES UK Conference Managing the Bit Budget, 3-19 (May 1994) 19 Joshua D. Reiss, Understanding sigma-delta modulation: the solved and unsolved issues, J. Audio Eng. Soc., Vol. 56, No. 1/2, 2008 January/February 20 Cirrus Logic - CS5381, 120 dB, 192 kHz, multi-bit audio A/D converter, Advance product information 21 Thomas Sandmann, Comparative test 24-bit-converters Apogee AD-8000 and RME ADI-8 DS, PMA Production Management

    Colloms, Do we need an ultrasonic bandwidth for higher fidelity sound reproduction?, Proceedings of the itute of Acoustics, Vol. 28, Pt. 8, 2006

    mu Oohashi, et al, Inaudible high-frequency sounds affect brain activity: hypersonic effect, Jourophysiology, 83:3548-3558, 2000, http://jn.physiology.org/cgi/content/full/83/6/3548

    ichi et al, Perceptual discrimination between musical sounds with and without very high frequency nts, NHC Laboratory Note no 486, AES 115th Convention 2003

    Mitra, Digital signal processing, a computer-based approach, McGraw Hill26

    IEEE Trans. On Circuit Theory, CT-19: 189-194, 1972. 28 Stanomir D. Discrete signals and systems, Bucharest, Athena, 1997

    Boyk J. Theres life above 20 kilohertz! A survey of musical instrument spectra to 102 kHz, California Institute of Technology, Music Lab, 1997 Hhttp://www.cco.caltech.edu/~musiclabH 30 Dunn J. Anti-alias and anti-image filtering: The benefits of 96kHz sampling rate formats for those who cannot hear above 20kHz, 104th AES Convention, Amsterdam, May 1998

    29/29

    1 INTRODUCTION 2 CONSERVATION, MIGRATION, RESTORATION 3 RECOMMENDED METHODS TO IDENTIFY THE BEST POSSIBLE A/D CONVERTERS FOR CRITICAL AUDIO APPLICATIONS 3.1 Recommended Values3.1.1 Sampling frequency: 96 kHz, 192kH3.1.2 Quantization Word Length: 24 bits

    4 THE TRADE-OFFS OF HIGH SAMPLING FREQUENCIES IN TYPICAL REAL DIGITAL SYSTEMS4.1 Preliminary considerations4.2 The benefits of high sampling rate in anti-alias and anti-image filtering design4.2.1 The effect of aliasing during digitization processMinTypMaxUnit

    4.2.2 The effect of imaging during audio signal reproduction

    5 CONCLUSIONS6 BIBLIOGRAPHY AND REFERENCES