mpeg 4 structured audio: algorithmic sound for the internet and beyond cs division university of...

MPEG 4 Structured Audio:MPEG 4 Structured Audio:

Algorithmic Sound

for the Internet and Beyond

CS DivisionUniversity of California at Berkeleywww.cs.berkeley.edu/~johnw

John LazzaroJohn Wawrzynek

Sep 1, 1999

MPEG 4 Structured AudioMPEG 4 Structured Audio

Outline:Motivation for structured audioIntroduction to MP4-SAExample encodingC translatorPhysical Instrument ModelingHardware ArchitecturesFuture directions

Digital Audio BasicsDigital Audio Basics

How well does this work?True Lossless: 2.5X reduction

Shorten, T. Robinson (Cambridge University) “Perceptually Lossless” : 10X-20X reduction

MP3, Dolby AC3, …

mono: 705.6 kbps Cell-phone network: 5-10kbps dialup modems: 50 kpbs xDSL: 128 to 1000 kbps

time

amp

16-bit samples

44.1kHz sample rate

decoderencoder

Traditional Compression:

The Kolmogorov alternative:The Kolmogorov alternative: Write a computer program that generates the

desired audio stream.

Transmit the computer program.

To decode, execute the program.

MPEG-4 Structured Audio (MP4-SA) uses this approach.

Final draft standard: Nov 15, 1998.

Eric Schierer, Editor (MIT Media Lab).http://sound.media.mit.edu/~eds/mpeg4/

Similar to Postscript!

MP4-SA EncodingMP4-SA Encoding may be a creative act: writing a program.

directly (emacs), or indirectly (GUI, webpage) In this case, MP4-SA is a lossless compressor.

may be automatic -- given a sound, an encoder writes a program that generates the sound. Automatic encoding is a hard problem in the general

case.

MP4-SA DecodersMP4-SA Decoders are interpreters or compilers.

Key Application: Music ProductionKey Application: Music Production Modern Music Production is Computer based.

Musicians enter performances into computers as control information, not audio waveforms.

Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.

“The Program”synthesis algorithmseffects “boxes”mixers

Musical performanceMix-down control information

“The Decoder”sound rendering

MP4-SA Maps to Modern Music Production

Network

Premium onlow-bandwidth

Key Application: Music ProductionKey Application: Music Production Modern Music Production is Computer based.

Musicians enter performances into computers as control information, not audio waveforms.

Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.

“The Program”synthesis algorithmseffects “boxes”mixers

Musical performanceMix-down control information

“The Decoder”sound rendering

MP4-SA Maps to Modern Music Production

Ideal format for collaborative productions, remixes, ...

File System

Standard Framework

MPEG 4 Structured Audio:MPEG 4 Structured Audio:

A binary file format that encodes: The programming language SAOL (say: sail). The musical score language SASL. Legacy support for MIDI. Audio sample data.

Result is normative: an MP4-SA file will sound identical on all compliant decoders.

Different from MIDI files.

MPEG 4 StandardMPEG 4 Standard

Structured Audio: One “component” in the MPEG audio standard.

MPEG 4

audio systemvideo

SA

Natural coding Synthetic coding

AAC T/F CELP Parametric TTS

ISO/IEC 14496-3 sec5


Advanced Audio Coding: successor to MP3, delivers highest quality audio, and highest bit-rate.

MPEG 4

audio systemvideo

SA




Time-Frequency Coding: Meant for a moderate bit/sec range, with moderate quality.

MPEG 4

audio systemvideo

SA




Code Excited Linear Prediction: Low bit rate coder, worksbest as a speech coder.

MPEG 4

audio systemvideo

SA




Parametric coders: Very-low bit rate coder, works best as as a speech coder.

MPEG 4

audio systemvideo

SA




Text-to-Speech: Takes phonetic and prosadic control information, produces syntesized speech.

MPEG 4

audio systemvideo

SA




“System” level includes mechanisms for composing and synchronizing audio (& video) components.

MPEG 4

audio systemvideo

SA



Why SAOL and MP4-SA?Why not Java?Why SAOL and MP4-SA?Why not Java? Musical performance have temporal structure

that changes over several timescales:

Sample-by-sample10’s of usec

Amplitude & timbre envelopes: 10’s of msec

Note-by-note: 100’s of msec

Writing sound generation code in a conventional language results in code dominated by time-scale management. Hard to maintain, hard to optimize.

Time management is built into SAOL.Time management is built into SAOL.

A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.

Work is scheduled to happen:at the a-rate (the audio sample rate)at the k-rate (envelope control rate)at the i-rate (rate for new notes)

Language variables are typed as a/k/i-rate.

A language statement is scheduled based on the rate of the variables it contains.

SAOL, SASL, and Scheduling:SAOL, SASL, and Scheduling:

Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.

A SAOL subprogram (called an instr or instrument) serves as the instrument.

SASL commands (called score lines) act to play notes on SAOL instruments.

Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

Single Note Execution TraceSingle Note Execution TraceSAOL Instruments ... Contains all the instructions for playing a note: -- Code that runs at note launch. (once per i-pass) -- Code that models timbre evolution at the k-rate. (once per kpass) -- Code to generate audio samples at the a-rate. (once per a-pass)

Executing a Note …(k-rate: 4 kHz, a-rate: 40 kHz) time(us) pass 0 i-pass 0 k-pass 0 a-pass 25 a-pass 50 a-pass ... 225 a-pass 250 k-pass 250 a-pass 275 a-pass 300 a-pass ... 475 a-pass 500 k-pass 500 a-pass 525 a-pass ...

An example:An example:

SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)

This SASL file plays melody on tone:

0.5 tone 0.75 52 0.251.5 tone 0.75 64 0.252.5 tone 0.5 63 0.253 tone 0.25 59 0.23.25 tone 0.25 61 0.2253.5 tone 0.5 63 0.2254 tone 0.5 64 0.255 endHow long instrument runs

When instanceis launched

Instance parameters(note number, loudness)

SAOL code for toneSAOL code for toneinstr tone (note, loudness){ ivar a; // sets osc f

ksig env; // env output

asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0);

if (init == 0) // first a-pass only { x = loudness; init = 1; }

x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output

} // end of instr tone







i-rate



asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; }



k-rate

SAOL code for toneSAOL code for toneinstr tone (note, loudness) { ivar a; // sets osc f






a-rate

SAOL: Unique FeaturesSAOL: Unique Features

Rate semantics: i/k/a-rate execution

Vector arithmetic: ex: A=B+C for i=1,n A[i]=B[i]+C[i]

All floating-point arithmetic.

Extensive build-in audio function library:signal generators, table operators, pitch

converters, filters, fft, sample rate conversion, effects, ...

SAOL: Unique FeaturesSAOL: Unique Features

Instrument communication through bus structures:

Dynamic instrument creation and control.

Scheduler and language support for MIDI and SASL scores.

C D

B A

bus

Sfront - a SAOL-to-C translatorSfront - a SAOL-to-C translator

sfrontfoo.mp4 sa.c

Converts MP4-SA files to a C program, that when executed, produces audio.

Runs on UNIX, Win98/NT.Licensed under the GNU public license (GPL).www.cs.berkeley.edu/~lazzaro/sa

sfrontfoo.mp4

SAOL

MIDIUncompressed

samples

SASL

sa.c

Handles SAOL, SASL, MIDI, uncompressed samples.

Sfront BenchmarksSfront BenchmarksSfront version 0.36

Machine: 450 Mhz Pentium III, 128 MB, gcc version egcs-2.91.66, -O3 optimizer

Audio sample rate: 44.1 kHz for all examples

MP3 compression ratio = 11

79829 1353 650960036 2382 44434248846 1711968 2779823 1163 75679823 1119 786942069 1355803 7.61074152 27265 4342052591 316861 71.479824 1390 63380033 1261 698

abar Aluminum bar strike, physical modelbeat FM Models and Algorithmic Compositionelpelele Solo Piano with Multi-Sample Piano Soundflute Blown Pipe Physical Modelmarimba Marimba Physical Modelperc Sampled Drums, Rvrb and Spatializationpc FM Models in Multi-voice Compositionscr1 Piano with Single Samplestring Plucked String Physical Modelvowels Vowel Formant Synthesizer

10 M 2.38 4.13 88013260 S 7.51 11.51 10584380265 S 308 317.88 4684086010 M 0.82 2.23 88013210 M 2.08 3.75 88013259 S 40.6 60.68 1038448467 S 91.4 96.17 11841500128 S 43.2 46.73 2262642810 M 2.37 4.15 88013210 M 0.89 2.35 880264

Sfront Performance Summary:Sfront Performance Summary:

Rendering (file decoding):Current performance: a benchmark suite of

moderately complex MP4-SA streams computes in a time equivalent to the audio it generates, on a 400 Mhz Ultrasparc & 450 Mhz Pentium.

Real-time interaction:with a MIDI keyboard with acceptable latency

(~20 ms) and microphone input.

Interesting Issues:Interesting Issues:

MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good Sampling Natural Instruments bad

If models are chosen carefully, compression ratios of 100 to 10,000 are possible.

Physical Modeling is relatively immature, but holds much promise.

Struck/Plucked Instrument ModelStruck/Plucked Instrument Model

frequency

amplitude

Digital resonator:Yn = Yn-1 + Yn-2 + Xn

output

M1

M2

M3

Mn

striker

linear modes (resonances)attack section

single strike

multiple strikes

Aluminum Bar Sounds

Examples: struck bars, bells, drums, plucked strings

Parameters: striker characteristics, resonator constants

Blown Instrument ModelBlown Instrument Model

Examples: pipes, flutes, etc.

jet

y

x

frequency

amplitude

Parameters: shape of non-linear function, resonator constants

non-linearelement

linear element(resonant modes)

x y

excitation tube

Blown Pipe Sounds

brass pipe

overblown

Physical Modeling SummaryPhysical Modeling Summary

Models instrument not sound.

Advantages over traditional synthesis techniques (FM, sample-based):Compact descriptions. Physical parameterization leads to:

more intuitive controllower control bandwidth

State accurate simulation leads to:efficiency in re-excitationemulation of otherwise missing effects

Ultimately - more realistic sounds.

Physical Modeling Summary (cont.)Physical Modeling Summary (cont.)

Disadvantages: potential for high computational complexity

Approaches: PDE (partial differential equation) approach would

be nice, but probably not practical.ODE (ordinary differential equation, lumped circuit

models) practical and very general. Capture essential physics.

Wave-guide filters provide a more efficient alternative in some cases.

Interesting Issues (cont.):Interesting Issues (cont.):

MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.

A new role for psychophysics: Instead of using psychophysics to squeeze bits

out of a sound representation, MP4-SA decoders will use psychophysics to squeeze FLOPS out of sound computations.

Leverage spectral and temporal masking.

Interesting Issues (cont.):Interesting Issues (cont.):

MP4-SA can be used in a way similar to traditional compression except that the compression method can be ad hoc:Frame-work for experimentation in encoding.Hope for automatic encoding, if done in a voice

specific way: vocals guitar sax and other hard-to-synthesize sounds.

Running SAOL on Conventional ArchitecturesRunning SAOL on Conventional Architectures Lessons Learned from SAOL development:

Temporal typing of variables has the nice side effect of marking the inner loops.Typically, a-rate = 10X to 100X k-rate

A-rate code optimization : moving subexpressions into k-rate or i-rate.

SAOL semantics support a static heap. No recursion, all variables sp floats, no pointers ...

simplifies optimization.Other researchers (Giorgio Zoia - ETH) focusing on

blocking all a-passes for an instance, reducing overhead.

Processors with SIMD FP support (Intel SSE, AMD 3DNow!) will be a good match.

Fixed-Function Hardware for SAOL AcceleratorsFixed-Function Hardware for SAOL Accelerators Unlike MPEG-2 chips, DVD chips, etc., its not

clear how MP4-SA can be accelerated by rolling an ASIC.Since every MP4-SA file is a new algorithm.

Common opcodes can be hardwired and the general characteristics of typical MP4-SA files could be leveraged to specialize a conventional processor design.But the language is only six months old; execution

frequencies are not known.

Reconfigurable computing architectures might hold promise (however, MP4-SA is all floating point).

Directions / Research OpportunitiesDirections / Research Opportunities

Compiler optimizations for: SAOL and other languages with rate semanticshigh-performance SIMD architectures runtime code specialization

Runtime scheduling under limited compute resources.

SAOL programming environments. Physical modeling. Automatic encoding.

mpeg 4 structured audio: algorithmic sound for the internet and beyond cs division university of...

Documents

mpeg audio

final audio

audio waveforms

standardstructured audio

structured audio mp4sa

standardadvanced audio

highest quality audio

audio sample data