mpeg 4 structured audio: algorithmic sound for the internet and beyond cs division university of...
TRANSCRIPT
MPEG 4 Structured Audio:MPEG 4 Structured Audio:
Algorithmic Sound
for the Internet and Beyond
CS DivisionUniversity of California at Berkeleywww.cs.berkeley.edu/~johnw
John LazzaroJohn Wawrzynek
Sep 1, 1999
MPEG 4 Structured AudioMPEG 4 Structured Audio
Outline:Motivation for structured audioIntroduction to MP4-SAExample encodingC translatorPhysical Instrument ModelingHardware ArchitecturesFuture directions
Digital Audio BasicsDigital Audio Basics
How well does this work?True Lossless: 2.5X reduction
Shorten, T. Robinson (Cambridge University) “Perceptually Lossless” : 10X-20X reduction
MP3, Dolby AC3, …
mono: 705.6 kbps Cell-phone network: 5-10kbps dialup modems: 50 kpbs xDSL: 128 to 1000 kbps
time
amp
16-bit samples
44.1kHz sample rate
decoderencoder
Traditional Compression:
The Kolmogorov alternative:The Kolmogorov alternative: Write a computer program that generates the
desired audio stream.
Transmit the computer program.
To decode, execute the program.
MPEG-4 Structured Audio (MP4-SA) uses this approach.
Final draft standard: Nov 15, 1998.
Eric Schierer, Editor (MIT Media Lab).http://sound.media.mit.edu/~eds/mpeg4/
Similar to Postscript!
MP4-SA EncodingMP4-SA Encoding may be a creative act: writing a program.
directly (emacs), or indirectly (GUI, webpage) In this case, MP4-SA is a lossless compressor.
may be automatic -- given a sound, an encoder writes a program that generates the sound. Automatic encoding is a hard problem in the general
case.
MP4-SA DecodersMP4-SA Decoders are interpreters or compilers.
Key Application: Music ProductionKey Application: Music Production Modern Music Production is Computer based.
Musicians enter performances into computers as control information, not audio waveforms.
Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.
“The Program”synthesis algorithmseffects “boxes”mixers
Musical performanceMix-down control information
“The Decoder”sound rendering
MP4-SA Maps to Modern Music Production
Network
Premium onlow-bandwidth
Key Application: Music ProductionKey Application: Music Production Modern Music Production is Computer based.
Musicians enter performances into computers as control information, not audio waveforms.
Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.
“The Program”synthesis algorithmseffects “boxes”mixers
Musical performanceMix-down control information
“The Decoder”sound rendering
MP4-SA Maps to Modern Music Production
Ideal format for collaborative productions, remixes, ...
File System
Standard Framework
MPEG 4 Structured Audio:MPEG 4 Structured Audio:
A binary file format that encodes: The programming language SAOL (say: sail). The musical score language SASL. Legacy support for MIDI. Audio sample data.
Result is normative: an MP4-SA file will sound identical on all compliant decoders.
Different from MIDI files.
MPEG 4 StandardMPEG 4 Standard
Structured Audio: One “component” in the MPEG audio standard.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
ISO/IEC 14496-3 sec5
MPEG 4 StandardMPEG 4 Standard
Advanced Audio Coding: successor to MP3, delivers highest quality audio, and highest bit-rate.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
MPEG 4 StandardMPEG 4 Standard
Time-Frequency Coding: Meant for a moderate bit/sec range, with moderate quality.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
MPEG 4 StandardMPEG 4 Standard
Code Excited Linear Prediction: Low bit rate coder, worksbest as a speech coder.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
MPEG 4 StandardMPEG 4 Standard
Parametric coders: Very-low bit rate coder, works best as as a speech coder.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
MPEG 4 StandardMPEG 4 Standard
Text-to-Speech: Takes phonetic and prosadic control information, produces syntesized speech.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
MPEG 4 StandardMPEG 4 Standard
“System” level includes mechanisms for composing and synchronizing audio (& video) components.
MPEG 4
audio systemvideo
SA
Natural coding Synthetic coding
AAC T/F CELP Parametric TTS
Why SAOL and MP4-SA?Why not Java?Why SAOL and MP4-SA?Why not Java? Musical performance have temporal structure
that changes over several timescales:
Sample-by-sample10’s of usec
Amplitude & timbre envelopes: 10’s of msec
Note-by-note: 100’s of msec
Writing sound generation code in a conventional language results in code dominated by time-scale management. Hard to maintain, hard to optimize.
Time management is built into SAOL.Time management is built into SAOL.
A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion.
Work is scheduled to happen:at the a-rate (the audio sample rate)at the k-rate (envelope control rate)at the i-rate (rate for new notes)
Language variables are typed as a/k/i-rate.
A language statement is scheduled based on the rate of the variables it contains.
SAOL, SASL, and Scheduling:SAOL, SASL, and Scheduling:
Sound creation in MP4-SA can be compared to a musician playing notes on an instrument.
A SAOL subprogram (called an instr or instrument) serves as the instrument.
SASL commands (called score lines) act to play notes on SAOL instruments.
Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.
Single Note Execution TraceSingle Note Execution TraceSAOL Instruments ... Contains all the instructions for playing a note: -- Code that runs at note launch. (once per i-pass) -- Code that models timbre evolution at the k-rate. (once per kpass) -- Code to generate audio samples at the a-rate. (once per a-pass)
Executing a Note …(k-rate: 4 kHz, a-rate: 40 kHz) time(us) pass 0 i-pass 0 k-pass 0 a-pass 25 a-pass 50 a-pass ... 225 a-pass 250 k-pass 250 a-pass 275 a-pass 300 a-pass ... 475 a-pass 500 k-pass 500 a-pass 525 a-pass ...
An example:An example:
SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)
This SASL file plays melody on tone:
0.5 tone 0.75 52 0.251.5 tone 0.75 64 0.252.5 tone 0.5 63 0.253 tone 0.25 59 0.23.25 tone 0.25 61 0.2253.5 tone 0.5 63 0.2254 tone 0.5 64 0.255 endHow long instrument runs
When instanceis launched
Instance parameters(note number, loudness)
SAOL code for toneSAOL code for toneinstr tone (note, loudness){ ivar a; // sets osc f
ksig env; // env output
asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0);
if (init == 0) // first a-pass only { x = loudness; init = 1; }
x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output
} // end of instr tone
SAOL code for toneSAOL code for toneinstr tone (note, loudness){ ivar a; // sets osc f
ksig env; // env output
asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0);
if (init == 0) // first a-pass only { x = loudness; init = 1; }
x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output
} // end of instr tone
i-rate
SAOL code for toneSAOL code for toneinstr tone (note, loudness){ ivar a; // sets osc f
ksig env; // env output
asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; }
x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output
} // end of instr tone
k-rate
SAOL code for toneSAOL code for toneinstr tone (note, loudness) { ivar a; // sets osc f
ksig env; // env output
asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0);
if (init == 0) // first a-pass only { x = loudness; init = 1; }
x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output
} // end of instr tone
a-rate
SAOL: Unique FeaturesSAOL: Unique Features
Rate semantics: i/k/a-rate execution
Vector arithmetic: ex: A=B+C for i=1,n A[i]=B[i]+C[i]
All floating-point arithmetic.
Extensive build-in audio function library:signal generators, table operators, pitch
converters, filters, fft, sample rate conversion, effects, ...
SAOL: Unique FeaturesSAOL: Unique Features
Instrument communication through bus structures:
Dynamic instrument creation and control.
Scheduler and language support for MIDI and SASL scores.
C D
B A
bus
Sfront - a SAOL-to-C translatorSfront - a SAOL-to-C translator
sfrontfoo.mp4 sa.c
Converts MP4-SA files to a C program, that when executed, produces audio.
Runs on UNIX, Win98/NT.Licensed under the GNU public license (GPL).www.cs.berkeley.edu/~lazzaro/sa
sfrontfoo.mp4
SAOL
MIDIUncompressed
samples
SASL
sa.c
Handles SAOL, SASL, MIDI, uncompressed samples.
Sfront BenchmarksSfront BenchmarksSfront version 0.36
Machine: 450 Mhz Pentium III, 128 MB, gcc version egcs-2.91.66, -O3 optimizer
Audio sample rate: 44.1 kHz for all examples
MP3 compression ratio = 11
79829 1353 650960036 2382 44434248846 1711968 2779823 1163 75679823 1119 786942069 1355803 7.61074152 27265 4342052591 316861 71.479824 1390 63380033 1261 698
abar Aluminum bar strike, physical modelbeat FM Models and Algorithmic Compositionelpelele Solo Piano with Multi-Sample Piano Soundflute Blown Pipe Physical Modelmarimba Marimba Physical Modelperc Sampled Drums, Rvrb and Spatializationpc FM Models in Multi-voice Compositionscr1 Piano with Single Samplestring Plucked String Physical Modelvowels Vowel Formant Synthesizer
10 M 2.38 4.13 88013260 S 7.51 11.51 10584380265 S 308 317.88 4684086010 M 0.82 2.23 88013210 M 2.08 3.75 88013259 S 40.6 60.68 1038448467 S 91.4 96.17 11841500128 S 43.2 46.73 2262642810 M 2.37 4.15 88013210 M 0.89 2.35 880264
Sfront Performance Summary:Sfront Performance Summary:
Rendering (file decoding):Current performance: a benchmark suite of
moderately complex MP4-SA streams computes in a time equivalent to the audio it generates, on a 400 Mhz Ultrasparc & 450 Mhz Pentium.
Real-time interaction:with a MIDI keyboard with acceptable latency
(~20 ms) and microphone input.
Interesting Issues:Interesting Issues:
MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. Physical Modeling good Sampling Natural Instruments bad
If models are chosen carefully, compression ratios of 100 to 10,000 are possible.
Physical Modeling is relatively immature, but holds much promise.
Struck/Plucked Instrument ModelStruck/Plucked Instrument Model
frequency
amplitude
Digital resonator:Yn = Yn-1 + Yn-2 + Xn
output
M1
M2
M3
Mn
striker
linear modes (resonances)attack section
single strike
multiple strikes
Aluminum Bar Sounds
Examples: struck bars, bells, drums, plucked strings
Parameters: striker characteristics, resonator constants
Blown Instrument ModelBlown Instrument Model
Examples: pipes, flutes, etc.
jet
y
x
frequency
amplitude
Parameters: shape of non-linear function, resonator constants
non-linearelement
linear element(resonant modes)
x y
excitation tube
Blown Pipe Sounds
brass pipe
overblown
Physical Modeling SummaryPhysical Modeling Summary
Models instrument not sound.
Advantages over traditional synthesis techniques (FM, sample-based):Compact descriptions. Physical parameterization leads to:
more intuitive controllower control bandwidth
State accurate simulation leads to:efficiency in re-excitationemulation of otherwise missing effects
Ultimately - more realistic sounds.
Physical Modeling Summary (cont.)Physical Modeling Summary (cont.)
Disadvantages: potential for high computational complexity
Approaches: PDE (partial differential equation) approach would
be nice, but probably not practical.ODE (ordinary differential equation, lumped circuit
models) practical and very general. Capture essential physics.
Wave-guide filters provide a more efficient alternative in some cases.
Interesting Issues (cont.):Interesting Issues (cont.):
MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.
A new role for psychophysics: Instead of using psychophysics to squeeze bits
out of a sound representation, MP4-SA decoders will use psychophysics to squeeze FLOPS out of sound computations.
Leverage spectral and temporal masking.
Interesting Issues (cont.):Interesting Issues (cont.):
MP4-SA can be used in a way similar to traditional compression except that the compression method can be ad hoc:Frame-work for experimentation in encoding.Hope for automatic encoding, if done in a voice
specific way: vocals guitar sax and other hard-to-synthesize sounds.
Running SAOL on Conventional ArchitecturesRunning SAOL on Conventional Architectures Lessons Learned from SAOL development:
Temporal typing of variables has the nice side effect of marking the inner loops.Typically, a-rate = 10X to 100X k-rate
A-rate code optimization : moving subexpressions into k-rate or i-rate.
SAOL semantics support a static heap. No recursion, all variables sp floats, no pointers ...
simplifies optimization.Other researchers (Giorgio Zoia - ETH) focusing on
blocking all a-passes for an instance, reducing overhead.
Processors with SIMD FP support (Intel SSE, AMD 3DNow!) will be a good match.
Fixed-Function Hardware for SAOL AcceleratorsFixed-Function Hardware for SAOL Accelerators Unlike MPEG-2 chips, DVD chips, etc., its not
clear how MP4-SA can be accelerated by rolling an ASIC.Since every MP4-SA file is a new algorithm.
Common opcodes can be hardwired and the general characteristics of typical MP4-SA files could be leveraged to specialize a conventional processor design.But the language is only six months old; execution
frequencies are not known.
Reconfigurable computing architectures might hold promise (however, MP4-SA is all floating point).
Directions / Research OpportunitiesDirections / Research Opportunities
Compiler optimizations for: SAOL and other languages with rate semanticshigh-performance SIMD architectures runtime code specialization
Runtime scheduling under limited compute resources.
SAOL programming environments. Physical modeling. Automatic encoding.