mpeg-4 structured audio
DESCRIPTION
MPEG-4 Structured Audio. Eric D. Scheirer [email protected] Machine Listening Group MIT Media Laboratory Editor, ISO 14496-3 (MPEG-4 Audio). Project Bar-B-Q 1999 Guadalupe River Ranch 15 Oct 1999. - PowerPoint PPT PresentationTRANSCRIPT
MPEG-4 Structured Audio
Eric D. [email protected]
Machine Listening GroupMIT Media LaboratoryEditor, ISO 14496-3 (MPEG-4 Audio)
Project Bar-B-Q 1999Guadalupe River Ranch
15 Oct 1999
MPEG-4 Structured Audio,A New Standard for Interactive Sound, in the Creation of Which Tom White did not Run the Whole Show, but Only Played a Small (Though Valuable) Part
Eric D. [email protected]
Machine Listening GroupMIT Media LaboratoryEditor, ISO 14496-3 (MPEG-4 Audio)
Project Bar-B-Q 1999Guadalupe River Ranch
15 Oct 1999
What’s this all about?
MPEG-4 is not just about compression
MPEG-4 shows one way for the IA world to move beyond wavetable synthesis
Overview
What is MPEG?What is MPEG-4 Structured Audio?Why was it created?How does it work?How can it be used in IA applications?What is its current status?A brief note on MPEG-4 AudioBIFS
Intellectual property in MPEG-4
Structured Audio and AudioBIFS are freeAll patentable IP has been released to public domainNo licensing or other costs to build tools & players(Standard itself costs $300 for printing/bureaucracy)
SA and AudioBIFS are open standardsCompanies competing through cooperationInteroperability makes the whole pie biggerMPEG processes for improving/correcting standardMIT has no veto over the future of the standard
What is MPEG?
MPEG is ISO/IEC JTC1 SC29 WG11A subcommittee of the Int’l Standards OrganizationThe “Moving Pictures Experts Group”
MPEG-1 : 1993 (ISO 11172)Digital audio/video coding (MP3)
MPEG-2 : 1994-7 (ISO 13818)Digital coding for broadcast
MPEG-4: 1998 (ISO 14496)Object based, synthetic/natural, interactive coding
MPEG Marketplace Model
MPEG Committee
Server-side tools makers Client-side tools makers
Content developers Content consumers
MPEG Standard
Authoring tools Playback tools
MPEGContent
MPEG Marketplace Model
MPEG Committee
Server-side tools makers Client-side tools makers
Content developers Content consumers
MPEG Standard
Authoring tools Playback tools
MPEGContent
This talk
MPEG Marketplace Model
MPEG Committee
Server-side tools makers Client-side tools makers
Content developers Content consumers
MPEG Standard
Authoring tools Playback tools
MPEGContent
The businessopportunities
MPEG-4 Audio
High-quality soundBased on MPEG-AAC algorithm: twice as good as MP3
Low-bitrate soundFor WWW and cellular: speech/music as low as 4 kbps
Synthetic soundInterface to Text-to-Speech synthesizersHigh-quality audio synthesis with Structured Audio
AudioBIFSMix and postproduce multi-track sound streams
MPEG-4 Structured Audio
Transmit structured description of soundUse real-time synthesis to play sound“PostScript for audio”Based on new (to MPEG) technology
SAOL: New music synthesis languageSASL: New music control format
A lot of related technology in academiaCsound, Music-11, SynthScript, Nyquist, CLM, ...
Standardization goals
Provide synthetic sound in MPEG-4 Bring algorithmic synthesis to wider
communityStandardize academic state-of-the-art; don’t innovate
Get new companies to work on synthesisImplementation required for full MPEG-4 system
Set a higher bar for PC sound architectureDrive forward the world of sound on PCs!
Stated goals
Secret goals
MPEG-4 SA decoding process
ReconfigurableSynthesis
Engine
ReconfigurableSynthesis
Engine
SAOLDecoderSAOL
Decoder
SASL/MIDIDecoder
SASL/MIDIDecoder
Bitstream
Bitstream header
Multichannelhigh-quality audio
Controlparameters
Samples
What SAOL looks like
A C-like languageBased on the Music-N
modelVariables hold audio
signalsUnit generators do
basic functions Instruments controlled
by score or MIDI
instr beep(mp, vol) {
asig wave;
ksig env;
table sig(harm,2048,1,1);
wave = oscil(sig,cpsmidi(mp));
env = kline(0,dur*0.05,vol,
dur*0.6,vol,
dur*0.35,0);
output(wave * env);
}
SAOL: Structured Audio Orchestra Language
SAOL capabilities
Many nice features built inWavetable manipulation FFT/IFFTMultitap delay lines Arrays of signalsFIR & IIR filters Effects routingGranular synthesis 3-D audio interfaceDynamic layering and triggering
SAOL is extensible-from-within(Allows encapsulation and structured programming)
Any kind of synthesis can be used in SAOL
Example
“Xanadu” (Joseph Kung)60 seconds long, 44 KHz stereo (10.5 MB as WAVE)2.2 KB in header4.2 KB in bitstream (= 0.07 kbps)No samples anywhere, only algorithmic synthesis
More than 1200:1 “compression”, no loss of qualityCould be controlled/restructured interactively
MPEG-MMA relationship
MIDI can control MPEG-4 SA synthSASL = more flexible, more tightly coupled
DLS-2 synthesis embedded in SA synthDo wavetable in series or parallel with other techniques
“Wavetable-only” profile of MPEG-4MIDI + DLS-2 + compressed audio + video (no SAOL)Logical path of progression from today to tomorrow
Lots of help from MMA - appreciated!MPEG is ready to help in the other direction (MIDI-DLA?)
Applications ideas
MPEG-4 is not an application!It’s a tool - enables functionality and interoperabilityImplementations could be hardware, software, bothAuthoring tools also very important
Use MPEG-4 SA like Staccato SynthcoreUse MPEG-4 SA like BeatnikUse MPEG-4 SA like KoanUse MPEG-4 SA for new music applications
Application example: Gaming
MPEG-4 enabled
sound card
Host program (game)
MPEG-4 & MIDI controls
Runtime
StartupMPEG-4
synthesis/effects algorithms
Multichannel, 3-D,
post-processed sound
MPEG-4algorithm andsample editors
MPEG-4 algorithmmarketplace
Not just music -- parametric sound effects as well All audio programming and asset development in SAOL
No host-language audio programming needed Host APIs (e.g. DirectMusic) can generate controls
Embedded MPEG-4 side can do this too, if useful
Current status
Standard and reference software finishedMany implementation projects starting
Creative Tech Center: Compression & Interactive AudioStuder + EPFL: “ThreeDSpace” projectHobbyist projects (Java API, ActiveX plugin)Others: Be Inc., Sseyo, Kings College, UC Berkeley, Catholic U. Leuven, Q-Team DE, Nokia, ...3 complete implementations already!
A few authoring tools projectsActive mailing list for developers
A brief note on AudioBIFS
BIFS is scene-description part of MPEG-4“Binary Format for Scenes”Based on VRML, but with many new features
AudioBIFS is the audio mixing partStream audio in multitrack formatDeliver mixdown instructions in AudioBIFSMixing, spatialization, effects in SAOL, multichannelTerminal-adaptive capabilityCandidate for “PC DSP architecture”?
AudioBIFS - scene graph model
AudioSource
AudioSource
NaturalDecode
r
Synthetic
Decoder
AudioBIFSmanipulation
Sound
Streaming compressed audio & synthesis controls
Decode into raw audio samples
Inject sound into scene graph
Create sound objectwith AudioBIFS (mixing, filtering, reverb, etc)
Attach sound to main scene (spatially position if desired)
Summary
MPEG-4 Structured AudioThe international standard for algorithmic sound synthesis
MPEG-4 AudioBIFSThe international standard for audio postproduction
New market opportunities for Hardware/software MPEG-4 players (embedded or not)Authoring tools (editors, sequencers)Advanced interactive audio content
What was this all about?
MPEG-4 is not just about compression
MPEG-4 shows one way for the IA world to move beyond wavetable synthesis
For more information
MPEG home pagehttp://www.cselt.it/mpegRequirements, future of MPEG
MPEG-4 SA home pagehttp://sound.media.mit.edu/mpeg4Draft standard, code, mailing lists, matchmaking
[email protected], technical papers, discussion available