paris video tech #2 - presentation by jean-yves avenard
TRANSCRIPT
Behind the scenes: The Media stack in Firefox
What we support
Out of the box Firefox can handle the following codecs:• Video: VP8 (ffvp8), VP9 (ffvp9) and Theora (libtheora)• Audio: Vorbis (libvorbis) and Opus (libopus), FLAC (ffmpeg)
Relying on installed system frameworks for:• H264, AAC and MP3.
• Windows: Media Foundation Transform (MFT), supports hardware acceleration in combination with D3D9 and D3D11. Not available on XP. European editions (N, KN) require installing extra packages.
• Mac: Video Toolbox, supports hardware acceleration; CoreMedia.• Linux and others: FFmpeg. Software decoding only.
Media Source Extension
Everything is supported as of current draft specifications except:
• MPEG-TS• raw AAC and MP3 streams• Anything related to the Track elements
Limitations:• All multi-channels audio tracks are downmixed to stereo.• Only one source buffer type (audio or video) at once.
Media Source Extension
Always supported when we have local decoders:video/mp4: H264, AAC, MP3. Soon Opus and FLAC
video/webm: VP8, VP9, Vorbis and Opus
Note for webm.VP8 and VP9 codecs are only available by default if one of the conditions is true:• No H264 decoder found• No hardware acceleration (typically blacklisted drivers)• Machine is deemed fast enough• media.mediasource.webm.enabled preferences is set to true.
HTML5 Media Element Architecture (Plain)
All operations between the media element and the media stacksare asynchronous and use a Promise-like communication mechanism.
HTML Media Element(manage events and
user operations)
Media Stack(loading, demuxing, decoding)
JS
● currentTime● readyState
● Load● Play / Pause● Seek
Video Compositor Audio Renderer
Media Stack (plain)
AsynchronousHeavily multi-threaded
MediaResource
MediaDecoderState Machine
MediaDataDemuxer
MediaDataDecoder
Platform Module
MediaFormatReader
MediaDataDecoder
MediaDataDecoder
MediaCache
HTML5 Media Element Architecture (MSE)
All operations between the media element and the media stacksare asynchronous and use a Promise-like communication mechanism.
HTML Media Element(manage events and
user operations)
Media Stack(loading, demuxing, decoding)
JS
● currentTime● readyState
● Load● Play / Pause● Seek
Video Compositor Audio Renderer
MediaSource
SourceBuffer SourceBuffer
Media Stack (MSE)
AsynchronousHeavily multi-threaded MediaSourceResourc
e
MediaDecoderState Machine
MediaDataDemuxer
MediaDataDecoder
Platform Module
MediaFormatReader
MediaDataDecoder
MediaDataDecoder
TrackBuffer
Implementation Notes
• Mostly written in C++• All demuxers are written in house. While we often use external
libraries to provide core features, we control the entire demuxing chain.
MSE Implementations notes
Eviction strategies:• In 50 and earlier, 100MB video source buffer, 30MB audio source
buffer (was both 100MB in 48 and earlier).• In 51 and later, 100MB video, 10MB audio.
First, attempt to evict data located prior currentTime.Second, attempt to evict future data, found after discontinuity
In the future, we are considering to stop having a set size, and instead base the eviction on the duration of data buffered (e.g. 30s for both audio and video).Combined maximum buffer size shared across all source buffers.
Media Most Common Issues
• Buggy video driversSolutions: blacklisting, out of process decoding
• Unsupported media fileSolutions: Decoder: tough luck, Demuxer: fix it.
• SecuritySolutions: rewriting some components in Rust language.
MSE Most Common Issues
• Bad muxing. In particular invalid tagging of keyframes.
• Invalid timestamps, gap in data (in 51 and earlier, Firefox will not go over 125ms gap, 500ms in 52)
• Having to rely on platform decoder limitation or unique behaviour especially on Windows.
• Chrome centric code, or relying on invalid Chrome behaviour.
• Not listening to appendBuffer events, especially buffer full.
HTML5 Media Element Architecture (EME)
EME is only working in combination with MSE
HTML Media Element(manage events and
user operations)
Media Stack(loading, demuxing, decoding)
JS
● currentTime● readyState
● Load● Play / Pause● Seek
Video Compositor Audio Renderer
MediaKeys
MediaKey session
MediaKey session
Media Stack (EME)The CDM runs in its own child process, within a sandbox.Decrypted and decoded data is fed back into our media stack for rendering
MediaSourceResource
MediaDecoderState Machine
MediaDataDemuxer
Platform Module
MediaFormatReader
EMEVideoDataDecoder
TrackBuffer
MediaKey session MediaKey
sessionEMEAudioDataDecoder
EME Support
• Currently only supporting Google’s Widevine and Adobe’s Primetime and ClearKey CDM
• No access to Microsoft PlayReady or Apple FairPlay. This prevents us from having access to hardware decoding for encrypted content.
• Netflix only delivering 720p, same for Amazon with some contents.
Gecko Future improvements.
• Out of process GPU decoding. When a driver crashes we can immediately recover with zero visible consequences
• Suspend decoding for videos when in the background to reduce CPU usage and increase battery life
• E10S: increasing the number of content processes
How can you help yourselves
• Test using Firefox!
• MSE implementation is very rigorous and 100% per spec.
• If it works in Firefox it will work with other compliant browsers. It’s also more likely to work with all other browsers.
• You’re better off testing with Firefox