intro sync
TRANSCRIPT
Introduction to Multimedia Synchronization
Klara Nahrstedtcs598KN
Content
Notion of Synchronization Intra-object and inter-object synchronization Live and synthetic Synchronization Synchronization Requirements Reference Model for Synchronization Synchronization in Distributed Environments Synchronization Specification
Notion of Synchronization
Multimedia synchronization is understood in correspondence of content relation, spatial relation and temporal relation
Content Relation: defines dependency of media objects for some data– Example: dependency between a filled
spreadsheet and graphics that represent data listed in spreadsheet
Spatial Relation
Spatial relation is represented through layout relation and defines space used for presentation of a media object on an output device at a certain point of time in a multimedia document
– Example: desktop publishing – Layout frames are placed on an output device and a
content is assigned to this frame Positioning of layout frames:
– Fixed to a position of a document– Fixed to a position on a page– Relative to the position of other frame
– Concept of frames is used for positioning of time-dependent objects
Example: in window-based systems, layout frames correspond to windows and video can be positioned in a window
Temporal Relation
Temporal relation defines temporal dependencies between media objects
– Example: lip synchronization This relation is the focus of our papers, we will not
talk about content or spatial relation Time-dependent objects – represent a media stream
because there exists temporal relations between consecutive units of the stream
Time-independent objects are traditional media such as images or text.
Temporal Relations (2)
Temporal synchronization is supported by many system components: – OS (CPU scheduling, semaphores during IPC)– Communication systems (traffic shaping, network
scheduling)– Databases– Document handling
Synchronization needed at several levels of a Multimedia System
Temporal Relations (3)
1. level: OS and lower communication layers bundle single streams
– Objective is to avoid jitter at the presentation time of one stream
2. level: on top of this level is the RUN-TIME support for synchronization of multimedia streams (schedulers)
– Objective is bounded skews between various streams 3. level: next level holds the run-time support for
synchronization between time-dependent and time-independent media together with handling of user interaction
– Objective is bounded skews between time-dependent and time-independent media
Specification of Synchronization
Implicit Specification– Temporal relation may be specified implicitly during
capturing of the media objects; the goal of a presentation is to present media in the same ways as they were originally captured
Audio/video recording and playback /VOD applications Explicit Specification
– Temporal relation may be specified explicitly in the case of presentations that are composed of independently captured or otherwise created objects
Slide show where presentation designed– Selects appropriate slides– Creates audio slides– Defines units of audio presentation stream– Defines units of audio presentation stream where slides have to
be presented
Inter-object and Intra-Object Synchronization
Intra-object synchronization refers to the time relation between various presentation units of one time-dependent media object
Inter-object synchronization refers to the synchronization between media objects
40ms 40ms
Audio 1
VideoSlides
Animation
Audio 2
Classification of Synchronization Units
Logical Data Units (LDU)– Samples or Pixels,– Notes or frames– Movements or scenes– Symphony or movie
Fixed LDU vs Variable LDU LDU specification during recording vs LDU
defined by user Open LDU vs Closed LDU
Live Synchronization
Goal is to exactly reproduce at a presentation the temporal relations as they existed during the capturing process
Need to capture temporal relation information during the capturing
Live Sync is needed in conversational services– Video Conferencing, Video Phone – Recording and Retrieval services are considered
retrieval services, or presentations with delay
Synthetic Synchronization
Temporal relations are artificially specified Often used in presentation and retrieval-based
systems with stored data objects that are arranged to provide new combined multimedia objects
– Authoring and tutoring systems Need synchronization editors to support flexible
synchronization relations between media Two phases: (1) specification phase defines temporal
relations, (2) presentation phase presents data in a synchronized mode
– 4 audio messages recorded relate to parts of an engine in an animation; the animation sequence shows a slow 360 degrees rotation of the engine.
Synchronization Requirements
For intra-object synchronization: – Accuracy concerning jitters and end-to-end delays in the
presentations of LDUs For inter-object synchronization:
– Accuracy in the parallel presentation of media objects Implication of blocking method:
– Fine for time-independent media– Gap problem for time-dependent media
What does the blocking of a stream mean for the output device?
Should be repeated previous parts in case of speech or music?
Should the last picture of a stream be shown? How long can such a gap exist?
Synchronization Requirements (2)
Solutions to Gap problem– Restricted blocking method
Switching to alternative presentation if gap between late video and audio exceeds a predefined threshold
Show last picture as a still image– Re-sampling of a stream
Speed up or slow down streams for the purpose of synchronization
– Off-line-re-sampling – used after capturing of media streams with independent devices
Concert which is captured with two independent audio and video devices
– Online re-sampling – used during a presentation in the case that at run-time a gap between media streams occurs
Synchronization Requirements (3)
Lip Synchronization Requirements refer to temporal relation between audio and video streams for human speaking
Time difference between related audio and video LDUs is called synchronization skew
Streams are in sync if skew = 0 or skew ≤ bound Streams are out of sync if skew > 0 Bounds:
– Audio/Video in sync means -80ms ≤ skew ≤ 80ms– Audio/Video out of sync means skew < -160ms or skew > 160ms– Transient means -160ms ≤ skew < -80ms, and 80ms < skew ≤
160ms
Synchronization Requirements (4)
Pointer Synchronization Requirements are very important in computer-supported cooperative work (CSCW)
We need synchronization between graphics, pointers and audio
Comparison– Lip sync error is skew between 40 to 60ms– Pointer sync error is skew between 250 to 1500ms
Bound:– Pointer/Audio/Graphics in sync means -500ms ≤ skew
750ms– Out of sync means skew < -1000ms and skew > 1250ms– Transient means -1000ms ≤ skew < -500ms and 750ms <
skew ≤ 1250ms
Synchronization Requirements (5)
Digital audio on CD-ROM: – maximum allowable jitter delay in perception experiments 5-
10ns, other experiments suggest 2ms Combination of audio and animation is not as
stringent as lip synchronization – Maximum allowable skew is +/- 80ms
Stereo audio is tightly coupled– Maximum allowable skew is 20ms; because of listening
errors, suggestion for skew is +/- 11ms Loosely coupled audio channels: speaker and
background music– Maximum allowable skew is 500ms.
Synchronization Requirements (6)
Production-level synchronization should be guaranteed prior to the presentation of data at the user interface– For example, in case of recording of synchronized
data for subsequent playback Stored data should be captured and recorded with no
skew For playback, the defined lip sync boundaries are 80 ms For playback at local and remote workstation
simultaneously, sync skews should be between -160ms and 0ms (video should be ahead of audio for remote station due to pre-fetching)
Synchronization Requirements (7)
Presentation-level synchronization should be defined at the user interface
This synchronization focuses on human perception– Examples
Video and image overlay +/- 240ms Video and image non-overlay +/- 500ms Audio and image (music with notes) +/- 5ms Audio and slide show (loosely coupled image) +/-
500ms Audio and text (Text annotation) +/-240ms
Reference Model for Synchronization
Synchronization of multimedia objects are classified with respect to four-level system
SPECIFICATION LEVEL (is an open layer, includes applications and toolsWhich allow to create synchronization specifications, e.g. sync editors); editingAnd formatting, mapping of user QoS to abstractions at object level OBJECT/SERVICE LEVEL (operates on all types of media and hides differences between discrete and continuous media), plan and coordinate presentations, initiate presentationsSTREAM LEVEL (operates on multiple media streams, provides inter-stream synchronization), resource reservation and scheduling
MEDIA LEVEL (operates on single stream; treaded as a sequence of LDUs, provides intra-stream synchronization), file and device access
Synchronization in Distributed Environments
Information of synchronization must be transmitted with audio and video streams, so that the receiver side can synchronize the streams
Delivery of complete sync information can be done before the start of presentation
– This is used in synthetic synchronization Advantage: simple implementation Disadvantage: presentation delay
Deliver of complete sync information can be using out-of-band communication via a separate sync channel
– This is used in live synchronization Advantage: no additional presentation delays Disadvantage: additional channel is needed; additional errors can occur
Synchronization in Distributed Environments (2)
Delivery of complete synchronization information can be done using in-band communication via multiplexed data streams, i.e., synchronization information is in headers of the multimedia PDU– Advantage: related sync information is delivered
together with media units– Disadvantage: difficult to use for multiple sources
Synchronization in Distributed Environments (3) Location of Synchronization Operations
It is possible to synchronize media objects by recording objects together and leave them together as one object, i.e., combine objects into a new media object during creation; Synchronization operation happens then at the recording site
Synchronization operation can be placed at the sink. In this case the demand on bandwidth is larger because additional sync information must be transported
Synchronization operation can be placed at the source. In this case the demand on bandwidth is smaller because the streams are multiplexed according to synchronization requirements
Synchronization in Distributed Environments (4)Clock Synchronization
Consider synchronization accuracy between clocks at source and destination
Global time-based synchronization needs clock synchronization
In order to re-synchronize, we can allocate buffers at the sink and start transmission of audio and video in advance, or use NTP (Network Time Protocol) to bound the maximum clock offset
Synchronization in Distributed Environments (5) Other Synchronization Issues
Synchronization in distributed environment is a multi-step process
– Sync must be considered during object acquisition (during video digitization)
– Sync must be considered during retrieval (synchronize access to frames of a stored video)
– Sync must be considered during delivery of LDUs to the network (traffic shaping)
– Sync must be considered during transport (use isochronous protocols if possible)
– Sync must be considered at the sink (sync delivery to the output devices)
Synchronization Specification Methods
Interval-based Specification– presentation duration of an object is considered
as interval Examples of operations: A before(0) B, A overlap B, A
starts B, A equals B, A during B, A while(0,0) B– Advantage: easy to handle open LDUs ad
therefore user interactions– Disadvantage: model does not include skew
specificationsAudio 1
Video 1slides
animation
Audio 2
Synchronization Specification (2)
Control Flow-based Specification – Hierarchical Approach
– Flow of concurrent presentation threads is synchronized in predefined points of the presentation
– Basic hierarchical specification: 1. serial synchronization, 2. parallel synchronization of actions
– Action can be atomic or compound– Atomic action handles presentation of a single media
object, user input or delay– Compound actions are combinations of synchronization
operators and atomic actions– Delay as an atomic action allows modeling of further
synchronization (e.g. delay in serial presentation)
Synchronization Specification (3)
Control Flow-based Specification – Hierarchical Approach
Advantage: easy to understand, natural support of hierarchy and integration of interactive objects is easy
Disadvantage: Additional description of skews and QoS is necessary, we must add presentation durations
Audio 1 Video
Slides
Animation Audio 2
Synchronization Specification (3)
Control flow-based Synchronization Specification – Timed Petri Nets
Advantages: Timed Petri nets allow all kinds of synchronization specification
Disadvantage: Difficulty with complex specification and offers insufficient abstraction of media object content because the media objects must be split into sub-objects
transition Input place with tokenOutput place
Summary
Different synchronization frameworks– Little’s synchronization framework (Boston University)
Goal – support retrieval and delivery of multimedia – Firefly System – Buchanan and Zellweger
Goal – generate automatically consistent presentation schedules for interactive documents
– HyTime – standard system with hypermedia/time-based structuring language
Goal – standard for structured representation of hypermedia information
HyTime is an application of the Standardized Markup Language (SGML)