intro sync

Introduction to Multimedia Synchronization

Klara Nahrstedtcs598KN

Content

Notion of Synchronization Intra-object and inter-object synchronization Live and synthetic Synchronization Synchronization Requirements Reference Model for Synchronization Synchronization in Distributed Environments Synchronization Specification

Notion of Synchronization

Multimedia synchronization is understood in correspondence of content relation, spatial relation and temporal relation

Content Relation: defines dependency of media objects for some data– Example: dependency between a filled

spreadsheet and graphics that represent data listed in spreadsheet

Spatial Relation

Spatial relation is represented through layout relation and defines space used for presentation of a media object on an output device at a certain point of time in a multimedia document

– Example: desktop publishing – Layout frames are placed on an output device and a

content is assigned to this frame Positioning of layout frames:

– Fixed to a position of a document– Fixed to a position on a page– Relative to the position of other frame

– Concept of frames is used for positioning of time-dependent objects

Example: in window-based systems, layout frames correspond to windows and video can be positioned in a window

Temporal Relation

Temporal relation defines temporal dependencies between media objects

– Example: lip synchronization This relation is the focus of our papers, we will not

talk about content or spatial relation Time-dependent objects – represent a media stream

because there exists temporal relations between consecutive units of the stream

Time-independent objects are traditional media such as images or text.

Temporal Relations (2)

Temporal synchronization is supported by many system components: – OS (CPU scheduling, semaphores during IPC)– Communication systems (traffic shaping, network

scheduling)– Databases– Document handling

Synchronization needed at several levels of a Multimedia System

Temporal Relations (3)

1. level: OS and lower communication layers bundle single streams

– Objective is to avoid jitter at the presentation time of one stream

2. level: on top of this level is the RUN-TIME support for synchronization of multimedia streams (schedulers)

– Objective is bounded skews between various streams 3. level: next level holds the run-time support for

synchronization between time-dependent and time-independent media together with handling of user interaction

– Objective is bounded skews between time-dependent and time-independent media

Specification of Synchronization

Implicit Specification– Temporal relation may be specified implicitly during

capturing of the media objects; the goal of a presentation is to present media in the same ways as they were originally captured

Audio/video recording and playback /VOD applications Explicit Specification

– Temporal relation may be specified explicitly in the case of presentations that are composed of independently captured or otherwise created objects

Slide show where presentation designed– Selects appropriate slides– Creates audio slides– Defines units of audio presentation stream– Defines units of audio presentation stream where slides have to

be presented

Inter-object and Intra-Object Synchronization

Intra-object synchronization refers to the time relation between various presentation units of one time-dependent media object

Inter-object synchronization refers to the synchronization between media objects

40ms 40ms

Audio 1

VideoSlides

Animation

Audio 2

Classification of Synchronization Units

Logical Data Units (LDU)– Samples or Pixels,– Notes or frames– Movements or scenes– Symphony or movie

Fixed LDU vs Variable LDU LDU specification during recording vs LDU

defined by user Open LDU vs Closed LDU

Live Synchronization

Goal is to exactly reproduce at a presentation the temporal relations as they existed during the capturing process

Need to capture temporal relation information during the capturing

Live Sync is needed in conversational services– Video Conferencing, Video Phone – Recording and Retrieval services are considered

retrieval services, or presentations with delay

Synthetic Synchronization

Temporal relations are artificially specified Often used in presentation and retrieval-based

systems with stored data objects that are arranged to provide new combined multimedia objects

– Authoring and tutoring systems Need synchronization editors to support flexible

synchronization relations between media Two phases: (1) specification phase defines temporal

relations, (2) presentation phase presents data in a synchronized mode

– 4 audio messages recorded relate to parts of an engine in an animation; the animation sequence shows a slow 360 degrees rotation of the engine.

Synchronization Requirements

For intra-object synchronization: – Accuracy concerning jitters and end-to-end delays in the

presentations of LDUs For inter-object synchronization:

– Accuracy in the parallel presentation of media objects Implication of blocking method:

– Fine for time-independent media– Gap problem for time-dependent media

What does the blocking of a stream mean for the output device?

Should be repeated previous parts in case of speech or music?

Should the last picture of a stream be shown? How long can such a gap exist?

Synchronization Requirements (2)

Solutions to Gap problem– Restricted blocking method

Switching to alternative presentation if gap between late video and audio exceeds a predefined threshold

Show last picture as a still image– Re-sampling of a stream

Speed up or slow down streams for the purpose of synchronization

– Off-line-re-sampling – used after capturing of media streams with independent devices

Concert which is captured with two independent audio and video devices

– Online re-sampling – used during a presentation in the case that at run-time a gap between media streams occurs


Lip Synchronization Requirements refer to temporal relation between audio and video streams for human speaking

Time difference between related audio and video LDUs is called synchronization skew

Streams are in sync if skew = 0 or skew ≤ bound Streams are out of sync if skew > 0 Bounds:

– Audio/Video in sync means -80ms ≤ skew ≤ 80ms– Audio/Video out of sync means skew < -160ms or skew > 160ms– Transient means -160ms ≤ skew < -80ms, and 80ms < skew ≤

160ms


Pointer Synchronization Requirements are very important in computer-supported cooperative work (CSCW)

We need synchronization between graphics, pointers and audio

Comparison– Lip sync error is skew between 40 to 60ms– Pointer sync error is skew between 250 to 1500ms

Bound:– Pointer/Audio/Graphics in sync means -500ms ≤ skew

750ms– Out of sync means skew < -1000ms and skew > 1250ms– Transient means -1000ms ≤ skew < -500ms and 750ms <

skew ≤ 1250ms


Digital audio on CD-ROM: – maximum allowable jitter delay in perception experiments 5-

10ns, other experiments suggest 2ms Combination of audio and animation is not as

stringent as lip synchronization – Maximum allowable skew is +/- 80ms

Stereo audio is tightly coupled– Maximum allowable skew is 20ms; because of listening

errors, suggestion for skew is +/- 11ms Loosely coupled audio channels: speaker and

background music– Maximum allowable skew is 500ms.


Production-level synchronization should be guaranteed prior to the presentation of data at the user interface– For example, in case of recording of synchronized

data for subsequent playback Stored data should be captured and recorded with no

skew For playback, the defined lip sync boundaries are 80 ms For playback at local and remote workstation

simultaneously, sync skews should be between -160ms and 0ms (video should be ahead of audio for remote station due to pre-fetching)


Presentation-level synchronization should be defined at the user interface

This synchronization focuses on human perception– Examples

Video and image overlay +/- 240ms Video and image non-overlay +/- 500ms Audio and image (music with notes) +/- 5ms Audio and slide show (loosely coupled image) +/-

500ms Audio and text (Text annotation) +/-240ms

Reference Model for Synchronization

Synchronization of multimedia objects are classified with respect to four-level system

SPECIFICATION LEVEL (is an open layer, includes applications and toolsWhich allow to create synchronization specifications, e.g. sync editors); editingAnd formatting, mapping of user QoS to abstractions at object level OBJECT/SERVICE LEVEL (operates on all types of media and hides differences between discrete and continuous media), plan and coordinate presentations, initiate presentationsSTREAM LEVEL (operates on multiple media streams, provides inter-stream synchronization), resource reservation and scheduling

MEDIA LEVEL (operates on single stream; treaded as a sequence of LDUs, provides intra-stream synchronization), file and device access

Synchronization in Distributed Environments

Information of synchronization must be transmitted with audio and video streams, so that the receiver side can synchronize the streams

Delivery of complete sync information can be done before the start of presentation

– This is used in synthetic synchronization Advantage: simple implementation Disadvantage: presentation delay

Deliver of complete sync information can be using out-of-band communication via a separate sync channel

– This is used in live synchronization Advantage: no additional presentation delays Disadvantage: additional channel is needed; additional errors can occur

Synchronization in Distributed Environments (2)

Delivery of complete synchronization information can be done using in-band communication via multiplexed data streams, i.e., synchronization information is in headers of the multimedia PDU– Advantage: related sync information is delivered

together with media units– Disadvantage: difficult to use for multiple sources

Synchronization in Distributed Environments (3) Location of Synchronization Operations

It is possible to synchronize media objects by recording objects together and leave them together as one object, i.e., combine objects into a new media object during creation; Synchronization operation happens then at the recording site

Synchronization operation can be placed at the sink. In this case the demand on bandwidth is larger because additional sync information must be transported

Synchronization operation can be placed at the source. In this case the demand on bandwidth is smaller because the streams are multiplexed according to synchronization requirements

Synchronization in Distributed Environments (4)Clock Synchronization

Consider synchronization accuracy between clocks at source and destination

Global time-based synchronization needs clock synchronization

In order to re-synchronize, we can allocate buffers at the sink and start transmission of audio and video in advance, or use NTP (Network Time Protocol) to bound the maximum clock offset

Synchronization in Distributed Environments (5) Other Synchronization Issues

Synchronization in distributed environment is a multi-step process

– Sync must be considered during object acquisition (during video digitization)

– Sync must be considered during retrieval (synchronize access to frames of a stored video)

– Sync must be considered during delivery of LDUs to the network (traffic shaping)

– Sync must be considered during transport (use isochronous protocols if possible)

– Sync must be considered at the sink (sync delivery to the output devices)

Synchronization Specification Methods

Interval-based Specification– presentation duration of an object is considered

as interval Examples of operations: A before(0) B, A overlap B, A

starts B, A equals B, A during B, A while(0,0) B– Advantage: easy to handle open LDUs ad

therefore user interactions– Disadvantage: model does not include skew

specificationsAudio 1

Video 1slides

animation

Audio 2

Synchronization Specification (2)

Control Flow-based Specification – Hierarchical Approach

– Flow of concurrent presentation threads is synchronized in predefined points of the presentation

– Basic hierarchical specification: 1. serial synchronization, 2. parallel synchronization of actions

– Action can be atomic or compound– Atomic action handles presentation of a single media

object, user input or delay– Compound actions are combinations of synchronization

operators and atomic actions– Delay as an atomic action allows modeling of further

synchronization (e.g. delay in serial presentation)


Control Flow-based Specification – Hierarchical Approach

Advantage: easy to understand, natural support of hierarchy and integration of interactive objects is easy

Disadvantage: Additional description of skews and QoS is necessary, we must add presentation durations

Audio 1 Video

Slides

Animation Audio 2


Control flow-based Synchronization Specification – Timed Petri Nets

Advantages: Timed Petri nets allow all kinds of synchronization specification

Disadvantage: Difficulty with complex specification and offers insufficient abstraction of media object content because the media objects must be split into sub-objects

transition Input place with tokenOutput place

Summary

Different synchronization frameworks– Little’s synchronization framework (Boston University)

Goal – support retrieval and delivery of multimedia – Firefly System – Buchanan and Zellweger

Goal – generate automatically consistent presentation schedules for interactive documents

– HyTime – standard system with hypermedia/time-based structuring language

Goal – standard for structured representation of hypermedia information

HyTime is an application of the Standardized Markup Language (SGML)

intro sync

Documents