performance measurements on the heidelberg audio/video distribution system: methodology and results

14
Performance measurements on the Heidelberg audio/video distribution system: methodology and results Andreas Cramer*, Richard Hofmatin+ and Norbert Luttenberger* The integration of continuous presentation media like audio and video into current workstation and networking environ- ments imposes tight timing constraints on A/V data transfer. These constraints are summarized under the term Quality of Service (QoS), which comprise a number of parameters. Among those, delay and delay jitter are the most important. In the study presented in this paper, measurement and modelling methods have been applied to analyse the delay and delay jitter incurred by audio/video streams. An experi- mental audio/video distribution system called HeiDi (Heidel- berg A/V Distribution System) intended for the fully digital distribution of compressed, packetized audio and video over a token ring local area network served as testbed for the analysis. Measurements were conducted with the ZM4 (abbreviation for the German name ZBhlmonitor 4) moni- toring system developed at the University of Erlangen- Niirnberg especially for performance analysis of distributed systems. For modelling, Monitoring Petri Nets are used, a class of Petri nets which lends itself well for the integration of measurement and modelling. Results indicate that the audio/ video subsystem and the communication subsystem must be closely integrated to yield a sufficient performance. Keywords: multimedia, performance, monitoring, modelling, audio/video, video distribution The demand to enrich the man-machine interface of computer systems by the integrated application of different presentation media like text, graphics, image, audio, video and animation has led to the development of multimedia systems, a development effectively trans- forming the computer from a computing instrument *IBM European Networking Center, Broadband Multimedia Com- munications Department, PO Box 103068, D-69020 Heidelberg, Germany (email:{cramer, lu}@heidelbg.vnet.ibm.com) ‘Computer Science Dept. (IMMD 7), University of Erlangen-Niirn- berg, Martensstr. 3, Germany (email: rhof- [email protected]) Paper received: 5 July 1993; revised paper received: 4 January 1994 into a communication instrument. Multimedia systems will enable a plethora of new applications from information kiosks to computer-based training systems to integrated desktop conferencing stations comprising live audio/video and shared applications. In the framework of multimedia systems develop- ment, it has been a prevailing goal to arrive at fully digital solutions for audio and video. Though analogue solutions are feasible for local audio/video playback, they would require a costly separate network in a distributed environment, at least in the wide area case. Obviously, only compressed video can be used; uncom- pressed video would require a very high communication bandwidth and, apart from technical limitations, increase usage costs to an unacceptable degree. The mentioned presentation media can be grouped into discrete media such as text and graphics, which are time-invariant, and continuous media such as audio and video, where the continuous change of presentation values over time contributes to the semantics of the medium. The correct handling of continuous media introduces the need for real-time processing into work- station and PC operating systems, and computer networks. These systems have so far been designed under the assumption that a fair share of the available resources (network bandwidth, processing time, buffer memory, etc.) has to be granted to all requesting activities. Real-time processing, on the other hand, implies that, before starting a real-time stream, it is made sure that the required amount of resources is available, that it is reserved for exclusive use by this stream, and that all real-time activities are given priority over non-real-time activities in their access to these resources. As Herrtwich’ outlines, both non-real- time and real-time environments are not new in themselves, but for the design of multimedia systems the challenge lies in the combination of both environ- ments. 262 0140-3664/95/$09.50 0 1995-Elsevier Science B.V. All rights reserved computer communications volume 18 number 4 april 1995

Upload: andreas-cramer

Post on 21-Jun-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Performance measurements on the Heidelberg audio/video distribution system: methodology and results Andreas Cramer*, Richard Hofmatin+ and Norbert Luttenberger*

The integration of continuous presentation media like audio and video into current workstation and networking environ- ments imposes tight timing constraints on A/V data transfer. These constraints are summarized under the term Quality of Service (QoS), which comprise a number of parameters. Among those, delay and delay jitter are the most important. In the study presented in this paper, measurement and modelling methods have been applied to analyse the delay and delay jitter incurred by audio/video streams. An experi- mental audio/video distribution system called HeiDi (Heidel- berg A/V Distribution System) intended for the fully digital distribution of compressed, packetized audio and video over a token ring local area network served as testbed for the analysis. Measurements were conducted with the ZM4 (abbreviation for the German name ZBhlmonitor 4) moni- toring system developed at the University of Erlangen- Niirnberg especially for performance analysis of distributed systems. For modelling, Monitoring Petri Nets are used, a class of Petri nets which lends itself well for the integration of measurement and modelling. Results indicate that the audio/ video subsystem and the communication subsystem must be closely integrated to yield a sufficient performance.

Keywords: multimedia, performance, monitoring, modelling, audio/video, video distribution

The demand to enrich the man-machine interface of computer systems by the integrated application of different presentation media like text, graphics, image, audio, video and animation has led to the development of multimedia systems, a development effectively trans- forming the computer from a computing instrument

*IBM European Networking Center, Broadband Multimedia Com- munications Department, PO Box 103068, D-69020 Heidelberg, Germany (email:{cramer, lu}@heidelbg.vnet.ibm.com) ‘Computer Science Dept. (IMMD 7), University of Erlangen-Niirn- berg, Martensstr. 3, Germany (email: rhof- [email protected]) Paper received: 5 July 1993; revised paper received: 4 January 1994

into a communication instrument. Multimedia systems will enable a plethora of new applications from information kiosks to computer-based training systems to integrated desktop conferencing stations comprising live audio/video and shared applications.

In the framework of multimedia systems develop- ment, it has been a prevailing goal to arrive at fully digital solutions for audio and video. Though analogue solutions are feasible for local audio/video playback, they would require a costly separate network in a distributed environment, at least in the wide area case. Obviously, only compressed video can be used; uncom- pressed video would require a very high communication bandwidth and, apart from technical limitations, increase usage costs to an unacceptable degree.

The mentioned presentation media can be grouped into discrete media such as text and graphics, which are time-invariant, and continuous media such as audio and video, where the continuous change of presentation values over time contributes to the semantics of the medium. The correct handling of continuous media introduces the need for real-time processing into work- station and PC operating systems, and computer networks. These systems have so far been designed under the assumption that a fair share of the available resources (network bandwidth, processing time, buffer memory, etc.) has to be granted to all requesting activities. Real-time processing, on the other hand, implies that, before starting a real-time stream, it is made sure that the required amount of resources is available, that it is reserved for exclusive use by this stream, and that all real-time activities are given priority over non-real-time activities in their access to these resources. As Herrtwich’ outlines, both non-real- time and real-time environments are not new in themselves, but for the design of multimedia systems the challenge lies in the combination of both environ- ments.

262 0140-3664/95/$09.50 0 1995-Elsevier Science B.V. All rights reserved computer communications volume 18 number 4 april 1995

Page 2: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Real-time requirements for continuous media processing together with requirements concerning allowable error rates, etc. are often summarized under the term Quality of Service (QoS). QoS is normally described by a whole set of parameters, of which delay, sometimes also called latency, is one of the most important.

For microphone and camera captured audio/video streams (live audio/video), the term delay obviously denotes the time span that elapses between the capturing and play-out of an individual audio or video frame. A frame is a set of sampled data which describes the recorded scene for one instance of time. For pre-recorded audio/video (stored audio/video) a definition for the term delay is not so obvious, because stored frames have lost their relation to real time. Nevertheless, the correct playback of stored audio/ video requires that a smooth audio/video data stream can be maintained. Therefore, in this type of applica- tion not the delay itself is critical, but its variance, also called delay jitter. In both cases, delay jitter must never exceed certain thresholds such that discontinuities will arise.

In conversational applications, an audio/video delay of more than approx. 250ms can severely annoy conference participants, a phenomenon well known from transatlantic telephone calls. It is therefore very important to carefully tune the audio/video sub-systems of multimedia applications so that a delay below this value is achieved (for a list of related performance parameters cf. Russel12). Tuning first requires the analysis of the different sources of delay.

In this paper, we present a thorough effort to analyse the delay incurred by a live audio/video stream in a distributed multimedia system. This analysis was carried out with the help of a distributed, event-driven, hybrid monitoring system called ZM4, designed at the University of Erlangen-Ntirnberg, Germany. This monitoring system, applied in a certain configuration, allows us to trace the flow of individual frames from a sending system through a network to a receiving system.

The multimedia system we analysed is a prototype audio/video distribution application for both live and stored audio/video. It is called HeiDi (Heidelberg A/V Distribution System), and was designed at IBM’s European Networking Center in Heidelberg, Germany. The underlying network was a 16 Mbit/s token ring. The goals were to demonstrate the feasibility of current networking technology for multimedia applications, to explore the limitations of current communication protocols, to design necessary enhancements, to prove the implementation concept, and finally, to study delay characteristics.

The paper is organized as follows: in the following section, we give a brief overview over HeiDi. We then present the ZM4 monitoring system used for the delay measurements. A formal model for the dynamic behaviour of HeiDi is presented, measurement results are given and some conclusions are drawn.

Heidelberg audio/video distribuion system: A Cramer et al.

HeiDi APPLICATION SCENARIOS AND ARCHITECTURE

Before explaining the HeiDi architecture, we give a brief outline of the two application scenarios which were defined for an audio/video distribution service like HeiDi. We call them ‘Employee Information’ and ‘Production Supervision’.

HeiDi application scenarios

In large enterprises, it is often useful to supplement conventional printed Information and Learning Material with video magazines for, for instance, internal infor- mation, computer-based training, product information, etc. The long-term viability of this approach apparently depends upon the ability to take advantage of already- existing potential resources for distribution of the magazines, in particular workstations and networks inside the company’s offices. From a technical point of view, this means that audio/video information stored on a tile server has to be distributed to a large number of workstations over digital networks. Following the TV distribution paradigm, it was decided for HeiDi that the distribution should take place to all listeners simulta- neously.

For highly-integrated, intelligent Production Super- vision systems, it is required to concentrate all different types of process control information on a small number of displays. Along, for example, the schematic display of the production process, window-based visual process monitoring must be integrated with the display of measurement values and alarm indicators. In contrast with the presentational type of application mentioned above, this scenario comes close to fully interactive multimedia applications like desktop conferencing, because it needs camera-captured, live video.

Both scenarios gave us an excellent framework to study the behaviour of audio/video streams in LAN- connected workstation environments. For deIay measurements, we concentrated on the production supervision scenario.

Environment and configuration

For transport over the 16Mbit/s token ring, video information must be compressed. In the HeiDi context, the Digital Video Interactive (DVI) was used for this purpose3. Though offering a set of programmable compression ratios, DVI was especially designed to produce output at the data rate of a CD player (approx. 1.2 Mbit/s), where it yields high-quality motion video interleaved with audio. DVI supports two different video compression techniques, RTV and PLV. RTV (Real Time Video) is a real-time compres- sion/decompression technique for processing video streams with up to 30 frames/s. The human perceived quality of RTV streams is lower than that of PLV

computer communications volume 18 number 4 april 1995 263

Page 3: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

(Presentation Level Video) streams, which is a proprie- tary, asymmetric compression technique requiring off- line compression, but allowing real-time decompression, also with up to 30 frames/s. The RTV quality can be parameterized but the data rate can increase above the CD level. In both cases, the basic data unit is called a frame, and incorporates audio and video information. For further details, the reader is referred elsewhere4.

In the HeiDi system, the DVI-based ActionMedia- adapter, jointly developed by Intel and IBM, was used for RTV capture and playback and PLV playback. To be usable with window-based user interfaces, the playback function includes mixing of a video stream with the standard graphics output of a workstation display sub-system.

The prototype setup shown in Figure I demonstrates a conceivable configuration for both scenarios described above. It consists of three client workstations and two server workstations connected via a 16 Mbit/s token ring network.

Both servers are IBM PS/2 Model 80 with a 25 MHz Intel 80386 processor running version 1.3 of the OS/2 operating system. One server supplies the clients with stored video programs in compressed, digital form (for the Employee Information scenario). The other server is attached to a remote-controlled camera, and provides the clients with live video (for the Production Super- vision scenario). Both servers distribute their respective video information to the clients by means of a multicast feature (see below).

The three client workstations are IBM PS/2 Model 90 with a 33 MHz Intel 80486 processor, also running OS/2

version 1.3. The display shows the OS/2 Presentation Manager desktop. At any client, one of the two mentioned video streams is shown in a ‘video window’. The video window has a size of 256*240 pixels.

HeiDi communication architecture

Only a brief outline on the HeiDi architecture is given here; for further details, the reader is referred elsewhere5, 6. Instead of a complete new protocol design specialized for audio/video data transfer, it was a HeiDi design goal to keep as far as possible with conventional communication protocols. However, existing protocols and services had to be modified to support multicast and rate enforcement.

In the context of shared medium networks, a multicast facility allows data directed towards a group of recipients to be transmitted only once, so no replicated data exists in the network. To address the group of recipients, the token ring group address mechanism was used.

Rate enforcement is part of a flow control technique, i.e. a way to prevent a fast sender from overrunning a slow receiver. Rate enforcement applies, for example, to a situation where a sender reads a large audio/video tile from disk and transmits it frame-wise to a receiver with a fixed-rate audio/video display system. If both sender and receiver have agreed upon a defined rate and allocated the required resources, rate enforcement makes window-oriented flow control schemes super- fluous, and thus helps to smoothen the flow of data

Figure 1 Configuration

264 computer communications volume 18 number 4 april 1995

Page 4: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

between sender and receiver. In the supervision scenario rate enforcement was turned off.

The communication subsystem of HeiDi has a layered structure according to the OS1 reference model comprising layers l-4, plus an application-specific layer on top of the transport layer in support of audio/video distribution.

The Physical, Medium Access Control and Data Link

layers are implemented on the token ring adapters. They are based on the IEEE 802.5 token ring and IEEE 802.2 LLC type 1 and 2 definitions. The token ring architec- ture is flexible enough to ensure smooth, continuous delivery of audio/video, even over a heavily loaded ring when supported by a bandwidth management system using MAC priorities and bandwidth reservation’. The connectionless (LLC type 1) protocol variant was applied for multicast transfer of audio/video streams. To hide the differences between different networks and to simplify the implementation of the network layer, a generic Data Link Service (DLS) interface was developed on top of the LLC interface.

For the Network Service, a modified X.25 packet level protocol is used. From the OS1 standardized version, all error recovery functions were removed, because in most cases, a retransmission of a lost or erroneous packet cannot be performed before the deadline of the data packet expires. Packets out of sequence or erroneous packets are simply dropped and no re-ordering occurs. Thus, NSDUs are guaranteed to be in sequence, but there is no guarantee that all of them will be delivered. This reduced X.25 protocol was subsequently enhanced to support multicast data streams.

For the Transport Service, two IS0 transport protocols were used: IS 8073 (classes 0 and 1) for connection-oriented and IS 8602 for connectionless communication. The transport service was enhanced by ‘association’-oriented service elements in support of a multitarget service’ and by a rate enforcement technique for precompressed audio/video streams. The service is unreliable in the sense that data given to the transport service at the sender side are not guaranteed

Heidelberg audio/video distribuion system: A Cramer et a/.

to be delivered to active group recipients. It is, however, guaranteed that only complete, error-free and in- sequence TSDUs (Transport Service Data Units) are delivered, although possibly with gaps (i.e. TSDUs may be missing). The mentioned rate enforcement is incor- porated in the transport service interface at the sender side. To interleave audio/video data with status information about the video sequence, a Continuous

Media Communication Service and the related protocol were designed. The mentioned status information comprises setup parameters (compression algorithm and algorithm specific parameters) and additional information on, for example, the title of the clip, its total duration and the current position in time. All this information is sent in-band, a single stream of data packets merged with the audio/video data stream.

HeiDi implementation architecture

Figure 2 shows the layering of the HeiDi building blocks in a client/server configuration and indicates their interaction. The top-level components (User Interface) manage the interface to the human user,.and are of no concern for the dynamic behaviour of the system.

In the next layer, we find the audio/video stream

handlers, which were developed in an object-oriented methodology. The essential abstractions being provided by different classes are sink, source and stream. A source object generates data, applying one of the methods provided by the layer below. In a similar manner of abstraction, a sink object consumes data. Once opened, sink and source objects are connected by establishing a stream object. Stream objects define the interaction between source and sink objects and control the data transfer between them. This methodology makes it easy to develop different types of applications just by configuring various source and sink objects with stream objects.

In the following, the applied methods are explained. AVSS here denotes a tile prefetcher for tiles in the AVSS file format reading stored audio/video data from

Figure 2 Sender and receiver config- urations

Heidi Server Heidi Client ,~~~..~~....~.~.~~~~.........~...~~,, .._. UwIntorh ..::

mtlg PM

computer communications volume 18 number 4 april 1995 265

Page 5: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et a/.

a hard disk and caches them for later use. AVK is the Intel supplied Audio Video Kernel offering an Applica- tion Programmer’s Interface to control the real-time capturing and feeding of compressed audio/video data from/to the ActionMedia- adapter. The AVK maintains internal buffers and indicates their state to the stream handler processes.

The HeiDi communication sub-system can be divided into network access and protocol processing compo- nents. Network access is handled by the token ring device driver. The protocol processing components offer the Data Link Service, the Network Service and the Transport Service to the audio/video stream handlers.

Not shown in Figure 2 is a buffer management system (BMS) which provides virtual data copy (i.e. pointer propagation) instead of physical data copy to all other components once data has been placed in a BMS administered buffer. In addition, the BMS gives a coherent view on buffers irrespective of their location in main memory or adapter memory, and it ensures that all processes use buffers in the same manner.

HeiDi dynamic bebaviour

Very important for the dynamic behaviour of the system is the mapping of the mentioned components onto OS/2 sessions and threads. The dotted ellipses in Figure 2 indicate which components run together in one thread.

Threads responsible for the real-time processing of audio/video data, run inside a real-time environment (RTE). The boundary between this real-time environ- ment and the conventional non real-time environment (NRTE) where user interface processes run is included in Figure 2. In OS/2, a basic support for establishing the RTE and NRTE is given by the operating system priority classes and levels in each class. The threads being part of the RTE are placed in a higher priority class.

We now describe the flow of data for a live video application starting with data capturing, followed by the transport over the network, and ending with data presentation.

On the server side, the stream handler is in charge of pumping audio/video data from the ActionMedia- adapter to the network. A transport cycle starts with the stream handler requesting a video frame from the AVK. The AVK copies all available frames from ActionMedia- adapter internal buffer into the buffer of the stream object. This is a physical copy operation, because the AVK does not make use of the BMS virtual copy mechanism. If the AVK has no previously compressed video frame available, the stream handler suspends itself for one frame time (approx. 30ms given a rate of 30frames/s), and the procedure is repeated. After obtaining video data, the stream handler acquires audio data using the same sequence of operations.

After receiving both audio and video data, the stream handler merges them into a single frame together with an application protocol header. The frame is subse- quently transmitted to the sink object, which at the

sending station encapsulates the transport service interface.

The transport layer breaks up the whole frame into a number of PDUs with a maximum size of 4296 bytes. All PDUs (Protocol Data Units) are propagated one by one through the network and data link layers and, finally, enqueued in the device driver queue. After processing a complete frame, the stream handler continues polling the source object for further frames.

The device driver resumes activity when a PDU is enqueued. It interrupts the execution of the stream handler because it has the highest static priority in the system. The device driver physically copies a PDU onto the Token Ring adapter, which then transmits it to the clients.

Processing at the client-side starts with the network device driver thread physically copying a PDU from the network adapter and enqueueing it to the receive thread. The receive thread resumes activity after the enqueue operation, and passes the PDU through the different protocol layers for processing. After a complete frame has been reassembled in the transport layer, it is enqueued to the stream handler. The stream handler dequeues frames and gives them to the sink object, running AVK to display the video. After successful delivery of a frame to the sink object, the stream handler polls the source object. Should no frame be available, it suspends itself. It is resumed after a frame has arrived. If the AVK cannot immediately accept a new frame, because its internal buffer is exhausted, the stream thread will suspend itself for about a frame’s time. In the client system, the network device driver thread has the highest, the stream handler thread the lowest, priority.

PERFORMANCE MEASUREMENT METHODOLOGY

Methodological issues

With the aim of gaining an insight into the dynamic behaviour of a distributed system, event-driven moni- toring is a suitable method. Event-driven monitoring represents the dynamic behaviour of a system by a sequence of events. An event is an atomic timeless action. In contrast to event-driven monitoring is the method of time-driven monitoring or sampling, which only provides statistical information about system execution, and is therefore insufficient for behaviour analysisg.

The actual generation of events depends upon the monitoring technique used. As hybrid monitoring combines source-related event specification with a small interference on the object system’s behaviour, we decided to use this monitoring technique. In hybrid monitoring, the events are generated by monitoring instructions called ‘event statements’, which are inserted into the object source code at user selected

266 computer communications volume 18 number 4 april 1995

Page 6: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

points, a process called instrumentation. These instruc- with a control and evaluation computer (CEC) as the tions write an event token to a hardware port which is master and a scalable number of monitor agents (MA) accessible by an external hardware monitor. as slaves (see Figure 3).

Whenever the monitor device recognizes an event, it stores an event record. An event record comprises at least the event token stating the event id, the information where it happened, and a time stamp. The time stamp is generated by the external hardware monitor and repre- sents the acquisition time of the event record. Beside these fields, an event record can contain optional fields describing additional aspects of the event occurred. The sequence of events is stored as an event trace.

The distance between the CEC and the MAs can be up to 1000 meters. The CEC controls the measurement activities of the MAs, stores the measured data and provides the user with a powerful and universal toolset for event trace processing (see the following subsection).

So far, we have described a method to collect a local event trace from each node of a distributed system. But as the nodes of a distributed system exchange informa- tion, the locally generated event traces are effectively interrelated, a fact that is not necessarily revealed in a local event trace. To get a global view of the behaviour of a distributed system it is, therefore, required to merge all local event traces into one global event trace. The causal relationship between SEND and RECEIVE events can help to establish a global ordering of events”, but if monitoring is to provide not only a correct sequence of events but also performance figures, it is necessary to introduce time. There are mechanisms to estimate global time from local observations in a distributed system”’ 12. While these estimates of a global time correctly reveal all causally related events, they only have a precision limited to the communication delay between the underlying processes. As this precision was not sufficient for performance figures, we decided to enhance the local event collection mechanism as described above with a facility for global time distribution. Apart from higher precision, it makes our monitoring system applicable also for parallel systems using shared memory’3* 14.

The MAs are standard PC/AT-compatible machines interconnected via a LAN, i.e. Ethernet with TCP/IP, forming the ZM4 data channel to distribute control information and to collect measured data. Each MA is equipped with up to four dedicated probe units (DPU), which link the MA and the nodes of the object system. The MAs control the DPUs and buffer the measured event traces on their local disks. The DPUs are responsible for event recognition, time stamping, event recording and for high-speed buffering of event traces. In the case of hybrid monitoring, the object system itself normally presents the event token in a form suitable for the DPU. In this case, no special event detector is needed, and the DPU has to be interfaced to the object system only electrically and mechanically.

A recognized event token together with its time stamp is written into a high-speed FIFO buffer located on the DPU within one 1OOns cycle of the globally synchro- nized clock. This buffer has a depth of 32K event records and helps to overcome the restricted bandwidth (10,000 events/s) of the MA’s local disk for bursts of events. The FIFO buffer can be read while monitoring is in progress. This enables continuous monitoring, i.e. there is no restriction on the length of a trace.

ZM4 hardware architecture and configuration

The monitor system used for the HeiDi performance analysis is called Zahlmonitor 4 (ZM4). It was designed and _ implemented at the University of Erlangen- Ntirnberg. ZM4 is structured as a master/slave system

To all DPU time-stamping clocks, synchronization information is distributed via the ZM4 tick channel from the central measure tick generator (MTG). To achieve a guaranteed global precision of 100 ns for time stamping and a clock skew of less than 5ns, a sophisticated two-level fault-tolerant synchronization scheme was developed which is supported by the tick channel protocol. On each DPU, a basic-PLL level filters the synchronization signal on the tick channel, and thereby ensures that the clocks on all DPUs proceed at exactly the same rate, whereas a token level

Heidelberg audio/video distribuion system: A Cramer et a/.

minimal configuration I

datachannel

Figure 3 Distributed architecture of ZM4

distributed object system

computer communications volume 18 number 4 april 1995 267

Page 7: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

is responsible for the global start and stop of all clocks. To prove the correctness of all time stamps and to correct them in case of an error, the synchronization scheme is enhanced by the concept of synchronization events. With a fixed rate the MTG broadcasts so-called sync tokens, which are recorded by the event recorder in the same manner as regular events from the object system. The time stamp assigned to such a synchroniza- tion event is known as a priori because of the fixed rate for generating them. Time stamps in a physically distributed configuration may be adjusted after the measurement according to the known wire length.

ZM4 event processing tool set SIMPLE

The software tool set SIMPLE (Source-related and Integrated Multiprocessor and -computer Performance evaluation, modeling, and visualization Environment) is aimed at monitoring independent event trace proces- sing, problem-oriented interpretation of event traces, and supporting distributed monitoring. These aims and our approach for reaching them will be described in the remainder of this section.

Monitor independent event trace processing means that event traces of arbitrary format, i.e. traces collected by different monitors, can be processed by the full set of SIMPLE trace processing tools. To achieve this in a flexible manner, an abstraction process was inserted: an event trace is looked upon as a generic abstract data structure, i.e. a sequence of event records with each event record being subdivided into a variable number of record fields. Each field has a certain type from a set of TDL types, and additionally, if an event record field is of the type TOKEN (an enumeration type), to each numerical value a string value is assigned which gives a symbolic name. Based on this approach, a set of generic query and access functions for event trace processing called POET (Problem-Oriented Event Trace interface) was implemented. POET thus forms the interface between the trace data and the trace processing algorithms, which makes the latter independent of particular trace formats. POET receives all necessary information from a key file containing a complete trace and event record description. To make this description human readable, and to guarantee correctness of the key file and its fast decoding, an event trace description language (TDL) together with a TDL compiler (TDLC) was developed.

The same considerations apply for the problem- oriented interpretation of event traces, which means that all SIMPLE trace processing tools (block EVALUA- TION in Figure 4) refer to an event inside the monitored application directly by the same symbolic identifier as used by the application itself (instead of displaying hexadecimal event codes, for example). This is achieved especially by assigning strings to the numeric values of the above mentioned enumeration type TOKEN.

Distributed monitoring typically leads to several disjoint local event traces. To yield a global view on

the dynamic behaviour of the monitored system, all these local event traces are merged into one global trace in order of ascending event time stamp values (MERGE component). This requires an accurate global clock at all local monitors, which is guaranteed by the ZM4 architecture. Together with merging, a key file called syskey is automatically generated for the global truce to allow for access by the POET routines.

Based on the universal event trace access interface described above, a modular open-ended environment of software tools for trace administration, interpretation, filtering, clustering, statistical evaluation, flow-oriented evaluation, modelling and visualization (block EVALUATION) was conceived. This environment houses standard software like the S statistics package” or the Timed Petri Net package GreatSPN16, and in- house implemented tools such as GANTT (drawing state vs. time diagrams), FACT (find activities), TRCSTAT (elementary statistical analysis of event traces)” and AICOS (automatic instrumentation)18. An overview of ZM4/SIMPLE and a comparison with similar projects in the monitoring arena can be found elsewhere”.

DELAY MODELLING

In this section, we develop a formal model describing the flow of frames from a HeiDi server to a HeiDi client. It will serve to analyse the different sources of delay and delay jitter, and thus help to explain the delay measurements. We first introduce a new modelling method, namely Monitoring Nets, then present a net for HeiDi and, finally, discuss the flow through the net.

II pmblem4ented event trace descriptions in TDL

keyfile

pick&! out in~sting events with pmblcm-orimti io~oo dependinB on requests of EVALUATION and VALWATION

EVALUATlON

evaluation with a oomynbeosive tool m*t

Figure 4 SIMPLE architecture

268 computer communications volume 18 number 4 april 1995

Page 8: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

Monitoring nets many tokens in the transition’s output places as given by the multiplicity of the connecting arcs.

Monitoring nets (M-nets) have been derived from generalized stochastic Petri nets (GSPN) as defined by Marsan et ~1.~‘. The main difference to the GSPN definition is that for M-nets we substitute the exponen- tially delayed transition type in GSPN with a transition type having an empirical delay distribution. Transitions of this type show activities, the duration of which is not known and therefore should be measured. In this manner, M-Nets serve the purpose of specifying measurement points.

Transition types are now discussed in more detail. Similar to GSPNs, thefi:ring of a transition in M-nets is delayed after enabling according to a delay parameter Oi which is assigned to each transition ti. Note that tokens remain available for conflicting transitions during the delay time (see below). According to their delay parameters, we distinguish three classes of transitions:

e;=o

A transition of this type, called an immediate transition, fires without delay immediately after enabling. It is graphically represented by a bar.

qi = COnSt.

The firing of a transition of this type, called a deterministic transition, is delayed for a constant amount of time after enabling. It is graphically represented by a filled black box.

P(Bi > t) = F(t)

The firing of a transition of this type, called an empirical transition, is delayed for an empirically distributed amount of time after enabling. It is graphically represented by a white box.

The above given basic firing rules must be enhanced by rules for conflict resolution. Two or more enabled transitions are said to be in conflict if firing one of them disables the other(s). A conflict is resolved in favour of the transition with the shortest remaining delay. A conflict between immediate transitions ti is resolved randomly by assigning branching probabilities pi.

When an enabled transition is disabled during its firing delay by tiring of a conflicting transition, the delay is restarted after the transition is enabled again. A new marking is produced after transition tiring but putting as

Figure 5 Data flow model for HeiDi

HeiDi flow model

In the HeiDi flow model as depicted in Figure 5, we see the flow of DVI frames from the server side (left) to the client side (right). Three main sections can be identified in the model (from left to right): capturing of DVI frames, ‘piping’ of frames through the server threads, the network, the client threads and play-out of frames. The loop-shaped sub-nets for frame capturing and play-out model the fixed rate proces- sing scheme of the ActionMedia- adapter, which is controlled by its clock. These subnets are dominated by a deterministic transition whose delay corresponds to one frame time. The ‘capture loop’ produces frames with a fixed rate, the ‘play-out loop’ consumes them with the same fixed rate (due to an unavoidable difference in the speed of the clocks controlling capturing and play-out, the rates may actually be slightly different). Activity in the capture loop starts after the DVI-internal pipe for inter-frame coding has been tilled (fill compression pipe), which introduces a delay of several frame times from actual camera capturing to the output of the ActionMedia- adapter. (A frame time is the inverse of the frame rate.) Activity in the play-out loop starts after the first frame is passed from the smoothing buffer to the ActionMedia- adapter and some initial processing has been done (+-ansition init) on the adapter. The delay for init is no1 related to the frame time. In case no frames are available in the smoothing buffer because .of increased delay in the software/network pipe, or because of frame loss, discontinuities will arise that decrease the audio/video quality. We assume the delay of the discontinuity transition to be equal to the maximum allowable phase shift of the play-out function. The discontinuity transition should not be seen as a transition which actually models a system activity, it is instead an ‘indicator’ whose firing shows that the upper bound for delay has been violated. The modelling of the software/network pipe is only very rough. So as not to overcomplicate the net we neglected the following aspects:

l As mentioned above, some software threads poll their input buffer. If they find their buffer empty,

capture cycle&

frame IOSS

0 0

l-3 init play-out cycle

dis- COfltiflUity

computer communications volume 18 number 4 april 1995 269

Page 9: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

they suspend themselves to ‘sleep’ for a certain pression methods, namely PLV, ‘high-quality’ RTV amount of time and resume after being ‘woken up’ and ‘normal-quality’ RTV, with ‘high/normal-qual- by a timer signal. This polling delay is not modelled ity’ refering to two different sets of initialization separately but contained in the processing delay of parameters for the ActionMedia- adapter. The clip the stream handler. runs approximately 4 min 46 s. The delay for the network transmission is compre- hended with both the sender and the receiver device drivers’ processing times in the network transition. For the sake of simplicity, we modelled frame loss by attaching a frame loss transition to the client input buffer as a transition in conflict with receive thread. All buffers have a limited, configurable size. When- ever a buffer is full, a frame cannot be forwarded to this buffer. As frames have different sizes, it is not possible to give the buffer capacity in frame units. The competition of threads for the processor resource is not modelled, which means that a performance evaluation of the net as depicted would give better values than reality.

The minimum frame sizes in all three columns are similar, all lying between 500 and 1000 bytes. However, for PLV, the maximum frame is more than 35 times larger than the minimum frame, and also for RTV the max-to-min ratios have remark- able values of more than 13: 1 and 11: 1, respectively. In other clips, maximum frame sizes of 31,020 bytes for PLV and of more than 10,000 bytes for RTV have been observed. The high differences between individual frame sizes are also characterized by the vastly differing values for the average frame size and the median, and by the high standard deviations.

Different frame sizes of this amount cause significant delay jitter because network transmis- sion time and the time required for the five copy operations per frame are strongly related to frame sizes.

Even with these restrictions, the above shown M-net allows us to analyse the different sources of delay and delay jitter. This will be done in the remainder of this section.

Sources of delay and delay jitter

In the model various sources of delay and delay jitter can be identified:

Compression time DVI applies both intra- and inter-frame compres- sion. For inter-frame compression, several frames have to be analysed in sequence and therefore have to be kept on the ActionMedia- adapter. This introduces a delay of several frame times. Network transmission time together with processing time at the server and the client On top of adding a fixed time delay, client/server processing and network transmission especially contribute to delay jitter because processing and transmission times vary with the frame sizes. We therefore analysed the frame sizes and give the results in Table 1. For this analysis, a sample video clip was recorded three times using different com-

Table 1 PLV and RTV frame sizes

The average data rates shown in Table I have been calculated based on the average frame sizes. Although the high-quality RTV data rate is only 0.1 Mbit/s higher than the PLV data rate, a very important fine line lies between them, namely the 1.2 Mbit/s data rate of a CD-ROM player. The high-quality RTV data rate exceeds this rate.

In both PLV and RTV streams, there are two different types of frames. A frame including all the information necessary to be decompressed and displayed is called a ‘reference frame’. The other type of frame resulting from inter-frame compres- sion is called a ‘delta frame’ and contains only the differences from the predecessor frame.

The last three rows show the number of delta frames, the number of reference frames and the ratio between delta/reference frames. These figures are important for error handling. If a reference frame is lost, all following delta frames cannot be used for display; the sequence cannot be restarted before the next reference frame has arrived. If a delta frame is lost, the display is distorted until the next reference frame has arrived. Using PLV compression, a lost frame is followed by an

PLV RTV (high) RTV (normal)

Number of frames 8600 8600 8600 Minimum (bytes) 738 622 622 Maximum (bytes) 26,384 8334 7294 Average (bytes) 4968.38 5365.5(3 2994.97 Median (bytes) 2230 6210 3090 Standard Deviation (bytes) 4810.98 2235.10 1296.03 Average Data Rate (Mbit/s) 1.1924 1.2877 0.7188 Number of Deltas 8433 1426 8199 Numeer of Reference 167 7174 401 Ration Delta/Reference 50.50 0.20 20.45

270 computer communications volume 18 number 4 april 1995

Page 10: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

average of 25 delta frames. Thus, a perceptible error is displayed for approx. 1 s. For normal- quality RTV compression, the average error dis- play time is only one 0.33s. With high-quality RTV, an error is barely perceptible.

Examining PLV and RTV frames sizes in sequential order reveals a cyclic pattern. For PLV, two small delta frames, between 500 and 5000 bytes in size, are followed by one large delta frame having a size between 8000 and 14,000 bytes. Periodically, a reference frame appears which is bigger than these delta frames. For high-quality RTV, five reference frames in the range of 5000- 8400 bytes are followed by one delta with a frame size of 622 or 654 bytes. Normal-quality RTV has the same pattern, except that the bigger frames being in the range of 2000-7000 bytes are also delta frames.

3. Initial processing in the play-out loop Initial processing in the play-out loop also intro- duces a considerable delay into the system. We will discuss this further in the next section.

it is possible to trace the internal behaviour of the server and the client software systems and to deduct the network delay from these values. But unfortunately, the capture and play-out delays incurred in the ActionMedia- adapter cannot be measured directly, since this adapter does not offer a facility for hybrid instrumentation. Therefore we had to design a measure- ment setup for direct measurement of server and client software system activities, and for indirect measurement of the partial delays related to the ActionMedia- adapters.

In our analysis so far, we have identified some delay components that are obviously unavoidable like compression/decompression delay, network transmis- sion delay and frame processing delay. Special care has to be taken for both of the latter components. As they have great variances resulting from vastly differing frame sizes, a smoothing buffer has to be provided which equalizes the delay at the cost of adding delay. The size of this smoothing buffer thus has to be calculated very carefully. When made too large, delay is increased beyond an acceptable upper bound; when made too small discontinuities may arise. Finally, the last delay component mentioned in our list, namely the initial processing in the play-out loop, in itself is not avoidable, but it can be compensated for by discarding those frames from the smoothing buffer that arrive while the initial proces- sing is going on.

For indirect measurement of the capture delay, the fact was used that a sudden change of the scene recorded by the camera results in a significantly different observable frame size. In the measurement setup shown in Figure 6, sudden scene changes were generated by a third computer (the display station) which periodically changed the contents of its display. The camera attached to the server station was directed to this display. The display station was also attached to the ZM4 and was programmed to generate monitoring events with every display change. The ActionMedia- capture delay equals the time elapsing between a display change (indicated by the related event generated by the display station) and the arrival of the first frame with a significantly distinct frame size in the server software system. Unfortunately, no comparably simple way of measuring the play-out delay could be found.

For the measurements which are reported in this paper, a sequence of a blank screen, a screen filled with dots (.), a blank screen, a screen filled with notes (b), a blank screen, a screen filled with the character ‘A’, and a blank screen was generated by the display station. The capture function was configured such that only reference frames should be recorded. This quality is at least as good as the high-quality RTV compression mentioned in the previous section.

HeiDi PERFORMANCE RESULTS

Measurement setup

As delay is most critical for applications including live video, we conducted delay measurements for a contig- uration with live, RTV-compressed video comprising a live server and one client. The total delay can be grouped into:

Figure 7 shows the sequence of frame sizes for one display sequence. The frame sizes have three significant changes, corresponding to the changes of the display contents. The base line, around 5000 bytes, is associated with the blank screen. The first change, up to 7500 bytes, is correlated with the dots, the second with the note sign and the third with the ‘A’. Interesting are the little peaks down to 622 or 654 bytes. There are 14 of these, and they are delta frames included in the measurement, although the ActionMedia- capture function was configured not to produce deltas. The occurrence of delta frames could not be explained by the authors.

l the capture delay in the server Actionmedia- adapter,

0 the frame processing delay (server stream handler, transport service, client stream handler), and

0 the play-out delay in the client Actionmedia- adapter.

With the hybrid monitoring approach presented earlier, Figure 6 Measurement setup

Heidelberg audio/video distribuion system: A Cramer et al.

Monitor

computer communications volume 18 number 4 april 1995 271

Page 11: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

Frame-Size Size Ibyte] x 103

I I I I I I v1.sz 9.50

9.00 1 “h 6.50 t I I

6.00

7.50

7.00 L fl

6.50 -

6.00 -

5.50 -

4.50

4.00

5.wif I

3.50

3.00

2.50

2.W

1.50

1.00

0.50 - , I I I ’ 1 Frame Number

0.00 1w.w 2w.w 300.00 4w.w 500.00

Figure 7 Sequence of frame sizes

Delay evaluation

Following a frame on its way from capture to play-out, we first deal with the capture delay, i.e. the delay from the camera to the stream handler. As only six pattern changes were monitored, only six values for this partial delay could be extracted from the measurement data. These values are in the range from 103&1357ms. Remembering that one frame time here is 33ms, the measured values indicate that the DVI algorithm has three frames permanently in its compression pipe needed for inter-frame compression. The difference between the smallest and the biggest delay value, approx. 30ms, is due to the polling algorithm applied by the stream handler. The stream handler suspends its activity for one OS/2 timer tick in case it does not find a frame in its input buffer. As the OS/2 timer granularity is 3 1.25 ms, polling adds a delay varying between 0 and 3 1.25 ms to the capture delay.

Figure 8 depicts the frame processing delay, including the processing delay in the server, the transport delay and the processing delay in the client. Monitoring was started while the HeiDi system was still idle, so that all initialization operations are contained in the trace. The

Table 2 Partial delays

Time[us]xlo3

Frame Processing Delay

500.00

450.w

400.00

350.00

300.00

250.00

2w.w

150.00

lw.w

50.00

t 0.00 1 t

I I I

0.00 lw.w 200.00

Figure 8 Frame processing delay

300.00 4w.w

lotal.stat

Frame Number

x-axis of Figure 8 indicates the frame number, the y-axis depicts the frame processing delay.

As the graph shows, the first transmitted frame, being a reference frame, is delayed for approx. looms, i.e. three frame times. This results from the above- mentioned capture delay which here becomes visible as initial server stream handler polling delay. On the client side, this frame is used for initializing the play-out process on the ActionMedia- adapter and inside the AVK. This initialization lasts about 500 ms. During this time, all arriving frames are stored in the smoothing buffer because the client stream handler is not able to take a frame out of the smoothing buffer during the initialization process. After the initialization process they are copied from the smoothing buffer in a loop into the AVK buffer until the ‘base-line’ delay of approx. 50 ms is reached.

To explain the following three delay peaks, which are obviously correlated with the frame size changes, it is necessary to investigate the individual parts of the frame processing delay. As indicated above, the total frame processing delay can be subdivided into three disjoint components, namely the server frame processing delay, the transport delay and the client frame processing delay. Table 2 contains the significant values for the distributions of the frame processing delay and its individual parts.

Total Server SH Tramp. Service TR Drv. TR Drv. (w/o cp.)

Minimum of Data Points 423 424 423 913 913 Maximum value (ms) 567.086 II 1.542 41.427 15.150 14.170 Minimum value (ms) 20.473 2.11 7.32 2.099 1.068 Mean (ms) 102.995 18.223 19.674 7.122 3.325 Median (ms) 52.037 13.233 18.617 8.529 3.003 Std. deviation (ms) 98.093 14.819 3.510 1.962 0.888

272 computer communications volume 18 number 4 april 1995

Page 12: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer ef a/.

Column ‘Server SH’ gives values for the delay within the server stream handler. During this time, the stream handler polls AVK for video and audio data. They are merged into one frame and passed to the transport service. The server stream handler time has a large maximum value of approx. 111 ms, but the mean value of 18 ms and the relatively low standard deviation indicate that this maximum value is not representative for the distribution. The standard deviation of 14.8ms results from the mentioned timer tick granularity becoming effective in the polling loop.

The end-to-end transport delay (column ‘Transp. Service’) covers the time elapsing between the transport service call at the server side until the frame is enqueued in the smoothing buffer by the transport service on the client side. The measurements were made on an empty token ring. The figures with a mean value of 19.7ms and a median of 18.6ms are rather low. Even the standard deviation of 3.5ms is remarkably low. This indicates that the transmission of the frames over the network does not increase the total delay significantly.

Although the delay values for the transport service are very low, it is interesting to have a closer look at the delay which is associated with the token ring adapters (last two columns of Table 2). Note that the number of token ring packets is greater than the number of transport service TSDUs because of segmentation within the transport layer.

The partial delay related to token ring transmission (column ‘TR Drv.‘) includes copying a packet to the token ring adapter at the sender side, transmission of the packet over the token ring, and copying the packet off the token ring adapter at the receiver side. The mean value and the median for token ring transmission delay are very low, approx. 7.1 ms and 8.5ms per packet, respectively. These figures can be compared with those excluding both driver copy operations (column ‘TR Drv. w/o cp.‘). Here, the mean value and the median are 3.3ms and 3.0ms, respectively. The transmission time for a maximal-sized (4300 byte) packet over a 16 Mbit/s line is over 2ms. The pure processing overhead of the sending and receiving token ring adapters is about 1 ms, as indicated by the minimum delay value.

From the values given in Table 2 it appears that the biggest partial delay is neither the delay within the server stream handler nor the end-to-end transport delay, but the client frame processing delay. An in- depth analysis has shown that this partial delay is dominated by the time during which a frame waits in the smoothing buffer until it can be copied into the AVK buffer. To analyse this delay, Figure 9 shows the number of frames in the smoothing buffer over time. It is obvious that the appearing pattern of behaviour is directly correlated with the pattern of the frame processing delay in Figure 8.

Immediately after the start, the number of frames contained in the smoothing buffer steeply increases to 16 frames. This behaviour has the same explanation as before, namely the AVK and the ActionMedia- initialization. During the initialization phase, arriving

Figure 9 Number of frames in the smoothing buffer

frames are queued in the smoothing buffer and are neither processed by AVK nor by the ActionMedia- adapter.

The three following peaks are obviously directly correlated to the increases of the frame sizes as shown in Figure 7. A closer look at the event traces reveals that the number of frames in the smoothing buffer starts to increase each time after an ‘AVK-Full’ message has been received by the stream handler, indicating that the AVK buffer is full. The size of the AVK buffer was configured to 128 Kbyte. Figure 7 shows that the mean frame size in the initial delay phase is about 5000 byte, which means that more than 16 frames (cf. Figure 9) are stored in the AVK buffer in this phase (16*5000 = 80,000 byte). But when the frame size increases with pattern changes, the total amount of data jumps over the 128 Kbyte limit so that an ‘AVK-Full’ message is generated. The delay from the camera to the screen is not really increased, but it becomes visible now because the smoothing buffer fills up as the limited AVK buffer space is exhausted.

At the end of this section, the results will be summarized. Although the frame processing delay is in the 50ms range (cf. the median value of Table 2) all delay components add up a total delay of more than 20 frame times (660ms). Most of the delay is due to the initialization delay. This delay lets all system buffers fill up. It could be observed that there are three frames in the compression pipe, at least one frame in the software system, and at least 16 frames in the smoothing and AVK buffers. From this initial delay, the system can never recover as both the capture and the play-out loop have the same frame rate, i.e. the sum of frames in the smoothing buffer and the AVK buffer will remain the same all over the session, which means that the initial delay of approx. 650ms will never be reduced. This delay is definitely too high for a dialogue system. In video conferencing, end-to-end delay should be less than 250ms, which means that the maximum total buffer capacity should not exceed approx. 8 frames2 if the frame rate is 30 frames/s.

From these investigations it should be clear that delay, once introduced into the system, can only be

computer communications volume 18 number 4 april 1995 273

Page 13: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

reduced by adjusting the relative speed of sender and receiver or more easily by dropping frames. The latter means that discontinuities are deliberately introduced into the system in favour of delay reduction. The frame dropping mechanism should be implemented in the receiver software. It is easier to drop frames here because the receiver system has all the information concerning the total delay of the system. Since multiple clients can be connected to a single video stream, it is obvious that the server should not decrease its data rate by skipping frames, otherwise at some other clients the A/V adapters could run empty and the video would start stuttering. The frame dropping mechanism must preferably drop unimportant frames, i.e. deltas which are immediately followed by reference frames. To give the stream handler better control over the buffering behaviour, the size of the AVK buffer should be decreased to a size where only a very small number of frames can be buffered.

CONCLUSIONS

The HeiDi system represents an important first step towards integrated, distributed multimedia systems usable within the existing workstation and networking environments. An essential technical achievement of the HeiDi system is the integration of audio and video into a conventional networking and workstation environ- ment. Audio and video use the same storage media, processing environment and network as do data, text and graphics.

During the development of the HeiDi system, it became soon obvious that the dynamics of such a system cannot fully be understood without the help of a distributed monitor like the ZM4. The measurements, conducted with the ZM4, were focused on delay analysis. They revealed that a major delay emerged at a point in the transmission pipe where it had not been anticipated.

But beyond stating the use of performance analysis for understanding the behaviour of multimedia systems, it should be emphasized here that the presented experience also illustrates that a shift in its goals can be observed. Performance analysis studies undertaken for ‘conventional’ data processing systems mostly aimed at maximizing their throughput and/or minimizing their response time; for multimedia systems, the aim of performance analysis has shifted towards assuring the timely delivery of data regardless of the achieved throughput or response time.

In our example, the timely delivery of continuous media data, i.e. the delivery with minimal delay, requires that the smoothing buffer on the receiver side has to be controlled very tightly.

Three conclusions can be drawn from that:

1. To achieve minimal delay, it is required to discard ‘old’ frames from the smoothing buffer. This makes the different paradigms of data and audio/video

transmission very obvious, because in ‘pure’ data transmission, it would be inconceivable that under normal circumstances data packets are discarded intentionally. The upper bound for the number of audio/video frames to be discarded is the number of discontinuities that human perception tolerates. The smoothing buffer is allocated between the receive thread acting as a ‘network stream handler’, and the display stream handler. On the architectural level, there must be a decision as to which of the stream handlers actually implements the strategy for hand- ling the smoothing buffer. Additionally, common buffer handling strategies like FIFO must be en- hanced for stream handling; sample strategies in this direction include, for example, FIFO forward all, FIFO forward latest only with given threshold, etc. Presentational and conversational audio/video re- quire different dimensioning of the smoothing buffer, and also the buffer handling strategies are different. In the interests of a small delay, all frame buffers must be configured to the bare minimum for conversational applications.

Issues for further research can be identified as follows:

1.

2.

3.

For full’ delay analysis, a measurement interface allowing us to track individual frames from source (i.e. camera) to sink (i.e. display) is required. Further research is required for establishing a synchronization relationship between continuous and discrete media. In a computer supported desk- top videoconference, e.g. the movement of a pointer within a diagram must be synchronized to an audio/ video stream when, for example, in conference between an architect and his/her customer the floor plan of a building is explained2’. The measurements have shown significantly varying DVI frame sizes. This imposes some problems for the Quality-of-Service negotiations in, for example, ATM networks. Either the problem is solved by over-reservation of bandwidth or, given that actual traffic characteristics are known, a best-effort approach is sufficient. The latter requires the care- ful study of traffic patterns in different types of networks.

Much of the progress to be achieved in the indicated directions depends upon the availability of working prototypes. The whole area of multimedia will not come to maturity without practical experiments on these prototypes. The interaction of both multimedia developers and performance analysts is required in all this work.

ACKNOWLEDGEMENTS

We gratefully acknowledge Ralf Steinmetz (ENC) and Rainer Klar (IMMD 7) for their valuable comments, and Frank Markgraf and Stefan Mengler (ENC) for their practical support.

274 computer communications volume 18 number 4 april 1995

Page 14: Performance measurements on the Heidelberg audio/video distribution system: methodology and results

Heidelberg audio/video distribuion system: A Cramer et al.

REFERENCES

1

2

3

4

5

6

I

8

9

10

I1

Herrtwich, R G The HeiProjects: Support for Distributed Multi- media Applications, IBM ENC Technical Report No. 43.9206, Heidelberg (1992) Russell, J ‘Multimedia Networking Performance Require- ments’, Proc. TriComm., Plenum Press, New York (1993) pp 187-197 Luther, A C Digital Video in the PC Environment, McGraw-Hill, New York (1991) Green, J L ‘The Evolution of the DVI System Software’, Commun. ACM, Vol35 No 1 (January 1992) pp 52-67 Cramer, A et al. The Heidelberg Multimedia Communication System: Multicast, Rate Enforcement and Performance on Single User Workstations, IBM ENC Technical Report No. 43.9212, Heidelberg (1992) Cramer, A, Farber, M, McKellar, B and Steinmetz, R ‘Experiences with the Heidelberg Multimedia Communication System’, in Danthine, A and Spaniol, 0 (eds), High Performance Networking, D4- I-20 (I 992) Nagarajan, R and Vogt, C Guaranteed-Performance Transport of Multimedia Traffic over the Token Ring, IBM ENC Technical Report No. 43.9201, Heidelberg (1992) Ravindran, K, Sankhla, M and Gupta, P ‘Multicast Models and Routing Algorithms for High-Speed Multi-Service Networks’, IEEE Znt. Conf Distrib. Comput. Syst., Yokohama Japan (June 1992) Klar, R ‘Hardware/Software-Monitoring’, Znformatik-Spektrum (Das aktuelle Schlagwort) , Vol8 (I 985) pp 3740 Lamport, L ‘Time, Clocks, and the Ordering of Events in a Distributed System’, Commun. ACM, Vol 7 No 21 (July 1978) pp 5588.565 Duda, A, Harrus, G, Haddad, Y and Bernard, G ‘Estimating

I2

I3

14

15

I6 17

18

19

20

21

Global Time in Distributed Systems’, Distributed Systems, Proc. 7th Znt. Conf., Berlin, Germany (September 1987) Hofmann, R ‘Globale Zeitskala fur lokale Ereignisspuren’, in Walke, B and Spaniol, 0 (eds), Messung, Moaizlherung und Bewertung von Rechen- und Kommunikationssystemen, Confer- ence, Aachen, Germany (September 21-21 1993) pp 333-345 Luttenberger, N Monitoring of Multiprocessor and Multicomputer Systems, PhD Thesis, Arbeitsberichte des Instituts fur Mathema- tische Maschinen und Datenverarbeitung der Universitlt Erlangen-Niirnberg (I 989) Hofmann, R Gesicherte Zeitbeztige fur die Leistungsanalyse in parallelen und verteilten Systemen, PhD Thesis, Arbeitsberichte des IMMD, Unversitlt Erlangen-Nilrnberg (1993) Becker, R A, Chambers, J M and Wilks, A R The New S Language, A Programming Environment for Data Analysis and Graphics, Wadsworth, Pacific Grove (1988) Chiola, G GreatSPN Users’ Manual (1987) Mohr, B SIMPLE- User b Guide Version 5.3, Technical Report 3/92, CS Dept. 7, University of Erlangen-Nilrnberg (1992) Dauphin, P, Hartleb, F, Kienow, M, Mertsiotakis, V and Quick, A PEPP: Performance Evaluation of Parallel Programs- User’s Guide, Technical Report 5/92, CS Dept. 7, University of Erlangen-Ntirnberg (April 1992) Mohr, B Ereignisbasierte Rechneranalysesysteme zur Bewertung paralleler und verteilter Systeme, PhD Thesis, VDI Verlag, Fortschritt-Berichte, Reihe 10, Nr. 221 Ajmone Marsan, M, Balbo, G and Conte, G ‘A Class of Generalized Stochastic Petri Nets for the Performance Evalua- tion of Multiprocessor Systems’, ACM Trans. Comput. Syst., Vol 2 No 2 (May 1984) pp 93-122 Steinmetz, R ‘Synchronization Properties in Multimedia Systems’, IEEE J. Selected Areas in Commun., Vol 8 No 3 (April 1990) pp 401412

computer communications volume 18 number 4 april 1995 275