ieee transactions on parallel and distributed …schmidt/pdf/tpds.pdfreal-time embedded...

13
The Design and Performance of Real-Time Java Middleware Angelo Corsaro, Student Member, IEEE, and Douglas C. Schmidt, Member, IEEE Abstract—More than 90 percent of all microprocessors are now used for real-time and embedded applications. The behavior of these applications is often constrained by the physical world. It is therefore important to devise higher-level languages and middleware that meet conventional functional requirements, as well as dependably and productively enforce real-time constraints. This paper provides two contributions to the study of languages and middleware for real-time and embedded applications. We first describe the architecture of jRate, which is an open-source ahead-of-time-compiled implementation of the RTSJ middleware. We then show performance results obtained using RTJPerf, which is an open-source benchmarking suite that systematically compares the performance of RTSJ middleware implementations. This paper shows that, while research remains to be done to make RTSJ a bullet-proof technology, the initial results are promising. The performance and predictability of jRate provides a baseline for what can be achieved by using ahead- of-time compilation. Likewise, RTJPerf enables researchers and practitioners to evaluate the pros and cons of RTSJ middleware systematically as implementations mature. Index Terms—Real-time middleware, real-time Java, QoS-enabled middleware platforms, object-oriented languages, real-time resource management, performance evaluation. æ 1 INTRODUCTION 1.1 Current Challenges T HE vast majority of all microprocessors are now used for embedded systems, in which computer processors control physical, chemical, or biological processes or devices in real-time. Examples of such systems include telecommunication networks (e.g., wireless phone services), telemedicine (e.g., remote surgery), manufacturing process automation (e.g., hot rolling mills), and defense applications (e.g., avionics mission computing systems). These real-time embedded systems are increasingly being connected via wireless and wireline networks. Designing real-time embedded systems that implement their required capabilities are dependable and predictable, and are parsimonious in their use of limited computing resources is hard; building them on time and within budget is even harder. Moreover, due to global competition for marketshare and engineering talent, many companies are now also faced with the problem of developing and delivering new products in short timeframes. It is therefore essential that the production of real-time embedded systems can take advantage of languages, middleware, tools, and methods that enable higher software productiv- ity, without unduly degrading the quality of service (QoS). 1.2 The State of the Art Many real-time embedded systems are still developed in C, and increasingly in C++. While writing in C and C++ is more productive than assembly code, they are not the most productive or error-free programming languages. A key source of errors in C/C++ stems from their memory management mechanisms, which require programmers to allocate and deallocate memory manually. Moreover, C++ is a feature rich, complex language with a steep learning curve, which makes it hard to find and retain experienced real-time embedded developers who are trained to use it well. Real-time embedded software should ultimately be synthesized from high-level specifications expressed with domain-specific modeling tools [1]. Until those tools mature, however, a considerable amount of real-time embedded software still needs to be programmed by software developers. Ideally, these developers should use programming languages and middleware that shield them from many accidental complexities, such as type errors, memory management, real-time scheduling enforcement, and steep learning curves. Java [2] has become an attractive choice because of its rapidly growing programmer base, its simplicity, its safety, and especially for its cheaper main- tenance cost when compared to C/C++. Conventional Java implementations are unsuitable for developing real-time embedded systems, however, mostly due to the fact that 1) the scheduling of Java threads is purposely under- specified, 2) Java is a garbage collected language and most of the precise garbage collectors known in literature [3] are not suitable for real-time systems, and 3) Java provides coarse-grained control over memory allocation and access, i.e., it allows applications to allocate objects on the heap, but provides no control over the type of memory in which objects are allocated. To address these problems, the Real-Time Java Experts Group has defined the RTSJ [4], which provides the following capabilities: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003 1 . A. Corsaro is with the Department of Computer Science and Engineering, Washington University, 1 Brookings Drive, Box 1045, St. Louis, MO 63130. E-mail: [email protected]. . D.C. Schmidt is with the Institute for Software Integrated Systems, Vanderbilt University, Nashville, 1829, Station B, Vanderbilt University, Nashville, TN 37235. E-mail: [email protected]. Manuscript received 8 Dec. 2002; revised 5 Aug. 2003; accepted 5 Aug. 2003. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 118752. 1045-9219/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society

Upload: others

Post on 06-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

The Design and Performance ofReal-Time Java Middleware

Angelo Corsaro, Student Member, IEEE, and Douglas C. Schmidt, Member, IEEE

Abstract—More than 90 percent of all microprocessors are now used for real-time and embedded applications. The behavior of these

applications is often constrained by the physical world. It is therefore important to devise higher-level languages and middleware that

meet conventional functional requirements, as well as dependably and productively enforce real-time constraints. This paper provides

two contributions to the study of languages and middleware for real-time and embedded applications. We first describe the architecture

of jRate, which is an open-source ahead-of-time-compiled implementation of the RTSJ middleware. We then show performance results

obtained using RTJPerf, which is an open-source benchmarking suite that systematically compares the performance of RTSJ

middleware implementations. This paper shows that, while research remains to be done to make RTSJ a bullet-proof technology, the

initial results are promising. The performance and predictability of jRate provides a baseline for what can be achieved by using ahead-

of-time compilation. Likewise, RTJPerf enables researchers and practitioners to evaluate the pros and cons of RTSJ middleware

systematically as implementations mature.

Index Terms—Real-time middleware, real-time Java, QoS-enabled middleware platforms, object-oriented languages, real-time

resource management, performance evaluation.

1 INTRODUCTION

1.1 Current Challenges

THEvast majority of all microprocessors are now used forembedded systems, in which computer processors

control physical, chemical, or biological processes ordevices in real-time. Examples of such systems includetelecommunication networks (e.g., wireless phone services),telemedicine (e.g., remote surgery), manufacturing processautomation (e.g., hot rolling mills), and defense applications(e.g., avionics mission computing systems). These real-timeembedded systems are increasingly being connected viawireless and wireline networks.

Designing real-time embedded systems that implementtheir required capabilities are dependable and predictable,and are parsimonious in their use of limited computingresources is hard; building them on time and within budgetis even harder. Moreover, due to global competition formarketshare and engineering talent, many companies arenow also faced with the problem of developing anddelivering new products in short timeframes. It is thereforeessential that the production of real-time embeddedsystems can take advantage of languages, middleware,tools, and methods that enable higher software productiv-ity, without unduly degrading the quality of service (QoS).

1.2 The State of the Art

Many real-time embedded systems are still developed in C,and increasingly in C++. While writing in C and C++ is

more productive than assembly code, they are not the mostproductive or error-free programming languages. A keysource of errors in C/C++ stems from their memorymanagement mechanisms, which require programmers toallocate and deallocate memory manually. Moreover, C++is a feature rich, complex language with a steep learningcurve, which makes it hard to find and retain experiencedreal-time embedded developers who are trained to use itwell.

Real-time embedded software should ultimately besynthesized from high-level specifications expressed withdomain-specific modeling tools [1]. Until those toolsmature, however, a considerable amount of real-timeembedded software still needs to be programmed bysoftware developers. Ideally, these developers should useprogramming languages and middleware that shield themfrom many accidental complexities, such as type errors,memory management, real-time scheduling enforcement,and steep learning curves. Java [2] has become an attractivechoice because of its rapidly growing programmer base, itssimplicity, its safety, and especially for its cheaper main-tenance cost when compared to C/C++. Conventional Javaimplementations are unsuitable for developing real-timeembedded systems, however, mostly due to the fact that1) the scheduling of Java threads is purposely under-specified, 2) Java is a garbage collected language and mostof the precise garbage collectors known in literature [3] arenot suitable for real-time systems, and 3) Java providescoarse-grained control over memory allocation and access,i.e., it allows applications to allocate objects on the heap, butprovides no control over the type of memory in whichobjects are allocated.

To address these problems, the Real-Time Java ExpertsGroup has defined the RTSJ [4], which provides thefollowing capabilities:

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003 1

. A. Corsaro is with the Department of Computer Science and Engineering,Washington University, 1 Brookings Drive, Box 1045, St. Louis, MO63130. E-mail: [email protected].

. D.C. Schmidt is with the Institute for Software Integrated Systems,Vanderbilt University, Nashville, 1829, Station B, Vanderbilt University,Nashville, TN 37235. E-mail: [email protected].

Manuscript received 8 Dec. 2002; revised 5 Aug. 2003; accepted 5 Aug. 2003.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 118752.

1045-9219/03/$17.00 � 2003 IEEE Published by the IEEE Computer Society

Page 2: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

. New memory management models that can be usedin lieu of garbage collection.

. Access to raw physical memory.

. A higher resolution time granularity suitable forreal-time systems.

. Stronger guarantees on thread semantics whencompared to regular Java, i.e., the most eligiblerunnable thread is always run.

Until recently, there were no implementations of the RTSJ,

which hampered the adoption of Java in real-time

embedded systems. It also hampered systematic empirical

analysis of the pros and cons of the RTSJ programming

model. Several implementations of RTSJ are now available,

however, including the RTSJ RI from TimeSys [5] and jRate

from Washington University.This paper significantly extends our earlier work [6], [7]

by providing:

. More extensive coverage of jRate capabilities, suchas its memory regions implementation and itssupport for custom allocators.

. A detailed analysis and comparison of scopedmemory that quantifies the trade offs associatedwith different types of scoped memory.

. Extensive new test results based on the commercialversion of Linux/RT from TimeSys, which providesmany features (such as higher resolution timer,support for priority inversion control via priorityinheritance and priority ceiling protocols, andresource reservation) not supported by the GPLversion of the TimeSys Linux/RT.

The main contribution of this paper is to describe the

techniques used by jRate to implement the RTSJ middle-

ware, as well as to illustrate empirically the performance

pitfalls and trade offs that certain RTSJ design decisions

incur.

1.3 Paper Organization

The remainder of the paper is organized as follows:

Section 2 provides a brief overview of the RTSJ; Section 3

describes the architecture and design rationale of jRate;

Section 4 analyzes the empirical results obtained by

benchmarking jRate and the TimeSys RTSJ RI using our

RTJPerf [6] benchmarking suite; Section 5 compares our

work on jRate with related research; and Section 6

summarizes our work and outlines future plans for

improving the next generation of RTSJ middleware for

real-time embedded applications.

2 THE REAL-TIME SPECIFICATION FOR JAVA

The RTSJ extends the Java API and refines the semantics of

certain constructs to support the development of real-time

applications. In the reminder of this section, we will provide

an overview of the RTSJ key features.

2.1 Memory

The RTSJ extends the Java memory model by providing

memory areas other than the heap. These memory areas are

characterized by the anticipated lifetime of the contained

objects (immortal, scoped) as well as the time taken forallocation (linear, variable).

Objects allocated within the (singleton) Immortal Memoryhave the same lifetime as the application: They are nevercollected. Each Scoped memory area is equipped with areference count of the number of threads active in its area.The lifetime of objects allocated in such an area is keyed tothe reference count.

Additionally, scoped memory areas provide bounds onthe allocation time; currently, variable (VTMemory) andlinear-time (LTMemory) allocators are accommodated. Forlinear allocation time, the RTSJ requires that the timeneeded to allocate the n > 0 bytes to hold the class instancemust be bounded by a polynomial function fðnÞ � Cn forsome constant C > 0.1

For Java Virtual Machine (JVM) and application devel-opers alike, scoped memory is one of the more interestingfeatures added to Java by the RTSJ. Objects allocated withina scoped memory are not garbage collected individually;instead, a reference-counting mechanism detects when allobjects in a scope should be collected. Safety of scopedmemory areas is ensured by reliance upon 1) a set of rulesimposed on entrance of scoped memories, and 2) a set ofrules that govern the legality of reference between objectsallocated in different memory areas.

2.2 Threads

The RTSJ extends the existing Java threading model withtwo new types of real-time threads: RealtimeThread andNoHeapRealtimeThread. The NoHeapRealtimeTh-

read can have an execution eligibility higher than thegarbage collector. A NoHeapRealtimeThread can there-fore neither allocate nor reference any heap objects. Thescheduler controls the execution eligibility of the instances ofthis class by using the SchedulingParameters asso-ciated with it.

2.3 Scheduling

The RTSJ introduces the concept of a Schedulable object.The execution of Schedulable entities is managed by thescheduler that holds a reference to them. The RTSJ providea scheduling API that is sufficiently general to implementcommonly used scheduling algorithms, such as RateMonotonic (RM), Earliest Deadline First (EDF), Least LaxityFirst (LLF), Maximum Urgency First (MAU), etc. However,the only required scheduler for a RTSJ-compliant imple-mentation is a priority preemptive scheduler that candistinguish 28 different priorities.

2.4 Asynchrony

The RTSJ defines mechanisms to bind the execution ofprogram logic to the occurrence of internal and/or externalevents. In particular, the RTSJ provides a way to associatean asynchronous event handler to some application-specificor external events. There are two types of asynchronousevent handlers defined in RTSJ:

. The AsyncEventHandler class, which does nothave a thread permanently bound to it—nor is it

2 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003

1. This bound does not include the time taken by an object’s constructoror a class’s static initializers.

Page 3: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

guaranteed that there will be a separate thread foreach AsyncEventHandler. The RTSJ simply re-quires that, after an event is fired, the execution of allits associated AsyncEventHandlers will be dis-patched.

. The BoundAsyncEventHandler class, which has areal-time thread associated with it permanently. Theassociated real-time thread is used throughout itslifetime to handle event firings.

Event handlers can also be specified a no-heap, whichmeans that the thread used to handle the event must be aNoHeapRealtimeThread.

The RTSJ also introduces the concept of AsynchronousTransfer of Control (ATC), which allows a thread toasynchronously transfer the control from a locus ofexecution to another.

2.5 Time and Timers

Real-time embedded systems often use timers to performcertain actions at a given time in the future, as well as atperiodic future intervals. For example, timers can be used tosample data, play music, transmit video frames, etc. TheRTSJ provides two types of timers:

. OneShotTimer, which generates an event at theexpiration of its associated time interval and

. PeriodicTimer, which generates events periodi-cally.

OneShotTimers and PeriodicTimers events arehandled by AsyncEventHandlers. The RTSJ also sup-ports high resolution timers and high resolution clocks.

3 JRATE OVERVIEW

jRate is an open-source RTSJ-based real-time Java imple-mentation that we are developing atWashington University.jRate extends the open-source GNU Compiler for Java (GCJ)front-endand runtime system[8] toprovide anahead-of-timecompiled middleware platform for developing RTSJ-com-pliant applications.Oneof jRate’s key research goals is that ofusing Generative Programming (GP) [9] in order to make itpossible to have a configurable, customizable, and yetefficient RTSJ implementation. The jRate architecture shownin Fig. 1a differs from the JVM model shown in Fig. 1b sincethere is no JVM interpreting the Java bytecode. Instead, jRatecompiles RTSJ applications into native code. The Java andRTSJ services, such as garbage collection, real-time threads,and scheduling, are accessible via the GCJ and jRate runtimesystems, respectively.

jRate supports most of the RTSJ features described inSection 2. We describe these features below and indicatewhere the design and performance of these features isdiscussed in subsequent sections of this paper.

3.1 Memory Areas

jRate supports scoped memory and immortal memory.Fig. 2 shows how these memory areas are implemented injRate. This diagram shows how each memory area isassociated with an allocator. There are two parallel classhierarchies: one set of Java classes for the memory areas andone set of C++ classes for the allocators. The memorymanagement strategy is delegated to native allocators andthe binding between memory area and type of allocator canbe deferred until creation time. This design provides userswith the flexibility to experiment with different allocatorsstrategies and to choose the type of allocator that best fitstheir application usage patterns.

jRate also provides a strategy that enables users todecide which type of memory (such as linear time memoryor variable time memory) should implement immortalmemory. Although the RTSJ does not mandate howimmortal memory is implemented, we believe it is im-portant to allow users to specify which type of implementa-tion to configure. jRate’s immortal memory implementationcan be configured when an application is launched. Itsscoped memory implementation also exposes a nonstan-dard extension that uses nonthread safe allocators to avoidthe overhead of unnecessary locks if a memory area isalways accessed by a single thread.

Fig. 2 illustrates a new type of scoped memory—calledCTMemory—provided by jRate. CTMemory trades offallocation time for the memory area creation time. Thismemory area is zeroed at initialization time and the amountused is also zeroed each time the memory reference countdrops to zero.2 This feature provides constant timeallocation for objects created within the CTMemory, whichis useful for real-time applications. The structure ofCTMemory is depicted in Fig. 4. The type field distinguishesdifferent types of objects. Different types of objects must betreated differently, e.g., some must be finalized, whereas

CORSARO AND SCHMIDT: THE DESIGN AND PERFORMANCE OF REAL-TIME JAVA MIDDLEWARE 3

2. The reference count associated with a scoped memory is representedby the number of real-time threads in it that are currently active, i.e., haveentered the scoped memory but have not yet exited.

Fig. 1. The jRate architecture.

Fig. 2. The jRate memory region structure.

Page 4: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

others need not be finalized. jRate, as described in [10], iscurrently the only RTSJ implementation which implementsa constant time algorithm for checking the validity ofmemory references. The performance of jRate scopedmemory implementation is analyzed in Section 4.3.1.

3.2 Real-Time Threads and Scheduling

jRate supports real-time threads of type RealtimeThread

using a priority preemptive scheduler based on the under-lying real-time OS priority preemptive scheduler—jRate’sthreads are directly mapped to native OS threads. Aninteresting characteristic of the RTSJ is that each Realti-

meThread is associated with a scope stack that 1) keepstrack of the set of memory regions that have been enteredby the thread and 2) can detect cycles in the entered scopes.jRate’s scope stack implementation uses data structureswhich allow to perform all scope stack operations inconstant time.

For example, the findFirstScope() operation de-fined in the RTSJ, scans the scope stack from top to bottomand returns the first scoped memory area found. Ifimplemented as suggested by the RTSJ specification, thisoperation would have a time complexity of OðnÞ, where n isthe length of the stack. jRate enhances the stack datastructure suggested in the RTSJ by maintaining a linked listof the scoped memory in the stack, along with the index ofthe topmost scoped memory, as shown in Fig. 3. This designallows a constant time implementation of both find-

FirstScope(), push(), and pop(). In fact, find-

FirstScope() simply has to return the value of thepointer to the top scoped memory, while push and pop

have, respectively, to detect if the memory area beingpushed or popped is a scoped memory and, if so, updatethe top pointer for the scoped memory and the pointer inthe linked list. Something else that is worth noticing is thatjRate avoids using instanceof in order to determine the typeof a the memory area being pushed or popped since this isusually implemented as a linear time operation in theheight of the class hierarchy. To have a constant time typeidentification, jRate uses its own type encoding for all thoseclasses that often require an instanceof .

When a scope stack is destroyed and the reference countsof all scoped memory areas still on the stack must bedecremented, jRate knows exactly which entries are scopedmemory and which are not. This knowledge enables it toelide a test on the type of memory area and avoids blindlysearching for scoped memories on the stack.

The size of the jRate scope stack is fixed after the real-time thread is created. This design makes jRate moreefficient by avoiding the use of pointers to implement thelinked list. The performance of jRate thread implementationis analyzed in Section 4.3.2.

3.3 Asynchrony

jRate provides a robust and efficient asynchronous eventhandling implementation that avoids priority inversion andprovides lock free dispatch on most platforms.3 jRate usespriority queues ordered by the execution eligibility of thehandlers to dispatch asynchronous events. Executioneligibility is the ordering mechanism used throughoutjRate, e.g., it is used to achieve total ordering of schedulableentities whose QoS are expressed in various ways. Theperformance of jRate asynchrony implementation is ana-lyzed in Section 4.3.3.

3.4 High Resolution Time and Clock

jRate implements the RTSJ high resolution time API.Different implementations of real-time clocks are provided.Depending on the underlying hardware and OS platform,resolution from nanoseconds up to microseconds can beobtained. In particular, on Pentium platforms, high resolu-tion time with a resolution close to the processor frequencyis obtained by using the read time stamp counter (RDTSC)register.

3.5 Timers

jRate implements periodic and one-shot timers in accor-dance to the RTSJ. A thread is associated with each timer(the RI takes a similar approach). Periodic timers areimplemented by relying on the behavior provided byperiodic threads, where as one-shot timers use a real-timethread with custom logic to generate the event at the righttime. The priority of the thread is inherited by the priorityof the most eligible handler registered with the timer. Ananalysis of the performance of the jRate timers implemen-tation appears in [6].

4 PERFORMANCE RESULTS AND ANALYSIS

This section first describes our real-time Java testbed andoutlines the various Java implementations used for the tests.We then present and analyze the results obtained runningmost of the RTJPerf tests [6] in our testbed.

4 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003

Fig. 3. The jRate scope stack structure.

3. On certain platforms, such as Compaq Alpha, the assumptions that werely upon to avoid locking do not hold, so for those platforms jRate mustuse locks.

Fig. 4. The jRate CTMemory structure.

Page 5: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

4.1 Overview of the Hardware and Software Testbed

The test results reported in this section were obtained on anIntel Pentium III 733 MHz with 256 MB RAM, runningLinux RedHat 7.3 with the TimeSys Linux/RT 3.1 kernel[11]. This is the TimeSys Linux/RT NET commercialversion that implements priority inheritance protocols, highresolution timers, and a resource kernel that supportsresource reservation. The Java platforms used to test theRTSJ features are described below:

1. TimeSys RTSJ RI: TimeSys has developed the officialRTSJ RI [5], which is a fully compliant implementa-tion of Java [2], [12] that implements all themandatory features in the RTSJ. The RI is based ona Java 2 Micro Edition (J2ME) (JVM) and supports aninterpreted execution mode, i.e., there is no just-in-time (JIT) compilation. The RI runs on any Linuxplatform and its threading model directly maps Javathreads onto Linux POSIX threads.

2. jRate: As described in Section 3, jRate is an open-source RTSJ-based extension of GCJ front-end andruntime systems that we are developing at Wa-shington University.

4.2 Compiler and Runtime Options

The following options were used when compiling andrunning the RTJPerf benchmarks.

3. TimeSys RTSJ RI:The Java code for the tests wascompiled with jikes [13] using the -O option. TheTimeSys RTSJRI Java Virtual Machine (JVM) wasalways run using the -Xverify:none option. Theenvironment variable that controls the size of theimmortal memory was set as IMMORTAL_

SIZE=9000000.4. jRate: The Java code for the test was compiled with

GCJ with the -O3 flag and statically linked with theGCJ and jRate runtime libraries. The immortalmemory size was set to the same value as the RI.jRate v0.3 was used along with GCJ 3.2.1.

4.3 RTJPerf Benchmarking Results

This section presents a description and the results obtainedfor the RTJPerf tests we ran. We analyze the results andexplain why the TimeSys RI and jRate RTSJ implementa-tions performed differently.4

We provide average and worst-case behavior, along withdispersion indexes, for all the RTSJ Java features wemeasured. For certain tests, we provide sample traces thatare representative of all the measured data. The measure-ments performed in the tests reported in this section arebased on steady state observations, where the system is runto a point at which the transitory behavior effects of coldstarts are negligible before executing the tests.

4.3.1 Memory Benchmark Results

Below we present and analyze the results of the RTJPerf

memory benchmarks that we ran.

Allocation Time Test: Dynamic memory allocation isforbidden or strongly discouraged in many real-timeembedded systems to minimize memory leaks, latency,and nonpredictability. The scoped memory specified by theRTSJ is designed to provide a relatively fast and safe way toallocate memory that has nearly the flexibility of dynamicmemory allocation, but the efficiency and predictability ofstack allocation. The measure of the allocation time and itsdependency on the size of the allocated memory is a goodmeasure of the efficiency of various types of scopedmemory implementations.

To measure the allocation time and its dependency onthe size of the memory allocation request, RTJPerf providesa test that allocates fixed-sized objects repeatedly from ascoped memory region whose type is specified by acommand-line argument. To control the size of the objectallocated, the test allocates an array of bytes. It is possible todetermine the allocation time associated with each type ofscoped memory by running this test with different alloca-tion sizes.

1. Test Settings: To measure the average allocation timeincurred by the RI implementation of LTMemory

and VTMemory, we ran the RTJPerf allocation timetest for allocation sizes ranging from 32 to 16,384bytes. Each test samples 1,000 values of the alloca-tion time for the given allocation size. This test alsomeasured the average allocation time of jRate’sCTMemory implementation described in Section 3.1.

2. Test Results: The data obtained by running theallocation time tests were processed to obtain anaverage, dispersion, and worst-case measure of theallocation time. We compute both the average anddispersion indexes since they indicate the followinginformation:

. how predictable is the behavior of a scopememory implementation,

. how much variation in allocation time canoccur, and

. how the worst-case behavior compares to theaverage-case and to the case that provides a99 percent upper bound.5

Fig. 5 shows the resulting average allocation time forthe different test runs, and it also shows the standarddeviation of the allocation time measured in thevarious test settings.

3. Results Analysis: We now analyze the results of thetests that measured the average and worst-caseallocation times, along with the dispersion for thedifferent test settings:

. Average Measures—As shown in Fig. 5, bothLTMemory and VTMemory provide linear timeallocation with respect to the allocated memorysize. Since similar results were found for othermeasured statistical parameters, we infer thatthe RI implementation of LTMemory andVTMemory are similar, so we focus primarily

CORSARO AND SCHMIDT: THE DESIGN AND PERFORMANCE OF REAL-TIME JAVA MIDDLEWARE 5

4. Explaining certain behaviors requires inspection of the source code ofa particular Java Virtual Machine (JVM) feature, which is not alwaysfeasible for Java implementations that are not open-source.

5. By “99 percent upper bound,” we mean that value that represents anupper bound for the measured values in the 99th percentile of the cases.

Page 6: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

on the LTMemory since our results also apply toVTMemory. jRate has an average allocation timethat is independent of the allocated chunk,which helps analyze the timing of RTSJ code,even without knowing the amount of memorythat will be needed. From Fig. 5, it can be easilyseen that for small memory chunks the jRatememory allocator is nearly 20 times faster thanRI’s LTMemory. For the largest chunk we tested,jRate’s CTMemory is � 120 times faster RI’sLTMemory.

. Dispersion Measures—The standard deviationof the different allocation time cases is shown inFig. 5. This deviation increases with the chunksize allocated for both LTMemory and VTMem-

ory until it reaches 4 Kbytes, where it suddenlydrops and then it starts growing again. OnLinux, a virtual memory page is exactly 4 Kby-tes, but when an array of 4 Kbytes is allocated,the actual memory is slightly larger in order tostore freelist management information. In con-trast, the CTMemory implementation has thesmallest variance and the flattest trend.

The plots in Fig. 6 show the cumulativerelative frequency distribution of the allocationtime for some of the different cases discussedabove. These plots illustrate how the allocationtime is distributed for different types ofmemory and different allocation sizes. Forany given point t on the x axis, the value onthe y axis indicates the relative frequency ofallocation time for which AllocationTime � t.The standard deviations shown in Figs. 6 and5 provide insights on how the measuredallocation time is dispersed and distributed.It is interesting to note that the cumulativefrequency distribution for jRate is S-shaped.Moreover, the values are mostly concentratedin the two flat regions of the “S.” The shape ofdistribution in Fig. 6 reveals a characteristic ofjRate’s allocator implementation. The GCJruntime requires all the objects to be allocated

at the 8-byte boundaries.6 As a result, paddingmay need to be computed, depending whetherthe size of the object plus the size of theheader is a multiple of 8 or not (see Fig. 4).The two regions in the cumulative distributionshow the two different cases.

. Worst-Case Measures—Fig. 5 shows the boundson the allocation time for jRate’s CTMemory andthe RI LTMemory. Each of these graphs depictsthe average and worst-case allocation times,along with the 99 percent upper bound of theallocation time. Fig. 5 illustrates how the worst-case execution time for jRate’s CTMemory is atmost � 1.8 times larger than its average execu-tion time. Fig. 5 shows how the maximum,average, and the 99 percent case for the RILTMemory converge as the size of the allocatedchunk increases. The minimum ratio betweenthe worst and average-case allocation times is� 1.8 for a chunk size of 16K. Figs. 5 and 6 alsocharacterize the distribution of the allocationtime. Fig. 6 shows how, for some allocationsizes, the allocation time for the RI LTMemory iscentered around two points.

Scoped Memory Lifetime Test: Scoped memory is one of thekey features introduced by the RTSJ. It enables applicationsto circumvent the garbage collector, yet still use automaticmemory management, by 1) associating with each memoryscope a reference count that depends on the number of real-time threads within the memory area (i.e., that have enteredthe scope but yet not exited it), and 2) ensuring that all theobjects allocated in the scope are finalized and the spacereclaimed, as soon as the reference count associated withthe memory area drops to zero. Since most RTSJ applica-tions use scoped memory heavily, it is essential tocharacterization its performance precisely.

RTJPerf provides a test that measures 1) the time neededto create a memory scope, 2) the time needed to enter it, and3) the time needed to exit it. The time needed to create ascoped memory area depends on the following factors:

6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003

Fig. 5. Scoped memory allocation time statistics.

Fig. 6. Allocation time cumulative relative frequency distribution.

6. This is done because the lowest three bits are used to store lockinginformation.

Page 7: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

. The allocation context of the thread that creates thememory scope. The Allocation Time Test measuresthis aspect of memory scope creation time.

. The native C/C++ implementation of scoped mem-ory. The Scoped Memory Lifetime Test measures theefficiency and predictability of the native C/C++implementation of scoped memory.7

The time needed to exit a memory scope is measured by

the case in which its reference count drops to zero as a

result of the thread exiting the scope. In this case, the

memory scope must finalize all the objects allocated within

it and reclaim the used storage.To determine the time needed to enter, exit, and create a

memory scope—and to determine how efficient the im-

plementation is—this test creates a memory scope, enters it,

fills it with objects, and then exits the scope. The test can be

run by configuring the type of scoped memory to be used

and by having the object allocated selectively override the

default finalize method. Measuring this latter point is

important since some Java implementations are smarter

than others in handling the case where an object does not

override the finalizer.

4. Test Settings: To measure scoped memory creation,enter, and exit time, we ran the RTJPerf scopedmemory timing test for memory sizes ranging from4,096 to 1,048,576 bytes. The test was designed toensure that the allocated objects overrode thefinalizer, which enabled a worst-case measurementof the exit time. For each test, 500 values weresampled for each of the measured variables. This testwas run to get the relevant measures for the RI’sLTMemory and jRate’s CTMemory.

5. Test Results: Figs. 7, 8, and 9 contain the average,99 percent, maximum, and standard deviation trendfor memory scope creation, enter, and exit time.Fig. 10 shows the execution time, measured as thetime needed to fill the memory with the objects.

6. Results Analysis: Below, we analyze the results of thetest that measure the creation, enter, and exit timefor a scoped memory area.

. Average Measures—From Fig. 7, we can seehow the jRate CTMemory has better creationtimes, on average, than the RTSJ RI for scopedmemory with size less or equal than 64 KBytes.After this point, jRate starts performing worsethan the RI. The explanation of this behaviorstems from the fact that the CTMemory maps theLinux /dev/zero device and then locks theallocated chunk. Simply locking the memorydoes not reserve physical memory for the callingprocess; however, since the pages might becopy-on-write. To improve the predictability ofan application, therefore, it is usually a goodhabit to make sure that at least a byte each pageof the locked chunk of memory is accessed. Thetime taken to write at least one byte per each

CORSARO AND SCHMIDT: THE DESIGN AND PERFORMANCE OF REAL-TIME JAVA MIDDLEWARE 7

7. The memory used by the scoped memory to allocate an object is notretrieved by the current allocation context, but is allocated in a platform-specific way, e.g., using malloc() or mmap().

Fig. 7. Average, 99 percent, maximum, and standard deviation of the

creation time (�sec).

Fig. 8. Average, 99 percent, maximum, and standard deviation of the

enter time (�sec).

Fig. 9. Average, 99 percent, maximum, and standard deviation of the

exit time (�sec).

Page 8: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

allocated page, has a time complexity of OðnÞ inthe size of the scoped memory allocated, whichmakes creation time depend linearly on memorysize.

Fig. 8 shows the enter time trend for jRate’sCTMemory and for the RI’s LTMemory. Fromthis diagram, we can see that, while the RI entertime varies only slightly with the scopedmemory size, for jRate, the enter time dependsmore on the size. This behavior stems from thefact that jRate’s scoped memory puts pressureon the cache and induces cache misses insucceeding instructions, which degrades theexecution time of the first enter (which is theone measured by the test).

Fig. 9 shows the results for the exit time.From this diagram, we see that, even if jRate’sCTMemory has to zero the allocated memoryother than finalizing the allocated objects, itsperformance is close to the RI. Exit time dependsprimarily on the efficiency of JNI and on theefficiency of the native implementation thatmanages the scope stack, the finalization, andthe cleanup of the memory, if any is needed. Inthis case, jRate’s ahead-of-time compilationmodel plays a minor role, especially as the sizeof the memory grows.

. Dispersion Measures—Figs. 7, 8, 9, and 10show how jRate and the RI have very lowdispersion values for small memory sizes. Thesedispersion values grow with the size of thescoped memory for both implementations. TheRI becomes less predictable as the size of thememory scope increases, culminating in patho-logical cases shown in Figs. 7 and 10.

. Worst-Case Measures—The results for all themeasured variables show that the worst-casemeasures are very close to the average-case forjRate. In contrast, the RI’s worst-case values canbe quite large compared to its average-case

values. The largest difference between averageand worst-case measures appeared in the crea-tion time and in the execution time.

We cannot give a precise answer to thereason of this behavior since the code of the RIwas not available for inspection. A reasonableguess, however, is that the RI allocators relydirectly on the system provided malloc() foreach of the allocated objects. This explanationjustifies both the relatively small creation time,and also the degradation of the predictabilitywhen the allocator creates many objects to fillthe scoped memory.

4.3.2 Thread Benchmark Results

Below, we present and analyze the results from the RTJPerfyield and thread creation latency benchmarks.

Yield Context Switch Test: High levels of thread contextswitching overhead can significantly degrade applicationresponsiveness and predictability. Minimizing this over-head is therefore an important goal of any runtimeenvironment for real-time embedded systems. To measurecontext switching overhead, RTJPerf provides a test inwhich two real-time threads characterized by the sameexecution eligibility yield to each other. Since there are justtwo real-time threads, whenever one thread yields, theother thread will have the highest execution eligibility, so itwill be chosen to run.

1. Test Settings: For each RTSJ platform we evaluated,we collected 1,000 samples of the the context switchtime, which we forced by explicitly yielding theCPU. Real-time threads were used for the RI andjRate, and immortal memory was used as allocationcontext for the threads.

2. Test Results: Table 1 reports the average andstandard deviation for the measured context switchin the various platforms.

3. Results Analysis: Below, we analyze the results of thetests that measure the average context switch time,its dispersion, and its worst-case behavior for thedifferent test settings:

. Average Measures—Table 1 shows how the RIperforms fairly well in this test, i.e., its contextswitch time is only � 2 �s larger than jRate’s.The main reason for jRate’s better performancestems from its use of ahead-of-time compilation.The last row of Table 1 reports the results of aC++-based context switch test described in [14].The table shows how the context switch timemeasured for the RI and jRate is similar to thatfor C++ programs on TimeSys Linux/RT. Thecontext switching time for the RI is less thanthree times larger than that found for C++,

8 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003

Fig. 10. Average, 99 percent, maximum, and standard deviation of the

execution time (�sec).

TABLE 1Yield Context Switch Statistics

Page 9: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

whereas the times for jRate are roughly thesame as those for C++.

. Dispersion Measures—The third column ofTable 1 reports the standard deviation for thecontext switch time. Both jRate and the RIexhibit tight dispersion indexes, indicating thatcontext switching overhead is predictable forthese implementations. In general, the contextswitch time for jRate is as predictable as C++ onour Linux testbed platform.

. Worst-Case Measures—The fourth and fifthcolumn of Table 1 represent the maximum andthe 99 percent bound for the context switch time,respectively. jRate and the RI have 99 percentbound and worst-case context switching timesthat are close to their average-case values.

4. Thread Creation Latency Test: This test measures thetime needed to create a thread, which consists of thetime to create the thread instance itself and the timeto start it. For this test, RTJPerf provides twovariants; here, we refer to the test that creates andstarts a real-time thread from another real-timethread. The results we obtained are presented andanalyzed below.

5. Test Settings: For each platform in our test suite, wecollected 1,000 samples of the thread creation timeand thread start time. Real-time threads were usedfor the RI and jRate. Since we are interested inmeasuring the time taken to create and start athread—while limiting the effects of other de-lays—the threads had immortal memory as theirallocation context, so to avoid garbage collectionoverhead.

6. Test Results: Tables 2 and 3 report the average,standard deviation, maximum, and 99 percentbound for the thread creation time and thread starttime, respectively.

7. Results Analysis: Below, we analyze the results of thetests that measure the average-case, the dispersion,and the worst-case for thread creation time andthread start time.

. Average Measures—The second column ofTables 2 and 3 show that jRate has the bestaverage thread creation time and thread starttime. The RI’s creation time grows linearly in theRT test, which is unusual and may reveal aproblem in how the RI manages the scope stack.The smallest creation time experienced by the RIis around 500 �s (which is much higher thanjRate) and one reason may be the RI’s scopestack implementation.

. Dispersion Measures—The third column ofTables 2 and 3 present the standard deviationfor thread creation time and thread start time,respectively. jRate has the smallest dispersion of

values for the thread startup time and creationtime.

The standard deviation associated with thethread creation time is influenced by thepredictability of the time the memory allocatortakes to provide the memory needed to createthe thread object. In contrast, the standarddeviation of the thread start time dependslargely on the OS, whereas the rest of threadstart time depends on the details of the threadstartup method implementation.

. Worst-Case Measures—The fourth and fifthcolumn of Tables 2 and 3 present the maximumand the 99 percent bound for the thread creationtime, and the thread startup time, respectively.These tables clearly show how jRate has a morepredictable startup time than creation time. Thejitter introduced in the creation time likely stemsfrom the fact that jRate allocates several datastructure via malloc(), and does not take yetadvantage of custom allocators. The RI also hasfairly good startup time, though we cannot makeany comparison with its creation time due to anRImemory leak exposedby this test. Theproblemis related to RI’s management of the scope stacksince it appearsonlywhenaRealtimeThread iscreated by another RealtimeThread.

Periodic Thread Test: Real-time embedded systems oftenhave activities (such as data sampling and control lawevaluation) that must be performed periodically. The RTSJprovides programmatic support for these activities via itsAPIs for scheduling real-time thread execution periodically.To program this RTSJ feature, an application specifies theproper release parameters and uses the waitForNext-

Period() method to schedule thread execution at thebeginning of the next period (the period of the thread isspecified at thread creation time via PeriodicPara-

meters). The accuracy with which successive periodiccomputation are executed is important since excessive jitteris detrimental to most real-time applications.

RTJPerf provides a test that measures the precision atwhich the periodic execution of real-time thread logic ismanaged. This test measures the actual time that elapsesfrom one execution period to the next.

8. Test Settings: This test runs a RealtimeThread thatdoes nothing but reschedule its execution for thenext period. The actual time between each activationwas measured and 1,000 of these measurementswere made for the periods 1ms, 5ms, 10ms, 50ms,100ms, and 500ms.

9. Test Results: Fig. 11 shows average and dispersionvalues that we measured for this test.8

CORSARO AND SCHMIDT: THE DESIGN AND PERFORMANCE OF REAL-TIME JAVA MIDDLEWARE 9

TABLE 2Thread Creation Time Statistics

TABLE 3Thread Startup Time Statistical Indexes

8. Whenever the plot for jRate and the RI overlap, the values for jRateare shown above the graph and the value for the RI are shown below thegraph.

Page 10: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

10. Results Analysis: Below, we analyze the results of thetest that measures the accuracy with which periodica thread’s logic is activated:

. Average Measures—Fig. 11 shows that bothjRate and the RI have an average period that isquite close to the target period. Whereas RI isalways at least several hundreds of microse-conds early, however, jRate is at most severaltens of microseconds late. To understand thereason for this behavior, we inspected the RIimplementation of periodic threads, (i.e., at theimplementation of waitForNextPeriod()),and found that a Java Native Interface (JNI)method call is used to wait for the next period.Without the source for the RI’s JVM, it is hard totell exactly how the native method is imple-mented. On the TimeSyS Linux/RT kernel, jRaterelies on the nanosleep() system call toimplement periodic thread behavior. To pro-duce more accurate periods, a calibration testcan be run at configuration time to obtain a slacktime that should be considered as an approx-imation of the overhead of calling the wait-

ForNextPeriod() and then getting thecontrol back.

As stated near the beginning of Section 4.3,

the measure for each test are made in stationary

conditions. The Periodic Thread Test was inter-

esting since the RI took a relatively long time to

reach the steady state, particularly for small

periods.. Dispersion Measures—Fig. 11 shows the dis-

persion of the measured period for both jRateand the RI has the same trend. While jRategenerally has less dispersed values than the RI,both implementations are quite predictable.

. Worst-Case Measures—As shown in Fig. 11,both jRate and the RI have worst-case behaviorthat is close to the average-case values and the99 percent bound. In general, jRate’s worst-casevalues are closer to the average, but the RIvalues are not much further away.

4.3.3 Asynchrony Benchmark Results

Below, we present and analyze the results of the RTJPerfasynchrony tests that we ran.

Asynchronous Event Handler Dispatch Delay Test: Severalperformance parameters are associated with asynchronousevent handlers. One of the most important is the dispatchlatency, which is the time fromwhen an event is fired towhenits handler is invoked. Events are often associated withalarms or other critical actions that must be handled within ashort time and with high predictability. This RTJPerf testmeasures the dispatch latency for the different types ofasynchronous event handlers prescribed by the RTSJ.

1. Test Settings: To measure the dispatch latencyprovided by different types of asynchronous eventhandlers defined by the RTSJ, we ran the testdescribed above with a fire count of 1,000 for bothRI and jRate. To ensure that each event firing causesa complete execution cycle, we ran the test in“lockstep mode,” where one thread fires an eventand only after the thread that handles the event isdone is the event fired again. To avoid theinterference of the Garbage Collector (GC) whileperforming the test, the real-time thread that firesand handles the event uses scoped memory as itscurrent memory area.

2. Test Results: Fig. 12 shows the trend of the dispatchlatency for successive event firings (since the RI’sAsyncEventHandler trend is completely off thescale, it is reported in a separate plot in Fig. 12). Thedata obtained by running the dispatch delay testswere processed to obtain average and worst-casebehavior, and the dispersion measure of the dispatchlatency. Tables 4 and 5 show the results found forjRate and the RI, respectively.

3. Results Analysis: Below, we analyze the results of thetests that measure the average-case and worst-casedispatch latency, as well as its dispersion, for thedifferent test settings:

. Average Measures—Table 5 illustrates the largeaverage dispatch latency incurred by the RTSJRI AsyncEventHandler. The results in Fig. 12show how the actual dispatch latency increasesas the event count increases. By tracing thememory used when running the test using heapmemory, we found that not only did memoryusage increased steadily, but even invoking theGC explicitly did not free any memory.

These results reveal a problem with how theRI manages the resources associated to threads.The RI’s AsyncEventHandler creates a newthread to handle a new event, and the problemappears to be a memory leak in the underlyingRI memory manager associated with threads,rather than a limitation with the model used tohandle the events. In contrast, the RI’s Boun-

dAsyncEventHandler performs quite well,i.e., its average dispatch latency is slightly lessthan twice as large as the average dispatchlatency for jRate.

10 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003

Fig. 11. Measured period statistics.

Page 11: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

Fig. 12 and Table 4 show that the averagedispatch latency of jRate’s AsyncEventHand-

ler is the same order of magnitude as itsBoundAsyncEventHandler. The differencebetween the two average dispatch latency stemsfrom jRate’s AsyncEventHandler implemen-tation, which uses an executor [15] thread from apool of threads to perform the event firing,rather than having a thread permanently boundto the handler.

. Dispersion Measures—The results in Tables 5and 4, Fig. 12, and Fig. 13 illustrate how jRate’sBoundAsyncEventHandler dispatch latencyincurs the least jitter. The dispatch latency valuedispersion for the RTSJ RI BoundAsyncE

ventHandler is also quite good, though itsjitter is higher than jRate’s AsyncEventHand

ler and BoundAsyncEventHandler. Thehigher jitter in RI may stem from the fact thatthe RI stores the event handlers in a java.u

til.Vector. This data structure achievesthread-safety by synchronizing all method thatget(), add(), or remove() elements from it,which acquires and releases a lock associatedwith the vector for each method.

To avoid the locking overhead incurred by theRI, jRate uses a data structure that associates theevent handler list with a given event and allowsthe contents of the data structure to be readwithout acquiring/releasing a lock. Only mod-ifications to the data structure itself must beserialized. As a result, jRate’s AsyncE

ventHandler dispatch latency is relatively

predictable, even though the handler has nothread bound to it permanently. The jRate threadpool implementation uses LIFO queues for itsexecutor, i.e., the last executor that has completedexecuting is the first one reused. This technique isoften applied in thread pool implementations toleverage cache affinity benefits [16].

. Worst-Case Measures—Table 4 illustrates howthe jRate’s BoundAsyncEventHandler andAsyncEventHandler have worst-case execu-tion time that is close to its average-case. Theworst-case dispatch delay of the RI’s BoundA

syncEventHandler is not as low as the oneprovided by jRate due to differences in how theirevent dispatchingmechanisms are implemented.

4. Asynchronous Event Handler Priority Inversion Test: Ifthe right data structure is not used to maintain thelist of event handlers associated with an event, anunbounded priority inversion can occur during thedispatching of the event. This test therefore mea-sures the degree of priority inversion that occurswhen multiple handlers with different Scheduling-Parameters are registered for the same event. Thistest registers N handlers with an event in order ofincreasing importance. The time between the firingand the handling of the event is then measured forthe highest priority event handler.

By comparing the results for this test with theresult of the test described above, RTJPerf candetermine the degree of priority inversion present inthe underlying RTSJ event dispatching implementa-tion. Section 4.3.3 provides an analysis of theimplementation of the current RI and presents howjRate overcomes some RI shortcomings.

CORSARO AND SCHMIDT: THE DESIGN AND PERFORMANCE OF REAL-TIME JAVA MIDDLEWARE 11

Fig. 12. Dispatch latency trend for successive event firing.

TABLE 4jRate Event Handler’s Dispatch Latency Statistics

TABLE 5RI Event Handler’s Dispatch Latency Statistics

Fig. 13. Cumulative dispatch latency distribution.

Page 12: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

5. Test Settings: This test uses the same settings as theasynchronous event handler dispatch delay test.Only the BoundAsyncEventHandler performanceis measured, however, because the RI’s AsyncE-

ventHandlers are essentially unusable since theirdispatch latency grows linearly with the number ofevents handled (see Fig. 12), which masks anypriority inversions. Moreover, jRate’s AsyncE-

ventHandler performance is similar to its Boun-

dAsyncEventHandler performance, so the resultsobtained from testing one applies to the other. Thecurrent test uses the following two types ofasynchronous event handlers:

. The first is identical to the one used in theprevious test, i.e., it gets a time stamp after thehandler is called and measures the dispatchlatency. This logic is associated with H.

. The second does nothing and is used for thelower priority handlers.

6. Test Results: Fig. 14 shows how the average, standarddeviation, maximum, and 99 percent bound of thedispatch delay changes for H as the number of low-priority handlers increase.

7. Results Analysis: Below, we analyze the results of thetests that measure average-case and worst-casedispatch latency, as well as its dispersion, for jRateand the RI.

. Average Measures—Fig. 14 illustrates that theaverage dispatch latency experienced by H isessentially constant for jRate, regardless of thenumber of low-priority handlers. It growsrapidly; however, as the number of low-priorityhandlers increase for the RI. The RI’s eventdispatching priority inversion is problematic forreal-time applications and stems from the factthat its queue of handlers is implemented with ajava.util.Vector, which is not ordered bythe execution eligibility. In contrast, the priorityqueues in jRate’s event dispatching are orderedby the execution eligibility of the handlers.

Execution eligibility is the ordering mechan-ism used throughout jRate. For example, it isused to achieve total ordering of schedulableentities whose QoS are expressed in differentways.

. Dispersion Measures—Fig. 14 illustrates howH’s dispatch latency dispersion grows as thenumber of low-priority handlers increases in theRI. The dispatch latency incurred by H in the RItherefore not only grows with the number oflow-priority handlers, but its variability in-creases, i.e., its predictability decreases. Incontrast, jRate’s standard deviation increasesvery little as the low-priority handlers increase.As mentioned in the discussion of the averagemeasurements above, the difference in perfor-mance stems from the proper choice of priorityqueue.

. Worst-Case Measures—Fig. 14 illustrates howthe worst-case dispatch delay is largely inde-pendent of the number of low-priority handlersfor jRate. In contrast, worst-case dispatch delayfor the RI increases as the number of low-priority handlers grows. The 99 percent boundis close to the average for jRate and relativelyclose for the RI.

5 RELATED WORK

Although the RTSJ was adopted as a standard fairlyrecently [4], there are already a number of related researchprojects. The following projects are particularly interesting:

. The FLEX [17] provides a Java compiler written inJava, along with an advanced code analysis frame-work. FLEX generates native code for StrongARM orMIPS processors, and can also generate C code. Ituses advanced analysis techniques to automaticallydetect the portions of a Java application that can takeadvantage of certain real-time Java features, such asmemory areas or real-time threads.

. The Real-Time Java for Embedded Systems (RTJES)program [18] is working to mature and demonstratereal-time Java technology. A key objective of theRTJES program is to assess important real-timecapabilities of real-time Java technology via acomprehensive benchmarking effort. This effort isexamining the applicability of real-time Java withinthe context of real-time embedded system require-ments derived from Boeing’s Bold Stroke avionicsmission computing architecture [19].

6 CONCLUDING REMARKS

TheRTSJ is an important emergingmiddlewareplatform thatdefines a standard, high-level, and productive environmentfor developing real-time embedded applications. RTSJencapsulates much of the complexity and platform-specificdetails in the middleware, and provides a powerful set ofprogramming abstractions to application developers. Thispaper presented the architecture of jRate, which is an open-source ahead-of-time compiled RTSJ middleware that wehave created atWashington University. This paper also usedthe open-source RTJPerf benchmarking suite to empirically

12 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2003

Fig. 14. H’s dispatch latency statistics.

Page 13: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …schmidt/PDF/TPDS.pdfreal-time embedded applications. 2THE REAL-TIME SPECIFICATION FOR JAVA The RTSJ extends the Java API and refines

evaluate the performance of RTSJ middleware features injRate and the TimeSys RTSJ RI that are crucial to thedevelopment of real-time embedded applications.

RTJPerf is one of the first open-source benchmarkingsuites designed to evaluate RTSJ-compliant Java implemen-tations empirically. We believe it is important to have anopen benchmarking suite to measure the quality of serviceof RTSJ implementations. RTJPerf not only helps guideapplication developers to select RTSJ features that aresuited to their requirements, but also helps developers ofRTSJ implementations evaluate and improve the perfor-mance of their products.

Although much work remains to ensure predictable andefficient performance under heavy workloads and highcontention, our test results indicate that real-time Java ismaturing to the point where it can be applied to certain typesof real-time applications. In particular, the performance andpredictability of jRate is approaching C++ for some tests. TheTimeSys RTSJ RI also performed relatively well in someaspects, though it has problems with AsyncEventHandlerdispatching delays and priority inversion.

ACKNOWLEDGMENTS

The authors would like to thank Ron Cytron, Peter Dibble,David Holmes, Doug Lea, Doug Locke, Carlos O’Ryan, JohnRegehr, and Gautam Thaker for their constructive sugges-tions that helped to improve earlier drafts of this paper.

REFERENCES

[1] J. Sztipanovits and G. Karsai, “Model-Integrated Computing,”Computer, vol. 30, no. 4, pp. 110-112, Apr. 1997.

[2] K. Arnold, J. Gosling, and D. Holmes, The Java ProgrammingLanguage. Boston: Addison-Wesley, 2000.

[3] R. Jones and R. Lins, Garbage Collection Algorithms for AutomaticDynamic Memory Management. New York: Wiley & Sons, 1996.

[4] G. Bollella, J. Gosling, B. Brosgol, P. Dibble, S. Furr, D. Hardin, andM. Turnbull, The Real-Time Specification for Java. Addison-Wesley,2000.

[5] TimeSys, Real-Time Specification for Java Reference Implementa-tion, www.timesys.com/rtj, 2001.

[6] A. Corsaro and D.C. Schmidt, “Evaluating Real-Time JavaFeatures and Performance for Real-time Embedded Systems,”Proc. Eighth IEEE Real-Time Technology and Applications Symp.,Sept. 2002.

[7] “The Design and Performance of the jRate Real-Time JavaImplementation,” On the Move to Meaningful Internet Systems2002: CoopIS, DOA, and ODBASE, R. Meersman and Z. Tari, eds.,pp. 900-921,Springer Verlag, 2002.

[8] GNU is Not Unix, GCJ: The GNU Complier for Java, http://gcc.gnu.org/java, 2002.

[9] K. Czarnecki and U. Eisenecker, Generative Programming: Methods,Tools, and Applications. Addison-Wesley, 2000.

[10] A. Corsaro and R.K. Cytron, “Efficient Memory-Reference Checksfor Real-Time Java,” Proc. ACM SIGPLAN Conf. Language,Compiler, and Tool for Embedded Systems, pp. 51-58, 2003.

[11] TimeSys, TimeSys Linux/RT 3.0, www.timesys.com, 2001.[12] J. Gosling, B. Joy, and G. Steele, The Java Programming Language

Specification. Addison-Wesley, 1996.[13] IBM, Jikes 1.17, http://www.research.ibm.com/jikes/, 2001.[14] D.C. Schmidt, M. Deshpande, and C. O’Ryan, “Operating System

Performance in Support of Real-Time Middleware,” Proc. SeventhWorkshop Object-Oriented Real-Time Dependable Systems, Jan. 2002.

[15] D. Lea, Concurrent Programming in Java: Design Principles andPatterns, second ed. Addison-Wesley, 2000.

[16] J.D. Salehi, J.F. Kurose, and D. Towsley, “The Effectiveness ofAffinity-Based Scheduling in Multiprocessor Networking,” Proc.IEEE INFOCOM, Mar. 1996.

[17] M.Rinardetal.,“FLEXCompilerInfrastructure,” http://www.flex-compiler.lcs.mit.edu/Harpoon/, 2002.

[18] J. Lawson, “Real-Time Java for Embedded Systems (RTJES),”http://www.opengroup.org/rtforum/jan2002/slides/java/lawson.pdf, 2001.

[19] D.C. Sharp, “Reducing Avionics Software Cost through Compo-nent Based Product Line Development,” Proc. 10th Ann. SoftwareTechnology Conf., Apr. 1998.

Angelo Corsaro received the Laurea degree incomputer engineering in 1999 from the Univer-sity of Catania, and the MS degree in computerscience from Washington University in 2001.Currently, he is a PhD candidate in theComputer Science Department of WashingtonUniversity. His research focuses on real-timeJava extensions and optimizations, program-ming languages, generative programming, real-time distributed systems, software patterns,

optimization techniques, and empirical analysis of object-orientedframeworks and middleware platforms. He is a student member of theIEEE.

Douglas C. Schmidt is a professor in theElectrical Engineering and Computer ScienceDepartment at Vanderbilt University. His re-search focuses on patterns, optimization techni-ques, and empirical analysis of object-orientedframeworks that facilitate the development ofdistributed real-time and embedded (DRE) mid-dleware running over high-speed networks andembedded system interconnects. Dr. Schmidthas served as a Deputy Office Director and a

Program Manager at DARPA, where he led the national R&D effort onDRE middleware. Dr. Schmidt has also served as the cochair for theSoftware Design and Productivity (SDP) Coordinating Group of the USgovernment’s multiagency Information Technology Research andDevelopment (IT R&D) Program, which formulated the multiagencyresearch agenda in software design. He is a member of the IEEE andthe IEEE Computer Society

. For more information on this or any other computing topic,please visit our Digital Library at http://computer.org/publications/dlib.

CORSARO AND SCHMIDT: THE DESIGN AND PERFORMANCE OF REAL-TIME JAVA MIDDLEWARE 13