net w ork liv ermore time sharing system (nl tss) · 2013. 4. 9. · net w ork liv ermore time...

26

Upload: others

Post on 25-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

Network Livermore Time Sharing System (NLTSS)

S Terry Brugger

Madhavi Gandhi

Greg Streletz

Department of Computer Science

University of California, Davis

March 7, 2001

Abstract

NLTSS was a distributed, object-orientedoperating system based on a pure messagepassing kernel and capabilities. This paperwill describe the goals of its development, anoverview of the system as well as a detailedexamination of some of its subsystems andwhat we learned from its development.

1 Introduction

The history of Lawrence Livermore NationalLab is closely intertwined with the history ofsupercomputing. Ever since the lab's found-ing in the early 1950's, it has used state ofthe art machines to solve diÆcult scienti�cproblems. In the 1960's it started lookingat ways that it could better utilize the re-sources of these huge machines and betterserve the scientists that used them. Fromthis, the Livermore Time Sharing System

(LTSS) was born. By the late 1970's withthe advent of the microprocessor, local areanetworks and advances in operating systemtechnology the Livermore Computing Cen-ter set forth to design a new operating sys-tem. This 4th generation operating systemwas "[e]xplicitly designed to support multi-processing, distribution, extendable objectoriented, uniform local & remote IPC, [and]least privilege access control." [21] This suc-cessor to LTSS was a speci�c implementa-tion of a larger framework called LINCS(Livermore Integrated Network ComputingSystem) which allowed distributed access toabstract object interfaces [20]. Designedwith this network functionality in mind, thenew operating system was called the Net-work Livermore Time Sharing System, orNLTSS.

The �rst three years of the projectconcentrated on research and developmentof prototypes examining various aspects

1

Page 2: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

of multicomputing, network protocols andservers. This period was followed by fouryears of intense development on the system.Beginning around 1987, NLTSS entered intoservice on the Cray XMP running alongsideLTSS through the use of a virtual hardwareabstraction layer [20]. In 1988 NLTSS en-tered full production on the Cray XMP andlater the YMP. It remained in service un-til shortly after Noon on March 22, 1993when the last NLTSS installation on theCray YMP was shutdown [9] and replacedwith Cray's version of UNIX (UNICOS).This paper will examine the goals of

NLTSS, take an overview look at the sys-tem from the contemporary viewpoint ofdistributed and object based systems, thenit will look in detail at NLTSS's use of ca-pabilities, it's CPU and �le servers and it'smessage passing system.

2 Goals

The primary goal of NLTSS was to de-sign an operating system which allowedthe application developer to take advan-tage of both the multiprocessing capabili-ties of supercomputer class machines andthe distributed computing resources of anetwork of computers. The NLTSS de-signers referred to this as a Multicomput-ing environment[21, 24]. The general goalsof the system were to facilitate extendibil-ity, be backwards compatible with LTSSand use standard interfaces [21, 24]. Con-trasting these goals with operating sys-tem technology at the time, they decided

that stage 3 of Kuhn's Model of Scien-ti�c Progress had been reached and ratherthan try to patch existing systems such asLTSS or UNIX, they decided to proposean alternative model [21]. They consideredUNIX unworkable as it had problems (atthat time) with extendibility, security, dis-tributed computing, multiprocessing, andits implementation had numerous problemssuch as a lack of clean modularization [21].The designers established numerous goals

in order to achieve the general goal of sup-porting a multicomputing environment. Forinstance, the system should do symmetricmultiprocessing of the kernel, servers andother user level applications. This symmet-rical architecture should be general and al-low for all tasks to compute and block forI/O. Any task synchronization and switch-ing should be low cost. The system librariesand scheduler should allow for asynchronoussystem calls and the ability to wait on anarbitrary number of processes. Finally, thesystem should perform deadlock preventionand detection [21, 20]. There were ad-ditional requirements to support the dis-tributed environment in a transparent man-ner. For instance, the syntax to access localand remote resources should be the sameand, in return, the resource should behavethe same way regardless of where it was ac-cessed from. The performance di�erence be-tween a local or remote access should be asminimal as possible. There should be a sep-aration of machine names from the human-readable names which allows the human-readable names to be completely indepen-dent of resource and access location. An

2

Page 3: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

object should have the same access rightsto a resource regardless of where it's access-ing the resource from. All accounting andallocation within the system should be uni-form. Finally, the implementation shouldbe such that access is only allowed to ob-jects via well de�ned interfaces [21, 24, 20].In order to achieve a highly scalable sys-

tem, the designers decided early to usea pure message passing architecture withbuilt-in support for the client-server model.For that reason, numerous goals were madefor the interprocess communication (IPC),�rst and foremost being network trans-parency: IPC calls should look the samewhether the client and server are on thesame or di�erent machines. To that end,the protocols should be low latency andhigh throughput. What latency did ex-ist should be hidden as much as possibleand the message system may be acceler-ated through the use of specialized hard-ware. Processes should have a great deal oflatitude in how they use IPC. This latitudeincluded the ability to use arbitrary commu-nications structures and message lengths,the ability to communicate with any otherprocess (subject to the security policy andmechanism), the ability to use multiple con-current message streams, and the abilityto use both synchronous and asynchronouscommunication. The message system alsoneeded to be clean in its design and im-plementation through the use of standardrequest & reply syntax & encodings, sep-aration of data & control, and separationof requests & replies. Finally, the messagesystem would provide ow and error control

and protect the data streams [21, 24, 20].While NLTSS was a revolutionary ad-

vance in operating system architecture atthe time, it also needed to support thelegacy LTSS environment that its user basewas used to, and which existing applicationswere designed for. To that end, NLTSSneeded to support existing user interfaces(speci�cally the terminal interface) and net-work services. It also needed to emulatethe LTSS at the system interface level sothat existing application source code couldbe moved to NLTSS with little to no e�ort.Finally, it should support incremental evo-lution by allowing applications to use someof the new functionality while other parts ofthe code continued to use the legacy inter-faces [24, 20].With the Livermore Computing Center's

long history of utilizing and operating su-percomputers they added some additionalconsiderations to the goals of the systemwhich came more from practical experiencethan purely academic operating system de-sign. For example, one goal was to mini-mize the human cost involved in learning,maintaining, developing for, or in any otherway using the system. The system shouldbe designed under the principle that highperformance (and, by extension, high cost)computing resources should be available toas large a user base as possible (not just theusers in one building for instance). Finally,the system should make it easy to integratenew technology - it needed to be exibleand expandable. This implied a modularor object-oriented design [24, 20].Finally, the designers recognized the is-

3

Page 4: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

sues common to all distributed and object-based systems as discussed in the overview,below, so there were commensurate goals forthe system to deal with those issues [21, 20].

3 Overview

3.1 Issues in Distributed oper-ating systems

NLTSS was designed from the beginning tobe a distributed operating system. As such,it has addressed most of the issues inherentin distributed operating systems. The com-mon issues in distributed operating systems[18] and how these are handled in NLTSSare discussed below.

3.1.1 Global Knowledge

Due to the lack of a global clock and globalmemory as well as unpredictable messagedelays, determining the entire state of thesystem at any given moment in time is im-possible. Thus, it is necessary to use eÆ-cient algorithms to approximate this state.NLTSS had no built-in support to achieveglobal consensus. The NLTSS designerssaw this functionality as being under thepurview of the application programmer [15].

3.1.2 Naming

The various objects to which a system refersneed to be named in some way. The issueof naming is especially important in a dis-tributed system because the various entitiesof the system need to understand a common

naming convention, such that there must bea global name space. NLTSS had such aglobal name space implemented with a Di-rectory Server that maps human readablenames to machine oriented names. Thisseparation is important as it frees users fromthe need to follow any naming conventionjust so the system can locate a given object.It also allows for multiple human readablenames for any single object. NLTSS also al-lows for multiple copies of the same object.Objects in NLTSS can be extended to buildnew types. Names are ASCII strings up to16 characters in length. This was seen asa major improvement over NLTSS's prede-cessor, LTSS, which only allowed 8-10 char-acter names. The namespace in NLTSS didnot need to support any type of automaticobject replication[21, 24, 20].

3.1.3 Scalability

A distributed operating system should guar-antee that the system performance doesnot degrade as more system resources areadded. Certain operating system techniquesthat work well for a small number of re-sources may not work well when the systembecomes substantially larger. NLTSS wasdesigned such that all the aspects of its dis-tributed nature were abstracted by the mes-sage system, which did not pose any limita-tions on the scalability of the system [13].

3.1.4 Compatibility

A distributed system may be composedof heterogeneous resources. Compatibility

4

Page 5: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

refers to the ability of the operating sys-tem to provide portability across these var-ious resources. NLTSS was an implementa-tion of the LINCS architecture, which spec-i�ed a standard architecture (including net-work and transport layer protocols) for ac-cessing abstract object interfaces [20]. TheNLTSS kernel and many of its lower levelcomponents were designed exclusively forthe Cray massive vector architecture. Inparticular, it was designed to accommodatemany of the irregularities in the Cray mem-ory and process model [15, 13]. Other com-ponents such as the �le server were compat-ible across various architectures such as theVAX. NLTSS provided no execution-levelcompatibility or binary compatibility as itwould have been prohibitively expensive.

3.1.5 Process Synchronization

When multiple processes can access ashared resource, the operating system mustsynchronize access to the resource in orderto ensure integrity. Typically, this is accom-plished through mutual exclusion. However,due to the lack of shared memory in a dis-tributed system, providing mutual exclusionis a challenge. As such, NLTSS did notprovide a mechanism to detect and recoverfrom deadlock, nor to provide mutual ex-clusion. These were seen as being underthe purview of the application developer.A simple mechanism that application devel-opers could use to achieve mutual exclusionwas through the use of �le locking. Alterna-tively, they could implement their own sys-tem using the exibility of NLTSS's message

system which allowed both synchronous(blocking) and asynchronous (non-blocking)communication between processes [21, 20].There were plans to add a semaphore server,however it was never implemented due tolack of user demand. There were numer-ous times throughout the development ofNLTSS where the servers themselves wouldgo into deadlock. The solution to this was�xing the servers such that the chain of de-pendencies that caused the deadlock wouldnot occur (deadlock avoidance) [15].

3.1.6 Resource Management

In a distributed system, the resources avail-able to a process can be both local and re-mote. The operating system must providea common method for accessing these re-sources. NLTSS is designed to deal withthis issue by considering all resources to beremote [21]. Typically, NLTSS migrates thedata to the computation. It does, how-ever, have the ability to perform computa-tion migration through the use of remoteservers and message passing. NLTSS alsoprovides distributed scheduling; it can dis-tribute the computational load amongst dif-ferent nodes through its process schedul-ing mechanism [21, 24]. Other consider-ations were given in NLTSS's design withrespect to resource management, such assupport for accounting of all resources, ad-ministrative allocation of resources and anavoidance on any restrictions on the numberor size of resources that an object can use[21, 24, 20]. NLTSS separated the low-levelhardware access driver (which had to reside

5

Page 6: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

on the machine with the hardware that itcontrolled) from the policy implementationon the server (such as the File Server or theProcess Server which, could be located any-where on the network) [5].

3.1.7 Security

The two fundamental aspects of securitythat a distributed system should address areauthentication and authorization. Authen-tication is the means by which a user is ver-i�ed to be who they say they are and au-thorization is the means by which the useof resources is restricted to certain users. InNLTSS, authentication is provided throughthe use of an Account Server [17]. Au-thorization is provided through the use ofcapabilities[16]. Numerous other considera-tions were given to security in NLTSS's de-sign. For example it has a consistent mul-tilevel mandatory access control and secu-rity policy for all resources, least privilegediscretionary access control to all resourceswith rights passing, and it allows multipleaccess rights for a given resource. Finally,its security is based on the principle of mu-tual suspicion between nodes [21, 24, 20].

3.1.8 Structuring

The structuring of an operating systemrefers to the method by which the operat-ing system components are organized. Themajor design decision in an operating sys-tem is if it is going to be monolithic or usea microkernel with services separated intoindividual processes. The microkernel ap-

proach lends itself much better to an object-oriented design (where each component isan object with a well de�ned interface)and the client-server model (where someprocesses or objects are dedicated serversthat handle requests from other processes orservers). NLTSS was designed as a pure mi-crokernel with separate drivers for low levelcomponents and user level servers to providesystem services. Everything in NLTSS, in-cluding the user level servers and all otherprocesses, was considered an object, whichplaces NLTSS in the realm of an object-oriented operating system. Another designtechnique that NLTSS leveraged was theprinciple of separation of policy and mecha-nism which allowed for maximum exibilityof the system.

3.2 Issues with DistributedObject-Based Program-ming Systems

The use of the object paradigm for cre-ating distributed systems can result in amuch more extendable, exible, program-mer friendly architecture. These systemsare called Distributed Object-Based Pro-gramming Systems (DOBPS) [4]. In thecreation of a DOBPS, issues, beyond thoseof a generic distributed system, need to beaddressed. We will brie y explain whatthese issues (based on the ontology givenin [4]) are and how NLTSS addresses them.

6

Page 7: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

3.2.1 Object Structure

As objects are central to the design of aDOBPS, their structure greatly in uencesthe design of the entire system. The twomain attributes of object structure are thegranularity and composition of the objects.

Granularity The granularity of objectsrefers to their relative size. There is atradeo� between how much control the sys-tem has over the objects and how many ofthe system resources are needed to man-age those objects. NLTSS considered mosteverything that the operating system dealtwith, such as processes, �les, terminals andtext strings, to be an object. Hence, NLTSSsupported large, medium and small grainedobjects. It should be noted that not all ob-jects in NLTSS were visible to, and hencemanaged by, the system. For example,linked lists that existed solely within a userapplication would not be managed directlyby NLTSS [7].

Composition The object composition isthe relationship between objects & pro-cesses in the system. If processes are partof the object, it's an active composition. Ifprocesses exist outside of objects, the sys-tem has a passive object composition. Ac-tive objects are more resource intensive (asmultiple processes may be required to per-form a task - one for each object), but theyare more amenable to client-server program-ming. Although processes are objects inNLTSS, not all objects have processes asso-ciated with them in NLTSS, hence NLTSS

has a passive object composition.

3.2.2 Object Management

Object management deals with the as-pects that a DOBPS must deal with thatare orthogonal to the types of issued de-scribed above (reliability, integrity, security,etc), but unique only insofar as the objectparadigm di�ers from the traditional viewof systems as procedural and monolithic.

Action Management Action manage-ment is concerned with issues such as se-rializability, atomicity and permanence. Inparticular, it is characterized by the type ofcommit system used by the system to insurethe integrity of the objects. In the requestscheme, commits are requested by the ob-jects. In the transition scheme, commits areautomatically performed by the system asactions complete. NLTSS was designed forthe supercomputer environment where sys-tem failures could mean the loss of weeksworth of computation. It did not do anytype of explicit checkpointing however. In-stead, it relied on a feature of the hardwareit ran on to dump out a copy of memory todisk in the event of a failure. NLTSS woulduse the process swap �le as a backup to thathardware feature [15].

Synchronization Synchronization is themethod by which serializability is achieved.Systems may use a pessimistic (only one ac-tion can access an object at any one time) oran optimistic (multiple invocations but nocommits until integrity is ensured) scheme.

7

Page 8: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

As noted previously, NLTSS left synchro-nization up to the application developer.

Security Security ensures that only au-thorized clients can invoke object meth-ods. The most common schemes are capa-bilities (which are little tickets that verifythe bearer is authorized) or control proce-dures (where incoming invocation requestsare processed by a "guard" that veri�es au-thorization). NLTSS is a pure capabili-ties based system. This will be explainedin greater detail below. Other aspectsof NLTSS's security model were discussedabove.

Reliability Reliability is the ability of thesystem to continue to operate when an ob-ject fails. It is commonly handled througheither object recovery or object replication.In an object recovery scheme, when an ob-ject failure is detected it is recreated us-ing either its last committed state (roll-back) or its last committed state with allin progress actions reapplied to the ob-ject (roll-forward). In an object replica-tion scheme, multiple copies of the objectare kept such that if one fails, another willservice its requests. This scheme is com-plicated by the need to have all copies ofan object consistent. NLTSS uses a roll-forward object recovery scheme that sup-ported multiple levels of failures. NLTSS isdesigned to do a deadstart with a minimalloss of state [21, 24, 20]. The most frequenttype of failure was when the system got intosome type of inconsistent state. This would

happen anywhere between every few min-utes (usually due to some type of hardwareor software bug) to every couple weeks. Inthis event, a hotstart "DS" would be per-formed that would simply scan the mem-ory of the machine and restore all processesto a known, good state. Every few weeksto months the errors would be more severeand require a warmstart "DSU" where theobjects would be restored from the mem-ory image and process swap �les mentionedabove. The most catastrophic failures, thetime between which was measured in years,required a full coldstart "DSB". As with awarmstart, as much object and process in-formation that could be restored, would be[15].While Chin and Chanson[4] do not dis-

cuss it, it should be noted that NLTSS wasdesigned with robust communication proto-cols and distributed exception throwing andhandling mechanisms [21, 24, 20].

3.2.3 Object Interaction Manage-ment

One of the most important functions of adistributed object-based programming sys-tem is managing the interactions betweenobjects through location transparency, in-vocation handling and handling failures.

Location of an Object DOBPS are de-signed to maintain location transparency ofall of its objects. In doing so, it needs away to locate objects when it is invoked.One scheme is to embed the location of theobject in that object's name. Another is to

8

Page 9: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

use a name server that maps object namesto locations. The third scheme is to keep asmall cache of mappings and if the decidedobject isn't found, broadcast a request forits location. NLTSS uses a combination ofthe encoding and directory server schemes.NLTSS supports numerous human readablenames for any one object [21, 24]. Each ob-ject has a single canonical machine readablename[24] called a capability [16]. This ca-pability has embedded into it the locationof the object [16]. NLTSS uses a directoryserver to map the human readable namesto capabilities. When the directory serveris presented with a human readable name,it �nds the corresponding capability, veri-�es that the requester is allowed to accessthat object and, if so, returns the capabil-ity [12]. Capabilities are discussed in moredetail below.

System-level Invocation HandlingWhen one object invokes the method ofanother object, it's the responsibility ofthe DOBPS to marshal that invocationto the proper object. The two primaryschemes used to achieve this are messagepassing and direct invocation. NLTSS wasdesigned to be a pure message passingsystem. Any and all object invocations(even those local to the machine) wouldbe executed through a message that waspassed to the kernel, then to the kernelof another node if necessary, and �nallydelivered to the desired recipient [21]. Itwas discovered that the overhead of sucha system (especially the expense of so

many context switches) was too high whenaccessing some system objects, particularlythe �le server. These select objects werehenceforth incorporated into the kernel.The message passing protocol remained thesame, the savings was solely in the lack ofcontext switches between the kernel andthe services as the messages were passedbetween them.

Detecting Invocation Failures It iseasy to detect failures that occur before ob-ject invocation is made (existing faults). Itis much more diÆcult to detect failures thatoccur during invocation (transient faults).It is necessary to do this however lest ob-jects block inde�nitely on an invocation thatwill never return. Numerous schemes forhandling transient faults exist from timeouts to invocation probes. NLTSS used abasic time out scheme to detect faults andkilled processes that were blocked waitingfor a response that would never arrive. Thismay have had catastrophic consequences onthe system if the transient fault was locatedin a critical server such as a �le server orthe process server. On the occasions thatsuch faults happened, the system typicallyhad to be restarted.

3.2.4 Resource Management

A DOBPS is particularly concerned withthe interaction between objects and sys-tem resources, such as object representationin memory and secondary store, or objectscheduling.

9

Page 10: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

Representation of Objects in Memoryand Secondary Store Objects are per-sistent if a copy is kept in secondary storeand the object can survive a machine fail-ure, otherwise the object is volatile. Per-sistent objects that reside both in memoryand secondary store are considered active,otherwise the object is only found on diskand is considered inactive. A DOBPS maykeep either a single or multiple copies ofan object in memory depending on the syn-chronization scheme. When a persistent ob-ject is updated, either the entire object canbe rewritten to disk (a checkpoint scheme)or only the changes since the last time thewhole object was saved can be stored (a logscheme). All objects in NLTSS are persis-tent. This was accomplished by virtue ofthe capabilities: as long as a capability to anobject existed, the object could be accessed(in this respect, the capabilities are like ref-erences in a programming language). Asexplained in the section on action manage-ment, NLTSS uses a checkpoint type schemeto backup its process objects.

Object Scheduling and Mobility Ob-ject scheduling is concerned with what pro-cessor a given object will reside on. Thismay be either explicit (given by the user)or implicit (determined by the system basedon metrics such as the relative load of ma-chines). Object mobility is concerned withhow objects are moved from one machine(such as one with a high load) to a di�erentmachine. NLTSS was designed for the su-percomputer environment, so its scheduler

is a distributed batch scheduler. NLTSSuses a combination of explicit and implicitscheduling: the system will schedule the ob-ject to reside on whichever machine ful�llsthe constraints given by the user. The usermay specify such metrics as the amount ofprocessing time and memory the object willrequire, as well as the machine architecturethe object should run on. The user has theoption of specifying which machine the ob-ject will reside on. NLTSS also has the no-tion of a priority based on the identity of theuser that submitted the job (so the users indepartments that paid for more of the ma-chine get preference in scheduling). NLTSSdid not have explicit support for object mo-bility. In particular, process objects weretied to one machine once scheduled and themobility of other objects (such as �les) hadto be handled at the application level. Lit-tle work was done with object mobility inNLTSS as there was no support for such aconcept in NLTSS's predecessor, LTSS andthe NLTSS user community was more inter-ested in backwards compatibility than newfeatures [7].

3.3 Other Issues Considered

During the design of NLTSS, other issueswere considered that were not found in thecommon literature. They stemmed mostlyfrom Livermore's experience in building andoperating large computer operating sys-tems. They included features to ease the op-eration of the system such as the ability toput the system into operator/maintenancemode, the ability to relocate, backup and

10

Page 11: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

purge objects, status reporting and perfor-mance monitoring. The designers madea conscience e�ort to separate policy andmechanism. Finally, the designers realizedthat system implementations are never per-fect, so the system had to be easy to debugthrough the use of tracing, event logs andsimilar features [21, 20].

4 Design & Implementa-

tion

Numerous parts of the design and imple-mentation of NLTSS were discussed in theoverview above, particularly those partsthan can be compared and contrasted withother distributed systems. Here we will lookat some of the components of NLTSS in de-tail.

4.1 Capabilities / DirectoryServer

Capabilities are tickets that establish thatthe bearer has permission to access someobject. They are a generalization ofcapability-list (C-list) operating systems. InC-list systems, the kernel handles all capa-bility granting and tracking. This doesn'tscale well for a distributed system [8], sothe general capabilities mechanism allowsprocesses to manage their own capabilities.In NLTSS, capabilities are synonymous withan object's machine-oriented name [16].

4.1.1 Capability Creation and Use

The capability representing some object iscreated at the time that object is createdby the server that creates the object, so the�le server creates a capability for every �lethat is created, the process server creates acapability for every process that is createdand so forth. This capability is passed backto the object that requested the creation.The capability is then sent with every accessrequest to show that the requesting objectis authorized to access the given object [16].

Any capability can be passed to anotheruser or object so that the other user or ob-ject can access the object represented bythe capability. This is the mechanism bywhich Discretionary Access Control (DAC)is achieved in NLTSS. The designers ofNLTSS believed that there was an inalien-able right to pass capabilities. If the sys-tem tried to disallow capability passing, theobject that was granted a capability couldalways act as a broker between the objectthat was being accessed and the object thatit thought should be allowed access. Thiscapability passing can be done without anyserver intervention, so it should be doneover a secure channel such that it can't besnooped and used by a third party [8].

NLTSS also implemented Mandatory Ac-cess Control (MAC) by virtue of requiringall objects to have an associated capabil-ity and all capabilities to have a declaredprotection level (Secret, PARD, Unclassi-�ed, etc.). NLTSS would not allow a user orprocess to access any object with a higherprotection level. This was done by means

11

Page 12: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

of ow control on messages at the kernellevel. Each message would have a certainprotection level. The kernel would verifythat messages were only delivered to pro-cesses with an equal or higher protectionlevel. The servers would then verify thatthe protection level of the contents of everymessage was equal to or lower than the mes-sage, otherwise the message was discarded.As all data ow took place via message pass-ing, this mechanism was suÆcient to pro-vide ow control [7].A user or object can assign arbitrary

human-readable names to the objects thatthey hold capabilities for. The capabilitiescan then be stored in a directory, allowinga user to log out or an object to be removedfrom memory. The user can then login, orthe object can be reconstructed later andthe capabilities can then be retrieved fromthe directory server. Directories can storeany number of capabilities. These capabili-ties may represent any type of object [8, 12].Hence, NLTSS allows users to store any-thing in a directory, not just �les. This al-lows for a natural structure where a processobject, numerous �le objects, a network I/Oobject and a windows interface I/O objectall representing a single logical job can bestored together in a directory. Users canthen do things such as easily pass the abilityto check and/or modify the job to others. Itis important to note that directories them-selves are objects and are hence accessedwith capabilities [8]. This allows directoriesto be nested inside other directories, hencecreating a resource graph. This is properlya directed graph and not a hierarchical tree

like the �le system in UNIX as a capabilitymay be stored in any number of directoriesby just as many di�erent names. The graphrepresents the global namespace and is dis-tributed among all the Directory Servers inthe system [20]. A directory object can evenbe stored in one of its subdirectories, hencethe graph is not acyclic. (This can be sim-ulated in the UNIX �lesystem with links,however those are either limited to a sin-gle disk partition or are symbolic and canbe ignored.) Each user has their own rootdirectory which contains the system directo-ries they need to have access to underneathit (as opposed to having the user's root bea subdirectory of some global root).NLTSS provides give and take directories

for each user to facilitate resource sharing.Any user has the ability to put capabili-ties into another user's give directory. Thatuser, who has sole read access to the direc-tory, then has access to whatever capabil-ities were given. Similarly, every user hasa take directory where they have sole writeaccess and all other users have read access:this allows that user to publish a resourcefor all other users to access [8]. This mech-anism is similar to, but not as general asthe formal take-grant model, but it achievesmost of the same functionality.Directories themselves are comprised of

numerous �elds in addition to the list of ca-pabilities that they store. Directories arelabeled with a certain protection level andaccess rights, they contain accounting in-formation such as usage counts, times ofcreation and access, and account to chargefor storage space. Additionally, the user

12

Page 13: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

can give each directory a comment up to48 characters in length. Directories sup-port functionality similar to disk directo-ries such as create, delete or list, as well assome functionality unique to their general-ized nature, such as reduce access, restoreor change header [12].

4.1.2 Capability Structure

A capability in NLTSS is between 68 and256 bits in length. It is composed of one 64bit Network Address and between 4 and 192bits of additional information. This addi-tional information includes, at a minimum,4 bits of protection level information (suchas Secret, Con�dential, etc.). As NLTSSwas designed for an environment where clas-si�ed work was performed, this informationwas considered essential. There were 8 1bit �elds that de�ned the type of access al-lowed: Destroy, Modify, Add, Read, Owner,Free Access and 2 reserved for future use.The free access bit, also called the inher-itance bit, is particularly interesting. Itcould be set on a directory capability andgiven to another process. The receiving pro-cess would then have access to the capa-bilities in that directory, but at no higherlevel than speci�ed for the directory thatwas given. For example, if User A has a di-rectory with a bunch of working documentsin it (that is, they have Read and Modifypermission), they can give a capability tothat directory to User B with only the readand Free Access bits set. User B then canread all the documents in that directory, butcan not modify them. Capabilities also con-

tained 2 bits of information about the server(whether it supports some, all or more fea-tures than a standard server), 8 bits of infor-mation on the resource type (File, Account,Process, Terminal, etc.), 10 bits of reservedspace and up to 160 bits of server dependentinfo [16].

4.1.3 Capability Protection

Protection of capabilities through encryp-tion is optional. Capabilities are encryptedwith a secret key known only to the server.This is done by encrypting word 1 and word3 with the server's secret key using DESand storing the result into word 2 (words inNLTSS were 64 bits long) [16]. This schemerequires that capabilities use all possible 256bits. While this scheme still leaves capabili-ties vulnerable to data theft, it prevents in-advertent modi�cation of the capability andgreatly reduces the chances that an attackerwill guess a valid capability for some object[8].Alternative methods of protecting the ca-

pability were considered, such as accesslists, encrypted client address and pub-lic key client address protection, each hadproblems in practice though. Access listprotection is where the server maintains alist of the addresses of clients that are al-lowed to access each object. Not only is thisapproach resource intensive for the server,but it is susceptible to a re ectivity attackagainst the directory server. Encryptedclient address protection embeds the ad-dress of the client into the capability andencrypts it using a secret key known only to

13

Page 14: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

the server. This relieves the burden of stor-ing access information on the server and,implemented properly, eliminates the sus-ceptibility to re exivity attacks. It does thisat the cost of additional CPU time, some-thing that is at a premium in the supercom-puting environment. The �nal protectionalso encrypts the client address into the ca-pability, but does so using public key en-cryption such that one client can encryptthe address of another client into the capa-bility and pass the capability to that clientwithout going to the server. This reducesthe CPU load on the server, however thecomputational expense of public key cryp-tography, especially given processor tech-nology in the 1980's when NLTSS was beingwritten, made this approach infeasible [8].

4.2 CPU and Process Control

NLTSS supports breaking processes intolightweight tasks (threads) through the useof a user level library [20, 5]. This increasesportability and allows the threading imple-mentation to be modi�ed independent ofthe kernel. The kernel itself was developedas a multitasked process so that multiplesystem calls could be handled concurrentlyon a multiprocessor system [20]. The CPUand process control is broken up into twopieces: the CPU driver and the ProcessServer. The CPU driver is the mechanismthat actually controls the CPU. The ProcessServer is the process that enforces the pol-icy by which processes get scheduled. Thereis one CPU driver per machine, however aprocess server may control the job schedul-

ing on multiple machines [5, 1].

4.2.1 CPU Driver

The CPU driver has two tasks. One taskhandles the time slicing between multipleprocesses. It will perform the context switchbetween processes at a given interval (times-licing). In the event of a signi�cant statechange, such as job completion or failurethis task noti�es the process server of thechange. When a job sends a message, thistask will forward the message to the mes-sage system and either switch to anotherprocess if the sending process has blockedor allow the process to continue [5]. Theother task in the CPU driver simply receivesnew jobs from the Process Server and addsthem to the queue. It also noti�es the mes-sage system of the process' arrival so thatany queued messages for it can be delivered[5].

4.2.2 Process Server

The Process Server handles all the func-tionality associated with processes, such astheir creation, destruction, forking and in-terrupt handling. Each process has associ-ated with it a header that contains infor-mation such as the conditions under whichit should be halted or noti�ed, protectionlevel, time elapsed, account to charge, mem-ory usage, message information, commentand name. In total, there are 39 �elds asso-ciated with every process [1].

14

Page 15: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

4.3 Message System

NLTSS provides one system call for inputand output. It is the Message system call[20, 6]. This call was designed to achievelocation independent and eÆcient commu-nication, support for shared memory multi-processing, process protection and domainseparation along with functional simplicity.

4.3.1 Location Independent Com-munication

The NLTSS message system provides a sys-tem call that allows processes on the samecomputer as well as processes on di�er-ent computers to communicate with eachother. Simply specifying network addressesin the message call does this. The mes-sage system provides a transport level in-terface as de�ned by ISO model. Whentwo processes from the same computer arecommunicating, bits are copied. When theprocesses communicating are on di�erentcomputers on the network, packets are ex-changed. These details are transparent tothe user of the message system call. Bychanging the network address used, accesscan be changed from local to remote. Sim-ilarly, programs written to service resourcerequests locally can be changed to directlyservice equivalent requests from anywherein the network. Clearly, NLTSS eliminatesmuch of the programming work needed toaccess services and provide access to ser-vices in a communication network.

4.3.2 EÆcient Input/Output

EÆcient I/O is achieved by minimizing do-main change overhead (Cray parlance for acontext-switch) and providing direct accessto the peripherals. NLTSS will only performa domain change before a process' times-lice is up if computation can no longer pro-ceed because it is completely blocked wait-ing for I/O to complete. The message sys-tem chains multiple I/O requests into a sin-gle call. Changing to the system domain isavoided by providing status of I/O requestsdirectly in the memory of the requestingprocess. NLTSS only supports data trans-fer directly to and from peripherals, henceavoiding unnecessary copying and bu�eringof that data.

4.3.3 Support for Shared MemoryMultiprocessing

With the support of the NLTSS processserver, two or more parallel forked pro-cesses may share the same memory space.The NLTSS message system call allows suchforked processes to initiate, control and syn-chronize both execution as well as I/O forany number of parallel tasks sharing mem-ory.

4.3.4 Process Protection and Do-main Separation

The NLTSS message system call is the onlyinterface between a process and everythingelse outside its memory space. The mes-sage system guarantees that only data in thebu�er pointed to by Activated Send Bu�er

15

Page 16: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

table is available to another process. It alsoguarantees that no data in the memory isexternally modi�ed unless it is in the bu�erpointed to by an Activated Receive Bu�ertable. The only exception to this rule iswhen one process has a capability to an-other. The NLTSS message system allowsprotection levels to be assigned to all dataso high protection data does not get exposedto processes that cannot provide that pro-tection.

4.3.5 Functional Simplicity

The NLTSS message system call uses a sin-gle data structure to keep the call as sim-ple as possible within the above design con-straints. The data structure describes eachdata transfer, a Send or a Receive, using asingle bu�er. The bu�er table contains Toand From network addresses, status of anindividual i/o request, three "action" bitsthat describe if the request is to Activate,Cancel or Wait. There are also quite afew other bits that can be set to give moreinformation about the transfer. These in-clude Done bit, BOM (beginning of mes-sage), EOM (end of message), Wake bit, etc.

4.3.6 Message System Implementa-tion

The NLTSS Message System follows theRendezvous and Transfer Mechanism [6].Basic operation of this message system isas follows:

1. Message sending process activates SendBu�er Table and the message system

looks for a corresponding active Re-ceive Bu�er Table in the receiving pro-cess.

2. If match is found, this rendezvous leadsto data transfer from Send Bu�er Tableto the Receive Bu�er Table until one orthe other is exhausted. The exhaustedbu�er table is then marked Done whichmay lead to waking up of sender or re-ceiver as required.

3. If no match was found in the receiv-ing process then the Send Bu�er Tableis marked as Send Blocked waiting formatching receive. The receiving pro-cess is noti�ed that a message is wait-ing to be sent to it. When a receive isactivated in the receiving process, theReceive Bu�er Table gets activated andit �rst checks if any send is blockedwaiting to it. If so, the message sys-tem starts the data transfer from SendBu�er Table to the Receive Bu�er Ta-ble. If not, the Receive Bu�er Table ismarked as Receive Blocked waiting fora matching send. Note that no noti-�cation to the sending process is donehere.

If there is a receiving process waiting withits Receive Bu�er Table for a send, then thisrendezvous is more eÆcient since the ad-ditional step of notifying the receiver thatsender is waiting to send is eliminated. Thisis more obvious where the rendezvous is be-tween two processes not on the same ma-chine. Planning to have the receives acti-vated before corresponding sends helps with

16

Page 17: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

eÆciency.Transfer of bu�ered data is asynchronous.

If a send bu�er is larger than the matchingreceive bu�er, send will do a partial datatransfer and wait for subsequent receives tocomplete the send. Correspondingly, if thereceive bu�er is larger than the send, thatreceive only partially completes until moredata can be received. The sender processcould put a special message ending marker(EOM, Wake) to force the receive to com-plete.NLTSS supports transfer to and from ar-

bitrary bit boundaries. Messages sent to de-vices get transferred directly from the user'smemory to the device if the device is capa-ble of the transfer. Direct device transfersreduce the cost of transfer substantially.

4.3.7 Protocol Compatibility Issues

The NLTSS message system can provide lo-cal communication by itself, but to com-municate with other computers, it requirescommunication support and protocol com-patibility at various levels up to and includ-ing transport level (ISO) [6]. NLTSS sup-ports the Delta-T transport protocol [19]and TCP/IP. There were problems withtransport protocols other than Delta-T. Forexample, most transport protocols only sup-port transfer of data in units of 8-bit bytesbut the NLTSS message system interfacesupports the transfer of an arbitrary num-ber of data bits. Additionally, most trans-port level protocols do not have the con-cept of protection levels, hence NLTSS userscommunicating with such protocols may get

only single protection level.Most transport level protocols are con-

nection oriented in that an explicit connec-tion open before data transfer and an ex-plicit connection close after transfer is com-pleted is expected. For performance rea-sons Delta-T chose not to explicitly openand close the connection for data transferand the message system follows that trend.However, the message system also providesthe BOM and EOM bits for functional andcompatibility reasons. These bits mark theopen and close of the connection, respec-tively.The NLTSS message system and Delta-T

use a �xed length 64 bit network address.Each network address uniquely identi�es acommunication port associated with a taskor set of tasks in the same process. The net-work address is broken down into an 8 bitnetwork identi�er, an 8 bit machine identi-�er, an 8 bit sub-machine identi�er, a 16 bitprocess identi�er and a 24 bit port number[20]. This may create problems with otherprotocols that use extendible or other verylarge forms of network addresses.The communication itself takes place over

streams. Streams are unidirectional con-nections that are identi�ed by the sourceaddress, destination address and a streamnumber. The stream number is similar toa capability in that knowledge of its valueis necessary to access the stream and it isdesigned to be unguessable [20]. Interruptsand other out-of-band signals are sent ona separate parallel stream in NLTSS andDelta-T, as a matter of philosophy. Thisis done to keep the protocol and interface

17

Page 18: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

simpler.

4.4 File System

The NLTSS �le system provides distributed�le access. The basic goal is to allowprocesses on any processor to access �lesthat reside either locally or remotely. Tosome degree, the NLTSS �le system waspatterned after Sun's Network File System(NFS). However, NFS itself fell short ofthe requirements of a �le system for a dis-tributed supercomputer environment [25].Speci�cally, NFS provides remote �le accessby keeping the �le data in its remote loca-tion and by transferring it to the request-ing process in fairly small pieces. Whilethis strategy usually is suÆcient for the re-quirements of a typical distributed environ-ment, it is insuÆcient for the needs of a dis-tributed supercomputer environment due tothe high-bandwidth characteristics of super-computers. To use these high-performancemachines eÆciently, it is necessary to mi-grate the data to the processor. Ideally, thismigration is performed at a high speed, andthe migrated data is then cached local to theprocessing in order to allow it to be accessedquickly during computation. The NLTSS�le system provides this functionality.

4.4.1 Utility of a Distributed FileSystem

The NLTSS �le system allows a process ona given machine to read and write not onlylocal �les, but remote �les as well [3]. Thistransparent distributed �le system capabil-

ity is of great utility in a distributed super-computer environment, in which migratingcertain functionality from number crunch-ing supercomputers to workstations is a de-sirable goal. For example, a numericallyintensive calculation can be run on a su-percomputer, and the resulting data can bevisualized by using graphics software on alocal workstation. By doing this, the su-percomputer is not burdened with graphicscomputations, which are more suited to beperformed on the workstation. Moreover,the graphics algorithms on the workstationcan use the services of the distributed �lesystem to access the data on the supercom-puter in a fairly transparent fashion, withno greater e�ort than would be required toanalyze local data. Another example of theusefulness of the distributed �le system waspointed out by the authors of NLTSS [3].They observed that the �le system allowedprogrammers to use a full-featured text ed-itor on a VAX to directly edit a �le on aCray, for which the available text editingfacilities were less sophisticated.

4.4.2 The NLTSS File Resource

A disk �le under NLTSS is referred to asa �le resource, and is identi�ed by a capa-bility that authorizes access to it. A �leresource is essentially a record consistingof several �elds [10]. The Body �eld isthe �eld of the �le resource that containsthe actual �le data; it contains unformat-ted data written there by a customer pro-cess. The other �le resource �elds describethe attributes and location of the Body, and

18

Page 19: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

are collectively referred to as the Header[2]. The Header consists of the follow-ing �elds: Allocated Length, Altered Flag,Comment, Fragmented List, Length, Life-time, Lock Type, Logical Unit, Noti�cationAddress, Physical Unit, Protection Level,Storage Account, Subtype, Time Last Read,Time Last Written, Time of Creation, To-tal Reads, Total Writes, Unique File ID,Usage Hint and User Checksum [10]. Ascan be seen from this list of �le resource�elds, a NLTSS �le resource encompassesmuch more than just the �le data.

4.4.3 File Capabilities

A �le capability identi�es a �le resourceand authorizes access to it. In order for aprocess to perform any operation on a �le,the process must have a capability for that�le. Each �le capability includes the fol-lowing access bits: Destroy, Modify, Add,and Read [10]. The process can performa given type of operation on the �le onlyif the corresponding access bit is set. TheRead and Destroy access permissions havethe expected meanings; Read gives the pro-cess read access to any �eld of the �le re-source, while Destroy gives the process theability to destroy the �le. Destroy also givespermission to revoke capabilities to the �le.The Modify permission gives the process theability to modify any �eld of the �le resourceexcept the Allocated Length �eld. The Addpermission gives the process the ability toincrease the size of the Body �eld, whichnecessarily includes the ability to alter theAllocated length �eld.

4.4.4 File System Components

The NLTSS �le system performs disk I/Othrough the use of several basic �le systemcomponents: �le servers, hardware interfaceprocesses (HIPs), I/O interfaces, and diskdrivers.

File Server The �le server for a speci�ccomputer is responsible for managing all ofthe �les on the disks of that computer. The�le server handles the allocation and deallo-cation of disk space, maintains a catalog of�les currently residing on the disk, manages�le headers, controls access to �les using theaccess bits of �le capabilities, and handlesI/O requests [2, 10]. NLTSS �le servers sup-port fragmented �les, can operate in parallelwith the process requesting disk access, andcan reside on a machine other than the onewhose disk is being managed [2]. Also, the�le server was designed to ensure that thefailure of a single disk would not result inthe loss of �les.The �le server provides four categories

of functionality: �le resource management(sometimes referred to as �le header man-agement), �le access management, I/O re-quest management, and operations pertain-ing to the handling of reply monologs [10].Reply monologs and the NLTSS ControlMonolog Record (CMR) are used to providecommunication between the �le server andthe customer process during the process of�le I/O, and will not be described in detailhere. The other three categories will be dis-cussed brie y.File resource management encompasses

19

Page 20: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

the following operations: Change, Create,Create and Write, Destroy, and Interrogate[10]. Change is used to change a writable�eld of a �le resource. Create is used tocreate a new �le. Create and Write is usedto create a read-only �le and to write the(static) data that this �le is to contain. De-stroy, as its name suggests, is used to de-stroy a �le. The �nal �le resource man-agement operation, Interrogate, is used toquery a readable �eld of a �le resource.

File access management involves the op-erations New Capability, Reduce Access,and Set Lock [10]. New Capability providesthe functionality for generating a new capa-bility to a given �le, or to invalidate existingcapabilities. Reduce Access is used to re-duce the access permissions associated withthe capability to a �le. Set Lock is used toset a lock (pertaining to either read, write,or read/write operations) on all �elds of a�le resource.

I/O request management operations in-clude Pattern, Supply Server Data Address,Read, and Write [10]. Read and Write areused to read data from the Body �eld of a�le resource or to write data to this �eld, re-spectively. Pattern is used to write a spec-i�ed pattern over the Body �eld of a �le.Supply Server Data Address is used to re-quest the �le server to provide a capabilitycontaining the address of the data mover bywhich data will be transferred to or fromthe customer process.

Disk Driver and I/O Interface Thedisk driver is low-level software that controls

hardware-speci�c aspects of the process ofactually reading data from and writing datato the physical disk hardware. The I/O in-terface provides a somewhat higher level in-terface to the low-level disk driver software.At the time NLTSS was designed, it was de-cided that the disk driver and interface thatwere already resident on the Crays at LLNLwould be retained and incorporated into theNLTSS �le system [11].

Hardware Interface Process (HIP)Because of the decision to reuse pre-existingdisk driver and interface software, theNLTSS �le system was obligated to repli-cate the I/O request format expected bythis software. Therefore, an additional levelof software was needed to translate be-tween NLTSS disk I/O requests and diskdriver I/O requests. This software inter-face was named the Hardware Interface Pro-cess (HIP) [11]. Another bene�t of this ar-rangement is that it provides an abstrac-tion that encapsulates the device speci�ccode of the device driver and provides auniform I/O interface to the HIP. There-fore, the transition to di�erent disk hard-ware, for example, would be simpli�ed.The HIP consists of several di�erent tasks,which are coordinated with each other usingstandard semaphore synchronization mech-anisms [11].

20

Page 21: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

4.4.5 Input Output Descriptors(IODs) and Component Inter-action

IODs Input Output Descriptors (IODs)are 192-bit data structures that are passedas messages between NLTSS �le servers,HIPs, and disk driver I/O interfaces, andwhich facilitate the communication betweenthese components. There are four va-rieties of IOD: a File-Server-to-Disk-HIPIOD, a Disk-HIP-to-I/O-Interface IOD, anI/O-Interface-to-Disk-HIP IOD, and a Disk-HIP-to-File-Server IOD. Although all fourtypes of IOD are 192 bits in size, the ex-act structure of the �elds that compose eachIOD di�ers somewhat from type to type[11].

File System Component InteractionTo illustrate the roles played by the variouscomponents of the NLTSS �le system, andto describe the interactions (via IODs) be-tween these components, we must considerwhat happens during a generic �le I/O op-eration. First, the �le server receives a log-ical I/O request from a customer process.This request for a data transfer contains a�le capability, the address of the �rst bit ofdata to be transferred, and the total num-ber of bits to be transferred, the type ofdata transfer, and the network addresses tobe used for the transfer [11]. The �le serverthen examines the �le capability and deter-mines if the requesting process has the req-uisite permissions to perform the requested�le operation. If so, the �le server proceedsto construct one or more IODs from the

data in the logical I/O request. These areFile-Server-to-Disk-HIP IODs, and there isone for each physical disk fragment speci�edin the logical I/O request (up to a maximumof eight fragments) [11]. These IODs arethen transferred to the Hardware InterfaceProcess (HIP).The HIP is responsible for translating the

request(s) from the �le server into a for-mat that the disk driver understands. Inaddition, it has the crucial role of obtain-ing the memory addresses within the cus-tomer process at which the data transfer isto take place. It obtains these data directlyfrom the customer process itself, since the�le server has no knowledge of this infor-mation [11]. Note that this is required onlyif the data transfer is a disk-to-memory ormemory-to-disk transfer, and is not neces-sary for a disk-to-disk transfer (which is justa �le copy, and does not involve any memoryaddress within the customer process). Like-wise, for a disk-to-memory or memory-to-disk transfer, the HIP needs to "lock down"the requesting process to prevent it from be-ing swapped out of memory on a contextswitch during the �le transfer. It does thisby issuing a privileged message system re-quest called an I/O Block [11]. Next, theHIP constructs one or more Disk-HIP-to-I/O-Interface IODs based on the informa-tion in the File-Server-to-Disk-HIP IODs re-ceived from the �le server (again, there isone IOD for each fragment). These IODs(or, as many of them as possible) are thensent to the I/O interface, which essentiallyrelays them to the disk driver by placingthem on the disk driver queue.

21

Page 22: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

The completion of disk operations is mon-itored by the I/O interface. When all of thephysical fragments of a single logical I/O re-quest are completed, the I/O interface con-structs a single I/O-Interface-to-Disk-HIPIOD (regardless of how many fragmentsthere were), and passes it up to the HIP[11]. Alternatively, if an error occurs, theIOD passed up to the HIP indicates thisfact.When the HIP receives an I/O-Interface-

to-Disk-HIP IOD, it either sends more Disk-HIP-to-I/O-Interface IODs to the I/O in-terface or, if the entire logical I/O job iscomplete, it constructs a Disk-HIP-to-File-Server IOD to pass back up to the �le server[11]. Of course, if the HIP receives an errorIOD from the I/O interface or detects anerror itself, it will send an error IOD of itsown to the �le server. In either case, it thenissues a privileged message system requestcalled an I/O Unblock , which causes thelocked down process to be released; sincethe I/O operation is complete, the customerprocess is once again eligible to be swappedout of memory on a context switch. Uponreceipt of the Disk-HIP-to-File-Server IOD,the �le server can communicate to the cus-tomer process that the I/O operation is �n-ished, or that an error has occurred, de-pending on the content of the IOD. At thispoint, the entire logical I/O operation is �n-ished.

4.4.6 Performance Considerations

In the initial design of the NLTSS �le sys-tem, the �le server, hardware interface pro-

cess, and driver/interface components ofthe �le system were all separate processes.Communication between the processes wasachieved by passing IOD messages throughthe NLTSS Message System. Although theencapsulation achieved in this model is con-ceptually attractive, these components wereeventually merged due to performance rea-sons [11]. At �rst, the �le server (FS)and the hardware interface process (HIP)were merged into a single process called theFSHIP. Later in the evolution of the sys-tem, the driver functionality was incorpo-rated into the FSHIP as well [11]. In theFSHIP process, there are two queues, oneto hold the File-Server-to-Disk-HIP IODsand the other to hold the Disk-HIP-to-File-Server IODs. The FS and the HIP arepart of the same FSHIP process, and thesetwo queues are shared between both com-ponents; access control is provided throughsemaphore-based synchronization [11].

Although the individual NLTSS �le sys-tem components were merged in the sensethat they were brought into the same pro-cess space, the codes for the componentsremained modularized, thereby preservingmuch of the original encapsulation, at leastconceptually. Meanwhile, a performanceenhancement was achieved by converting toa system with a single �le system process(per physical disk, of course). Such trade-o�s are instructive, and this modi�cation indesign was one of the lessons learned in theNLTSS experience.

22

Page 23: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

5 Lessons Learned

NLTSS was intended to be a revolution-ary advance in operating system technology.For the most part, it succeeded. Of its longlist of goals, most were successfully imple-mented. The goals that weren't met wereprimarily those for which there was insuf-�cient user demand, such as encryption ofcommunication channels or automatic dis-tributed deadlock detection. The primarychange in the design that came about af-ter putting the system into action was themovement of critical services such as the�le server and CPU server into the kernel,which was a sacri�ce made purely to avoidthe context switch overhead to these heav-ily used services. The NLTSS team madea conscious e�ort to keep a clean separa-tion between these components and the low-level kernel though in the hopes that givenadvances in hardware technology those ser-vices could one day be moved back into userspace [7]. It is suspected that some of theseboundaries were crossed though while theteam was under pressure to provide higherperformance [14].

In retrospect, NLTSS's features do notseem very revolutionary by today's stan-dards. The methods that NLTSS's archi-tects used can be found in most any com-puter science text that covers distributedoperating systems such as [18]. What isnotable about NLTSS is that addressed is-sues such as scalability, reliability and in-tegrity before they were identi�ed as be-ing issues common to all distributed sys-tems. Furthermore, NLTSS was more than

an academic exercise: it actually supporteda demanding user community for over threeyears of production use. Finally, it im-plemented all these features on the realmemory Cray architecture which was notamenable to being controlled by an ad-vanced operating system [15].

During the shutdown ceremony forNLTSS, Dick Watson attributed its demiseto the widespread acceptance of UNIX.While NLTSS was backwards compatiblewith its predecessor, LTSS, it was notcompatible with UNIX, which many usersdemanded[22]. In retrospect, it proba-bly wasn't that the users actually wantedUNIX, just that UNIX was good enough toperform the tasks that the users required.While NLTSS was designed to address whatthe designers saw as numerous shortcomingsin UNIX, the users didn't actually use theseadvanced features as they were locked intothe LTSS way of doing things. Instead oftaking advantage of the new features, userscomplained that operations took longer tocomplete and they couldn't understand therational for the change [7]. In the end,the cost of developing and maintaining anoperating system was just too high whena version of UNIX that was good enoughwas readily available. Eight years after hisspeech at the last NLTSS shutdown, DickWatson attributed the reason for NLTSS'sdemise was that "it was too revolutionary,"it tried to �x all the problems with existingsystems which made it so di�erent from any-thing else that it never really got accepted[23].

23

Page 24: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

6 Conclusion

NLTSS was a distributed, object-orientedoperating system based on a pure messagepassing kernel and capabilities developedat Lawrence Livermore National Lab fromthe late 1970's through 1993. As it wasdesigned to be a full-featured productionlevel operating system supporting a mul-ticomputing environment, it had numerousgoals concerning its use of multiprocessing,message passing, interprocess communica-tion and distributed processing. Other goalsfocused on backwards compatibility, oper-ational considerations and issues commonto distributed and object-based operatingsystems. An overview of the issues in dis-tributed operating systems and distributedobject-based programming systems was pre-sented along with how NLTSS addressedeach issue. The design and implementa-tion of NLTSS with particular regard forthe capabilities & directory server, CPU &process control, message system, and �lesystem were covered in detail. Finally, welooked at what can be learned from NLTSSin retrospect.

7 Auspices & Acknowl-

edgements

This paper is a personal work of the authorsand was not developed under the auspicesof the United States Department of Energy,the University of California or LawrenceLivermore National Lab. Any opinions ex-pressed herein are solely those of the authors

and may or may not re ect the views oftheir respective employers or the Universityof California, Davis. NLTSS was developedunder the auspices of the United States De-partment of Energy by Lawrence LivermoreNational Lab under contract number W-7405-ENG-48.The authors gratefully acknowledge the

assistance of Jed Donnelley, David Fisher,Donna Mecozzi, Jim Minton and Dick Wat-son, without whose assistance this paperwould not have been possible.

8 References

References

[1] Ralph Allen. NLTSS process server.Technical report, Lawrence LivermoreNational Laboratory, December 1984.

[2] Charles Athey III, Je�rey Clark,James E. Donnelley, Pierre Du Bois,Donna T. Mecozzi, and James A.Minton. NLTSS. Lawrence LivermoreNational Laboratory, Internal presen-tation, date unknown.

[3] Charles Athey III, Kent Crispin, PierreDu Bois, Donna T. Mecozzi, andJames A. Minton. NLTSS/LINCS for apresentation to Nuclear Software Sys-tems Division. Lawrence LivermoreNational Laboratory, Internal presen-tation, November 1982.

[4] Roger S. Chin and Samual T. Chanson.Distributed object-based programming

24

Page 25: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

systems. ACM Computing Surveys,23(1), March 1991.

[5] James E. Donnelley. The internalstructure of NLTSS. Technical report,Lawrence Livermore National Labora-tory, June 1983.

[6] James E. Donnelley. The NLTSS mes-sage system: De�nition and implemen-tation for the Cray-1 and Cray X-MPcomputers. Technical report, LawrenceLivermore National Laboratory, May1984.

[7] James E. Donnelley. Personal commu-nication, February 16, 2001.

[8] James E. Donnelley and John G.Fletcher. Resource access control ina network operating system. In ACM

Paci�c '80 Conference, San Francisco,

CA, November 1980.

[9] Invitation to NLTSS shutdown cere-mony. Lawrence Livermore NationalLaboratory, Memorandum, March1993.

[10] Donna T. Mecozzi. NLTSS �le server.Technical report, Lawrence LivermoreNational Laboratory, April 1986.

[11] Donna T. Mecozzi, James A. Minton,Donald L. von Buskirk, and Joe Requa.NLTSS disk I/O. Technical report,Lawrence Livermore National Labora-tory, July 1988.

[12] James A. Minton. NLTSS directoryserver. Technical report, Lawrence

Livermore National Laboratory, July1981.

[13] James A. Minton. Personal communi-cation, February 23, 2001.

[14] James A. Minton. Personal communi-cation, March 6, 2001.

[15] James A. Minton and David Fisher.Personal communication, February 16,2001.

[16] James A. Minton and Donna T.Mecozzi. What are capabilities? Tech-nical report, Lawrence Livermore Na-tional Laboratory, February 1986.

[17] Norm Samuelson. NLTSS accountserver. Technical report, Lawrence Liv-ermore National Laboratory, Septem-ber 1985.

[18] Mukesh Singhal and Nirnajan G. Shiv-ratri. Advanced Concepts in Operating

Systems. McGraw-Hill, 1994.

[19] Richard W. Watson. DELTA-T pro-tocol speci�cation. Technical report,Lawrence Livermore National Labora-tory, December 1982.

[20] Richard W. Watson. Working notes:Motivation, goals and developmentstrategy of NLTSS. Lawrence Liv-ermore National Laboratory, Workingnotes, July 1987.

[21] Richard W. Watson. The architectureof future operating systems. Lawrence

25

Page 26: Net w ork Liv ermore Time Sharing System (NL TSS) · 2013. 4. 9. · Net w ork Liv ermore Time Sharing System (NL TSS) S T erry Brugger Madha vi Gandhi Greg Streletz Departmen t of

Livermore National Laboratory, Inter-nal presentation, January 1989.

[22] RichardW. Watson. Eulogy to NLTSS.Lawrence Livermore National Labora-tory, Internal presentation, March 22,1993.

[23] Richard W. Watson. Personal commu-nication, January 29, 2001.

[24] Richard W. Watson. Requirementsand architectural features of futureoperating systems. Technical report,Lawrence Livermore National Labora-tory, circa 1989.

[25] Richard W. Watson. Distributed com-puting and operating systems for thesupercomputer environment. Technicalreport, Lawrence Livermore NationalLaboratory, date unknown.

26