a readable tcp in the prolac protocol language

A Readable TCP in the Prolac Protocol Language

February 20, 1998

AbstractImplementing network protocols correctly and readably with high performance is a difficultproblem, one naturally addressed by specialized languages. However, most protocol lan-guages concern themselves with provability and machine verification rather than readability,extensibility, and performance. Prolac is a new statically-typed, object-oriented protocollanguage designed for implementation and human readability, rather than specification andautomatic checking. We present a working partial TCP implementation in Prolac, directlyderived from 4.4BSD. Our implementation is modular—protocol processing is logically di-vided into minimally-interacting pieces; readable—Prolac encourages top-down structureand naming intermediate computations; extensible—subclassing cleanly separates protocolextensions like delayed acknowledgements and slow start, leaving the base protocol clean;and, eventually, efficient—the compiler can remove almost all expensive language features(like dynamic dispatch) using simple global analysis. This TCP implementation is designedfor real use, as well as for teaching and research into TCP extensions.

1 IntroductionNetwork protocol languages and compilers have been an active research area for decades [DB89,CDO96, BB89, Nes90, AP93, HA89]. Protocols are difficult to design and implement; pro-tocol correctness and performance is vital; and protocols tend to be self-contained—allthese factors suggest specialized languages as a solution. Most existing protocol languagesfocus on provability: their underlying theoretical models are designed for verification andtesting, often making efficient compilation for the real world impossible. In fact, in mostlanguages, compiled implementations are substantially slower than hand-coded versions.Recently, the Esterel protocol compiler has reported good performance for real-world pro-tocol implementations [CDO96]; unfortunately, its underlying theoretical model—finite

1

state machines—has serious problems for modeling protocols: the division into states oftendoes not correspond to anything real in the protocol, and relationships between states canbecome very complicated and difficult to change, even in carefully layered protocols [vB86].

Prolac, the language we describe in this paper, takes a different tack. Prolac is focusedon readability rather than provability, and on the human programmer rather than a ma-chine verifier. Its influences are readable programming styles—specifically, object-orientedprogramming and functional programming—rather than a theoretical model of communi-cation. The language contains no communication-related abstractions, which would limitits applicability; instead, it provides enough power for a programmer to create her ownabstractions. Its intended application—network protocols—affected its versions of commonobject-oriented concepts, as well as its rule-based approach. A Yacc-like [Joh75] mechanismlets the programmer escape to C to express implementation details, keeping Prolac itselfclean. Finally, the language was designed for efficiency, and none of its aspects are theo-retically difficult to implement: using Prolac to produce a fast protocol implementation is arealizable goal. The current Prolac compiler already performs some optimizations (removingdynamic dispatches, for example) significantly better than any C++ or Java compiler could.Section 3 describes the Prolac language and its compiler in more detail.

The heart of this paper, however, is section 4’s description of Prolac’s first real-world use:our reimplementation of most of 4.4BSD’s TCP processing in Prolac. Our TCP implementa-tion is modular and readable, a great contrast to the BSD model. Computation is broken intosmall, appropriately named pieces; inheritance is used to break large, complex functionalityinto focused parts joined by relatively simple interfaces. The implementation’s top-downdesign remains visible in the final specification with no overhead incurred. Our frameworkis easily extensible; four logical TCP extensions (delayed acknowledgements, slow start, fastretransmit, and header prediction) have been implemented as extensions to a clean basethrough subtyping. These extensions are simple—each one fits in a single source file withless than 60 lines of Prolac—and can be independently and transparently turned on with nochanges to the base protocol. Our implementation currently runs at user level, but has beendesigned for interfacing with BSD kernel networking structures. (The final submission willpresent the Prolac specification running inside a BSD kernel.)

Although our TCP specification currently runs at user level, it can communicate withother TCPs over a real network using the Berkeley packet filter system [MJ93]; comparingtcpdump output shows that we send equivalent packets to a 4.4BSD kernel (OpenBSD 2.1)performing the same tasks. A simple performance-counter comparison described in section 5shows our TCP input processing running two to three times slower than 4.4BSD. Some ofthis slowdown comes from our simplistic testing infrastructure, which performs more copiesand is generally slower than similar infrastructure in the BSD kernel; some comes from lackof performance tuning in the specification; and some comes from the apples-and-orangestest of kernel vs. user-level. The final submission will present real performance numbers for

2

a kernel Prolac TCP, which, as we describe below, we have reason to believe will be similarin performance to 4.4BSD. Even aside from performance, the TCP specification we presentin this paper is still noteworthy for its readability and extensibility.

2 Related workThis section presents the protocol languages which have most influenced our work. Two ofthe most-studied protocol languages are LOTOS [BB89] and Estelle [DB89], two “formaldescription techniques” originally intended for developing the ISO OSI protocol suite.LOTOS is the higher-level of the two, a functional language based on Milner’s Calculus ofCommunicating Systems [Mil80]. The LOTOS notation is very terse and expressive, duepartially to its functional style—a style we adopted for expressivity and readability. Estelle,in contrast, is a Pascal-like language; an Estelle specification is structured as a set of finitestate machines communicating via broadcast signals. Chung and Sidhu have reported semi-automatic (meaning partially hand-written) implementations of Estelle specifications withreasonable performance [SCB90], but do not report experience with large protocols. Wefind both LOTOS and Estelle specifications extremely difficult to read.

RTAG [And88] is based on a different formal model: context-free attribute grammars.RTAG generally seems more readable than LOTOS and Estelle, a feature that led us tobase Prolac’s predecessor on grammars. (This still survives in Prolac’s “rule” terminology.)However, RTAG’s performance is problematic due to an inherently difficult theoreticalproblem: parallelism in the language.

Two things these protocol languages all have in common are explicit parallelism anddifficulty of efficient implementation. (The problem is most extreme with LOTOS, wherevalid programs can potentially generate an infinite number of processes [WvB89].) Thedesigners of Esterel [BdS91] addressed this problem by removing asynchronous parallelismfrom Estelle, a lead we have followed in Prolac. Impressive performance results have beenreported for a restricted Esterel version of TCP [CDO96]. However, we still find Esterelspecifications difficult to read due to its underlying formal model, which has also beenreported to be difficult to extend [vB86].

3 LanguageThis section is an introduction to the Prolac language. We do not provide a thoroughdescription of the language, focusing instead on general features and design goals.

Prolac is a pragmatic implementation language, not a theoretical or proof-amenablespecification language. It is not based on any model of protocol processing; instead, languagedesign focused on programming questions: How can protocol definitions be made morereadable? Easier to instrument? Easier to extend? Existing languages like C++ and Java are

3

Name Syntax DescriptionType declaration E :> T “E has type T”Arrow E1 ==> E2 “If E1 then E2”Case bars E1 ==> E2 ||| E3 “If E1 then E2 else E3”Let let x = E1 in E2 end Local variable definitionMinimum min(E1, E2) Minimum of E1 and E2; E1 and

E2 are calculated only onceMinimum assign x min= E x = min(x, E)Maximum, maximum assign Analogous

Table 1: New operators in Prolac. Arrow and case bars have very low precedence (lowerthan C’s comma); let has no precedence.

inadequate for various reasons, ranging from inherent inefficiency to syntactic tediousnesswhen dealing with common protocol issues. We aimed to keep Prolac largely conventional,hoping it would therefore be easy to grasp. However, our focus on protocol-related issuesand an additional concentration on simplicity has led to some novel language contributions,such as module operators and implicit rules, which are described below.

3.1 Rules and computation

Rules perform all computation in Prolac, just as functions perform all computation in C. Infact, rules and functions are conceptually identical; the different terminology is intendedto highlight a difference in scale. While a C function is often hundreds of lines long, mostProlac rules are less than 5 lines.

This simple rule returns its argument times two:

twice(n :> int) :> int ::= 2 * n;

Thus, twice(3) == 6.Prolac is an expression language, like the functional languages on which it is modeled

(ML, Haskell). Each rule body is a simple expression, and the value of that expression isthe return value of the rule. In contrast, a C function implementing twice would have areturn statement in its body. Prolac has all of C’s operators, as well as a few of its own; themost important of these are summarized in Table 1. All of the new operators were directlyinspired by real protocol code: the arrow operator, for example, naturally represents theform of conditional processing commonly found in protocols.

One of Prolac’s design principles was the elimination of syntax, particularly routine orboilerplate syntax, in the interest of readability. Prolac rules tend to perform relatively little

4

computation themselves; a large task is divided into many small pieces. We find that, moduloa slight learning curve, the most readable syntax for small rules is often the shortest: nothinggets in the way of understanding the code. Prolac is an expression language because of thisprinciple; several other applications of the principle were inspired by network protocols.First, many rules in a protocol definition return a Boolean value (Is this TCB in the listenstate? Is this a valid acknowledgement?); second, many rules do not take arguments. There-fore, Prolac assumes a bool return type, and parentheses can be left off when calling a rulewith no arguments. These definitions are functionally identical:

closed() :> bool ::= current−state() == 0;closed ::= current−state == 0;

Prolac is wedded to the C language through uninterpreted actions, much like the Yaccparser generator. (The Prolac compiler generates C code from a Prolac program, directlyincluding any C actions when appropriate.) Actions are used to interface with the systemoutside the Prolac protocol specification.

3.2 Modules and object orientation

Prolac rules are organized into related groups called modules, which can also contain data(called fields). Modules may extend other modules through inheritance, and may provide newdefinitions for their supertype’s rules; the correct definition is chosen at runtime (dynamicdispatch). Thus, like any number of languages, Prolac is object oriented.

Prolac’s object orientation is similar to Java’s: the language is statically typed; all code ispart of some module; a module can have at most one parent; every rule is potentially subjectto dynamic dispatch; and Prolac source code is completely order-independent. However,Prolac lacks some Java functionality: there is no interface inheritance, and different rulesmust always have different names.

As in C++, the single mechanism of inheritance is used for interface inheritance, where amodule implements an interface specified by its supertype; and implementation inheritance,where a module adds to or overrides part of its supertype’s definition. Our TCP specificationalso uses inheritance to build modules for complex subsystems from smaller parts. Forexample, the module representing the base protocol’s TCB (transmission control block)is built through successive inheritance from 6 submodules (basics and connection state,windows, output, timeouts, round-trip time measurements, and retransmission).

The examples in Figure 1 demonstrate Prolac syntax for most of these concepts. Themodule keyword introduces a module (equivalent to a Java class); inside a module, the fieldkeyword introduces a field (equivalent to a slot or instance variable). The Debug−TCB−Statemodule is a subtype of TCB−State (a module’s parent, if any, is listed in the module headerafter ‘:>’).

5

module TCB−State {field s :> int;constructor ::= s = 0;set−state(i :> int) ::= s = i;listen ::= s == 1;enter−listen ::= set−state(1);// ... code for other states ...

}

module Debug−TCB−State :> TCB−State {set−state(i :> int) ::={ printf(“entering state %d\n”, i); },s = i;

}

Figure 1: Two simple modules from the TCP specification, implementing the TCB state(i.e., closed, listen, syn-sent . . .). The Debug-TCB-State module additionally reports when theTCB state is changed using a C action.

Prolac modules can optionally have constructors, rules which are run whenever an objectof that module is created. These rules are named ‘constructor’; the TCB−State module has aconstructor initializing the s field (the connection state) to zero, so that each TCB−State objectis initially closed. The appropriate constructor is called automatically if an object is createdon the stack (in a let, for example), but—in the interests of flexibility and simplicity—Prolachas no primitives for heap allocation. Instead, the user must get memory for an object insidea C action (using malloc, for example) and call the constructor directly on the resultingpointer.

The TCB−State module provides a set of rules for testing what state the connection is in(listen, syn−sent, etc.) and a set for changing the state (enter−listen, enter−syn−sent, etc.).The enter−* rules are written using an internal routine, set−state. The Debug−TCB−Statemodule, which prints a message every time the state changes, overrides this internal routine.The version of the set−state routine actually used depends on the run-time type of the object;for instance:

let s1 :> TCB−State, s2 :> Debug−TCB−State ins1.enter−listen, // no messages2.enter−listen // prints a message

end

Here, the set−state rule is a convenient override point; if the enter−* rules didn’t call set−state, the Debug−TCB−State module would have had to override each of the enter−* rules.Finding a simple and powerful set of override points in a protocol definition is an interestingdesign issue, one we discuss more in Section 4.3.

6

module Segment { ...net−to−host { // introduces the net−to−host namespace

net−to−host ::=ip−well−formed && ip−>net−to−host && check−length && tcp−well−formed&& tcp.net−to−host;

ip−well−formed ::= ip−>check−checksum && !ip−>fragmented;// etc.

}... };

Figure 2: Prolac namespaces example. The Segment module represents a TCP segment. Itsnet-to-host namespace contains rules which prepare a segment received from the networkfor protocol processing (checking the checksum, byte-swapping, etc.).

3.3 Naming

Naming variables and functions sensibly—descriptively and naturally—makes any programmuch more readable. In a programming language like Prolac which encourages the useof many small functions, sensible naming is correspondingly more important. In our TCPimplementation, we try to name rules so that the rule’s purpose is intuitive without needingto look at the rule’s definition; without this property, a reader would have to jump nonlinearlyfrom rule to rule to have any hope of understanding the code.

Prolac has a range of features to encourage and support sensible naming. First andsimplest, hyphens ‘−’ are allowed in identifiers. Second, Prolac supports separate namespacesboth inside and outside modules, allowing rules or modules to be grouped into relatedunits. Figure 2 gives an example of namespaces inside modules, an idiom we often use forcomplicated rules requiring helper functions.

Prolac gives the user control over module namespaces through a set of module operators:operators which affect the compiler’s behavior, rather than the running program’s behavior.The first module operators, hide and show, were designed to support loose, flexible accesscontrol in a simple and elegant manner. If M is a module with a feature (rule, namespace,or field) named x, then M hide x is the same module, except that M.x is inaccessible. TheTCP specification uses hide to hide implementation details from module users. The showoperator makes hidden names accessible again.

Another new mechanism, implicit rules, supports convenient naming for rule calls ona module’s fields. To motivate this mechanism, consider a module supporting TCP inputprocessing. This module has two fields: tcb, a pointer to a TCB—the transmission controlblock for the relevant connection—and seg, a pointer to the received segment. The inputmodule calls many rules on each of these objects—for instance, state-testing rules on the

7

TCB (listen, syn−sent, etc.) and flag-testing rules on the segment (syn, fin, ack, etc.). Hereis an excerpt from reset processing demonstrating the flavor of the resulting code:

do ::= assertion(seg−>rst),(tcb−>syn−received ==> do−syn−received)|| (tcb−>closing || tcb−>last−ack || tcb−>time−wait ==> tcb−>close−connection)|| tcb−>abort−connection;

The repetition of tcb and seg actually detracts from the code’s readability. For the over-whelming majority of input processing, there is only one TCB and one segment; why shouldthe user constantly have to specify tcb and seg?

Implicit rules allow repetitive object specifications like this to be left out. Implicit rulesearch begins when the rule corresponding to some name cannot be found. Fields with ausing module operator in effect are searched for the name. If an unambiguous definition isfound in some field, the compiler uses that rule. This allows the code above to be abbreviatedto the following:

field tcb :> *TCB using all;field seg :> *Segment using all;...do ::= assertion(rst), // finds seg−>rst(syn−received ==> do−syn−received) // finds tcb−>syn−received|| (closing || last−ack || time−wait ==> close−connection) // ...and so on|| abort−connection;

The implicit rule mechanism is so powerful it’s dangerous: the programmer cannot alwayseasily find the actual rule called by an expression, which makes the program more fragile.However, implicit rules are actually a great help in our TCP specification. The risk of unread-able programs does not concern us since implicit rule search is entirely under programmercontrol through module operators; the programmer can decide whether to use implicit ruleswell or badly.

3.4 Compilation and optimization

The Prolac compiler, prolacc, compiles Prolac code into C. The compiler attempts to gen-erate natural, high-level C—that is, code resembling hand-written C as much as possible.In particular, the output C has large expressions resembling the Prolac input, and a smallnumber of introduced temporaries. The result is reasonably readable, debuggable with Cdebuggers, and, with certain C compilers, results in better object code than an equivalentlower-level version.

The compiler accepts an entire Prolac program at once and performs some simple global

8

module A {rule1 ::= ...;rule2 ::= ...;

}

module B :> A {rule1 ::= ...;

}

module C :> B {rule2 ::= ...;

}

test(a :> *A, b :> *B, c :> *C) ::=a−>rule1, // not a leaf—generates dynamic dispatchb−>rule1, // leaf: C doesn’t override rule1b−>rule2, // not a leafc−>rule1, // leafc−>rule2; // leaf

Figure 3: Example of leaf analysis. Assume that no other module in the program is a subtypeof B or C

optimizations. This is not a problem even for our large and complex TCP specification,which takes up just 1800 lines of Prolac; the Prolac compiler runs in under 3 seconds on a133MHz Pentium machine.

3.4.1 Leaf analysis

The most important optimization the Prolac compiler performs is leaf analysis, a simpleglobal analysis which removes every single dynamic dispatch in our generated code. Remov-ing dynamic dispatches is absolutely essential for performance: although a dynamic dispatchis slightly more expensive than a conventional function call, the real problem is that Prolacwill not inline a dynamically dispatched rule. The language encourages the use of smallrules for naming extremely simple computations; the only hope of having any sort of goodperformance is therefore aggressive inlining. (It is possible to inline dynamic dispatches withmechanisms such as speculative inlining, which inlines one version of the code in questionand generates a check to see if that version is the correct one. However, these mechanismsare complex and have nonzero overhead.) To show the magnitude of the problem, we re-moved leaf analysis from the Prolac compiler. Even when allowing the compiler to inlineor directly call every rule which was never overridden, the number of dynamic dispatchesjumps to 65, many for trivial rules like send−window (which returns the current send win-dow). Considering that every Prolac rule is potentially dynamically dispatched, however, thesituation is even worse: a naıve compiler (equivalent to an average C++ or Java compiler)would generate 972 dynamic dispatches in our Prolac TCP specification.

The idea behind leaf analysis is simple: if the compiler can prove that the rule beingcalled was never overridden—it is a leaf in the inheritance graph—then that rule can becalled directly, without the need for dynamic dispatch. Figure 3 gives a simple example of

9

leaf analysis at work. The Prolac compiler removes some additional dynamic dispatches witha slightly more complicated analysis.

Leaf analysis is not new. However, our implementation was motivated by, and works sowell because of, characteristics of network protocols. Specifically, the type-related behaviorof a single network protocol is relatively static at runtime: that is, the protocol deals withone kind of packet, one kind of control block, one kind of header, and so on. Another wayto put this in the case of TCP is that there is only one kind of TCP running at any time.We therefore do not use inheritance to demultiplex packets or varieties of TCP; instead,we use inheritance purely for grouping related rules and providing extensions to the baseprotocol. Since the behavior we want is the most extended behavior—if we include delayedacknowledgements, for example, we want to use them on every connection—the module wewant will always be the most derived module (the TCB we want is the most derived TCB,and so forth). But any rule in a most derived module will be a leaf rule, since no moduleexists to override it! It is therefore no surprise that leaf analysis works so well.

Of course, it would be perfectly possible to use inheritance to demultiplex packetsor kinds of processing—to derive TCP and UDP modules from a supertype representinginternet transport protocols, for example. In this case, leaf analysis would appropriately fail,and the necessary dynamic dispatches would be generated. It would continue to be effectivewithin the module hierarchies for the individual protocols, however.

3.4.2 Inlining and outlining

Mosberger et al. [MPBO96] list a number of useful techniques for improving protocolefficiency. Prolac has direct support for three of these: inlining, path inlining, and outlining.Inlining is replacing a function call with the function’s body; path inlining is simply recursiveinlining, where functions called by an inlined body are replaced with their bodies, and so on;and outlining is moving an uncommon case out of the code for common-case processing,thus improving i-cache behavior.

The programmer is given fine-grained control over these optimizations through expres-sion and module operators. Module operator specification is particularly useful, allowingthe programmer to say a rule should usually be inlined without cluttering the call site or therule’s definition—instead, most inline specifications are gathered together after the mod-ule’s body. Prolac does not optimize a protocol definition based on runtime feedback; webelieve that the source should completely specify the protocol’s runtime behavior. Prolac’ssource annotations for optimization are clean and simple specifically to ease programmerspecification.

10

4 TCP specificationThis section describes the structure of our TCP specification, highlighting its readability andextensibility. After an overview, we describe the implementations of the TCB, or transmissioncontrol block; input and output; and various protocol extensions.

4.1 Overview and status

The Prolac TCP implementation is directly based on that of 4.4BSD as described in [WS95].Where our specification is complete, it performs exactly as 4.4BSD does, except for a handfulof corrections. Our initial goal is simply to equal 4.4BSD’s performance, but with a muchmore readable and extensible specification.

Our TCP implementation is not complete at the time of this submission. We implementfull input and output processing including retransmissions; slow start, fast retransmit, andcongestion avoidance; and header prediction. We do not implement keepalive or persisttimers, urgent processing, or TCP options (maximum segment size, timestamps). Thesefeatures will not be difficult to implement cleanly in our framework. In terms of systemintegration, our TCP currently runs at user level using Berkeley packet filters [MJ93]. Theinternals of the Prolac TCP were carefully designed for compatibility with 4.4BSD’s in-kernel structures, however. For example, the Segment module, which represents a receivedor sent data segment, is easily implemented as a chain of mbufs (4.4BSD’s representationof a segment); and the TCP specification uses a Socket module corresponding exactly to the4.4BSD kernel’s struct socket, including many stub routines which currently do nothing(e.g., Socket.mark−readable should wake up all processes waiting on a socket’s read buffer).Reimplementing these interfaces using the actual kernel structures will not make the currentuser-level versions obsolete, as the interfaces will not change. Thus, we will eventually havea TCP implementation that can run inside or outside the kernel depending on a simplechoice of modules—a real advantage for our object-oriented framework.

For the final submission, we will report on a complete in-kernel TCP implementation.

4.2 Organization

The current Prolac TCP implementation consists of 18 Prolac source files containing a totalof 1800 nonempty lines of code. These files are combined by the C preprocessor’s #includemechanism, with the resulting preprocessed source being passed to the Prolac compiler. Afile of support C code is also necessary to provide a means for actually sending and receivingpackets; in our current user-level TCP, this is either Berkeley packet filters, allowing us tosend and receive packets over an actual Ethernet, or Unix pipes for testing.

Modules in the base TCP implementation fall into six categories: utilities, for byte-swapping and checksumming routines; data, for data-centric protocol modules—IP and

11

Category Module Description

Utilities Byte−Order Byte-swappingChecksum Checksumming

Data Headers.IP IP headerHeaders.TCP TCP headerSegment SegmentTCB Transmission control block

Base.TCB Basics and connection stateWindow−M.TCB Send and receive windowsTimeout−M.TCB TimeoutsRTT−M.TCB Round-trip time measurementRetransmit−M.TCB RetransmissionOutput−M.TCB State for BSD-like output

Input Base.Input General input processingBase.Listen Handle input in listen stateBase.Syn−Sent Handle input in syn-sent stateBase.Trim−To−Window Trim segment to fit receive windowBase.Reset Process rstBase.Ack Process ackBase.Reassembly ReassemblyBase.Fin Process fin

Output Base.Output Output processing

Timeouts Base.Timeout Timeouts

Interfaces Tcp−Interface User-level interface (read, write, etc.)Base.Socket Interface to socket layer

Table 2: Module structure of the Prolac TCP specification I: The base protocol

TCP headers, TCP segments, and the TCB; input, for processing received segments; out-put, for sending segments; timeouts, for slow and fast timeout events; and interfaces, forcommunicating with the rest of the system. Table 2 lists the modules constituting the baseprotocol, many of which are described in detail in the following sections.

12

4.3 The TCB

In the RFC definition of TCP [Pos81], in the 4.4BSD TCP implementation, and in ourProlac TCP implementation, the transmission control block, or TCB, stores all persistentTCP-specific data about a connection. This means it’s pretty large: the 4.4BSD TCB struc-ture has 48 fields; our Prolac TCB structure currently has 39. This is too large to be readablydefinable in a single module. Therefore, as mentioned above, we build the TCB by suc-cessive inheritance from six components: basics and connection state; windows; timeouts;round-trip time measurement; retransmission; and output. Each of these components ismade self-contained through hide, the module operator for access control: fields and rulesnecessary only within one component are hidden from the outside world. This interface-based organization also allows choice between components; for example, a programmer caneasily change the algorithm for measuring round-trip time by replacing that component, aslong as the replacement fits the existing interface.

The TCB is mostly passive, meaning that it does not usually act upon other modules—other modules act upon it. (Another formulation would say that the TCB is reactive ratherthan proactive.) This corresponds in flavor to 4.4BSD’s implementation, which isn’t object-oriented—there, the TCB is simply a flat structure. Even in Prolac, however, passive organi-zation seems to be correct for the TCB: TCP processing code is so complex that separatingcode from data generally improves readability.

Even our passive TCB still greatly benefits from object-oriented design. One simplebenefit comes from calling small, descriptive rules instead of performing calculations withthe fields themselves. Here are some examples concerning acknowledgements, taken fromthe base TCB component. (snd una and snd max are fields, representing the first un-acknowledged sequence number and the maximum sequence number sent, respectively.These sequence numbers have type seqint, which represents 32-bit sequence numbers withwraparound; comparisons and min and max operations on seqints behave correctly.)

valid−ack(ackno :> seqint) ::= ackno >= snd una && ackno <= snd max;unseen−ack(ackno :> seqint) ::= ackno > snd una && ackno <= snd max;

Here, valid−ack and unseen−ack both return true iff they are given a good acknowledgementnumber. The difference between them is simple: valid−ack allows duplicate acknowledge-ments, unseen−ack does not. The rule names make this much clearer than the expressions,which differ only subtly (one test is >= in valid−ack and > in unseen−ack). Calling these rulesinstead of referring to snd una and snd max makes the code easier to read, since expressionsdon’t need to be parsed; it also helps prevent errors, since the programmer must activelychoose between rules. The choice between valid−ack and unseen−ack puts the issues moreclearly at stake than the choice between > and >=: this clarity makes the programmer morelikely to choose carefully and correctly.

An important organizing principle in the TCB is the mark rule, a rule that marks the oc-

13

currence of a protocol event. (Mark functions are sometimes called hooks in other contexts.)Mark rules, whose names generally begin with mark−, are explicitly meant to be extended:they are convenient override points as defined in Section 3.2. Other modules are expectedto call the appropriate mark rule when a protocol event occurs; the TCB then performsappropriate processing. Here are a few of the TCB’s mark rules, including the event thattriggers each mark rule and typical actions the mark rule performs.

• mark−receive−syn(seqno :> seqint)

Called when a syn is received on a connection. seqno is the syn’s sequence number.Effect: Sets various TCB fields (e.g., irs, the initial received sequence number, andrcv next, the next receive sequence number expected).

• mark−new−ack(ackno :> seqint)

Called when a new acknowledgement is received. Effect: Removes newly acknowl-edged data from the retransmission queue; moves snd una forward; adjusts sendwindow; updates current round-trip time estimate if appropriate.

• mark−total−ack

Called when all outstanding data has just been acknowledged. Effect: Cancels re-transmission timer.

• mark−send(seqlen :> uint)

Called when a segment is sent. seqlen is the length of the segment in sequence numbers.Effect: Moves snd next and snd max forward; clears pending-acknowledgement anddelayed-acknowledgement flags; adjusts send window; optionally starts round-triptime measurement and retransmission timer.

Most mark rules are multiply overridden with each overriding definition adding to themark rule’s behavior. Figure 4 gives an explicit example of this by showing actual proto-col code for mark−send’s five definitions—four definitions in base modules and one in thedelayed-acknowledgement extension. Each individual definition is small, focused, and clear,although the aggregate behavior is reasonably complex. The individual TCB modules arereadable as well: each module contains limited, focused processing, with complicated be-havior only created through the modules’ aggregation. This style does obscure the aggregatebehavior—the code in Figure 4 was taken from five source files—but when suitably naturaloverride points are chosen, this doesn’t tend to be a problem.

14

Base.TCB.mark−send(seqlen :> uint) ::=// This is the base mark rule. It adjusts some fields and clears some flagsclear−flag(F.pending−ack | F.pending−output),snd next += seqlen,snd max max= snd next;

Window−M.TCB.mark−send(seqlen :> uint) ::=// The window TCB additionally adjusts the send window and clears another flaginline super.mark−send(seqlen), // calls Base.TCB.mark−sendclear−flag(F.need−window−update),snd wnd −= seqlen;

RTT−M.TCB.mark−send(seqlen :> uint) ::=// Decide whether to measure this segment’s round-trip time. After inline super.mark−send, the// sent segment’s sequence number is snd next − seqlen, not snd nextinline super.mark−send(seqlen),(seqlen && !retransmitting && !timing−rtt ==> start−rtt−timer(snd next − seqlen));

Retransmit−M.TCB.mark−send(seqlen :> uint) ::=// Start the retransmit timer if necessaryinline super.mark−send(seqlen),(!is−retransmit−set && !recently−acked ==> start−retransmit−timer);

Delay−Ack.TCB.mark−send(seqlen :> uint) ::=// Clear the delayed acknowledgement flaginline super.mark−send(seqlen),clear−flag(F.delay−ack);

Figure 4: The five mark-send rules defined by the Prolac TCP specification, from the initialdefinition (in Base.TCB) to the most derived version (in Delay-Ack.TCB). Each rule except thefirst explicitly calls its predecessor with super.mark-send(seqlen), resulting in a cumulativeeffect.

15

do−segment ::=(closed ==> reset−drop−fail)|| (listen ==> Listen(seg, tcb).do)|| (syn−sent ==> Syn−Sent(seg, tcb).do)|| other−states;

other−states ::=Trim−To−Window(seg, tcb).do,(rst ==> Reset(seg, tcb).do),(syn ==> reset−drop−fail),(!ack ==> drop−fail),Ack(seg, tcb).do,process−data;

process−data ::=(urg ==> check−urg),let is−fin = Reassembly(seg, tcb).do in(is−fin ==> Fin(seg, tcb).do)

end,send−data−or−ack;

If the state is CLOSED . . .If the state is LISTEN . . .If the state is SYN-SENT . . .

Otherwise,first check sequence number . . .second check the rst bit, . . .fourth, check the syn bit, . . .fifth check the ack field, . . .

sixth, check the urg bit, . . .seventh, process the segment text, . . .eighth, check the fin bit, . . .

and return. [Pos81]

Figure 5: The Prolac implementation, at left, directly echoes the TCP RFC specification, atright. (The missing third step is “check security and precedence”.)

4.4 Input and output

The Prolac TCP specification divides input processing into eight independent modulesbased on processing steps specified in the original TCP RFC [Pos81]; the 4.4BSD sourcesalso largely follow this organization, although the detailed C code tends to obscure this fact.In contrast, Prolac keeps the high-level structure crystal clear: Figure 5 demonstrates thisby comparing an excerpt from our input processing code with headings from the TCP RFC.This top-down organization has no associated cost in Prolac, since the various rules can allbe inlined.

The eight modules that make up input processing do not inherit from one other. Inheri-tance makes sense for the TCB, where one conceptual module is being built (the commonaspect being persistent information about a connection), but not for input processing, wheredifferent modules have very different functions. Instead, all eight input processing modulesare derived from one ancestor, Base.Input. This module provides common functions likedrop−fail, which drops the received segment and returns from input processing.

The received segment, seg, and relevant TCB, tcb, are stored in fields of the commonInput module. This allows the segment and TCB to be passed implicitly from rule to rule

16

within each module; it also enables implicit rule search as described in Section 3.3, a greatreadability advantage. Unfortunately, there is a performance penalty: the segment and TCBare structure members, and therefore not stored in registers by some compilers; and creatingInput objects, or objects derived from Input like Ack and Reassembly, has a small but nonzerooverhead. Section 6 describes a potential solution for these problems.

Output processing, which is smaller and less complex than input processing, is imple-mented in a single module. Output processing follows the 4.4BSD model: a single routine,Output.do, is called whenever any normal kind of output is needed; the Output module thendecides exactly what kind of segment to send. Several small changes were made, includ-ing consistently using sequence number length rather than data length (sequence numberlength includes any syn and fin flags above and beyond data length). These changes, whichwere intended only to make the code more consistent and therefore readable, ended updiscovering a bug in the 4.4BSD code as reported in [WS95]: if a segment just fits in a max-imum segment size, but doesn’t quite fit when options are included, the code could leave afin on the segment when it should be removed. While this is a small bug which had alreadybeen fixed in our OpenBSD kernel, our independent discovery eloquently demonstrates theadvantages of code readability.

4.5 Extensions and hookup

TCP has been extended over time, with some of these extensions—for example, slow start,congestion avoidance, fast retransmit, and fast recovery—becoming standard [Ste97]. Pro-lac’s object orientation makes it easy to extend a protocol without cluttering the base pro-tocol’s definition, an advantage we used in our Prolac TCP specification. We have currentlyimplemented four extensions to our base TCP: delayed acknowledgements; slow start andcongestion avoidance; fast retransmit and fast recovery; and header prediction. These ex-tensions are transparent and independent: any subset of the four1 can be turned on withoutchanging the rest of the system in any way.

This transparency is achieved through a C preprocessor mechanism called hookup. Eachextension consists of several modules inheriting from the base protocol; these moduleschange the base’s behavior, thus implementing the extension. All modules relating to aparticular extension are placed in a single source file. The extension will be used only ifthat source file is #included into the preprocessed source. Table 3 lists the modules whichconstitute each extension and the corresponding source files. Hookup works through macroredefinition; for example, at each point, a macro CUR TCB is defined which expands to the fullname of the “current TCB” (Retransmit−M.TCB, for example). Each TCB extension moduleinherits from CUR TCB, the macro, rather than any specific TCB; the module then redefinesCUR TCB to its own name. Therefore, including different source files, or including them in

1. Almost any subset: fast retransmit requires slow start, but not vice versa.

17

Module Description Source file

Delay−Ack.* Delayed acknowledgements delayack.pc

Delay−Ack.TCBDelay−Ack.ReassemblyDelay−Ack.Timeout

Slow−Start.* Slow start and congestion avoidance slowst.pc

Slow−Start.TCBSlow−Start.Ack

Fast−Retransmit.* Fast retransmit fastret.pc

Fast−Retransmit.TCBFast−Retransmit.Ack

Header−Prediction.* Header prediction predict.pc

Header−Prediction.Input

Table 3: Module structure of the Prolac TCP specification II: Protocol extensions

a different order, automatically changes the inheritance hierarchy. The final hookup stepdefines global modules (TCB instead of Namespace.TCB, for example) to equal the final macrovalues; other modules generally refer to these global modules. Since the global modules arethe most derived modules, the hookup process therefore enables leaf analysis as describedabove.

This arrangement makes extending the TCP protocol simple, natural, and convenient.None of our extensions takes more than 60 lines of Prolac proper. Each extension is con-centrated and readable, since extension-related code is contained in one file rather thanspread throughout thousands of lines of other protocol processing, yet the extension codeis automatically integrated into the protocol through leaf analysis and inlining. Thus, Prolacmakes a very good platform for developing protocol extensions.

5 PerformanceThis section describes some preliminary tests run on our current TCP specification. Themachines involved have 200MHz Pentium Pro processors running OpenBSD 2.1 with noother load; they communicate over a single 10 MB/s Ethernet segment. The test is a simpleclient-server ping-pong of a variable amount of data. Client and server are implementedusing Prolac user-level TCP, which uses the Berkeley packet filter system for direct accessto the Ethernet. For comparison, similar Unix clients and servers were implemented usingthe normal, in-kernel 4.4BSD TCP stack.

We instrumented both BSD input processing and Prolac input processing (the tcp input()

18

function and its Prolac equivalent) using Pentium performance counters; we count process-ing time only, including checksums and time spent preparing a segment for output, but nottime spent after an output segment leaves the TCP protocol stack. (In BSD, this meanswe include all of tcp output() except the call to ip output(); in Prolac, we measure thesame functionality.) Several factors still make this comparison less than perfect: for example,differences in buffering schemes and checksum routines, and cache pollution and other sideeffects of context switching due to our user-level implementation. For the final submission,we will present more detailed comparisons between unmodified in-kernel 4.4BSD TCP anda complete in-kernel Prolac TCP.

Comparing tcpdump output generated by BSD-to-BSD and Prolac-to-BSD commu-nication shows that, for 4-data-byte segments without loss or reordering, Prolac and BSDgenerate exactly equivalent packets (modulo sequence numbers, window sizes, and so forth).This is as expected, given that the Prolac is be a reimplementation of BSD.

The preliminary numbers for BSD and Prolac input processing show Prolac running twoto three times slower than BSD. Both sets of measurements show two clusters, correspondingto segments which pass header prediction (the faster cluster) and other segments. With4-data-byte segments—the fairest comparison, as copying and checksumming have thesmallest effect—BSD takes about 1100 cycles per segment processed by header prediction;the slower cluster takes about 3100 cycles per segment. For Prolac, the correspondingnumbers are 3100 and 5200 cycles, respectively. In BSD, these clusters are very sharp;in Prolac, however, they show a great deal of spread, probably due to the more variableoverhead at user level.

There are many reasons why Prolac’s TCP input processing might be slower than BSD’s.Besides the general overhead of user level, our current implementation suffers from severalslow abstractions meant only for testing—our memory buffer abstraction, for example, willperform more copies than the kernel’s mbufs, and our checksumming routines are quiteslow. (BSD is at a disadvantage, however, because of the Prolac implementation’s incom-pleteness.) We have not carefully tuned our specification, even to the point of making surethat inlining is being sensibly performed (demonstrating a disadvantage of user-specifiedoptimization); we expect that further tuning will make a significant difference—for example,in a previous version of our specification, a couple inline statements resulted in a twofoldperformance improvement. We rarely reperform some very small computations—an occa-sional recomputation allows a much cleaner specification—but, if necessary, these couldbe removed. Finally, some inefficiency is inherently due to Prolac’s approach, the mostimportant example being the problem of commonly-used values stored in memory ratherthan registers described above. We expect that most of these issues will be easily resolved,and have plans for addressing the last. The final submission will report improved, updatedresults.

19

6 Future workThe must pressing future work is to complete our TCP specification and implement it insidea BSD kernel. We also plan to use excerpts from the TCP specification when discussingthe TCP protocol in an undergraduate course; further polishing will be needed to make ituseful in a teaching context. We would also like to implement further TCP extensions—for instance, those from TCP Vegas [BOP94] and TCP for Transactions—and eventuallyfreely distribute a canonical TCP, including all important extensions. Such a specificationshould also facilitate the development of specialized TCPs, such as a TCP optimized forWeb processing.

While the Prolac compiler is currently stable enough for research, further work is cer-tainly needed, particularly in error recovery. More optimizations are also needed, particularlyoptimizations which could place commonly-used fields in registers on occasion.

We have already planned one optimization which, with user intervention, could solve thefield overhead problem mentioned in section 4.4: a new unbox module operator. A modulelabeled unboxed would not have a contiguous memory representation; instead, its fieldswould be stored in separate variables, which could then be stored in registers and passedas arguments as needed. This would eliminate most field referrals in common protocolmodules like Input and hopefully produce a significant speedup. Field modifications andinteractions between boxed and unboxed versions of the same module present problems,but we are hopeful that these can be overcome.

7 ConclusionOur experience with Prolac has been very encouraging. Prolac’s tractability and featurestailored to network protocols made writing TCP an almost pleasant experience, and theresulting specification is significantly more readable than anything else we have seen. OurTCP’s demonstrated extensibility should be very useful for protocol research and develop-ment. Even at user level and without significant optimization effort, our specification runsjust two to three times slower than in-kernel BSD TCP. Just the few, simple global compileroptimizations we have implemented have had great effect—removing all 972 dynamic dis-patches from the generated code, for example—leaving us with high hopes that a readableTCP in Prolac can easily be made just as efficient as, if not more efficient than, our 4.4BSDmodel.

20

References[And88] David P. Anderson. Automated protocol implementation with RTAG. IEEE

Transactions on Software Engineering, 14(3):291–300, March 1988.

[AP93] Mark B. Abbott and Larry L. Peterson. A language-based approach to protocolimplementation. IEEE/ACM Transactions on Networking, 1(1):4–19, February1993.

[BB89] Tommaso Bolognesi and Ed Brinksma. Introduction to the ISO specificationlanguage LOTOS. In Peter H. J. van Eijk, Chris A. Vissers, and Michel Diaz,editors, The formal description technique LOTOS, pages 23–73. North-Holland,1989.

[BdS91] Frederic Boussinot and Robert de Simone. The ESTEREL language. TechnicalReport 1487, INRIA Sophia-Antipolis, July 1991.

[BOP94] Lawrence S. Brakmo, Sean W. O’Malley, and Larry L. Peterson. TCP Vegas:new techniques for congestion detection and avoidance. In Proceedings of theACM SIGCOMM 1994 Conference, pages 24–35, August 1994.

[CDO96] Claude Castelluccia, Walid Dabbous, and Sean O’Malley. Generating efficientprotocol code from an abstract specification. In Proceedings of the ACM SIG-COMM 1996 Conference, pages 60–71, August 1996.

[DB89] P. Dembinski and S. Budkowski. Specification language Estelle. In MichelDiaz, Jean-Pierre Ansart, Jean-Pierre Courtiat, Pierre Azema, and Vijaya Chari,editors, The formal description technique Estelle, pages 35–75. North-Holland,1989.

[HA89] Diane Hernek and David P. Anderson. Efficient automated protocol imple-mentation using RTAG. Report UCB/CSD 89/526, University of California atBerkeley, August 1989.

[Joh75] Stephen C. Johnson. Yacc—Yet Another Compiler-Compiler. Comp. Sci. Tech.Rep. #32, Bell Laboratories, July 1975. Reprinted as PS1:15 in Unix Program-mer’s Manual, Usenix Association, 1986.

[Mil80] Robin Milner. A calculus of communicating systems, volume 92 of Lecture notesin computer science. Springer-Verlag, 1980.

[MJ93] Steven McCanne and Van Jacobson. The BSD packet filter: a new architecturefor user-level packet capture. In USENIX Technical Conference Proceedings,pages 259–269, San Diego, Winter 1993. USENIX.

21

[MPBO96] David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O’Malley.Analysis of techniques to improve protocol processing latency. In Proceedingsof the ACM SIGCOMM 1996 Conference, pages 73–84, August 1996.

[Nes90] Linda Ness. L.0: a parallel executable temporal logic language. In MarkMoriconi, editor, Proceedings of the ACM SIGSOFT International Workshopon Formal Methods in Software Development, pages 80–89, September 1990.

[Pos81] Jon Postel. Transmission Control Protocol: DARPA Internet Program protocolspecification. RFC 793, IETF, September 1981.

[SCB90] Deepinder Sidhu, Anthony Chung, and Thomas P. Blumer. A formal descriptiontechnique for protocol engineering. Technical Report CS-TR-2505, Universityof Maryland at College Park, July 1990.

[Ste97] W. Richard Stevens. TCP slow start, congestion avoidance, fast retransmit, andfast recovery algorithms. RFC 2001, IETF, January 1997.

[vB86] Gregor v. Bochmann. Methods and tools for the design and validation ofprotocol specifications and implementations. Publication #596, Universite deMontreal, October 1986.

[WS95] Gary R. Wright and W. Richard Stevens. TCP/IP Illustrated, Volume 2: TheImplementation. Addison-Wesley, 1995.

[WvB89] Cheng Wu and Gregor v. Bochmann. An execution model for LOTOS specifi-cations. Publication #701, Universite de Montreal, October 1989.

22

a readable tcp in the prolac protocol language

Documents