ziria

Upload: anindyasaha

Post on 03-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Ziria

    1/11

    Ziria: Wireless Programming for Hardware Dummies

    Gordon Stewart

    Princeton

    Mahanth Gowda

    UIUC

    Geoffrey Mainland

    Drexel

    Boidar Radunovic

    MSR

    Dimitrios Vytiniotis

    MSR

    Abstract

    Software-defined radio (SDR) brings the flexibility of software tothe domain of wireless protocol design, promising both an idealplatform for research and innovation and the rapid deploymentof new protocols on existing hardware. However, existing SDRplatforms either require careful hand-tuning of low-level code,

    negating many of the advantages of software, or are too slow tobe useful in the real world.

    In this paper we present Ziria, the first software-defined radioplatform that is both easily programmable and performant. Ziriaintroduces a novel programming model tailored to wireless physicallayer tasks and captures the inherent and important distinctionbetween data and control paths in this domain. We describe anoptimizing compiler for Ziria, provide a detailed evaluation, andgive a line-rate Ziriaimplementation of 802.11a/g.

    1. Introduction

    The past few years have witnessed tremendous innovation in thedesign and implementation of wireless protocols, both in industryand academia (cf. [7,21,28]). Much of this innovation has occurred

    at the physical (PHY) layer of the protocol stack which managesthe translation between radio hardware signals and protocol packets.These innovations have taken the form of numerous new signalprocessing algorithms and novel coding schemes, both of which havegreatly increased the efficiency of existing radio communicationchannels.

    Many of these innovations were first implemented using software-defined radio (SDR) platforms. Software-defined radio refers towireless protocol design in which a protocols complex signalprocessing is performed in software rather than in hardware. Thereare clear advantages to implementing novel protocol designs insoftware, such as ease of development, fast and cheap deployment,and a much shorter development cycle compared to a hardwareimplementation. For example, GnuRadio [2, 10], currently oneof the most widely-used SDR platforms, is implemented in a

    combination of C++ and Python, meaning it is very easy to programand extend. Unfortunately, this extensibility and ease of use comesat a cost: GnuRadio suffers from serious performance limitationscompared to hardware implementations (for example [25]reportsbus transfer delays of hundreds ofs). Signal processing algorithmsoften require substantial processing power, and modern PHYsimpose tight time constraints. For example, WiFi requires a receiverto process each signal sample in 25ns. Meeting these demandsis challenging in general, and impossible with GnuRadio. Theseperformance requirements make GnuRadio and similar platform oflimited utility for testing real network deployments where line-speedoperation is critical.

    This is not to say that no SDR architecture provides acceptableperformance: Warp [23], Sora[29], and TI KeyStone [1] are high-performance hardware-software SDR platforms. These platformscan meet tight timing deadlines, and thus providereal-time supportfor protocol designers wishing to test at line rate. However, theysuffer from another problem, which has seriously limited their

    adoption: these platforms are difficult to program. FGPA-basedplatforms, such as WARP, require substantial digital design expertise.CPU and DSP-based platforms, such as Sora and TI KeyStone,require the ability to write code that is highly-tuned to the underlyingprocessors architecture. For example, Sora relies heavily onexternally created lookup tables for performance. Furthermore,different parts of the receiver are manually placed onto differentcores in order to balance CPU load. Such contortions are heavilyhardware dependent and must be performed manually for each newarchitecture.

    In this paper, we present a new language, Ziria, and correspondingprogramming model closes the gap between performance and flexi-bility. Ziriaconsists of two components: (1) a high-level domain-specific language for programming wireless protocols, and (2) anexecution model and compiler transform Ziriaprograms to line-rate

    software radio implementations. In this way,Ziria combines thebest features of extensible, highly programmable, but low perfor-mance platforms like GnuRadio, with those of high performance butdifficult-to-program platform such as Sora. To demonstrate that oursystem supports designing real signal processing code, we presenta Ziriaimplementation of WiFi 802.11a/g. The optimizations im-plemented in our compiler, which include automatic vectorization,automatic lookup table (LUT) generation, and annotation-guidedpipeline parallelization, allow our Ziriaimplementation to gener-ously meet the timing limits imposed by the 802.11a/g standards;its performance even approaches that of hand-optimized code. Tothe best of our knowledge, we are the first to present a high-levelprogramming platform that can implement the 802.11a/g protocolon a general-purpose CPU while meeting timing constraints. Insummary, our contributions are as follows:

    We present Ziria, the first software radio platform that providesa flexible, high-level programming model while meeting thetiming constraints required for line-rate deployments.

    Our compiler for Ziriaimplements automatic vectorization, auto-matic lookup table generation, and annotation-guided pipelining,optimizations that are vital to performance and provide up to anorder of magnitude speed-up over un-optimized code.

    To demonstrate the viability ofZiriaas a platform for developingSDR applications, we implement 802.11a/g. Our implementa-tion meets the timing constraints imposed by the protocol speci-fication, and its performance approaches that of hand-optimizedcode.

    1 2014/6/16

  • 8/11/2019 Ziria

    2/11

    In Ziria, all signal processing code aside from a few heavily usedstandard functions, e.g., Fast Fourier Transform (FFT) and Viterbidecoding, is written in the high-level language without referenceto the underlying hardware architecture. This leads to a program-ming model in whichZiriaprograms closely resemble the originalprotocol specifications. Nonetheless, the programmer does not haveto sacrifice line-rate speed to gain this clarity, as our benchmarksshow. Indeed, it is the very high-level nature ofZiria that givesour optimizer the opportunity to transform code into an efficient,low-level implementation.

    2. Ziriaby example

    In this section we give a high-level overview of the main componentsof the Ziria language and its execution model. We then examinefragments of our 802.11 implementation: first, the imperativeinnards of a Ziria scrambler block, which is used in our WiFitransmitter to XOR input packet data with a pseudorandom sequencein order to shape the transmitted signal, and then the outer pipelineof our WiFi 802.11a/g receiver. We introduce necessary signalprocessing concepts along the way.

    2.1 Language and execution model

    Execution in Ziria is centered aroundstream processing: readingvalues from a (potentially infinite) input stream and writing valuesto an output stream. For example, at a high level, a WiFi receiver is

    just a computation that reads a stream of complex numbers from theA/D unit of a radio and outputs a stream of MAC-layer packets.

    Language Ziria provides both traditional stream processors,which map values from input to output streams and which neverterminate, which we call stream transformers, andstream comput-ers, a novel stream processing language construct. Figure 1depictsboth stream transformers and stream computers. On the left, twotransformers are composed vertically into a two-stage pipeline. Onthe right, a stream computer is composed with a stream transformer.

    Like stream transformers, stream computers map stream inputs tostream outputs. However, unlike stream transformers, computers

    need not execute indefinitely. Instead, they execute for a while,consuming input and producing output, and then halt and return acontrol value. In Figure 1, this is depicted as the fat Controlarrow connecting the computer to the transformer. If a streamcomputer is at the outermost position in a Ziriaprogram, then itsreturn value is just the programs return value. Otherwise, thecontrol value is used to dynamically reconfigure the rest of theprocessing pipeline; in the figure, this corresponds to switching fromthe Computer to the Transformer in the lower right corner. Afterreconfiguration, the inputs that originally flowed to the computerare routed to the next component in the pipelinein this case, theTransformer. Both components output to the same stream. Thisdynamic reconfiguration directly reflects the control flow of manyPHY-layer protocols, which read a preamble or packet header fromthe input stream and then reconfigure computation of the rest of the

    stream based on data in the preamble.As an example, consider two Ziriastream computer primitives,emitandtake. emit takes an expression e and immediately writes itsvalue to the output stream while returning the unit control value,pq.take is computer that pulls a single value from the input streamand immediately halts computation, returning the input as a controlvalue. These basic building blocks are composed inZiriato performarbitrary stream processing using Ziriasbindoperator, which hasthe notation x c1; c2. Bind runs the stream computationc1until it produces a return value v, and it then runs the dynamicallyconfigured stream processorc2rv{xs(c2 withv substituted forx).For example, the following is a small Ziria program that maps a

    Figure 1. Transformer-transformer composition (left); computer-transformer composition (right).

    function fover an input stream of xs, emitting a stream of results,f(x), to the output stream.

    repeat (x take; emit(f(x)))

    The semicolon betweentake and emit is bind. It produces a programthat behaves liketakefor one step, untiltakereturns a control valuex, then behaves likeemitfor one step, until emityields the valuef(x)onto the output stream. In our small example, this processis repeated indefinitely using Zirias repeat combinator, which

    converts a stream computer (xtake; emit(f(x))) into a streamtransformer by reinitializing the computer every time it returns acontrol value. Note that xtakebinds the control result oftake,namely x, in emit(f(x)). Bind is similar to the switchoperatorin Yampa [13]; however Yampa conflates control and data paths,whereas Ziriacarefully maintains a distinction between the two.

    Dataflow composition in Ziriais achieved by sequencing compu-tations with the arrow combinator, >>>. This form of compositionis similar to standard stream transformer composition operators indataflow languages such as StreamIt [31]. It takes a Ziria com-putation that maps stream inputs to stream outputs and routes itsoutput to the input stream of a second Ziriacomputation. If eithercomputation returns a control value, then the entire compound com-putation returns that control value. As illustration, consider thefollowingZiriaprogram, which computes the square root of -1 and

    then immediately returns.emit(-1) >>> repeat (x take; emit(sqrt(x)))

    This program yields the value -1 onto an intermediate stream(emit(-1)), pulls the -1off the intermediate stream (x take), andthen is dynamically reconfigured to emit(sqrt(-1)), which printssqrt(-1)to the output stream and returns unit.

    Execution model At runtime, eachZiria computation is imple-mented as a pair of functions, called tick and process.1 We con-sider process first. A computations process function takes a singleargument, a value from the Ziriaprograms input stream, performsthe required computation on the input value, and returns a resultofan enumerated type inr. Resultsr ::skip| yieldpvq | donepvqare eitheryieldpvq, indicating that the valuev is ready to be writtento the programs output stream, skip, indicating that the programshould be called again but that no output is immediately available, ordonepvq, indicating successful program termination with a returnedcontrol valuev . Note that only stream computers may produce aresultdonepvq; transformers must either skip or yield.

    Tick is used to drive computation of blocks that do not requirevalues from their input stream in order to do useful work. Forexample, the Ziria program emit 1; emit 2 writes the sequence

    1 Because our compiler generates code in continuation-passing style, tick

    and process are actually code labels, and calls to these functions areactually jumps. These details are not important here; see Section4 for amore in-depth discussion.

    2 2014/6/16

  • 8/11/2019 Ziria

    3/11

    1,2onto its output stream but does not require any input to do so.The tick function returns an enumerated type called a result kind.Result kindsk :: immprq | consare eitherimm, for immediate,indicating that the current computation is ready either to yield anoutput value or to return a done value, or cons, which indicatesthat the computation needs to consume input to proceed. The outerloop of every Ziriaprogram first ticks the computation, processingimmediate results along the way, until either consor immpdonepvqqis returned. Oncons, the outer loop proceeds to call the programsprocessfunction with a value from the input stream, handles theresult, then returns to callingtickonce again.In compound blocks such as x c1; c2 and c1 >>> c2, thecomputationc1 to the left of the bind or arrow combinator neednot return an immediate value directly to the driver loop. Inc1>>>c2,c1yields stream output values by immediately calling theprocess function ofc2. We use a similar strategy when compilingx c1; c2: control values produced byc1 are written to a staticallyallocated private buffer shared byc1 and c2, and references to thevariablex in c2 are translated to reads from this shared buffer.

    2.2 WiFi 802.11a/g: TX scrambler and RX pipeline

    In order to illustrate the versatility of Ziria, this section walks

    through two examples of real Ziria

    code. The first, a signalscrambler, is typical of the kind of imperative code that lives withinthe blocksof a Ziria pipeline. The second exampleis the full pipelineof our 802.11a/g receiver.

    Scrambler In signal processing domains, the purpose of a scram-bler is to XOR input data with a pseudorandom sequence. Thisreduces the probability of sending data sequences that have unde-sirable signal properties, such as all1s or all 0s, over the air (cf.Section 17.3.5.4 of [19]).

    1 let scrambler() =

    2 letref scrmbl_st: arr[7] bit := {1,1,1,1,1,1,1};

    3 tmp: bit; y: bit;

    4 in repeat (5 x take;6 return (7 tmp := (scrmbl_st[3] ^ scrmbl_st[0]);8 scrmbl_st[0:5] := scrmbl_st[1:6];9 scrmbl_st[6] := tmp;

    10 y := x ^ tmp);

    11 emit (y))

    Listing 1. Scrambler function of WiFi 802.11a/g transmitter inZ i r ia

    The scrambler above, a let-bound Z i r ia computation taking no ar-guments, is an example of a feedback shift register. The scramblersbody declares three local variables in lines 2through3: scrmbl_st,an array of7 bits that gives the current state of the shift register,and two one-bit references: tmpand y. The scrambler takes a valuefrom the input stream (line5), then performs an imperative compu-tation that assigns tmpthe XOR of taps 3 and 0 in the shift register,shifts the register state left by one, feeds tmpinto position 6 of theregister, and finally returns the XOR of tmpand the input bit x.

    There are two interesting aspects to this code. First, because boththe standard and the code in the listing operate on bit arrays, onecan easily verify by inspection that the listing code matches thedefinitions found in the WiFi standard. This would be more difficultif we implemented the scrambler directly in a more efficient fashion,say by using an integer rather than a bitvector to store the scramblerstate, and by shifting instead of indexing into the array. Second, weremark that, when compiled with the Z i r iacompiler, the scramblerin Listing1compiles to quite efficient code. In the context of ourWiFi pipeline, the Z i r ia compiler first automatically generates alookup table for the scrambler (cf. Section4.3), then vectorizesthe code to operate on multiple inputs at a time (cf. Section4.2).

    Figure 2. Schematic representation of a WiFi receiver. The shadedbox is a transformer; the white boxes are computers.

    This gives a 14 .8 speedup (cf. Section5) over the same codecompiled without optimizations. As comparison, in the current Soraimplementation, the same scrambler is defined as a hand-writtenlookup table, an excerpt of which is shown in Listing 1.

    1 const unsigned char2 SCRAMBLE_11A_LUT[SCRAMBLE_11A_LUT_SIZE] = {3 0x00, 0x91, 0x22, 0xb3, 0x44, 0xd5, 0x66, 0xf7,4 0x19, 0x88, 0x3b, 0xaa, 0x5d, 0xcc, 0x7f, 0xee,5 //... 13 lines elided ...6 0x87, 0x16, 0xa5, 0x34, 0xc3, 0x52, 0xe1, 0x70,};

    Listing 2. Sora lookup table corresponding to theZ i r iascrambler imple-mentation in Listing 1

    The lookup table implementation may be efficient, but it gives thereader no insight into the computation being performed. Nor isit easy to verify by inspection that this implementation meets thescrambler specification given in the WiFi standard.

    WiFi receiver The next code example illustrates the other end ofthe Z i r ia spectrum: a complete signal processing pipeline for an802.11a/g receiver. This pipeline consists of a number of signalprocessing components, a block diagram for which is given in

    Figure2. We first describe what each component does, then take acloser look at the corresponding code, which is given in Listing3.

    The first block, Detect Carrier, determines whether a WiFi trans-mitter is operating on the radio channel by looking for a knownconstant preamble sequence. Once carrier detection observes thepreamble of a valid packettransmission, the pipeline enters a channelestimation phase, operating over the subsequent 160 input samples,in which it attempts to quantify physical effects such as multipathfading on the transmitted signal. The channel information is thenused in the channel inversion block of the steady state of the pipeline(in gray) to nullify these effects. This block feeds input data first toa block that decodes the packet header, then to a block that decodesthe packet payload. Information from the header decoding phasesuch as the data rate is used to configure the packet decoding stage.

    1 let detectCarrier() = CCA() in

    2 let channelEstimation = t11aLTS() in3 let invertChannel(cInfo:int) =4 t11aDataSymbol() >>> t11aFreqCompensation() >>>5 t11aFFT() >>> t11aChannelEqualization(cInfo) >>>6 tPhaseCompensate() >>> tPilotTrack()7 in read >>> downSample() >>> (8 detectCarrier();9 cInfo channelEstimation();

    10 invertChannel(cInfo) >>> (11 pInfo t11aDecodePLCP();12 t11aDecode(pInfo))) >>> write

    Listing 3. Top-level Z i r iapipeline for WiFi 802.11a/g receiver

    3 2014/6/16

  • 8/11/2019 Ziria

    4/11

    The Z i r iacode corresponding to the pipeline of Figure 2is givenin Listing3. The structure of this code follows that of the blockdiagram quite closely. After reading from the input stream and down-sampling (line7), the first operation performed is carrier detection(line7). Here, detectCarrier() is an alias introduced with a let-binding (line1) for another Z i r iafunction, CCA()or clear channelassessment. Carrier detection is sequenced using the bind operatorwith channel estimation, a Z i r ia stream computer that returns achannel information value cInfo. The channel information cInfoispassed as an argument to invertChannel, a Z i r iafunction defined atline3. Finally, we stream the output of channel inversion to a Z i r iacomputer that first decodes the packet header (t11aDecodePLCP()),and then decodes the packet payload (t11aDecode(pInfo)).

    3. Language

    In this section, we present more details about the Z i r ia languageand the key ideas behind its type system and operational semantics.

    3.1 Syntax

    The key abstraction inZ i r iais that of stream computationsc , whosesyntax is given in Figure 3. We already presented in Section 2

    the primitive combinators take, emit, and repeat. The returncombinator trivially evaluates its argument and returns it on thecontrol channel. Themapp fq combinator applies fto every valuein the input stream, emitting the result onto the output stream.

    As explained in Section2,primitive combinators can be composedalong the control or the data path with Z i r ias composition opera-tors bind (x c1 ; c2 ) and arrow (c1 >>>c 2), respectively. Inaddition to these constructs, Z i r ia includes local functions that canbind computations (letfunc) and computation function applicationsc p eq. It also allows for mutable computation-local state, with theletrefconstruct. We will return to the use of shared state betweenprocessing pipeline components throughout the rest of the paper,since the careful treatment of state in Z i r ia is key to a rich set ofoptimizations that dramatically improve performance. The condi-tional computationife then c 1 else c 2 configures the computation

    to be eitherc1

    orc2

    depending on the value ofe . More convention-ally, Z i r iaincorporates a sub-language of imperative expressions(e) and functions (letfun) that can be used in computations suchas mapp fq andemit e. This imperative sub-language includesconstructs for manipulating primitive types such as bits, complexnumbers, and arrays. The details are not particularly interesting.

    The Ziriatype language is stratified into stream (ST) types andexpression types. Stream types describe stream computations. Forexample, a stream transformer computation with typeST T a b(e.g.mappfq) executes indefinitely, transforming a stream of values oftypea to a stream ofbs. As we originally explained in Section2,stream computers of type ST pC q a b are similar, except thatat some point during stream processing they may conclude thecomputation by returning a value of type . The language ofexpression types is standard, and includes a rich collection of basetypes, arrays, etc. We remark that the type language disallowsarbitrary higher-order functions and partial applications in well-typed Ziria programs, to avoid closure allocation costs. Finallystreams may only contain values with types.

    3.2 Type system

    The Ziria type system (Figure 4) includes a judgment form fortyping computations ;; $ c : as well as a judgment (notshown) for typing imperative expressions; $ e : . The typingrules make use of three type variable contexts. maps variables toexpression types, maps computation variables to computationtypes , and the store typing maps mutable variables to base

    Stream computations and imperative expressions

    c : : x c ; c Bind| c>>>c Arrow| letrefx : : v in c Scoped shared state| letfun g p x :q m in c Local function| letfunc f p x :q c1 in c 2 Local computation| c p eq Call computation function

    | take Move to control channel| emit e Emit on data channel| returnm Return on control channel| repeatc Repeatc indefinitely| mappfq Map over input stream| ife thenc1 elsec2 Conditionals| . . .

    e, m :: x| v | x :e |m1; m2 | fpeq | . . . Expressionsv :: pq |i| . . . Values

    Computation and expression types

    :: | Computation types :: STpCq |ST T Computers/Transformers :: | Expression typesa,b,, :: unit| int|complex| . . . Base types

    Figure 3. Syntax ofZiria

    ;;$ c :

    ;; $ c1 : ST pCq a b, x: ;;$ c2 : ST t a b

    ;;$ x c1; c2 : ST t a b

    ;;$ c : STpC unitqa b

    ;;$ repeatc : ST T a b

    ;;$

    $&%

    take: STpCaqa bemite : STpC unitqa , if; $ e :

    returnm : STpCqa b, if; $ m : mappfq : ST Ta b, ifpf:a bq P

    9 p1, 1q p2, 2q1;;1 $ c1 : ST t a b2;;2 $ c2 : ST T b c

    ;;$ c1>>>c2 : STt a c

    9 p1, 1q p2, 2q1;;1 $ c1 : ST T a b2;;2 $ c2 : STt b c

    ;;$ c1>>>c2 : ST t a c

    Figure 4. Computation typing (tP tpCq, Tu)

    expression types. Type checking the access to a mutable variablelooks up the variable in the store typing , whereas accessing afunction argument looks it up in (for read-only variable).2

    We discuss some interesting typing rules ofZiria, given in Figure4.The typing judgment for bind (x c1; c2) requires that c1 is astream computer taking streams ofas to streams ofbs, potentiallyreturning a value of type . Thec2 component has a stream typeSTt a b in the extended context , x:; . Note thatc2 may beeithera stream computer or transformer, but its type determines the

    type of bind. The rule forrepeatc types it as a stream transformerwhenc is a computer that returns pq. takehas typeST pCaqa bsince it takes a value of type a from the input stream and immediatelyreturns it. It is polymorphic in the output queue and thus can becomposed using>>>with stream computations of arbitrary inputtypeb. emite is a stream computer that evaluates an expressione: and yields it onto the output stream, returning the unit valuepq(STpC unitqa ).emitcan be composed using>>>to the right of

    2 The full language also includes a let-binding in the computation andexpression languages for introducing immutable local variables. The types

    of these variables are tracked in .

    4 2014/6/16

  • 8/11/2019 Ziria

    5/11

    stream computations with arbitrary output types a. Finally,returnlifts an expressionm : into the language of stream computations(STpCqa b) by evaluating it and returning it on the control path.The typing rules for the>>>combinator are slightly more involved.The arrow combinator (>>>) can appear overloaded, and can betyped with two typing rules. The first says that c1>>>c2 has typeSTt a cifc1 is a stream computation (transformer or computer)takingas tobs (STt a b) andc2is a streamtransformer(ST Tb c)taking bs tocs. The second>>>rule is symmetric.The>>>rules type componentsc1 and c2 ofc1>>>c2 in differentcontexts in order to prevent communication through shared state.Imagine thatc1 andc2 communicate via a buffered channel. Anelement that has been produced by c1 under a particular view ofthe shared state and emitted on the communication channel mayeventually get processed byc2 under a different view of the sharedstate, sincec1 may have updated the state in the meantime. Undersuch conditions the semantics ofc1 >>> c2 becomes difficult toreason aboutin fact, we will see in Sections 4.2and4.4that thepresence of shared state invalidates two important optimizations.To avoid these problems we split the store typing into two pairsp1,1qandp2, 2q, to ensure that no computation writes to statethat the other component either reads or writes. For reasons of spacewe give the rules of this relation just for single-variable contexts:

    px:q 9 ppx:q, q ppx:q, q px:q 9 p, px:qq p, q

    px:q 9 p, q p, px:qq

    3.3 Reference semantics

    Ziria programs compile to an IR in C. We now give a semanticsthat captures the essence of the IR execution model. In additionto the tick/process-based execution (outlined in Section2), eachcomponent is equipped with an initializationfunction.

    A Ziriacomputation becomesactivatedthrough initialization (judg-ment , Figure6), meaning that it is ready to participate in the maintick/process loop. The judgmentS; c S1;hinitializes the compu-tationc in stateS. Arbitrary code may be executed during initializa-tion, hence the state Smay be updated to S1. A bindpx h1; cqthat is activated contains an activated first componenth1, ready to

    be ticked. h1 >>> h2 is the activated version ofc1 >>> c2 inwhich both components are activated. The activated computationt|L, h|uarises from localletrefdefinitions. Simple primitives are ac-tivated immediately but note that emitv and returnv only mentionvalues. The initialization of components that include expressionsforces their evaluation, and this in turn may cause state updates.Finally,repeatc initializes by activatingc to h and reconfiguringtoph; repeatcq, which allows the runtime to subsequently tick thefirst componenth.

    Runtime Once a computationc has been initialized, it can partici-pate in the main loop. We model the runtime execution in Figure5.The judgment steps triples of an input buffer I, an activatedcomputationh, and an output component O to a new configuration.The intuition behind the rules in Figure5is that we first try to tick acomponent (using) and if that returns a result kind ofcons, wepeel a value off the input stream and push it through the computa-tions process function using. Results are emitted onto the outputstream. To make the presentation shorter, the relation has nocase fordonepvqvalueswe assume that the top-level computationwe evaluate is a stream transformer and hence does not return.

    Tick and process As described in Section2,the process functionof a Ziria computation returns a result r , either skip, yield v, ordone, while tick returns a result kindk: either an immediate result(immr) or a request for more input (consume). Figure5gives theprecise definition of (tick) and (process) for some of the moreinteresting Ziriacomputations.

    Runtime semantics I;S;h; O I1; S1; h1; O1

    S;h S1; h1; immpskipq

    I;S;h; O I; S1; h1; O

    S; h S1;h1; immpyieldpvqq

    I; S; h; O I; S1; h1; v:O

    S; h S1;h1; consS1; h1; v S2; h2; skip

    pv:Iq; S; h; O I;S2; h2; O

    S; h S1;h1; consS1; h1; v S2; h2; yieldpuq

    pv:Iq; S; h; O I; S2; h2; pu:Oq

    Figure 5. Runtime semantics

    Basic definitions

    r :: skip| yieldpvq | donepvq Resultsk :: immprq | cons Result kinds

    S, L :: | S, xv Storesh :: t|L, h|u | px h1; c2q Activated computations

    | h1>>>h2 | take| emitv | returnv | mappfq

    Computation initialization (excerpt) S; c S1; h

    S; c1 S1; h1

    S; pxc1; c2qS1

    ; pxh1; c2q

    S; c1S1; h1 S1; c2S2;h2

    S; pc1>>>c2qS2

    ; ph1>>>h2qS; c S1; h

    S; repeat c S1; ph; repeat cq

    S, xv; c S1, h

    S; letrefx: :v in cS1; t|xw, h|u

    Figure 6. Computation initialization, used in Figure7

    Tick (excerpt) S;h S1;h1; k

    S; h1 S1; _; immpdonepvqqS1; c2rv{xs S2;h2 S2; h2 S3;h12; k

    S; px h1; c2q S3; h12;k

    S; h2

    S1

    ; h

    1

    2; consS1; h1 S2; h11; immpyieldpvqq S2;h1

    2; v S3; h2

    2; r

    S; h1 >>> h2 S3; h11 >>>h2

    2; immprq

    S; emitv S; returnpq; immpyieldpvqq

    S; returnv S;L; returnv; immpdonepvqq

    Process (excerpt) S; h; v S1; h1; r

    S, L;h; v S1, L1; h; r

    S; t|L, h|u; v S1; t|L1, h1|u; r

    S;h1; v S1; _; donepwqS1; c2rw{xs S2; h2

    S; px h1; c2q; v S2;h2; skip

    S; h1; v S1; h11; yieldpwq

    S1;h2; w S2; h12; r

    S;h1>>>h2; v S2; h11>>>h1

    2; r S; take; v S; returnv; donepvq

    Figure 7. Tick and process semantics

    The judgment takes an initialized h and a storeSand maps theseto a new computationh1, a new store S1, and a result kind k. Thejudgment takes anh that has requested data to process, an inputvalue v , and an initial store S, and returns a new h 1, an updatedstoreS1, and a resultr. The topmost tick rule for bind defines whathappens when the computationh1returns withimmpdonevq(withpossible side effects toS): First, we reconfigure c2 by substituting

    5 2014/6/16

  • 8/11/2019 Ziria

    6/11

    (:)repeat (xtake;emit(e)) letfun f(x) = e in map(f)()xreturn e; c let x = e in c()return e; c let _ = e in ctimes e1 i (return e2) return (for i in (0,e1) e2)

    (xletfun f(..) = e in c1); c2 letfun f(..) = e in (xc1; c2)

    letfunc f(..) = return e in ... f(..) ...

    letfun f(..) = e in ... return (f(..)) ...(if e then c1 else c2) >>>c3

    if e then c1>>>c3 else c2>>>c3

    Figure 8. Selected rewrite steps performed by the Z i r ia optimizer

    v for x (c2rv{xs) and initialize it to h2. Then we tickh2. Note howwe have discarded the local state ofh1 as it is no longer needed.

    Next consider the tick rule shown for arrow (>>>). While we onlyshow one of four rules for >>>here due to space constraints, allfour >>> rules together encode a single pattern of computation:When we tick the computation h1 >>> h2, we always start byticking h2 first. This is necessary if h2 is a computation likeemitthat yields values to the output stream without requiring anyinput. For example,c1 >>> repeatpemitp3qqis a computation that

    after initialization will tick indefinitely, producing a stream of3sand ignoringc1. Ticking from right-to-left means that we need nointermediate buffering between sub-computations, since we alwaysdrain the pipeline before requesting additional input. The otherthree tick rules for>>>which we have not shown here encode thesituations in which (1)h2requires input andh1has an input to push;(2)h2 requires input andh1 ticks to an immediate result that isskipordonebut notyield, in which case the whole computation returnsthat result; or (3) both h1 and h2 require input, in which case theentire computation returnsconsume. The process rules for>>>aresimilar except that instead of ticking from right to left, we process avalue through the pipeline from left to right (i.e., firsth1 thenh2).

    The tick and process rules for base combinators are mostly straight-forward. Tickingemite immediately yieldses valuev and steps toreturnpq, returning the unit value the next time the computation is

    ticked. Tickingtakeimmediately returnsconsumewhile processingtakewith inputv just returnsdonev . There are no process rulesforemitand returnbecause the runtime semantics will never callprocess on these components. The process rule for take convertsthetaketo areturnand producesdonev as result.

    4. Compiler

    Compiling Ziria, as opposed to implementing it as a library, meansthat we can apply optimizations to the actual Ziriacode, for betterperformance. This section details these optimizations, which includeautomatic vectorization, automatic lookup table (LUT) generation,and annotation-guided pipelining. Section5measures performanceof each optimization via microbenchmarks and for WiFi pipelines.

    The Ziria compiler has a standalone frontend, including a parserand typechecker. The frontend generates code in an explicitlytyped intermediate language. The code generator outputs C code incontinuation-passing style, which allows us to replace calls to thetick, process, and initialization functions with direct jumps to labels.

    4.1 High-level optimizations

    The Z i r ia compiler implements a series of type- and semantics-preserving optimizations that eliminate or fuse computations to-gether in order to decrease the amount of processing that occursacross computations. This means less copying and fewer jumps tothe outer runtime loop. Figure8 gives some of the rewrite rulesthat the Z i r ia compiler implements. Particularly interesting are

    the rewrite rules (marked ) that convert returnstatements to letdefinitions and theauto-maptransformation (marked: ), which re-places a repeated take-emitblock with a single map transformer.For example, in our scrambler implementation from Listing 1,theauto-map transformation allows us to extract the scrambler impera-tive code (lines6through10)into alet-bound function definition fand replace the entire body of the scrambler component with a call tomap(f). This transformation is important for producing performantcode, as the overhead measurements in Section 5.2show.

    Other optimizations justified by the runtime semantics of Section3include the conversion of computation loops 3 to for-loops, inliningand loop unrolling, and floating definitions and conditionals out ofcomputations to enable other optimizations like auto-mapping. Notethat the correctness of pushing conditionals above>>>relies on thestate invariants that we decribed in Section 3.

    4.2 Vectorization

    Wireless protocols are designed and specified so that each basiccomponent operates at an intuitive data granularity. For example, inSection2we saw that in each step of computation, the scrambler ofListing1takes a single bit from its input queue and emits a singlebit to its output queue. This implementation matches the granularity

    of the specification quite closely. Unfortunately, the tight timingdeadlines of most wireless protocols mean we do not always havethe luxury of sacrifing performance by, e.g., packing one bit per charor operating on only one 32-bit precision complex number at a timeon a 64-bit CPU. Our pipelines would be much more efficient if wewere able to process arrays of elements simultaneously. Ideally, wewant tobatchthe inputs and the outputs of components.

    However, this is not straightforward. Different components operateat different data granularities. For example, the Wifi scrambler isfollowed by an encoder. Depending on the selected data rate, theencoder will be configured to one of three variants which take 1, 2,or 3 input bits, and produce 2, 3, or 4 output bits, respectively. Inorder to correctly vectorize, we must ensure that the vectorization ofthe scrambler is matched to the vectorization of all possible variantsof the encoder. Otherwise, a component may terminate with un-emitted data. Matching granularities in a large program is tediousand has been done manually in frameworks like Sora. A benefit ofour compiler is that it automatically vectorizes programs to operateon arrays, similarly to [9, 26]. The goal of this transformationis to rewrite a computation of type ST t a b to one of typeST t parray[N] aq parray[M] bq. This process consists of threesteps:

    1. First, an analysis identifies the number of values that a computer(one of type ST pC q a b) takes from the input stream (callthis in) and emits to the output stream (call this out) beforereturning. We call this thecardinalityinformation.

    2. Next, for every repeat cwith an identified cardinality for c wetake the union of two candidate sets of possible vectorizations:The first set comes fromscale-upvectorization: All candidatesin this set have types of the form ST pC q parray[k m in]aq parray[mout]bq. We allow components to take input

    arrays that are any multiple (k m) of the input cardinality, butrestrict the output array so that the multiplicity (m) is divisibleby the output multiplicity. We do not vectorize the output to anymultiplicity of the output cardinality to avoid situations wherethe output array is only half-filled but there is no more data toprocess on the input. The second set comes from scale-downvectorization, which only applies if the component has a largein or out cardinality (not uncommon in the Wifi pipelines).Scale down vectorization will create a transformed computation

    3 Defined using thetimescombinator, a variation ofrepeat that repeats acomputation a finite number of times.

    6 2014/6/16

  • 8/11/2019 Ziria

    7/11

    that has typeSTpCq parray[din]aq parray[dout]bq, for somedivisordin and dout ofin and out respectively.

    3. Once we have identified scale-up and scale-down sets for com-putations in the pipeline, we must compose them across the bindand arrow operators in the program in a way that maximizesperformance. Note that our optimization should aim to increasevectorization sizes across all components equally, as the smallestbatch size is likely to be a bottleneck. In order to achieve thisbalance, we use the optimization framework from [20], whichis well-studied in the context of networking. Each arrow andbind operator is assigned a utilitynumber, which is obtained byapplying a predefined concave function4 to the correspondingvectorization size. We attempt to find a composition of vector-izations that respects type correctness and maximizes the sumof all utilities. One key benefit of the approach from [20] isthat by maximizing the sum of concave utilities we balance theoptimization across all batch sizes. The other key benefit is thatthis can be done very efficently, in a greedy manner.

    The following is the automatically vectorized version of the scram-bler from Listing 1 where input and output types have been convertedto arrays of 8-bits that can be conveniently packed into chars in thegenerated C code.

    1 let vectorized_

    scrambler (u: unit) =2 letref scrmbl_st: arr[7] bit := {1,1,1,1,1,1,1};

    3 tmp:bit; y:bit

    4 in repeat5 (let up_wrap_17 () =

    6 letref ya_19: arr[8] bit in

    7 (xa_18 : arr[8] bit) take;8 (times 8 (\j_21.9 x return xa_18[0*8+j_21*1+0];

    10 return (11 tmp := scrmbl_st[3]^scrmbl_st[0];12 scrmbl_st[0:+6] := scrmbl_st[1:+6];13 scrmbl_st[6] := tmp;14 y := x^tmp);15 return ya_19[j_21*1+0] := y));

    16 emit ya_19

    17 in up_wrap_17())

    Listing 4. Auto-vectorized scrambler

    Of course this is a relatively complicated computation that includesmany sub-computations, but fortunately the optimizer shines here:Post-vectorization and post-optimization, this program is automati-callyconverted to use a tight expression-level for-loop and the wholecomputation is auto-mapped into a single mapcomputation.

    4.3 Lookup table generation

    Lookup tables (LUTs) are used pervasively in Sora for performance.However, as Section2demonstrated when comparing Soras lookuptable-based implementation of the scrambler (Listing2)to the Ziriaversion (Listing1), writing functions that use LUTs leads to codethat is difficult to write, read, and modify. Furthermore, the size ofthe LUT may depend on the outcome of other optimizations such as

    vectorization, leading to frequent LUT recomputation.Ziriafrees the programmer from having to forsake readability forperformanceand from the pain of generating LUTs by handby providing compiler support for transparently compiling high-level functions to LUTs. Functions may be hand-annotated withthe keyword lut, or the programmer may direct the compiler toautomatically detect portions of a Ziriaprogram that are amenableto a LUT implementation. In this case, the compiler looks forexpressions that are complex enough to be worth converting to aLUTs, but whose LUTs sizes are reasonable.

    4 In our implementation we uselogpq, as in[20].

    For instance, our auto-LUT analysis automatically identifies that thebody of the auto-mapped function obtained by further optimizationof Listing4has inputs of total bitwidth 15 (scrambl_st, and xs_18)and outputs of total bitwidth 25 (8 for the result of the function, 8for ya_19, 7 for scrmbl_st, and 2 for tmpand y), and automaticallycreates the corresponding ~100K LUT. Moreover,Ziriaincludes abit permutation primitive,bperm, which is automatically convertedto a LUT if the permutation table is statically known. bpermis useful in wireless interleavers, which reduce errors in signaltransmissions by applying pseudorandom permutations.

    4.4 Pipelining

    As explained in Section3,when two Ziria computationsc1 and c2are connected by>>>, they cannot mutate shared state and hence canbe pipelined onto multiple cores on SMP platforms. To support thiskind of parallelization the Ziriacompiler allows user annotationsof typec1|>>>|c2. This code is automatically pipelined onto twocores by (i) allocating a fresh single-reader single-writer queue q,(ii) transforming the code toc1>>> writepqq|>>>| readpqq>>>c2,then (iii) spawning two threads: c1>>> writepqq, which is pinned tophysical core1, and readpqq>>> c2, which is pinned to physical core2. The design of synchronization queues is adapted from Sora[29].

    5. Evaluation

    In this section we seek to answer the following questions aboutthe performance of Ziria: (1) What is the overhead of Ziriasexecution engine (2) What is the speedup of various compileroptimizations? (3) How doesZiria compare with state-of-the-artimplementations of a real-world wireless protocol? To answer thesequestions we perform two sets of measurements. First, we usea number of microbenchmarks to quantify the overheads ofZiriacombinators bind, arrow (>>>), and pipelined arrow (|>>>|). Thenwe present an implementation and a detailed evaluation of WiFi(802.11a/g) in Ziria. We show the speedups achievable with theZiriacompilers automatic vectorizer, lookup table generator, andannotation-guided pipeliner, and associated overheads on a real-world wireless physical layer protocol. As a baseline for comparison,

    we use Sora [29], one of the few CPU-based SDR platforms with aline-rate implementation of a full WiFi PHY.

    5.1 Methodology

    We evaluate the Ziria framework on a Dell T3600 PC with anIntel Xeon E5-1620 CPU at 3.6 GHz, running Windows. In orderto compare our performance results to Sora, we use the same Ccompiler as Sora (Windows Driver Development Kit version 7) andwe adapt our runtime to use Soras runtime libraries. In particular,we use Soras user-mode threading library, which allows us to runour user-mode threads at the highest priority and pinned to a specificcore, effectively preventing the OS from preempting the execution.

    We evaluate our framework on various DSP algorithms that are partof a standard WiFi transceiver. Some of these algorithms operate onbits (e.g.most of the TX) and some on complex samples (e.g.most

    of the RX). The throughput of each algorithm is proportional tothe sample width in bits. However, the overhead ofZiriaexecutionmodel mainly depends on the number of data items being processedand not their actual widths. Therefore, throughout this section wepresent the throughput in units of Md/s (mega data per second).

    Sora hardware, as well as the hardware of other high-performanceSDR platforms, is designed to mitigate I/O overheads. The Sorasoftware component fetches radio samples through PCI bus at aspeed comparable to main memory reads. In our evaluation wefocus on measuring the performance of the software component.When we evaluate the components performance we read inputsamples directly from the memory and discard them at the output.

    7 2014/6/16

  • 8/11/2019 Ziria

    8/11

    0 50 1000

    0.5

    1

    1.5

    Number of blocksR

    untimeperdatum[

    us]

    Bind

    Baseline

    (a) Bind

    0 50 1000

    1

    2

    3

    4

    Number of blocksR

    untimeperdatum[

    us]

    RepeatMap

    Baseline

    (b) Arrow (>>>q

    0 100 2000

    0.5

    1

    1.5

    2

    Number of sin() calls

    Runtimeperdatum[

    us]

    1 thread

    2 threads

    (c) Pipelined arrow (|>>>|)

    Figure 9. Overheads of various Ziriacomponents.

    5.2 Microbenchmarks

    Bind We first measure the overhead of bind. We do this bymeasuring the runtime of a program containing n computationcomponents bound together in sequence, with each componentconsisting of a single call to sin(). As a performance baseline,we use the runtime of a semantically equivalent program in whichalln sinoperations are executed in a single component.

    The results are depicted in Figure 9(a). The dashed line givesruntime of the baseline program for n sincomputations. The solidline gives runtimes for n sincomponents bound in sequence. Weran each program10 times on20 million inputs and report averageexecution time per data item (confidence intervals are very small).We verify that the runtime data fit a linear model as a function of

    n, indicating that the cost of bind grows linearly with the numberof components. The cost of a single bind operation on our system,given by the difference of the slopes of the two lines, is around 3ns.

    Arrow Next, we measure the cost of arrow (>>>). To do so wemeasure the runtime of a program containingn components com-posed with>>>, with each component repeating the computation(xtake; emit(sin(x))). This experiment is labeledRepeat. Wealso measured the runtime of an optimized version of this programthat usesmapcomputation map(sin(x))in the experiment labeled

    Map. As a baseline, we use the runtime of a program in which alln sincalls are merged into a single component. Again, we ran thecode 10 times on 20 million inputs each time and report averageexecution time per data item (confidence intervals are very small).

    The results are depicted in Figure 9(b).The cost of a single repeatarrow operation on our system, given by the difference of the slopesof the Repeat and Baseline lines, is around 24ns, and the cost ofa single map arrow is around 1ns. This difference stems fromthe fact that the repeat computation has to execute several ticksand procs in each round, whereas the mapexecution is completelystreamlined. However, as discussed in Section4.1,our high-leveloptimizations are often able to transform a repeatcomputation intoa map computation, and mitigate this overhead. The overhead istypically further amortized over multiple inputs due to vectorization.

    Pipelined arrow To gauge the overhead of pipelining Ziria pro-grams onto multiple cores using |>>>|, we measured the runtimeofn sincalls when (i) run on a single core, and (ii) divided evenly

    onto two cores. Plot9(c)shows the results of this experiment. Thered dashed line and the solid black line give the execution time perdatum in microseconds when running on a single core and on twocores, respectively. The point at which these two lines intersect,approximately30 computations per datum, is the point at which webreak even, i.e., when pipelining gives speedup rather than slow-down. Furthermore, speedup is approximately1.7x at 60 calls and2x at 90 calls. As we show later in the WiFi evaluation, most signalprocessing blocks are computationally intensive, and the pipeliningbenefits are on the high side.

    5.3 Performance of WiFi 802.11a/g

    This section evaluates the performance of our Ziriaimplementationof the physical layer of 802.11a/g, the most popular variant of theWiFi protocol. Our implementation consists of 3k lines ofZiriacode in total. We evaluate data throughput of a number of theprocessing blocks in our WiFi transmitter (TX) and receiver (RX),as well as end-to-end performance of both TX and RX.

    We compare our Ziriaimplementation with the manually optimizedSora implementation [29]and the WiFi requirements. We have firstverified the correctness of eachZiriablock against the correspondingblock in the Sora implementation. We then profile each block in

    both implementations by sending them the same input data.Different implementations may use different signal processingalgorithms in an effort to improve the receivers performance. Toallow for a fair comparison, our WiFi implementation uses the samereceiver algorithms as the Sora implementation.

    As a part ofZiriawe provide a basic signal processing library. Thislibrary provides a high-level interface for very efficient implemen-tations (borrowed from Sora) of three common signal processingblocks: FFT, IFFT and Viterbi. These blocks are standardized andreused across all modern physical layers (WiFi, WiMax, LTE), andtheir efficient implementations are already available across a largerange of SDR platforms. The library also includes a basic SIMDlibrary for vector operations that operates on Zirias basic types.

    In the case of the end-to-end transmitter and receiver code, differentinput data may activate different control paths in the code. We

    profile these various code paths separately, both for Sora and Ziriaimplementations. In particular, we profile the signal detection path(CCA) at the receiver that is running when the receiver is scanningfor a packet, as well as the transmitter and the receiver at differentdata rates (6, 12, 24, and 48 Mbps).

    5.3.1 Receiver

    We start by profiling the receivers building blocks. In order to assesthe performance of our vectorization algorithm, we compile eachblock with no vectorization, manual vectorization copied from theSora implementation, and the automatic vectorization described inSection4.2. We also enable lookup table generation, but it does notplay a major role in the receiver since most of the receiver blocksoperate on complex samples and cannot be converted into lookup

    tables. The results are given in Table1and Table2.Although Soras WiFi implementation is manually tuned for highperformance, all of the blocks in our high-level Ziria implemen-tation are less than two times slower than the corresponding Sorablocks, and many of them approach the performance of their Soracounterparts. Furthermore, all our end-to-end code paths satisfy theWiFi specification requirements of40 Msamples per second.

    We also observe that vectorization can significantly improve theperformance ofZiriacode, often by more than an order of magnitude.We see that the performance of our automatic vectorization comesclose to the performance of the manually vectorized Ziriacode inwhich we use Soras vectorization annotation.

    8 2014/6/16

  • 8/11/2019 Ziria

    9/11

  • 8/11/2019 Ziria

    10/11

    performance. Similar considerations apply to VOLK, in which theprogrammer must choose and manually configure the vectorizationwidths of each component in a GNURadio pipeline. In Ziria, weavoid this sort of manual configuration by automatically vectorizingand generating lookup tables for compiled Ziriacode. The result ishigh-level code close to the standard specification that still runs fast.

    Dataflow languages In addition to work on SDR, Ziria buildson a significant body of programming languages research. Syn-chronous dataflow languages [8,11,12] have been used in embed-ded and reactive systems for modeling and verification butto ourknowledgenever to implement line-rate software PHY designs.StreamIt [31], also based on synchronous dataflow, was one of theearly works to target DSP applications in software. Example pro-grams in StreamIt include software radios, like Wifi and 3GPP PHY.StreamIt demonstrated that a DSL can enable significant optimiza-tions, and emphasized on multicore execution [17].

    Ziria learns from StreamIt and makes improvements in severalrespects. First, StreamIt programs are graphs of independent filters.To express controldependencies between nodes in the graph oneuses splitter (demultiplexing) blocks, or teleport messages[32], aform of asynchronous message passing with guaranteed logical

    delivery bounds. Zirias explicit treatment of control fortunatelyhelps here. An example of the need for teleport messaging inStreamIt comes from frequency hopping (Figure 12 [32]), whichcan be expressed in Ziriaas:

    1 aToD >>> letref freq : float = startFreq in repeat (2 newFreq rfToIf()>>>fft()>>>checkFreqHop(freq);3 return (freq := newFreq))

    The checkFreqHopis acomputerthat eventually returns a new fre-quency after emitting some elements with the old frequency. Ziriastreatment of shared state with the restricted typing of>>>exposespipeline parallelization opportunities. Ziriaalso exposes state (re-)initialization and associates it with bind reconfiguration.These fea-tures are in accordance with lessons distilled from the StreamItexperience[30]. Unlike StreamIt, Ziriadoes not support data paral-

    lelism, which appears to be less applicable in wireless PHY designdue to heavy control dependencies in processing pipelines. Finally,although there is a StreamIt WiFi implementation, it is unknownwhether it can be deployed at line rates.

    Functional programming and functional reactive programmingZiria builds on a broad range of techniques from functional pro-gramming. The design ofZiriaand its key combinators and theiroptimization draw from monads and arrows[18,22]. A flavor ofour bind combinator (called switch) can be found in Yampa [13],a popular functional reactive programming (FRP) framework. How-ever, Yampa encodes control information onto the data channel,whereas Ziria keeps these notions separate. The Ziriastickandprocess semantics bears some resemblance to the push and pullmodel of computations. Typically FRP uses the pull model, but there

    exists recent work on combining the two to improve efficiency [ 16].

    The vectorization transformation is inspired by work in nested dataparallelism [9, 26]. Zirias stream computation semantics wasinfluenced by the work on stream fusion. For example, the resulttype we use to implement a single step of stream computation in theruntime model of Section3is effectively the same as that used torepresent streams in Coutts et al. [14].

    Finally, there exists recent work on using high-level functionallanguages for DSP applications that aspires to match the efficiencyof low-level implementations [4]. In principle, such a languagecould replace the imperative sub-language inside Ziriablocks.

    7. Conclusion

    We presented Ziria, the first high-level SDR platform with a per-formant execution model. To validate our design, we built a com-piler that performs optimizations done manually in existing CPU-based SDR platforms: vectorization, lookup table generation, andannotation-guided pipelining. To demonstrateZirias viability, weused Ziriato build a rate-compliant PHY-layer for WiFi 802.11a/g.

    References

    [1] Texas Instruments TMS320TCI6616 Communications InfrastructureKeyStone SoC.

    [2] Ettus Research, USRP: Universal Software Radio Peripheral.

    [3] Microsoft Research, Brick specification, 2011.

    [4] E. Axelsson et al. Feldspar: A domain specific language for digitalsignal processing algorithms. InFMMC, 2010.

    [5] H. V. Balan et al. USC SDR, an easy-to-program, high data rate, realtime software radio platform. InSRIF, 2013.

    [6] M. Bansal et al. OpenRadio: a programmable wireless dataplane. InHotSDN, 2012.

    [7] T. Bansal et al. Symphony: Cooperative packet recovery over the wired

    backbone in enterprise WLANs. In MOBICOM, 2013.[8] G. Berry and G. Gonthier. The ESTEREL synchronous programming

    language: Design, semantics, implementation. SCP, 19(2), 1992.

    [9] G. E. Blelloch and G. W. Sabot. Compiling collection-orientedlanguages onto massively parallel computers. JPDC, 8(2):119134,Feb. 1990.

    [10] E. Blossom. GNURadio: tools for exploring the radio frequencyspectrum. Linux Journal, 2004(122):4, 2004.

    [11] P. Caspi. Lucid Synchrone. InActes du colloque INRIA OPOPAC,Lacanau. HERMES, November 1993.

    [12] P. Caspi and M. Pouzet. Lucid Synchrone, a functional extension ofLustre. Technical report, Univ. Pierre et Marie Curie, Lab. LIP6, 2000.

    [13] A. Courtney and C. Elliott. Genuinely functional user interfaces. InHaskell Workshop, 2001.

    [14] D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From liststo streams to nothing at all. InICFP07, 2007.

    [15] A. Dutta, D. Saha, D.Grunwald, and D. Sicker. CODIPHY: Composingon-demand intelligent physical layers. InSRIF, 2013.

    [16] C. Elliott. Push-pull functional reactive programming. InHaskellSymposium, 2009.

    [17] M. I. Gordon. Compiler Techniques for Scalable Performance ofStream Programs on Multicore Architectures. PhD thesis, 2010.

    [18] J. Hughes. Generalising monads to arrows. SCP, 37(1-3), 2000.

    [19] IEEE. Part 11: Wireless LAN MAC and PHY specifications high-speedphysical layer in the 5 GHz band, 1999. URLhttp://standards.ieee.org/getieee802/download/802.11a-1999.pdf.

    [20] F. P. Kelly. Charging and rate control for elastic traffic. EuropeanTransactions on Telecommunications, 8:3337, 1997.

    [21] T. Li et al. CRMA: Collision-resistant multiple access. In MOBICOM,

    2011.

    [22] E. Moggi. Notions of computation and monads. Inf. Comput., 93(1),1991.

    [23] P. Murphy, A. Sabharwal, and B. Aazhang. Design of WARP: awireless open-access research platform. InESPC, 2006.

    [24] M. C. Ng et al. Airblue: a system for cross-layer wireless protocoldevelopment. InANCS, 2010.

    [25] G. Nychis et al. Enabling MAC protocol implementations on software-defined radios. InNSDI, 2009.

    [26] S. Peyton Jones. Harnessing the multicores: Nested data parallelism inHaskell. InAPLAS, 2008.

    10 2014/6/16

    http://standards.ieee.org/getieee802/download/802.11a-1999.pdfhttp://standards.ieee.org/getieee802/download/802.11a-1999.pdfhttp://standards.ieee.org/getieee802/download/802.11a-1999.pdfhttp://standards.ieee.org/getieee802/download/802.11a-1999.pdfhttp://standards.ieee.org/getieee802/download/802.11a-1999.pdf
  • 8/11/2019 Ziria

    11/11

    [27] T. Rondeau et al. SIMD programming in GNURadio: Maintainable anduser-friendly algorithm optimization with VOLK. InSDR WinnComm,2013.

    [28] S. Sen, R. R. Choudhury, and S. Nelakuditi. No time to countdown:Migrating backoff to the frequency domain. InMOBICOM, 2011.

    [29] K. Tan et al. Sora: high performance software radio using generalpurpose multi-core processors. InNSDI, 2009.

    [30] W. Thies and S. Amarasinghe. An empirical characterization of streamprograms and its implications for language and compiler design. InPACT10, 2010.

    [31] W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: a languagefor streaming applications. In Compiler Construction, 2002.

    [32] W. Thies et al. Teleport messaging for distributed stream programs. InPPoPP05, 2005.

    11 2014/6/16