a survey of language-based approaches to cyber-physical ...depengli/publication/paul-tst.pdf ·...

12
TSINGHUA SCIENCE AND TECHNOLOGY ISSNll 1007-0214 ll 02/11 ll pp130-141 Volume 20, Number 2, April 2015 A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development Paul Soulier , Depeng Li, and John R. Williams Abstract: As computers continue to advance, they are becoming more capable of sensing, interacting, and communicating with the physical and cyber world. Medical devices, electronic braking systems in automotive applications, and industrial control systems are examples of the many Cyber-Physical Systems (CPS) that utilize these computing capabilities. Given the potential consequences of software related failures in such systems, a high degree of safety, security, and reliability is often required. Programming languages are important tools used by programmers to develop CPS. They provide a programmer with the ability to transform designs into machine code. Of equal importance is their ability to detect and avoid programming mistakes. The development of CPS has predominantly been accomplished using the C programming language. Although C is a powerful language, it lacks features present in other languages that facilitate the development of reliable systems. This has prompted research into language-based alternatives for improving program quality through the use of programming languages. This paper presents an overview of the characteristics of embedded and cyber-physical systems and the associated requirements imposed on programming languages. This is followed by a survey of relevant research into language- based methods for creating safe, reliable, and robust software for CPS. Key words: cyber-physical systems; embedded systems; programming languages; type systems 1 Introduction Cyber-Physical Systems (CPS) exist at the intersection of computation and the physical world. A CPS perceives the world through its sensors and affects change through connected actuators. In a form of feedback, sensors and external inputs influence computation that allows the system to interact with the physical world in a tangible way. CPS exist in various forms, sizes, and complexity including small, stand-alone devices (e.g., sensor nodes or implanted Paul Soulier and Depeng Li are with University of Hawaii, Manoa, HI 96822, USA. E-mail: [email protected]; [email protected]. John R. Williams is with Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA. E-mail: [email protected]. * To whom correspondence should be addressed. Manuscript received: 2015-01-26; revised: 2015-03-29; accepted: 2015-03-30 medical device) or embedded as a subcomponent in a large system (fly-by-wire systems in aircraft). (The terms embedded system and cyber-physical system are generally used interchangeably.) CPS have become an intrinsic part of modern society. They can be found in appliances, medical devices, automotive applications, avionics, military weapons, industrial control systems, power grids, and countless other applications. The seemingly inexorable advances in hardware technology have enabled CPS to expand into new domains. Ubiquitous wireless connectivity has made possible the Internet of Things (IoT) where CPS will undoubtedly play a significant role. As new advancements are made in other disciplines (biotech, medicine, and robotics), it is easy to envision any number of possible applications where CPS will be an essential component. Given current and potential future applications of CPS, the ability to create safe, reliable, and secure software for these systems is self-evident. Developing

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

TSINGHUA SCIENCE AND TECHNOLOGYISSNll1007-0214ll02/11llpp130-141Volume 20, Number 2, April 2015

A Survey of Language-Based Approaches to Cyber-Physicaland Embedded System Development

Paul Soulier�, Depeng Li, and John R. Williams

Abstract: As computers continue to advance, they are becoming more capable of sensing, interacting, and

communicating with the physical and cyber world. Medical devices, electronic braking systems in automotive

applications, and industrial control systems are examples of the many Cyber-Physical Systems (CPS) that utilize

these computing capabilities. Given the potential consequences of software related failures in such systems, a

high degree of safety, security, and reliability is often required. Programming languages are important tools used

by programmers to develop CPS. They provide a programmer with the ability to transform designs into machine

code. Of equal importance is their ability to detect and avoid programming mistakes. The development of CPS has

predominantly been accomplished using the C programming language. Although C is a powerful language, it lacks

features present in other languages that facilitate the development of reliable systems. This has prompted research

into language-based alternatives for improving program quality through the use of programming languages. This

paper presents an overview of the characteristics of embedded and cyber-physical systems and the associated

requirements imposed on programming languages. This is followed by a survey of relevant research into language-

based methods for creating safe, reliable, and robust software for CPS.

Key words: cyber-physical systems; embedded systems; programming languages; type systems

1 Introduction

Cyber-Physical Systems (CPS) exist at the intersectionof computation and the physical world. A CPSperceives the world through its sensors and affectschange through connected actuators. In a formof feedback, sensors and external inputs influencecomputation that allows the system to interact withthe physical world in a tangible way. CPS exist invarious forms, sizes, and complexity including small,stand-alone devices (e.g., sensor nodes or implanted

� Paul Soulier and Depeng Li are with University of Hawaii,Manoa, HI 96822, USA. E-mail: [email protected];[email protected].

� John R. Williams is with Massachusetts Institute of Technology(MIT), Cambridge, MA 02139, USA. E-mail: [email protected].

* To whom correspondence should be addressed.Manuscript received: 2015-01-26; revised: 2015-03-29;accepted: 2015-03-30

medical device) or embedded as a subcomponent ina large system (fly-by-wire systems in aircraft). (Theterms embedded system and cyber-physical system aregenerally used interchangeably.)

CPS have become an intrinsic part of modernsociety. They can be found in appliances, medicaldevices, automotive applications, avionics, militaryweapons, industrial control systems, power grids, andcountless other applications. The seemingly inexorableadvances in hardware technology have enabled CPSto expand into new domains. Ubiquitous wirelessconnectivity has made possible the Internet of Things(IoT) where CPS will undoubtedly play a significantrole. As new advancements are made in otherdisciplines (biotech, medicine, and robotics), it is easyto envision any number of possible applications whereCPS will be an essential component.

Given current and potential future applications ofCPS, the ability to create safe, reliable, and securesoftware for these systems is self-evident. Developing

Page 2: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

Paul Soulier et al.: A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development 131

such systems has been a long-standing challenge incomputer science and software engineering. Softwareengineering and design methodologies, formalverification, simulation, and various other techniqueshave been devised to aid in the production of error-free software. Programming languages are anothersuch tool. In much the same way CPS exist at theboundaries of the computational and physical worlds,programming languages bridge the gap betweenhuman-created concepts and the correspondingmachine code computers used to realize thoseconcepts. Consequently, a language that can effectivelyenable the transformation of concepts into code willresult in systems that operate as expected.

The primary goal of this paper is to surveyresearch focused on improving software quality in CPSthrough language-based techniques. To contextualizethe relevance of language-based techniques to CPS, theunique characteristics of CPS are described as well astheir influence in the design of programming languages.The contributions of this paper are as follows:

� Describe the elements of CPS that differentiatethem from other application domains and influencethe design of programming languages.

� Detail languages currently available for CPSdevelopment and the aspects of languagesthat affect their suitability for use as a CPSdevelopment tool.

� Survey the works over the period 2000-2014intended to improve software quality andreliability of CPS through language-basedtechniques.

� Enumerate current challenges and open problemsthat exist with language-based techniques.

The structure of the paper is as follows: Section 2details the differentiating aspects of CPS from otherapplication domains. Section 3 covers the importanceof programming languages and deficiencies that existwith current tools. Section 4 is a survey of worksrelated to language-based techniques as they relate toCPS. Section 5 discusses open issues and challenges inlanguage-based approaches and the paper concludes inSection 6.

2 Characteristics of Cyber-Physical Systems

The development of software for CPS has many of thesame expectations of a programming language as otherapplication domains. Memory allocation, concurrency,and defining and manipulating data structures are

all concerns. The differences found between domainsbecome more distinct when the amount of controlover these common aspects of programming areexamined. Many applications designed for general-purpose computers are not generally concerned withhow fields are organized within a structure, the sizeof data structure, where memory comes from when anobject is allocated, or even when memory is released.Conversely, CPS are very attuned to these, and manyother, aspects of a system. The manner in which datais represented, where it exists within a structure, andwhere it is stored can all have a dramatic influence onthe ability for a CPS to function as needed. CPS alsodiffer in functional requirements where reliability andreal-time timing constraints can be significantly moreimportant than other fields. This section provides anoverview of the characteristics of CPS that differentiateit from other application domains.

2.1 Reliability

High reliability is a trait frequently attributed toCPS. Users of general purpose computing platformsare accustomed to their computer crashing orrebooting to install updates. While such events areunwanted, they seldom result in anything more thanan inconvenience. Conversely, the failure of a systemcontrolling a power grid or aircraft flight mechanicscan have a significantly more profound impact. Failuresof software in CPS can have catastrophic consequenceswith some examples including aerospace[1], military[2],medical devices[3], and avionics[4]. These casesunderscore the importance of software-correctness inCPS and the potential consequences of software-relatederrors.

2.2 Security

For many CPS, where the device is physically separatedfrom any source of unwanted, external influence,security is not typically a significant concern. Assystems continue to grow in complexity, embeddedsystems not directly vulnerable to security threatsare frequently connected to those that are. CPScan be vulnerable to attack even when not directlyaccessible. Such a case was demonstrated with theStuxnet virus[5]. Wireless communication, Internetconnectivity, and the IoT are further exposing CPS tonew varieties of security threats. For example, Halperinet al.[6] have demonstrated that some ImplantableMedical Devices (IMDs) are subject to a form of

Page 3: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

132 Tsinghua Science and Technology, April 2015, 20(2): 130-141

wireless attack.Where software flaws were once the only significant

mode of failure for CPS, they are now becomingvulnerable to potential modes of attack similar tothose experienced by web services, personal computers,and other wireless or Internet connected device. Aswith reliability issues, it is primarily the result of asecurity breach that differentiates a CPS from othersystems. A security breach in a web service or databaseis likely to compromise data whereas a breach in aCPS can also involve data, but may additionally havea detrimental impact to person or property. Security isquickly becoming a significant aspect of CPS design.

2.3 Real-time requirements

Cyber-physical systems frequently interact withphysical systems. This often necessitates theneed to react within some specific window oftime. This differs considerably from other softwareapplications. Consider a word processor, the differenceof 1 ms vs. 10 ms would likely be unnoticeable to a userin most circumstances. This small timing difference ina CPS, however, can have a significant impact. Take,for example, an electronic breaking system in anautomobile. A similar delay in response time couldresult in an increased breaking distance with obviouslynegative consequences. Timing and deadlines arecritical in CPS. Software for a CPS can execute withouterror and properly perform whatever computation itwas designed for and still fail if it can’t completethe task within the proper amount of time. Certainlanguage features, such as garbage collection, have thepotential to add an element of nondeterminism that cancomplicate the task of developing a system capableof achieving necessary real-time constraints. Real-time requirements are an important distinction whendefining program correctness in the domain of CPS.

2.4 Data representation

Data representation relates to the manner in which aprogram organizes and manipulates in-memory datastructures. For many systems, managing the detailednuances of how memory is allocated and the specificplacement of data is a burden that is best managed bythe run-time environment. CPS, on the other hand, carea great deal about these details.

A CPS routinely interfaces directly with hardwareor communicates with other devices via well-defined

protocols. To accomplish these tasks, a program musthave control over the specific layout of data structuresdown to individual bits. In addition to functionalnecessity, data representation has a tremendous impactto performance. The organization of a data structure canbe tuned to optimize data locality to take advantageof CPU cache memory or optimally “pack” fields tominimize memory requirements.

2.5 Constrained environment

CPS are known for operating in resource constrainedenvironments. Memory is typically less plentifuland CPU clock-speeds are often slower than otherhardware platforms. For some CPS, advances inhardware technology have enabled the use of fully-featured programming languages such as Java orSwift. However, many CPS still operate in highlyconstrained environments that do not allow the use ofsuch languages.

Clearly, not all CPS have limited 8-bit processors anda few kilobytes of RAM. Some are equipped with largeamounts of memory and powerful CPUs equivalentto those found in general-purpose computer but stilloperate within a constrained environment. Systemsof this nature are typically built for a specificpurpose. They have only enough computational abilityto adequately perform a defined function. Additionalhardware comes at the expense of additional cost,space, or power consumption. Adding more powerfulhardware for the sole purpose of enabling the use of afeature-rich language is often not viable.

Another limited resource is energy. For datacenters, high-performance clusters, or general-purposecomputers, consuming less power is sometimes agoal and can equate to financial and environmentalbenefits, but a constant power source is typicallyavailable. Power consumption presents a very differentchallenge when the energy source is a battery — acommon characteristic of mobile devices and manyCPS. These systems attempt to conserve powerwhenever possible, but may still have the opportunityto recharge. For a class of CPS, such as IMDs orremote sensor networks, recharging a battery is eitherdifficult or simply not possible; energy is a finiteand consumable resource. For these systems, effectivepower management is crucial.

Page 4: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

Paul Soulier et al.: A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development 133

2.6 Software updates

Software updates pose yet another challenge to CPSnot found in many other environments. For many CPS,the device may require specific tools and processes toupdate and may incur significant costs. A software flawin an automotive application may require thousandsof vehicles to be recalled at great expense to themanufacturer. Some systems can be difficult to update(for example, remotely located sensor networks) orthe task may simply not be possible (consider distantunmanned spacecraft).

Downtime is another component to software updatesthat can have a more significant impact when a CPSis involved. To perform an update, it is not unusualthat a system will be taken off-line to complete theprocess. For general-purpose computing, this can bea bit of a nuisance, but nothing more. For a CPS inan industrial control application or an IMD, down-timemay have a significantly larger impact.

3 Programming Languages and Cyber-Physical Systems

A programming language is the primary tool used byprogrammers to transform requirements and designsinto code a computer can execute. A language thatcan enable a programmer to effectively and efficientlydescribe a concept and detect errors early in thedevelopment process will result in more reliablesoftware. Boehm and Basili[7] have proposed that thecost of fixing a software bug increases with each phaseof development — a bug detected in the test phaseis more expensive than one found during the designprocess. A language that can assist a developer incorrectly realizing designs and detecting errors can havea significant impact to overall software quality. Thissection examines some of the most common languagespresently used for CPS development as well asimportant language characteristics.

3.1 Current state-of-the-art

There exists a large variety of programming languagesoffering support for different paradigms, specificdomains, dynamically or statically typed, etc. Evenwith numerous languages, when put in context with thecharacteristics described in Section 2, there are onlya handful that are suitable for CPS development. Thefollowing is a list of the most common languages usedfor developing CPS.

� C — The C language[8] is general purposeprogramming language that is statically-typed,type-unsafe, and memory-unsafe. It is, by anextremely large margin, the most commonlanguage used to develop CPS. C is a powerfullanguage that can be used for virtually anyprogramming task.

� C++ — As the successor to C, C++[9] is a supersetof the C language that adds language constructsfor object-oriented programming and various otherlanguage features. Like its predecessor, C++ isstatically typed and is neither type or memory safe.

� Assembly — Assembly language is still used inCPS, often to access specific CPU instructionsthat are otherwise inaccessible in a high-levellanguage. Assembly is untyped and unsafe.

� D — D[10] is a dialect of C and C++ thatattempts to address various shortcomings of thoselanguages. D is a statically typed language andtype-safe language.

� Ada — The Ada programming language[11] wasoriginally developed for the U.S. Department ofDefense for high-reliability systems. It is a type-safe and statically typed language. The use ofAda is commonly found in military applications,avionics, and industrial systems that require a highdegree of reliability.

3.2 Expression

A language’s expressive ability relates to how wellit allows a programmer to express relevant conceptsnecessary to implement an application. Expressivepower also differs from one field to the next. Forexample, Javascript is better suited to developinga web application than assembly. Conversely, fora programmer that needs to utilize specific CPUinstructions, assembly is far more expressive thanPython. Languages well-suited for developing CPSwill allow a programmer greater control over howdata is represented and managed, how to control datarepresentation, where data is located, and so forth.

Another valued trait of languages used for developingCPS applications is transparency of expression. Theterm relates to the ability for a programmer to readsource code and generate a reasonably accurate mentalmodel of the structure of assembly code producedby the compiler. This characteristic is important forprogrammers to tune performance, understand theruntime costs associated with code, as well as managing

Page 5: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

134 Tsinghua Science and Technology, April 2015, 20(2): 130-141

code space for resource constrained CPS.

3.3 Type system

A type in a programming language is a form ofspecification that defines various characteristics ofthe constructs within a language. A type system isthe mechanism used to enforce that all specificationsdefined by the types in a language are adhered to. Theprimary role of a type system is to help promoteprogram correctness and reduce bugs. This sectiondescribes the basic properties of a type system aswell as addressing some issues that deserve specialconsideration in a language designed for CPS.

Memory and type safety are critical componentsof the type system. Ideally, the type system shouldreject any code that can undermine the underlyingassumptions and rules of the language. By enforcingthe rules of a language, a program can guarantee theabsence of certain types of programming errors. Typesafety ensures that an object created in memory canonly be referenced as the type it was created as. Memorysafety protects the system from erroneously accessingmemory (e.g., enforcing array boundaries).

Another aspect of the type system is the time at whichthe rules of the language are enforced. Dynamic typesystems offer flexibility and relieves the programmerfrom a degree of additional specification within aprogram by automatically checking and enforcingtype correctness at runtime. Conversely, static typesystems attempt to enforce the type rules at compiletime. Dynamic type systems are undesirable inembedded systems where latent type errors are detectedat runtime and are often unrecoverable resultingin program failure. Static type systems allow typecorrectness to be verified earlier in the developmentprocess. While potentially requiring more effort onbehalf of the programmer to properly define the typespecifications in the system, this often results insystems with fewer runtime bugs. Due to the nature ofembedded systems, namely the difficulty of updatingsoftware and the implications of software failures, it ismore important to identify errors early. Consequently,languages for CPS are generally statically typed.

4 Survey of Language-Based Approaches toCPS Development

This section presents a survey of language-basedresearch with the goal of improving overall softwarequality and programmer productivity. The vast majority

of the works found have focused on amending Cthrough language extensions or syntactically similardialects. The primary areas of research found addressedthe following general topics: type and memory safety,concurrency, and memory management. Figure 1provides an overview of the areas surveyed and theassociated works.

4.1 Languages

There is a plentiful and varied selection of programminglanguages available for virtually every applicationdomain. Language theory has continued to provide newtype systems and abstractions to make programmingmore efficient and reliable. While not every languagecreated gains widespread usage, most applicationdomains periodically adopt new languages to reapthe benefits of current technology. As mentionedpreviously, CPS are somewhat of an exception tothis. Only a few research-based languages have beendeveloped to address program safety and low-levelprogramming in the context of CPS.

Cyclone[12], a dialect of C, addresses many ofthe shortcomings of C while maintaining manyof the programming idioms commonly used in Cprogramming. Cyclone, unlike C, provides type andmemory safety through the use of additional pointertype specifications and annotations. A region-basedmemory management scheme is employed for memorymanagement and guarantees all memory access is safeand unused memory is released. The language attemptsto retain the expressive power and performance foundin C that is necessary for low-level programmingwhile simultaneously providing language-based safetyfeatures.

The nesC language[13], also a C dialect, has beenspecifically designed for Wireless Sensor Networks(WSN) and resource constrained platforms. Thelanguage has been designed to complement theTinyOS operating system — a commonly usedOS for embedded systems. The nesC languageis still type and memory unsafe, but has addedvarious features to enable a more structured approachto software development. The language providessyntax and semantics that allow programs to bedefined with components. Components contain internalimplementations and external interfaces for interactingwith other components. Additional safety is providedthrough static program analysis and can detect somerun-time errors such as data races.

Page 6: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

Paul Soulier et al.: A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development 135

Fig. 1 Overview of language-oriented research for developing cyber-physical systems.

4.2 Type and memory safety

Type and memory safety are critical componentsof a programming language that helps ensurecorrectness. Although type and memory safe languagesare plentiful, few are suitable for CPS. BecauseC is the predominant language used for CPS, asignificant amount of work has focused on amendingthe shortcomings of the C/C++ type system eitherthrough language transformations, extensions, or newdialects.

Listing 1 is a trivial memory copy example thatillustrates some of the type and memory safety issuesthat arise in a typical C program. In this example,

Listing 1 Memory safety code.

the C compiler has no method to verify the sourceand destination memory locations are compatible withthe range specified by the caller where an incorrectsize may result in memory corruption or programfault. Furthermore, this routine uses “void” pointers toavoid the need for a duplicate function to be created

Page 7: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

136 Tsinghua Science and Technology, April 2015, 20(2): 130-141

for every combination of possible types. This, however,prevents the compiler from checking if the source anddestination are compatible types. The works presentedin this section attempt to resolve these type issues.

Necula et al.[13] developed the CCured type systemfor C to enhance memory safety of pointer operationsthrough the use of annotations. The type systemadds pointer type qualifiers that facilitate programmingidioms common to C while enhancing the safety ofthe language. These aid in the compiler’s ability tostatically verify many uses of pointers at compiletime. For instances that cannot be checked statically,runtime checks are added to the code. The underlyingrepresentation of pointers is determined by the compilerand may vary in size. This presents challenges wheninterfacing with C libraries built with a standardcompiler. In addition, use of garbage collectionpotentially limits the use of CCured in certain CPSapplications.

Deputy, by Condit et al.[14], provides an extension tothe C language in the form of dependent types. Usingannotations in C code, the programmer specifiesconstraints, such as ranges and boundaries, for varioustypes. This enables the compiler to ensure programcorrectness by performing static compile time analysisand inserting runtime checks where necessary. By usingthis metadata, Deputy is able to avoid changing programdata representation.

Cyclone[12] is a dialect of C that enhances the typesystem to avoid memory and type errors common in Ccode. By using additional syntax and type inference,Cyclone is capable of performing static analysis andinserting runtime checks when necessary to ensurememory violations do not occur. The language usestype inference and parametric polymorphism to providea type-safe alternative to the idiomatic use “void”shown in Listing 1. Cyclone was developed with theexplicit intent to preserve the expressive power of C indeveloping low-level software.

4.3 Concurrency

Cyber-physical systems routinely interact with physicalprocesses that occur in a non-deterministic fashion. Asa result, CPS must manage a number of asynchronousevents and use either thread or event-based mechanismsto accomplish this. While some debate exists[30–32]

as to the better method, CPS have traditionally usedevents-driven mechanisms when resource constraintsare a concern. This is primarily due to the fact

that, in practice, thread-based implementations havesubstantially higher operating overheads in terms ofcode and data requirements.

Events are an efficient mechanism. Theydo, however, place additional burdens on theprogrammer. Operations that span multiple eventsrequire the programmer to manually manage statetransitions and data. For processes that involve a largenumber of states, event-based mechanisms can alsobecome excessively complicated. The pseudo-code inListing 2 illustrates a simple event-drive process thatreceives a long data stream from a wireless radio insmaller, 64-byte blocks. The code has the followingproperties:

� The code implements two states: The first waitsfor a buffer to become available. Once available,the buffer is acquired and then transitions to thenext state. The second state repeats until all datahas been received.

� State data must be explicitly managed. Theprogrammer is required to manage where theinformation is stored as well as updating it.

� State transitions are also explicitly managed in theform of function pointer call-backs.

� The use of common language constructs, suchas loops, is not possible when asynchronousevents are present. In this example, loops mustbe translated manually into state transitions usingfunction pointers and call-backs.

� Reusing code requires the integration of one statemachine into another.

Threads offer, from a programming perspective, asimplified way of managing asynchronous events. The

Listing 2 Event-based code.

Page 8: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

Paul Soulier et al.: A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development 137

pseudo-code in Listing 3 implements the samefunctionality as Listing 2. By most standards, thethread-based code is intuitively obvious and needs littleexplanation beyond the code itself. The thread-basedimplementation contrasts the event-driven mechanismin several important ways:

� Common language constructs, specifically loops,are usable.

� Code reuse is simplified and amounts to a simplefunction call.

� All states are implicitly managed; the programmeris not required to manually save state betweenasynchronous operations and memory associatedwith state is automatically allocated and released.

Clearly, thread-based mechanisms appear to simplifyprogramming. Code need not be broken into separateroutines for each state, and loops are usable, reusing thecode amounts to a simple function call, etc. However,threads are not without drawbacks. With traditionalthread implementations, there is a significant costboth in memory and runtime execution overhead. Notsurprisingly, the general trend of research seeks toprovide thread-based semantics while reducing thetypical overhead associated with traditional threadingimplementations. The majority of the research tendsto be centered on sensor networks; this is notunexpected due to the resource constraints encounteredin such systems. Although the focus may be onsensor networks, the work is equally applicable to anyembedded or cyber-physical system, in essence.

One of the primary issues that arise from event-based implementations is complexity. Complex systemsoften have numerous distinct events associated witha single action. Event-based methods are frequentlyused due to their efficiency. Within the context ofevent-driven programming, several approaches havebeen taken to minimize the limitations associatedcomplexity. The nesC language, developed by Gayet al.[15], is an extension to the C language

Listing 3 Thread-based code.

designed specifically for the highly constrainedenvironment found in sensor network applications. Inconjunction with TinyOS[33], the language providesa structured approach to event handling to enhancedeveloper productivity. One drawback of nesC is thefocus on resource constrained systems. The compilerutilizes whole program compilation to enable effectiveoptimization of type checking; as such it is not well-suited for large-scale projects.

Kasten and Romer[17] identified the static natureevent-driven software and management of stateinformation as two limitations of event-basedprogramming. They proposed a language that utilizesfinite state machines to enable more flexibility in theconstruction of software that handles asynchronousevents by improving modularity and reducing overallcomplexity. State data is managed with state variablesthat behave as a traditional local variable, but automaticmemory management is provided by the language. Thisenables efficient sharing of data between states.

Bernauer et al.[16, 34] seeked to combine the mostfavorable characteristics of event- and thread-basedparadigms by extending the nesC language to allowa programmer write code using the semantics ofthreads. The compiler then transforms this code intoequivalent event-based code. The compiler staticallyallocates memory to store local variables used tomaintain state information. Due to the static natureof memory allocation, recursive function calls arenot possible and this language assumes a cooperativemultitasking model.

Protothreads (Dunkles et al.[19, 20]) provides amechanism that permits a programming stylesimilar to that of the sequential method used withthreads. Protothreads are implemented using onlystandard C language constructs and are designed tobe extremely low overhead and used in conjunctionwith an event-driven system. Through the use ofC macros, this system interleaves code within a C“switch” statement. All threads of execution share thesame stack. This has the advantage of not requiring aunique stack for each distinct thread, but requires theprogrammer to manually manage state when a blockingoperation is performed. As a result of using a standardC compiler, the rules associated with Protothreads arenot enforced by the compiler and the burden of adheringto these rules is incumbent on the programmer.

Many CPS do not use true parallelism. It is oftenunnecessary or the hardware is uniprocessor. For

Page 9: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

138 Tsinghua Science and Technology, April 2015, 20(2): 130-141

such systems, the need for costly synchronizationmechanisms can be avoided by using cooperative multi-threading. To avoid the additional overhead typicallyrequired by threads, various approaches have beendevised for stack sharing[21, 22, 24, 35]. These techniquesprovide the behavior expected from threads withoutthe need for manual state management while reducingmemory overhead. This comes at the cost of reducedruntime performance that results stack swapping andother operating overhead.

The works discussed thus far have focused onsystems where parallel execution is not used or doesnot have synchronization concerns between parallelexecuting threads. However, multi-core hardware isbecoming more common place. Cyclone[12] providesmany desirable traits for programming CPS butdoes specifically address concurrency. The work byGrossman[25] proposes an approach to concurrency inCyclone that reamins type-safe and provides race-freeaccess to shared data.

4.4 Memory management

Managing memory allocation has always poseda challenge to programmers. For modern high-level languages, the need for manual memorymanagement has largely been obviated by the useof garbage collecting systems. For CPS, however,manual memory management is still necessary in manycircumstances. Garbage collectors impose significantruntime overhead and non-deterministic timing effectsthat are often unacceptable. In practice, manual memorymanagement is sometimes unavoidable in CPS. Thissection examines some alternatives that attempt tocombine efficient automatic memory managementwhile maintaining a sufficient level of runtimeperformance.

Originally proposed by Tofte et al.[36, 37], region-based memory management provides a compellingmechanism for memory management in CPS. In region-based memory management, each object or structureis allocated in a specific region. The region may bedefined automatically by the compiler or manually bythe programmer. In either case, the memory associatedto a region is not released until all objects allocatedto the region have been freed. In essence, region-based memory techniques attempt to minimize thecost associated to automatic memory managementover a collection of related objects. The language canstatically check that programs are correct at compile

time while the compiler inserts code to manage dynamicmanagement at runtime. In addition to avoidingcommon pitfalls of memory management, related datacan be co-located to produce good locality that can leadto better cache and overall system performance.

Gay and Aiken[27] described region-based memorymanagement for dynamic memory. Their approachoffers both explicit freeing of regions as well asreference counted regions and is dynamically checkedat runtime. Grossman et al.[26] detailed region-basedmemory management used in Cyclone[12]. Their systemuses additional annotations in code to allow compile-time, static checking of memory regions. The systemused in Cyclone also applies regions to stack-allocatedmemory to prevent invalid references from occurring. InC, it is possible to bind an external reference to a localvariable. When the function which the local variablewas declared in goes out of scope, the memory isreleased and any reference to that data is no longervalid. The type system in Cyclone prevents this throughthe use of regions.

Linear types are another interesting method ofpotential memory management in CPS. Withlinear types, an object can be referenced by onlya single entity. Once that reference ceases toexist, there can be no other references and theobject can be released. Linear types require littleruntime overhead making them ideally suited forCPS. Although linear types provide guaranteed memorymanagement, they come at the expense of sharing datathrough aliases. Walker and Watkins[28] examinedcombining linear type and region-based memorymanagement. Event-driven systems, common in CPS,often communicate through messages. Fahndrich etal.[29] discussed efficient and safe message-basedcommunication using linear types.

5 Open Challenges

Many issues pertaining to language-based techniquesfor improving the software quality of CPS have well-understood solutions. Others are still open challengesthat have yet to be addressed. Furthermore, of the issuesthat have been addressed, there are no languages thatincorporate all of the potential techniques. This sectionreviews important areas of research in language-basedapproaches to improving software quality in CPS thatdo not have adequate solutions.

Page 10: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

Paul Soulier et al.: A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development 139

5.1 Combining safety, expression, andperformance

Software designs often require trade-offs to achievespecific goals. Additional memory may be needed toobtain performance requirements or a useful abstractionthat makes a task easier degrade performance. Inlanguage design, similar issues exist. Abstractions canreduce performance or limit expressiveness. Safetyenforced through run-time checking can degradeperformance. It remains to be seen if a language can bedesigned such that it simultaneously offers acceptablelevels of safety, expressiveness, and performance.

5.2 Unsafe code

In various circumstances, the rules of a type systeminterfere with the ability to accomplish a task. Memorymanagement or access to a raw address that containsa memory-mapped register are common examples inCPS. For general-purpose computing, the need for suchfacilities is rare. Using an unsafe secondary language orless efficient mechanisms for isolated portions of codeis a reasonable solution. These situations arise morefrequently in CPS necessitating the need for a morecomprehensive solution.

Most languages well-suited for CPS providethe ability to subvert the type system in somemanner. Allowing such operations opens the doorto various safety and security issues. An adequatesolution that provides raw memory access while stillproviding strong guarantees regarding the integrity ofthe type system is not present in any language.

5.3 Timing semantics

As noted by Lee[38], systems with timing deadlinesmay execute code correctly but still fail to functionas designed by missing a timing constraint. Currently,timing is verified through testing, simulation, or othermechanisms. To date, languages have no mechanism tospecify timing requirements in code. With numeroushardware platforms, each with unique timing andperformance characteristics, specifying timingrequirements in code is difficult.

5.4 Concurrency

This survey has shown works that provide highlyefficient concurrency mechanisms and the Cyclonelanguage provides compiler support for type-safe,preemptive systems using a more traditional “heavy-weight” thread model. However, there does not

exist a system that simultaneously addresses bothof these aspects of parallel programming. Multi-core hardware is now common and developing errorfree software that exploits this potential parallelismis difficult. Thread-based systems with semaphoresor other synchronization primitives still have a highdegree of overhead while event-based systems face asignificant increase in code complexity. In the domainof cyber-physical and embedded systems, no adequatesolution exists.

5.5 Acceptance

Possibly one of the most significant issues in designinga new language is achieving even a moderate degreeof acceptance from the programming community.From an organizational perspective, adopting a newprogramming language is difficult for a variety ofreasons.

� Cost — Selecting a new language can havesignificant cost overhead, especially to smallerorganizations, in both time and financial resources.

� Standardization — CPS are developed for awide variety of hardware platforms that differin architecture (RISC vs. CISC, 8-bit vs. 32-bit processors, etc.). A project may use differentplatforms from one generation to the next. Thismay also require a change in vendors that supplythe compiler tool chain. Without standardization,switching from one vendor to another may resultin costly porting efforts due to incompatibilities incompiler implementation.

� Existing Code Base — For any organization thathas a substantial code base, using a new languagecan pose difficult logistical issues. Softwareengineers must be familiar with multiple languagesor must be involved in non-trivial porting efforts.

� Inertia — Learning a new language takesconsiderable effort for both organizations andindividual software developers. It is often easierto simply continue to use existing tools despiteknown flaws.

6 Conclusions

Cyber-physical systems exist in the Internet of Things,implantable medical devices, smart appliances, anda multitude of other technologies. Advances incomputing technology and other fields that rely oncomputers will likely continue to fuel the growth ofCPS. The ability to develop such systems with quality,

Page 11: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

140 Tsinghua Science and Technology, April 2015, 20(2): 130-141

safety, and security is clearly important. To the best ofour knowledge, this is the first survey of language-basedtechniques for improving software designed for CPS.

As with many application domains, CPS possescharacteristics that make domain-specific languages anecessity. In the first part of this paper, backgroundwas provided to illustrate the unique programmingchallenges often encountered in CPS. Elements ofCPS that differentiate them from other applicationdomains and the associated requirements imposedon programming languages used to build them wereelaborated on. The second part of the paper presenteda survey of the language-based techniques aimed atimproving program correctness for CPS in addition toopen challenges pertaining to the practical adoption anduse of these techniques.

The C language, although powerful, is inherentlyunsafe and lacks many of the features found in modernprogramming languages. Despite these limitations andthe availability of languages with better safety, C is stillthe most widely used language for building embeddedand cyber-physical systems. Given the potential useand impact of these systems, there is a necessity todeveloping safe, reliable, and secure software. It isinteresting to note that research in the areas discussedin this paper has dwindled in recent years; one can onlyspeculate as to the reasons for this. As CPS becomeincreasingly complicated and pervasive in society, newlanguages and language-based methodologies will becrucial to ensuring these systems function as expected.

References

[1] J. Lions, Report by the inquiry board on the ariane 5 flight501 failure, Joint Communication ESA-CNES, 1996.

[2] E. Marshall, Fatal error: How patriot overlooked a scud,Science, vol. 255, no. 5050, pp. 1347–1347, 1992.

[3] N. G. Leveson and C. S. Turner, An investigation of thetherac-25 accidents, Computer, vol. 26, no. 7, pp. 18–41,1993.

[4] C. Bolkcom, V-22 osprey tilt-rotor aircraft, DTICDocument, 2004.

[5] R. Langner, Stuxnet: Dissecting a cyberwarfare weapon,Security & Privacy, IEEE, vol. 9, no. 3, pp. 49–51, 2011.

[6] D. Halperin, T. S. Heydt-Benjamin, B. Ransford,S. S. Clark, B. Defend, W. Morgan, K. Fu, T. Kohno,and W. H. Maisel, Pacemakers and implantable cardiacdefibrillators: Software radio attacks and zero-powerdefenses, in Security and Privacy, 2008. SP 2008. IEEESymposium on, 2008, pp. 129–142.

[7] B. Boehm and V. R. Basili, Software defect reduction top10 list, Computer, vol. 34, no. 1, pp. 135–137, 2005.

[8] International Organization for Standardization,Programming languages c, Geneva, Switzerland,ISO 9899:TC2, 1999.

[9] International Organization for Standardization,Programming language c++, Geneva, Switzerland,ISO 14882:2011, 2011.

[10] A. Alexandrescu, The D ProgrammingLanguage. Addison-Wesley Professional, 2010.

[11] S. T. Taft, Ada 2005 Reference Manual. Languageand Standard Libraries: International Standard ISO/IEC8652/1995 (E) with Technical Corrigendum 1 andAmendment 1. Springer, 2006, vol. 4348.

[12] T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks,J. Cheney, and Y. Wang, Cyclone: A safe dialect of c,in USENIX Annual Technical Conference, General Track,2002, pp. 275–288.

[13] G. C. Necula, J. Condit, M. Harren, S. McPeak, andW. Weimer, Ccured: Type-safe retrofitting of legacysoftware, ACM Transactions on Programming Languagesand Systems (TOPLAS), vol. 27, no. 3, pp. 477–526, 2005.

[14] J. Condit, M. Harren, Z. Anderson, D. Gay, andG. C. Necula, Dependent types for low-level programming,in Programming Languages and Systems. Springer, 2007,pp. 520–535.

[15] D. Gay, P. Levis, R. Von Behren, M. Welsh, E. Brewer,and D. Culler, The nesc language: A holistic approachto networked embedded systems, ACM Sigplan Notices,vol. 38, no. 5, pp. 1–11, 2003.

[16] A. Bernauer, K. Romer, S. Santini, and J. Ma,Threads2events: An automatic code generation approach,in Proceedings of the 6th Workshop on Hot Topics inEmbedded Networked Sensors, ACM, 2010, p. 8.

[17] O. Kasten and K. Romer, Beyond event handlers:Programming wireless sensors with attributed statemachines, in Proceedings of the 4th InternationalSymposium on Information Processing in SensorNetworks, 2005, p. 7.

[18] A. Adya, J. Howell, M. Theimer, W. J. Bolosky, andJ. R. Douceur, Cooperative task management withoutmanual stack management, in USENIX Annual TechnicalConference, General Track, 2002, pp. 289–302.

[19] A. Dunkels, O. Schmidt, T. Voigt, and M. Ali,Protothreads: Simplifying event-driven programming ofmemory-constrained embedded systems, in Proceedings ofthe 4th International Conference on Embedded NetworkedSensor Systems, 2006, pp. 29–42.

[20] A. Dunkels, O. Schmidt, and T. Voigt, Using protothreadsfor sensor node programming, in Proceedings of theREALWSN, 2005.

[21] S. Rossetto and N. d. L. R. Rodriguez, A cooperativemultitasking model for networked sensors. in ICDCSWorkshops, Citeseer, 2006, p. 91.

[22] W. P. McCartney and N. Sridhar, Stackless preemptivemultithreading for tinyos, in Distributed Computingin Sensor Systems and Workshops (DCOSS), 2011International Conference on, 2011, pp. 1–8.

Page 12: A Survey of Language-Based Approaches to Cyber-Physical ...depengli/Publication/Paul-TST.pdf · Data representation relates to the manner in which a program organizes and manipulates

Paul Soulier et al.: A Survey of Language-Based Approaches to Cyber-Physical and Embedded System Development 141

[23] J. Sallai, M. Maroti, and A. Ledeczi, A concurrencyabstraction for reliable sensor network applications, inReliable Systems on Unreliable Networked Platforms.Springer, 2007, pp. 143–160.

[24] C. Nitta, R. Pandey, and Y. Ramin, Y-threads: Supportingconcurrency in wireless sensor networks, in DistributedComputing in Sensor Systems. Springer, 2006, pp. 169–184.

[25] D. Grossman, Type-safe multithreading in cyclone, ACMSigplan Notices, vol. 38, no. 3, pp. 13–25, 2003.

[26] D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, andJ. Cheney, Region-based memory management in cyclone,ACM Sigplan Notices, vol. 37, no. 5, pp. 282–293, 2002.

[27] D. Gay and A. Aiken, Language support for regions, ACMSigplan Notices, vol. 36, no. 5, pp. 70–80, 2001.

[28] D. Walker and K. Watkins, On regions and linear types,ACM Sigplan Notices, vol. 36, no. 10, pp. 181–192, 2001.

[29] M. Fahndrich, M. Aiken, C. Hawblitzel, O. Hodson,G. Hunt, J. R. Larus, and S. Levi, Language support for fastand reliable message-based communication in singularityos, ACM SIGOPS Operating Systems Review, vol. 40,no. 4, pp. 177–190, 2006.

[30] H.-J. Boehm, Threads cannot be implemented as a library,ACM Sigplan Notices, vol. 40, no. 6, pp. 261–268, 2005.

[31] J. Ousterhout, Why threads are a bad idea (for mostpurposes), presentation at the 1996 Usenix AnnualTechnical Conference, San Diego, CA, USA, 1996.

[32] J. R. von Behren, J. Condit, and E. A. Brewer, Why eventsare a bad idea (for high-concurrency servers), in HotQS,2003, pp. 19–24.

[33] P. Levis, S. Madden, J. Polastre, R. Szewczyk,K. Whitehouse, A. Woo, D. Gay, J. Hill, M. Welsh,E. Brewer, et al., Tinyos: An operating system forsensor networks, in Ambient Intelligence. Springer, 2005,pp. 115–148.

[34] A. Bernauer and K. Romer, A comprehensive compiler-assisted thread abstraction for resource-constrainedsystems, in Information Processing in Sensor Networks(IPSN), 2013 ACM/IEEE International Conferenceon. IEEE, 2013, pp. 167–177.

[35] B. Gu, Y. Kim, J. Heo, and Y. Cho, Shared-stackcooperative threads, in Proceedings of the 2007 ACMSymposium on Applied Computing, 2007, pp. 1181–1186.

[36] M. Tofte and J.-P. Talpin, Region-based memorymanagement, Information and Computation, vol. 132,no. 2, pp. 109–176, 1997.

[37] M. Tofte and L. Birkedal, A region inference algorithm,ACM Transactions on Programming Languages andSystems (TOPLAS), vol. 20, no. 4, pp. 724–767, 1998.

[38] E. A. Lee, Cyber physical systems: Design challenges,in Object Oriented Real-Time Distributed Computing(ISORC), 2008 11th IEEE International Symposiumon. IEEE, 2008, pp. 363–369.

Paul Soulier received his BS degree inelectrical engineering from the Universityof Colorado, Colorado Springs in 1999. Hehas worked in the private sector for15 years developing real-time, embeddedsystems. Currently, he is a graduate studentat the University of Hawaii, Manoa,specializing in security, cyber-physical

systems, and programming languages.

Depeng Li obtained his PhD degreein computer science from DalhousieUniversity, Canada in 2010. He iscurrently an assistant professor inDepartment of Information and ComputerSciences (ICS) at University of Hawaiiat Manoa (UHM). His research interestsare in security, privacy, and applied

cryptography. His research projects span across areas suchas Internet of Things, smart grids, mobile Health-tech andphysical-human-cyber triad.

John R. Williams (PhD, SwanseaUniversity, UK) is a professor ofinformation engineering, civil andenvironmental engineering, andengineering systems at MassachusettsInstitute of Technology, and he is alsothe director of Auto-ID Laboratory atMIT. His research area of specialty is large

scale computer analysis applied to both physical systems and toinformation. He has been named, alongside Bill Gates and LarryEllison, as one of the 50 most powerful people in ComputerNetworks.