ieee transactions on robotics, vol. 21, no. 3, june 2005 ... collection/tro/2005/june/14.pdf ·...

IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 3, JUNE 2005 423

How UGVs Physically Fail in the FieldJennifer Carlson, Student Member, IEEE, and Robin R. Murphy, Member, IEEE

Abstract—This paper presents a detailed look at how unmannedground vehicles (UGVs) fail in the field using information from10 studies and 15 different models in Urban Search and Rescueor military field applications. One explores failures encounteredin a limited amount of time in a real crisis (World Trade Centerrescue response). Another covers regular use of 13 robots over twoyears. The remaining eight studies are field tests of robots per-formed by the Test and Evaluation Coordination Office at FortLeonard Wood. A novel taxonomy of UGV failures is presentedwhich categorizes failures based on the cause (physical or human),its impact, and its repairability. Important statistics are derivedand illustrative examples of physical failures are examined usingthis taxonomy. Reliability in field environments is low, between 6and 20 hours mean time between failures. For example, duringthe PANTHER study (F. Cook, 1997) 35 failures occurred in 32days. The primary cause varies: one study showed 50% of fail-ures caused by effectors; another study showed 54% of failuresoccurred in the control system. Common causes are: unstable con-trol systems, platforms designed for a narrow range of conditions,limited wireless communication range, and insufficient bandwidthfor video-based feedback.

Index Terms—Failure, failure analysis, field, meta-study, mobilerobot, taxonomy, unmanned ground vehicle (UGV).

NOMENCLATURE

The following list of definitions covers the terminology usedin this paper and is provided as a reference.Availability Probability that a system will be

error free at some given point intime. See (1).

Availability (1)

Control system A robot subsystem that includes theonboard computer, manufacturerprovided software, and any remoteoperator control units (OCU).

Effector Any device that performs actuationand any connections related to thosecomponents.

Error A state within the system which canlead to a failure.

Failure The inability of the robot or theequipment used with the robot tofunction normally.

Manuscript received October 23, 2003; revised June 8, 2004. This paperwas recommended for publication by Associate Editor N. Sarkar and EditorS. Hutchinson upon evaluation of the reviewers’ comments. This work wassupported in part by a Department of Energy RIMCC grant and in part by theOffice of Naval Research under Grant N00773-99P1543.

The authors are with the Computer Science and Engineering, Universityof South Florida, Tampa, FL 33620 USA (e-mail: [email protected];[email protected]).

Digital Object Identifier 10.1109/TRO.2004.838027

Fault Anything which could cause thesystem to enter an error state.

Field environment An environment which has not beenmodified to ensure the safety of therobot or to enhance its performance.

Field-repairablefailure

A failure that can be repaired underfavorable environmental conditions,by a trained operator, with the equip-ment that commonly accompaniesthe robot into the field.

Mistakes Human failures caused by fallacies inconscious processing.

MTBF Mean time between failures

MTTR Mean time to repair

Nonfield-repairablefailure

A failure that is not field-repairable.

Nonterminal failure A failure that introduces some no-ticeable degradation of the robot’scapability to perform its mission.

Slips Human failures caused by fallacies inunconscious processing.

Teleoperated Manually controlled by an operatorat a distance that is too great forthe operator to see what the robot isdoing [6].

Terminal failure A failure that terminates the robot’scurrent mission.

UGV Unmanned ground vehicle. Aground-based mechanical devicethat can sense and interact with itsenvironment.

I. INTRODUCTION

THE SUBSTANTIAL risk to human life associated withUrban Search and Rescue (USAR) and modern Military

Operations in Urban Terrain (MOUT) have led professionalsin each of these domains to explore the use of advanced tech-nology, like robotics, to improve safety. Unmanned ground ve-hicles (UGVs) are appealing for USAR and the military becausethey can be sent either ahead of (or in place of) humans, dothings humans cannot, and can send back a wide variety of datafrom their on-board sensors. Though a few UGVs have beenused in real USAR (e.g., the World Trade Center (WTC) rescueresponse) and military operations (such as cave reconnaissance

1552-3098/$20.00 © 2005 IEEE

424 IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 3, JUNE 2005

in Afghanistan [2]), they are far from commonplace. The finalacceptance of UGV technology in these new application areaswill depend as much on their reliability, as on the capabilities ofthe robot platform (such as, the ability to detect chemical or bio-logical agents). A fragile robot in constant need of maintenanceand repair is likely to be left behind to make room for more re-liable equipment.

Recent studies on UGVs used in the field have shown a no-ticeable lack of reliability in real field conditions. In [3] theMTBF for field robots was a little over 6 hours. The analysisin [4] showed that tethered robots required assistance throughthe tether an average of 2.8 times/min. The Test and EvaluationCoordination Office (TECO) at Fort Leonard Wood has reporteda MTBF less than 20 hours [5].

This paper presents a detailed look at why UGV reliabilityis so low in the field. It serves as a meta-study taken from tenstudies of ground robots used for either USAR or MOUT ap-plications. Two of the studies are focused explicitly on robotfailures and explore the problems encountered over a two weekperiod in a real crisis (WTC rescue response) and day to day useover a two year period of time. The other eight are field tests ofrobot platforms carried out by TECO.

The studies cover a wide variety of equipment, from smallin tracked vehicles capable of changing their

geometry, to a modified M1 tank (over 60 tons). All of theseplatforms can be considered to be UGVs. A UGV is defined asa ground-based mechanical device that can sense and interactwith its environment. It may possess any level of autonomy withrespect to its human operator(s), from manual (where the humanhas complete control) to fully autonomous (where the robot cancarry out assigned tasks on its own). The majority of the robotsconsidered in all ten studies are teleoperated, or manually con-trolled by an operator at a distance that is too great for the op-erator to see what the robot is doing [6].

Failure is a term that is given a wide range of definitions,based on the needs of the individual or group assessing theusefulness of technology for a new task. Reliability metricslike MTBF and availability are very sensitive to differencesin the criteria used to determine if an event can be called afailure. Therefore, it is difficult to directly compare resultsacross studies or to apply findings to new circumstances. Forthe purposes of this paper, a failure is defined as the inability ofthe robot or the equipment used with the robot to function nor-mally. Both complete breakdowns and noticeable degradationsare included. This general definition of failure is used due tothe wide variety of tasks and vehicles considered.

There are a variety of factors which make field work partic-ularly challenging for robots compared to lab conditions. Thefield environment is defined as an environment which has notbeen modified to ensure the safety of the robot or to enhanceits performance. Conditions which a robot may encounter in afield environment for USAR and MOUT include: dirt, standingwater, rain, intense heat, intense cold, confined spaces (seeFig. 1), uneven surfaces (see Fig. 2), the presence of obstacleswith unpredictable movement, and hostile agents. Field robotsused by the military and for USAR have to be packed andtransported to remote locations. This adds the requirement thatthe robot itself be designed to be safely packed.

Fig. 1. An Inuktun MicroVGTV inside a confined space training maze.

Fig. 2. An iRobot Packbot climbing a rubble pile used for USAR training.

The rest of this paper is organized as follows. Section IIprovides an overview of related work in robot reliability studies,evaluation of robots for field applications, and failure analysismethods. Section III describes a new method for UGV failureanalysis developed for this study. The method includes a noveltaxonomy of UGV failures (Section III-A) based on a synthesisof taxonomies from the fields of dependability computing,human-computer interaction, and robotics. This section alsoincludes a description of the metrics used in this meta-study,and recommendations for improvement for future studies (Sec-tion III-B). Section IV provides a more detailed descriptionof the studies from which this paper draws. Section V andprovides a closer look at failures encountered in the field usingthe taxonomy defined in Section III-A and examples from thestudies described in Section IV which highlight the challengesinvolved in using UGVs in field environments. Section VIprovides some general findings and discusses the implica-tions of those findings for robot manufacturers, robot-relatedproject managers, and other researchers working on developingfault-tolerance for UGVs. Section VI closes with a discussionof possibilities for future work in this area.

II. RELATED WORK

This section provides a brief overview of related work onrobot reliability, evaluation of robots used in field environments,and failure analysis methods. Section II-A covers robot relia-bility studies, and qualitative studies of mobile robot suitabilityfor the USAR domain. The studies covered in Section II-A are

CARLSON AND MURPHY: HOW UGVs PHYSICALLY FAIL IN THE FIELD 425

not included in the meta-study for one of two reasons: theystudy robots used in indoor, controlled environments; or theypresent shortcomings of mobile robots for USAR based on thecharacteristics of the domain, rather than the examination ofdocumented failures. Section II-B covers the two distinct ap-proaches to failure and reliability analysis found in the liter-ature: characterization and formal validation. Characterizationapproaches group or classify documented failures based on apre-defined set of categories, usually examining the set of fail-ures as a whole and each group individually. Formal validationapproaches create a model of the system to be studied and thenexamine that model to determine if it meets reliability criteria.

A. Reliability Studies and Qualitative Analyses

Two studies were found in the literature that, like this meta-study, examined robot failures from multiple sources. Bothcovered industrial manipulators used primarily in the automo-tive industry. Beauchamp and Stobbe [7] presented an overviewof human factors experimental studies on human-manipulatorsystems, including a review of documented human-robotsystem accidents from nine different studies. Starr, Wynne, andKennedy [8] surveyed failure and maintenance reports fromtwo large automotive plants. The reports covered 200 robotsrepresenting five manufacturers over a period of 21 weeks. Therobots were found to be down for repair or maintenance for3.95 h per week per manufacturing line in the first plant and1.74 hours in the second. Position failures (recorded eventswhere the robot was clearly off of its programmed path) werethe most common at 45%, followed by drive system (hardware)failures at 25%.

The Robots in Exhibitions workshop at the IEEE Interna-tional Conference on Intelligent Robots and Systems (IROS)2002 produced two studies which reported the reliability ofUGVs actively used for long periods of time. Nourbakhsh [9]describes a set of four autonomous robots used for a period offive years as full-time museum docents. Their robots reacheda MTBF of 72–216 hours. In [10], Tomatis et al. describe asimilar project and reported a MTBF of 7 hours—similar to the8.3-h MTBF found in the CRASAR study [3]. Note that in bothcases the failure analysis only covered a few identical or nearlyidentical robots used in indoor, semi-controlled environments(controlled with the exception of interaction with museumvisitors).

Three papers found in the literature have identified potentialweaknesses of UGVs for field applications but have not pro-vided quantitative failure data. In [2], Blitch provides a surveyof the mobility problems which are keeping current robot tech-nology from populating otherwise well-suited niches withinUSAR, especially the confined space access niche. He identifiestumble recovery, traction, and the (incorrect) assumption ofan obstacle-free working envelope as the key problems andpresents some suggestions for solutions. Casper, Micire, andMurphy [11] present an overview of the USAR domain, listingtasks that robots are best suited for, followed by a discussion ofthe constraints which that application domain places on robotictechnology. Sensors are identified as the area which requiresthe most improvement, though the lack of weather-proofing andthe need for invertible chassis (also identified by Blitch) were

also mentioned. In [12], Murphy et al. discuss the same issuesas Casper et al. [11] but provide some additional discussionon the need for adjustable autonomy, or the ability to changethe allocation of control between the robot and its operator, inUSAR scenarios.

B. Failure Analysis Approaches

Two classification schemes are particularly relevant to thiswork. The first is Laprie’s work in dependable computing. In1985, Laprie [13] and his colleagues developed a set of conceptsand definitions related to the dependability of computer-basedsystems. According to Laprie a fault is simply a cause, an error isa state, and a failure is an event. Specifically, failure is defined asa deviation from the specified service as seen by the client. Theclient may be a human user or another component of the com-puter system that is trying to use the service. An error is a statewithin the system which can lead to a failure. A fault is any-thing which could cause the system to enter an error state. La-prie defines two major fault classes, namely physical faults andhuman-made faults. Human-made faults are further subdividedinto design faults and interaction faults. The taxonomy definestwo levels for severity for failures. The consequences of benignfailures are comparable to the benefits of the service they arepreventing. Malign or catastrophic failures have a higher costby one or more orders of magnitude than the service.

Laprie’s dependability taxonomy is general enough that it canbe applied to a large variety of systems (see [14] for a sim-ilar specification used for computer-integrated manufacturing(CIM) systems) but in some aspects it is not suitable for ana-lyzing UGV reliability in the field. A UGV can suffer from aninfinite variety of physical faults. Laprie’s levels of severity arealso difficult to apply since the benefit of a service and the cost ofa failure tend to vary widely based on the situation, for examplea military training exercise versus a real engagement. Never-theless, the taxonomy used in this paper (see Fig. 3) is largelyrooted in Laprie’s. The definition for failure used in this paper(the inability of the robot or the equipment used with the robotto function normally) can be made equivalent to his definitionby interpreting “normal” as the behavior specified and agreedon by the service provider and the client. The definition of er-rors and faults can also be directly applied.

The second classification scheme comes from two UnitedStates military standards, [15] and [16]. These standards, whencombined, specify how general equipment failures are classifiedfor reliability analysis. (Unfortunately TECO’s studies makeno reference to these or any other military standard. Though theassumption could be made that their scheme was at least basedon these general standards, this is by no means certain.) Thesestandards define a failure as an event in which any item or partof an item does not perform as previously specified [15]. Afault is the immediate cause of the failure [15]. The standardsprovide the following designations used to describe a failure:catastrophic—causes item loss, critical—prevents the itemfrom serving its purpose on a given mission, dependent (causedby another failure) or independent, relevant or irrelevant (see[16] for conditions used to determine relevancy), single pointfailure—the failure of an item which would result in failure ofthe entire system. The type of process which caused the failure,


Fig. 3. The taxonomy of UGV failures used in this paper. Classes are shown with solid lines, attributes with dashed lines.

called failure mechanism, is classified as physical, chemical,electrical, thermal, or other.

Formal validation methods have been developed for a widevariety of applications relevant to mobile robotics: software, in-dustrial manipulators, CIM systems, and autonomous systemsin general. Some examples include the automata-based valida-tion approach for CIM systems presented by Kokkinaki andValavanis in [14] and probability-based approaches presentedby Sheldon et al. in [17] and Tchangani in [18]. Petri nets wereused by Ramaswamy and Valavanis in [19] and González et al.in [20] for discrete event dynamic (DED) systems, and multi-manipulator systems, respectively. A software validation systemdesigned for use with robot control software was developed bySimmons, Pecheur, and Srinivasan in [21].

Formal validation approaches are useful for analysis of asingle specified task which can be explicitly modeled. Forcharacterization approaches the generality of the categories isthe only limit to the kinds of failures that can be examined ina single study. This meta-study deals with mobile robots usedin a wide variety of environments for a wide variety of tasks,which is why this study employs a characterization approach.

III. TAXONOMY OF FAILURES AND METRICS

This section describes the approach taken in this paper to ex-amine mobile robot field failures. This approach was developediteratively as examples and results were gathered from the tenstudies that make up this meta-study. As a result, it is both a toolused to create the findings of this meta-study, and a product ofthose findings. A novel taxonomy of mobile robot failures willbe presented in Section III-A. Then, Section III-B will describethe metrics used in this meta-study, followed by a discussion ofimproved reliability metrics for future analyses of this kind.

A. Taxonomy

In order to gain insight on how and why UGVs fail, single in-stances of failures cannot be treated as independent and uniqueevents. Important common attributes must be identified and usedto categorize failures into well-defined groups. In order to pro-vide foundation for such insights and to bridge the gaps betweenthe robotics, human–robot interaction, and dependability com-puting communities, this section defines a classification or tax-onomy which can be meaningfully applied to any failure that

a UGV used in the field might encounter. The taxonomy (seeFig. 3) is expected to be sufficient for mobile robots in general.

Failures are categorized based on the source of failure (orwhat Laprie [13] would call the fault) and are divided intophysical and human categories, following dependability com-puting practice. Physical failures are further subdivided basedon common systems found in all UGV platforms, these beingeffector, sensor, control system, power, and communications,following Carlson and Murphy [3]. Effectors are defined as anydevices that perform actuation and any connections related tothose components. Examples would be the motors, grippers,and treads or wheels. The control system category includesthe on-board computer, manufacturer provided software, andany remote operator control units (OCU). For robots withouton-board computers the control system category only coversthe control unit.

Following Laprie’s categorization, human failures (caused byhuman-made faults) are subdivided into design and interaction.Design failures are caused by faults or defects introduced duringdesign, construction, or any post-production modifications to arobot. Interaction failures are caused by faults introduced by in-advertent violations of operating procedures. (Note that exper-imental platforms rarely come with operating procedures, andthat such procedures may not exist for new applications of ex-isting platforms. In these cases the presence of an interactionfailure is harder to discern, and is likely to require a judgmentcall on the part of the individual performing the failure analysis.)Drawing from Norman’s taxonomy [22] widely accepted in thehuman–computer interaction community, mistakes and slips areincluded under interaction. Mistakes are caused by fallacies inconscious processing, such as misunderstanding the situationand doing the wrong thing. Slips are caused by fallacies in un-conscious processing, where the operator attempted to do theright thing but was unsuccessful.

Each failure falls into one of these classes. Physical failuresalso have two attributes. The severity of the physical failureis evaluated based on its impact on the robot’s assigned taskor mission. A terminal robot failure is one that terminates therobot’s current mission, and a nonterminal failure is one that in-troduces some noticeable degradation of the robot’s capabilityto perform its mission. The repairability of the physical failure isdescribed as field-repairable and nonfield-repairable. A failure


is considered field-repairable if it can be repaired under favor-able environmental conditions, by a trained operator, with theequipment that commonly accompanies the robot into the field.For example, if a small robot transported in a single backpackencounters a failure the following conditions need to be met forthe failure to be deemed field-repairable: the tools required forthe repair are part of the robot’s standard support equipment(which all fits in a single backpack), and the skills required tocomplete the repair are part of standard operator training for thatrobot platform. Few repairs could be carried out in the worstpossible field conditions (e.g., a hurricane), therefore, favorableenvironmental conditions (conditions which will not prevent therepair) are assumed.

This taxonomy is used in Section V to classify and studyUGV failures in the field. For the CRASAR study [3] andTECO’s studies [23] the taxonomy was used to place eachfailure reported into a single physical class (or leaf in Fig. 3).The classification of the data from the WTC study was moredifficult. Aside from four single instance failures, the failuredata from the WTC study is presented in terms of the WTCstudy’s categories. These categories capture common failurestates (e.g., track slippage) rather than the attributes of thefailures which placed the robot in that state. The details neededto reclassify the most common failures themselves are missing.For this reason, the categories used in the WTC study mayappear in several classes within the taxonomy. Each appearancewas based on a different set of assumptions which held for asubset of the failures (of unknown size) that fell within that cat-egory. Human failures are beyond the scope of this meta-studybut will be covered in detail in future work.

B. Metrics

This section will begin by showing how the reliability metricsfor mobile robots used in the field were calculated from failureand usage data. These equations were originally and primarilyused to analyze the data in the CRASAR study [3]. They werealso used in this meta-study to summarize any data provided bythe WTC study [4], and TECO’s studies [23] (see Section IV).

All the formulas for reliability metrics were taken from theIEEE standards presented in [24]. The MTBF is calculated by(1). This metric provides a rough estimate of how long one canexpect to use a robot without encountering failures. Anothermetric presented in this paper is the failure rate, which is simplythe inverse of MTBF

Note that the MTBF metric requires records of active usagetime, to ensure that idle time is not included. The projected avail-ability of the robot is calculated by (5), where the mean time torepair, MTTR is defined by (6). Availability, also called relia-bility, should be interpreted as the probability that the robot willbe free of failures at a particular point in time

(5)

(6)

Average downtime is the average amount of time between theoccurrence of the failure and the completion of the repair thatfixed it [see (7)].

(7)

Note that availability uses the MTTR rather than the averagedowntime in the denominator. In this sense, availability is op-timistic in that it assumes that the repair for a given failure canbegin as soon as the failure occurs. It ignores factors such asthe time needed to acquire replacement parts, queuing of repairsfrom previous failures, and staff availability. By ignoring thesefactors (which vary over time and field location), availabilityprovides an optimistic but stable indicator of the impact of agiven set of failures.

The probability that a failure was caused by a componentfrom class (from the taxonomy, e.g., sensor, effector, slip, etc.)

is simply (8). Other values derived for this meta-study werealso calculated using standard formulas

(8)

The problem of determining suitable reliability metrics, oreven what data need to be gathered, is challenging for newdomains or application areas. TECO [25] and CRASAR [26],which provided the ten studies examined in this paper, havebegun the process of identifying the characteristics of normaluse (and, therefore, expectations for normal behavior) of robotsin the field domains of MOUT and USAR, respectively, buttheir work is still in the preliminary stages. In addition, therealm of possibility is always limited by the ability of aninfrastructure to support data collection. In the absence ofinformation about normal behavior and a suitable infrastruc-ture, less precise methods for data collection and metrics mustbe used. For example, the CRASAR study [3] was forced torely on the judgment of experienced robot operators to judgewhether or not a robot’s behavior was normal or indicated afailure, based on their knowledge of the robot platform. Then,the operator had to be trusted to report the incident accurately.

Therefore any reliability study of robots should determinethe scope of the planned analysis (one field exercise versus twoyears of usage, one robot versus as wide a variety of platformsas possible) and the level of infrastructure for data collectionthat can be maintained for that scope. For long-term efforts, itis important to develop standard procedures for data collectionwhich can be used in every task or mission and with every plat-form to be considered, and to follow them. (Automated systemsshould be used whenever possible to improve the frequency andaccuracy of data collection.) Within these constraints, as muchdata as can be gathered, should be, as a wide variety of factorscan influence mobile robot reliability. The minimum data andstandards needed to use this approach are listed as follows:

• Usage data: which robot(s), start time, duration, mission;• Failure data: which robot(s), time of discovery of failure,

which component failed, associated usage data;


TABLE IOVERVIEW OF FAILURE INFORMATION AVAILABLE FROM EACH STUDY

• Repair data: which robot(s), start time, duration, requiredtools, required skills;

• Standards: normal behavior for each robot model, stan-dard field support equipment, standard operator training.

Examples of additional data which should be considered forcollection are listed as follows.

• Environmental conditions: ambient temperature, hu-midity, terrain type, terrain evenness, terrain dryness, etc.;

• Task or mission attributes: operator experience levels, dis-tance traveled, duration of the task or mission, level of con-cern for robot safety, etc.;

• Robot condition: the state of each sensor (active/inactive,failed/working), average load on the motors, CPU loadaverage, core temperature, battery level, etc.

By grouping the usage, failure, and repair data by these at-tributes, the standard metrics given above (MTBF, failure rate,MTTR, downtime, and availability) for that attribute can be usedto compare the relative influence of field, mission, and robotcharacteristics. This was the basic method used in the CRASARstudy [3] and this meta-study. If the results are plotted over time(where the block of time is determined by the granularity of datacollection) then trends can be analyzed to determine additionalfactors to be tested in later experiments. If consistent methodsare used for collection, and sufficient volumes of data are avail-able, standard statistical tests can also be used to find correla-tions between field, mission, and robot characteristics and thereliability metrics.

IV. SOURCE STUDIES

This paper studies findings from two independent sources,the Center for Robot-Assisted Search and Rescue (CRASAR)and TECO, which have produced a total of ten studies on UGVreliability the field. This section provides background informa-tion and important results from those studies. It begins with anoverview of the information available from the ten studies, fol-lowed by a description of the UGVs examined in Section IV-A.Then, in Section IV-B the CRASAR study [3] is described, fol-lowed by the WTC study [4] in Section IV-C, and TECO’sstudies [23] in Section IV-D. Each section includes a descrip-tion of the study’s (or studies’) approach to data gathering andanalysis, and a summary of important findings related to UGVreliability in the field.

Table I provides an overview of the information that wasavailable from the ten studies. For each study, the table specifies

whether or not the following information was available: a defini-tionof failure, aclassificationof the failures, andsourceusageandfailure data. It also reports, for each study, the granularity of thedata collection process and the format of any reported results.

Table I shows that key information (the definition of failureused, data collection methods, and numeric results) are missingfrom most of TECO’s studies. It also shows that the CRASARstudy and the WTC study differed in both data collection andanalysis methods. For this reason a true comparative studycannot be performed directly on this set of studies. A goal forthis meta-study, therefore, was to gather as much information aspossible from these studies, and to use that information to begindeveloping standard methods for collecting and analyzing UGVfailure information. The taxonomy presented in Section III-Ais a result of that effort.

A. Robots Surveyed

A total of 24 robots are considered in this paper. They rep-resent 15 different models from 7 manufacturers. They rangefrom small in tracked vehicles capable ofchanging their geometry, to a modified M1 tank (over 60 tons).Table II presents basic information on each robot model in-cluding the name, size, manufacturer, weight, communicationmethod, traction method, and the studies which analyzed thereliability of the model. The size of a robot is man-packable,man-portable, or not man-portable [4]. A man-packable robotis small enough to be safely carried by one or two people inbackpacks. A man-portable robot cannot be easily carried intothe field by a person but can be transported in a HUMMV or per-sonal car and can be lifted by one or more people. Robots whichare not man-portable require additional equipment to transportthem, e.g., a heavy truck or trailer.

The smallest surveyed robots are Inuktun’s MicroTracs andMicroVGTV platforms (Fig. 4). Though they have limitedsensing capabilities (usually only video and two-way audio,though more can be added), they have been shown to be veryuseful in a variety of USAR scenarios. Their small size enablesthem to go where humans and dogs simply cannot fit. They arealso the most portable. The robot and all support equipmentthat is needed can easily be packed into a backpack and carriedinto the field.

The next size group includes the SOLEM and URBOT(Fig. 5) from Foster-Miller, the Urban and its successor thePackbot (Fig. 6) from iRobot Corporation, and MATILDA(Fig. 7) from Mesa Associates. These robots are at the limit of


TABLE IITHE ROBOTS AND SOME OF THEIR CHARACTERISTICS. ASV REFERS TO ALL SEASONS VEHICLES INC.

Fig. 4. Inuktun MicroVGTV (left) and MicroTracs (right) robots.MicroVGTV photo courtesy of Inuktun Services Ltd.

Fig. 5. Foster-Miller SOLEM (left) and URBOT built on a SOLEM base(right). SOLEM photo courtesy of Foster-Miller. URBOT photo courtesy ofU.S. Unmanned Ground Vehicles/Systems Joint Project Office (UGV/S JPO).

Fig. 6. iRobot Urban and ATRV (left). iRobot Packbot (right).

what can be considered man-packable, requiring two people tocarry the robot and its support equipment into the field. They areusually connected to the OCU through a wireless link and carrytheir own batteries and a computer inside their chassis. With anonboard computer and larger sensor payload capabilities, theserobots can function with varied levels of autonomy. Low levelsof autonomy, for example guarded motion (teleoperation withobstacle avoidance), can help ease the workload of the operator.

Fig. 7. Mesa Associates MATILDA (left) and Foster-Miller Talon (right).MATILDA photo courtesy of UGV/S JPO. Talon photo courtesy of Foster-Miller.

Fig. 8. iRobot ATRV-Jr. shown here in camouflage (left). SARGE built on aYamaha Breeze ATV base (right). SARGE photo courtesy of UGV/S JPO.

High levels of autonomy are still largely experimental and haverarely been used to date in field environments.

The Talon (Fig. 7) from Foster-Miller, ATRV-Jr (Fig. 8) andATRV (Fig. 6) from iRobot, and SARGE (Fig. 8) are all man-portable. They have a wider range and increased flexibility com-pared to the smaller robots due to their ability to carry more sen-sors, and effectors. The ATRV and SARGE are larger platformswhich can also be modified to carry smaller robots. These het-erogeneous groups, usually referred to as marsupial, have theadvantage of the range of the “mother” robot as well as the ma-neuverability of the “baby” robots.

The largestgroup including theARTS,DEUCE(Fig.9),D-7G,and PANTHER (Fig. 10) are all platforms which have beenadapted from commercially available heavy construction equip-ment or military platforms. They each weigh in excess of fivetons and require special equipment to transport. These platformsare too large for typical USAR or squad-level MOUT scenarios.


Fig. 9. ARTS built on an All Seasons Vehicles MD-70 base (left) andCaterpillar DEUCE (right). ARTS photo taken from ARTS study [27]. DEUCEphoto courtesy of Caterpillar.

Fig. 10. Caterpillar D-7G and M1 PANTHER II built on a U.S. Army M1 Tankbase. Photos courtesy of UGV/S JPO.

TECO has equipped them with a standard teleoperation inter-face and tested them for demining and debris clearing tasks.

B. CRASAR Study Summary

In [3], Carlson and Murphy reported the results of an analysison failures encountered during the day to day use of robots byCRASAR. The CRASAR failure data were collected over a pe-riod of two years and include data on 13 robots representing threemanufacturers and seven models. User logs, collected failuretype, and frequency data served as the source of information forthis study. The study was not limited to field environments, butincluded usage and failures which occurred in the lab as well. Thefailure data were analyzed by first categorizing each individualfailure based on the robot’s subsystems (which now serve asphysical classes), then using standard manufacturing measuresfor the reliability of a product from [24] (the same metrics usedin this study, see Section III-B for details). The findings showedan average MTBF of eight hours and availability of less than50%. The effectors were the most common sources of failures35%, followed by the control system at 29%. For the purposesof this paper only field use and failures (including office andUSAR domains) will be considered.

Table III summarizes the general findings on field fail-ures from the analysis. The first two lines describe the fieldfailures recorded for Inuktun’s platforms (MicroTracs and Mi-croVGTV), and iRobot’s platforms (Urban, Packbot, ATRV-Jr,and ATRV), respectively. The bottom line considers all therecorded field failures as a whole. The number of recordedfailures, percentage of usage in the field over all the recordedusage, and the MTBF are included to describe the overallfrequency of field failures.

Table III shows that MTBF by itself does not paint a com-plete picture. The overall MTBF for Inuktun and iRobot arethe same, but the other metrics of reliability like availability arequite different. The primary reason for this is that the Inuktunrobots tend to suffer from minor failures, which often take lessthan 1 min to fix. 70% of the recorded Inuktun failures wererepaired in the field. iRobot’s robots were more likely to suffer

from serious failures that take hours to repair. Based on the find-ings in the CRASAR study [3], this is due to the maturity levelof the platforms rather than differences between manufacturers.Within the same manufacturer, iRobot’s ATRV-Jr (productionlevel) has a high availability rate at 80% but the Urban (an earlyprototype) has only a 22% availability rate.

C. WTC Study Summary

A detailed analysis on the failures encountered by CRASARwhile using robots in the WTC rescue operation was reportedby Micire in [4]. Each time the robot was deployed, the robot’scamera view was recorded. The data used in this analysis weretaken from 5.5 hours of video that resulted from the initial twoweek rescue phase of the WTC response. The study coversfour robots from two manufacturers representing three differentmodels. During the response new techniques were developedfor operating tethered robots. These allowed the person feedingthe tether to assist the robot operator in overcoming problems.According to the analysis, such assistance was needed anaverage of 2.8 times/min. Considering minor issues as wellas more serious failures, 11% of the search time was lost onaverage due to traction slippage, 18% due to camera occlusion,and 6% due to lighting adjustments. Seven overall findings anda detailed taxonomy of the environmental and robot relation-ships are developed from these data.

The WTC study [4] defined failure as any event which hin-dered the progress of the robot in its search task, and used twolevels of failure severity: catastrophic and significant. A cat-astrophic failure required that the robot be removed from thevoid it was searching in order to be repaired [4]. A significantfailure introduced suboptimal performance [4] but did not keepthe robot from continuing its mission. Five categories were usedto classify the most common problems encountered by the teth-ered robots (the MicroVGTV and MicroTracs platforms wereused in all but one attempt, which was not included in the anal-ysis) at the WTC. These were defined as follows.

• Gravity assist: Events that occur when the tether manageris required to provide support for the robot through thetether.

• Stuck assist: Events which occur when the tether managermust use the tether to free the robot from some obstacle orterrain that does not permit the robot to move.

• Track slippage: The amount of time that the track mecha-nisms do not have sufficient contact with the ground sur-face.

• Occluded camera: Amount of time that the camera viewis completely occluded by objects and debris.

• Incorrect lighting: Amount of time that the lights werecompletely off or in a transition between intensities.

Table IV presents overall statistics from this analysis. Thedata are broken down by attempt. An attempt is an event inwhich a robot is inserted into a void in order to search thatvoid. Only attempts for which there are recorded data are in-cluded. They are numbered sequentially for simplicity. See [4]for more details on the conditions under which each attempt oc-curred. The WTC study reported the duration (in seconds) ofthe track slippage, occluded camera, and lighting incorrect cat-egories. The number of occurrences was not provided for these


TABLE IIITHE RELIABILITY OF INUKTUN AND IROBOT ROBOTS IN FIELD ENVIRONMENTS FROM THE CRASAR STUDY [3]

TABLE IVFAILURES ENCOUNTERED BY INUKTUN ROBOTS AT THE WTC FROM [4]

categories, though it was for gravity and stuck assists. Thus, itis impossible to combine these two distinct measurements intoa single statistic. Table IV, therefore, presents the best availablesynthesis of all robot field failures from the WTC study. It in-cludes the frequency/min of gravity and stuck assists; and thepercentage of time spent in the track slippage, occluded camera,or lighting incorrect failure modes.

Considering the fact that Inuktun’s robots (MicroTracs andMicroVGTV) were not designed to perform USAR searchtasks, it is likely that many of the failures referred to in Table IVwould still constitute “normal” behavior for those platforms.Therefore the definition of failure used in the WTC analysis istechnically broader than the one used in this paper. For the pur-poses of this meta-study, the WTC study’s definition of failureis accepted. This is done in the interests of eventually producingrobots which can be expected to show reliable performanceunder the extreme conditions encountered during the WTCrescue response. In studying these results, it is important tokeep in mind that both physical and human failures are coveredin [4]. Many of the minor problems summarized in Table IVwere not the fault of the robot or the physical components ofthe robot but of the humans that interacted with it.

An interesting attribute of the WTC study is that there wereonly four failures that would have been recorded without thedetailed analysis that was performed on the 5.5 hours of video.In other words, if the WTC study had been performed at the levelof granularity of the other studies only four failures would havebeen reported for the entire duration of the rescue response. Thedetailed video analysis uncovered 136 cases in which the tethermanager had to assist the robot, and that one-third of the timewas spent in failure modes (tracks slipping, camera occluded,etc.). These minor failures, which were not recorded by the otherstudies, had a significant impact on the robots’ performance.

D. TECO Study Summary

The results from eight studies conducted by TECO, partof the Maneuver Support Center at Fort Leonard Wood, havebeen posted to the Department of Defense Joint Robotics Pro-gram (JRP) library [23]. TECO provides operational test andevaluation expertise to the Chemical, Engineer and Military

TABLE VSUMMARY OF RELIABILITY RESULTS DERVIED FROM TECO’S

PANTHER STUDY [1]

Police Schools and assists in the development and executionof Advanced Warfighting Experiments (AWE). Each roboticplatform was designed for a specified set of tasks within theFuture Combat System (FCS). The overall goal of the studieswas to evaluate the feasibility of using the robotic platform forthose tasks. These studies were performed on the following: anAll-Purpose Remote Transport System (ARTS) for clearing anddemining; integration of Chemical, Biological, Radiological,and Nuclear (CBRN) sensor modules on existing robot platforms(URBOT and MATILDA); the delivery of nonlethal munitionsfrom an existing robot platform (SARGE); a UGV-based rapidobscuration system; a D-7G bulldozer, Deployable UniversalCombat Earthmover (DEUCE), and an M1 tank (each equippedwith a standardized teleoperation system); as well as a varietyof smaller platforms (URBOT, Urban, SOLEM, and TAR). Allof these studies were carried out in mock military operationswhich can be considered to be high-fidelity (close enough to areal scenario to produce similar results) field environments.

For all but the DEUCE study, experimental scenarios weredeveloped and managed by TECO personnel and subject matterexperts (where needed). Questionnaires and reports were devel-oped and used to record: operator performance, operator feed-back, platform and payload performance, and documentationof unanticipated events (e.g., equipment failure). These wereused to develop the assessments and recommendations whichappeared in each of the studies’ Executive Summaries. Unfor-tunately, the raw data from these questionnaires and reportswere not available. The documents stored in the JRP libraryfor each study consisted mainly of the Executive Summary, Pat-tern of Analysis (a detailed list of issues explored by the study),blank versions of any questionnaires or reports developed, andoften pictures of the platform(s). Only the repositories for theCBRNLOE, DEUCE, and PANTHER studies included a com-plete Test Report.

TECO has reported a MTBF of less than 20 hours [5], thoughtheir studies did not provide enough information to validate thatfigure. Only the PANTHER study [1] provided details on eachfailure encountered. None provided sufficiently detailed usageinformation to calculate the MTBF. For example the PANTHERstudy reported that 35 failures were recorded during the 32-dayperiod of the experiments. The time spent conducting the exper-iments each day (e.g., 4, 8, or 24 hours) was not specified.

Table V provides a summary of the reliability informationderived from TECO’s PANTHER study. The descriptions of


TABLE VIPROBABILITY BY PHYSICAL CLASS FROM THE CRASAR STUDY [3]

TABLE VIIPROBABILITY BY PHYSICAL CLASS FROM TECO’S PANTHER STUDY [1]

the failures provided in [1] allowed the failures to be classifiedusing the impact attribute (see Section III-A). The number, per-centage, and average downtime are presented for terminal andnonterminal failures with the summary of all reported failuresgiven at the bottom.

Most of the failures were terminal in that testing was stoppedfor the robot that failed until it was repaired. Two recurring sensorfailures were nonterminal, therefore, no downtime was reportedfor those failures. The average downtime was 7.31 hours overall.According to TECO, despite the fact that they had two robotsavailable, 7.2 days were lost due to unscheduled vehicle main-tenance occurring on both robots simultaneously.

V. PHYSICAL FAILURES

This section covers in detail the failures reported in all tenstudies which do not directly involve a human (that is, physicalfailures). The majority of failures found in CRASAR’s study [3]and TECO’s studies are physical failures. The failures encoun-tered at the WTC are presented by category as defined in [4] andSection IV-C. First the frequency of failures within each classunder the physical failure branch of the taxonomy are presented.Then, in Sections V-Athrough V-E, examples of failures whichfall into each category are provided. The examples selected willdemonstrate not only how failures can be classified using thetaxonomy in Section III-A, but also the challenges associatedwith using robots in field environments. Each section will alsoinclude some discussion of general traits of that physical failurecategory.

Table VI presents data from the CRASAR analysis describedin [3]. It shows the number of field failures reported and the rel-ative frequency of each of the physical classes in the form ofprobabilities that a failure was caused by component(s) withineach class. Note that this information was taken from the sourcedata for [3], rather than the study itself. All lab usage and fail-ures are factored out of these statistics. The failures are groupedby manufacturer with the overall probabilities (calculated forthe complete set of field failures) for each category shown atthe bottom of the table. The Communications category is notincluded since none of the field failures included in [3] werecommunications failures. Note that more field failures were re-ported for Inuktun than iRobot, therefore, the overall probabili-ties show more influence from the relative physical class failurerates of Inuktun’s robots.

Table VI clearly shows that effectors are the biggest problem,followed by the control system, power, then sensors. Table VIIfor the PANTHER shows very different results. In TECO’s

Fig. 11. Dirt found near sensitive equipment inside an iRobot Urban.

study the primary problem was the new control system. Thesensors that came with the control system were second with26% of the failures, followed by the effectors. Only power hasthe same probability of causing problems on the smallest andthe largest robots. The difference is probably due again to thematurity of the M1 Tank platform, which has been standardissue in the U.S. Army for 20 years.

A. Effector

Across all the studies analyzed, the most common type offailure was effector failure (failure of components that performactuation and their connections). Common failure sources in[3] were the shear pin and pinion gear in the geometry shiftingmechanism on the MicroVGTV. If the robot encounters resis-tance while shifting (low clearance in a confined space for ex-ample) the shear pin will break. The pinion gear is a problembecause the area that houses it is open to the environment. Dirtand other debris get into that area and cause premature wear. Ac-cording to TECO, the Urban platform suffers from similar prob-lems (see the URBOT study [25] and Fig. 11). Open gearing forthe articulating arms and the drive motor collected debris whichcaused them to stop working.

Tracked vehicles were more prone to failure than theirwheeled counterparts. In [3], 96% of the effector failures oc-curred on tracked rather than wheeled vehicles. 57% of theeffector failures were simply the tracks working off their wheelsfor various reasons. In the URBOT study [25], the Urban en-countered the same problem. The study of the ARTS performedby TECO [27] mentioned two instances in which rocks becamestuck in the track guides and sprockets (see Fig. 12). The PAN-THER (a modified M-1 tank) also threw a track, a failure whichtook days to repair. Both the D-7 study [29] and the DEUCEstudy [30] mentioned repeated problems with track slippage.

The WTC study’s track slippage category was used to catalogthe amount of time the tracks were spinning while the robot re-mained in place. If the tracks should have had enough frictionwith the ground surface under those conditions, then the failurewould be considered to be an effector failure. Such a failuremight have been induced by heat which in one documented caseexceeded 122 degrees Fahrenheit and softened the track enoughthat it became loose. In another case an aluminum rod becamelodged into the track mechanism where the space tolerance be-tween the track and the platform was less than one-eighth of an


Fig. 12. Rock stuck in ARTS track mechanism. Photo taken from ARTSstudy [27].

Fig. 13. Failure encountered at the WTC.

inch (see Fig. 13). These are classified as terminal failures as therobots had to be pulled out of the void and repaired before theycould be used again.

B. Sensor

The sensor category covers any failed sensors and any prob-lems with their connections. By far the most common failedsensor in all of the studies was the camera. It was also the onlysensor common to all of the robots’ sensor suites.

In the WTC study [4], there were two categories of repeatedsensor failures: occluded camera and incorrect lighting. Oc-cluded camera is defined as a state in which the entire cameraview is blocked by obstacles. Note that this is a conservativedefinition of occlusion and this failure was still found to occurduring 18%, on average, of the total time the robots spentsearching a void. Incorrect lighting included states in which thelights were completely off (in which case the operator could notsee) or were in transition between intensities. Lighting was alsoa problem mentioned during TECO’s ARTS study [27]. Thecamera’s iris would not adjust enough to allow the operator tosee to maneuver the robot.

The PANTHER study uncovered sensor problems which donot tend to occur under lab conditions. Bumpy terrain, suddenchanges in lighting, and rainy weather caused problems for theon-board cameras, making it difficult to control the robot fromthe remote OCU. They also mention other common externalproblems for cameras used in the field, namely lenses coveredin moisture, dirt, or mud.

It is important to note here that the WTC study [4] and TECOin three separate studies cited the lack of depth perception as aproblem encountered in the field. This appears to be an impor-tant feature lacking in current sensor suites, regardless of thepayload capability, as the ARTS vehicle is considerably largerthan the robots used at the WTC. The difference in payloadcapabilities led TECO to recommend the addition of cameraswhereas the WTC study recommended software and/or cogni-tive science-based solutions.

C. Control System

Control system failures (problems caused by the on-boardcomputer, manufacturer provided software, and OCUs) are themost common failures in the PANTHER study [1], second mostcommon in the CRASAR study [3]. None were reported in theWTC study. In the PANTHER study [1], more than 6 days of the32 available for testing were lost to diagnosis and maintenanceof the teleoperation system.

The most common control system failures found in theCRASAR study [3] were cases in which the robot was simplyunresponsive (60% of field control system failures) and the so-lution was to cycle the power. Since rebooting the robot and/orOCU solved the problem it was assumed that the problem wasdue to the control system, though the source of these problemsremains unknown. These problems tend to occur as frequentlyin the lab as they do in the field.

The PANTHER study [1] revealed a wider variety of con-trol system problems. The most frequently encountered sourcewas the steering system. Symptoms ranged from sluggishnessto a complete loss of steering, sometimes manifesting in onlyone direction at a time. The emergency stop switch failed mul-tiple times. The UGVROP study [32] described similar prob-lems but they seemed to be less frequent. The PANTHER’s con-trol system behavior was erratic and unstable in some cases.Reported failures include: uncontrolled acceleration, the RPMsshooting up to a critical level for no apparent reason, and asystem shutdown when the operator tried to switch to teleop-eration mode.

D. Power

Based on the analysis in the CRASAR study [3] and TECO’sstudy of PANTHER [1] power failures produced few of the fail-ures that occur in the field. The WTC analysis [4] revealed nofailures due to batteries and their related connections during thetwo week rescue response. This was probably due to the factthat the robots were not used for an extended period of time.The longest period of time a robot was continuously used was alittle over 24 min, therefore, the batteries were not heavily taxed.

In the CRASAR study [3] half of the power failures on therobots are due to the battery and its connections. The PANTHERsuffered repeatedly from low batteries and low fuel. One of thetwo DEUCE platforms tested by TECO [30] suffered from arecurring power failure due to clogged fuel filters, requiring areplacement roughly every six hours. TECO also had problemswith the Urban platform [25] due mainly to the fact that theoperators did not recognize that the drive system was frozen andwould subsequently overtax the power system in a vain effort toget the robot to move.


E. Communications

The majority of communications failures in the field werefound and described by TECO, with the WTC study [4] pro-viding one example. This is due in part to the fact that wire-less communications is a known problem in field environments,therefore, wired robots are more commonly used (94% of usagefrom [3]) than the wireless robots (28% of usage).

Experience with the wireless SOLEM platform at the WTCprovides a good example of why these robots are difficult touse remotely in field environments. The structural steel of theWorld Trade Center had a large impact on the range of the 2-Wtransmitter the platform was carrying. Instead of the usual mileor more, the robot lost communication with the OCU in under 20feet. Even up to that point the signal was not very stable, 23.8%of the 7 min the robot spent searching the rubble pile resultedin completely useless video. The robot finally completely lostcontact with the OCU. It was never recovered.

Similar problems may have been found with the PANTHERplatform tested by TECO [1]. Fourteen percent of the failuresencountered during that study were due to video dropout, agood indication of communications failures. The study on theD-7 bulldozer [29] concluded that the nonline-of-sight controlrequirement for the platform could not be met due to the factthat the teleoperation equipment on the robot could not transmitvideo through any interposed materials or foliage. Video band-width and reliability limitations even impacted the performanceof operators in line-of-sight scenarios. Occasional static was aproblem for ARTS [27] operators as well.

Even if communication technology improves, new problemsare likely to emerge. For example, the problem of sharing lim-ited bandwidth among many wireless robots in the same area.Limited bandwidth is already a problem for the military, whererules on allowed frequencies are found to be a hindrance to im-proved signal strength and reliability. It is also a challenge toensure that wireless transmitters adhere to the established rules.The ARTS for example would bleed over to radio frequencies itwas not allowed to use.

F. Attributes

In the taxonomy defined in Section III-A two additional char-acteristics were defined in addition to the cause (or fault) ofthe failure: the repairability of the failure and the impact of thefailure. Table VIII presents overall metrics for comparing theseattributes from the source data of the CRASAR study, using onlydata collected in the field. The percentage of failures, MTBF,and average downtime are included for each value of the twoattributes. Note that this table divides repairability into field-re-paired and not-field-repaired rather than field-repairable fail-ures. Field-repaired failures represent a subset of the field-re-pairable failures, since in some cases the field work as done and,therefore, it was more convenient to bring the robot back to thelab for repair.

Based on Table VIII, terminal failures are more common.This finding is supported by TECO’s M1 Panther II study inwhich 95% of the recorded failures were terminal. The averagedowntime shows that terminal failures, based on the definition

TABLE VIIICOMPARISON OF FIELD-REPAIRED VERSUS NOT FIELD-REPAIRED AND

TERMINAL VERSUS NONTERMINAL FAILURES FROM THE CRASAR STUDY [3]

used in this paper, need not be severe. For example, an unre-sponsive control system may take less than 1 min to fix (re-boot). This is still considered to be a terminal failure as the robotcannot continue its mission until it is repaired. In fact, intermit-tent nonterminal failures tend to require more diagnosis timeand additional technical knowledge of the robotic system. Thiscan drive up the amount of time that a robot is out of service.Field-repaired failures were also more common, though theirinfluence on the robot’s reliability is noticeably less, since theyrequire less than 10 min to repair on average. Nonfield-repairedfailures, on the other hand, have a significant impact on a robot’savailability, as they require weeks to repair on average. This islikely due to shipping overhead and the time it takes to createnew custom parts by hand.

G. Discussion

While this meta-study is a quantitative analysis of failures,some observations as to how to proactively reduce failures cometo mind. Close partnerships need to be created between robotmanufacturers and groups of interested end-users. Field-testingis very important once a suitable prototype is ready. End-userswill be needed to supply realistic scenarios and environmentsfor the tests. Past experience has shown that end-users in themilitary and search and rescue fields will readily apply theiracquired experience and expertise with other familiar tools tonew technology. This usually results in new applications andmodes of interaction which the original designers never antici-pated. User-centered design techniques [33] from human-com-puter interaction (field-testing with end-users is one) can be usedto help focus the engineering design team on the specific chal-lenges which must be overcome to provide a new robot withmore reliability than its predecessors.

Some of the problems mentioned above can be handled orat least reduced by the design team alone. The platform itselfincluding the chassis and effectors should be designed for andtested in as many terrain, temperature, and moisture conditionsas possible. Alternative solutions to the wireless communicationproblem need to be explored, for example the use of a combina-tion of wired and wireless communication or repeaters to boostsignal strength (see [34] and [35]). Video quality over wirelesscan be improved by providing the robot with the ability to com-press (using lossless compression techniques) the video streambefore sending it. Ultimately, manufacturers must accept thattheir robots will suffer from failures in the field, and must designfor maintainability. Regular maintenance tasks must be madeas painless as possible for the end-user and any custom partsshould be made readily available in the event that more seriousfailures occur.


Fig. 14. Summary of classification results using the failure taxonomy from Section III-A including probabilities for each leaf class and attribute value for militaryand USAR applications. For example, the probability that a failure was an effector failure is 0.11 for military applications [1] and 0.5 for USAR.

VI. CONCLUSION

This paper has explored the question of how UGVs fail inthe field. A novel taxonomy of UGV failures was created toreconcile failure taxonomies from the dependability computing,human-computer interaction (HCI), and robotics communities.Following Laprie [13], failures were divided into physical andhuman categories, with human subdivided into design and inter-action. The HCI classes of mistakes and slips fell under interac-tion failures. Physical failure classes were taken from Carlsonand Murphy [3]. The new taxonomy was then used to explore,in depth, how UGVs fail in the field based on ten studies [1],[4], [25], [27]–[32], [36]. For each category, examples were pro-vided which illustrate the nature of UGV field failures which fallunder that category.

Fig. 14 provides a summary of the classification results(from Tables V–VIII) in terms of the taxonomy presented inSection III-A, broken down by field application area (militaryor USAR). For each leaf class, the probabilities for that cate-gory are displayed beneath the leaf in the diagram. As seen inFig. 14, if both application areas are considered together, theprobability of either the effectors or the control system is near0.5. Sensor failures are less frequent with at most 26% of thefailures, followed by power at 9%. Communications failuresdid not appear in any of the studies which provided enoughdetails to derive relative frequencies of this type, but four ofthe studies [1], [4], [25], [27] reported individual examples orevidence (e.g., video dropout) of such failures.

The attributes are similarly marked with the probability thata given failure will have that attribute value. Field-repairableand terminal failures are more common than their opposites,with 80% and up to 95%, respectively, of the failures covered inthis meta-study. The studies did not provide data on conditionalprobabilities between the classes and attributes, for example,given that the failure is due to an effector, what is the probabilitythat it will be field-repairable.

Overall the classification results support three conclusionsabout mobile robot field failures in general. The most common

sources of failure in modern robotic systems are custom-builtby hand and increasingly complex (control and effector sys-tems). The most reliable components in modern robot systemsare simple (power) and/or mass-produced (sensors). Most fail-ures will interrupt the robot’s mission (terminal) but require rel-atively minor repairs (field-repairable).

Overall robot reliability in field environments is low with be-tween 6 and 20 hours MTBF, depending on the criteria used todetermine if a failure has occurred. Common issues with ex-isting platforms appear to be the following: unstable controlsystems, chassis and effectors designed and tested for a narrowrange of environmental conditions compared to the variety theywill ultimately encounter in the field, limited wireless commu-nication range in urban environments, and insufficient wirelessbandwidth for video-based feedback.

In order for robots to be accepted in field applications, theirreliability in field conditions must improve. In order to achievethis goal both the knowledge of the engineers (mechanical, elec-trical, and software) that designed and built the robots and theexpertise of the end-users must be applied. Currently, a state-of-the-art UGV cannot be expected to complete an entire shift(12 hours for USAR or 20 hours for the Department of Defense)without incident. This is important because it means that backupresources might be needed when the robot fails. It is also impor-tant that a new robot be tested in a realistic scenario as soon aspossible, to learn what capabilities and shortcomings it has be-fore the design is finished. Finally, more field studies and failureanalyses are needed to find problems that span application areas(like USAR) and a variety of robot platforms. (Human-robot in-teraction failures in particular need to be examined, as no studiesfound cover this area.) This meta-study has used the informa-tion gathered from ten studies on mobile robot reliability in thefield to develop a general method which can be used by futurestudies.

Rigorously applying specific criteria to a new domain or ap-plication area is a challenging task. This is the primary reasonthat the definition of failure applied in this paper uses the


generic, imprecise term “normal” behavior to distinguish fail-ures from all other events. Once field applications of robotics inUSAR and MOUT are sufficiently mature, the normal behaviorof the robot can be described in measurable terms and a moreprecise definition of failure can be enforced. At the presentstate-of-the-art, this can only be done for small scale, controlledexperiments (e.g., using rate of progress toward a goal or thepercentage of land mines found as metrics), not at the scope ofthis meta-study.

TECO’s modified platforms provide an additional area inwhich a comparative study can be done between the reliabilityof manned and unmanned vehicles, and the characteristicsof the failures that each encounter. For example, the resultsfrom [3] and this meta-study suggest that tracked UGVs tendto have more effector failures than wheeled UGVs in fieldenvironments. It would be interesting to find out if this is truefor manned vehicles as well, and if not, what is different aboutunmanned vehicles that causes this result.

ACKNOWLEDGMENT

The authors would like to thank D. Knichel, Chief of the Testand Evaluation Coordination Office at Fort Leonard Wood, forhis assistance, and A. Nelson for his helpful comments. In ad-dition, they would like to thank the reviewers for their carefulconsideration of this paper.

REFERENCES

[1] F. Cook. (1997) Test Report for the Panther II. US Army, Test and Eval-uation Coordination Office (TECO), Ft. Leonard Wood, MO. [Online].Available: www.jointrobotics.com

[2] J. G. Blitch, “Adaptive mobility for rescue robots,” Proc. SPIE (Sensors,and Command, Control, Communications, and Intelligence (C3I) Tech-nologies for Homeland Defense and Law Enforcement II), vol. 5071, pp.315–321, 2003.

[3] J. Carlson and R. Murphy, “Reliability analysis of mobile robots,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), Taipei, Taiwan, Sep. 2003,pp. 274–281.

[4] M. Micire, “Analysis of the robotic-assisted search and rescue responseto the world trade center disaster,” master’s thesis, Univ. South Florida,Tampa, Jul. 2002.

[5] R. Tiron. (2003, May) Lack of autonomy hampering progress of bat-tlefield robots. National Defense. [Online]. Available: www.nationalde-fensemagazine.org

[6] R. R. Murphy, Introduction to AI Robotics. Cambridge, MA: The MITPress, 2000.

[7] Y. Beauchamp and T. Stobbe, “A review of experimental studies onhuman-robot system situations and their design implications,” Int. J.Human Factors Manuf., vol. 5, pp. 283–302, 1995.

[8] A. Starr, R. Wynne, and I. Kennedy, “Failure analysis of mature robots inautomated production,” Proc. Inst. Mech. Eng.—Pt. B (J. Eng. Manuf.),vol. 213, pp. 813–824, 1999.

[9] I. R. Nourbakhsh, “The mobot museum robot installations,” in Proc.IEEE/RSJ IROS 2002 Workshop Robots in Exhibitions, 2002, pp.14–19.

[10] N. Tomatis et al., “Design and system integration for the expo.02 robot,”in Proc. IEEE/RSJ IROS 2002 Workshop Robots in Exhibitions, 2002,pp. 67–72.

[11] J. Casper et al., “Issues in intelligent robots for search and rescue,” SPIEGround Vehicle Technology II, vol. 4, pp. 41–46, 2000.

[12] R. R. Murphy et al., IEEE Int. Conf. Ind. Electron. Control Instrum.,Oct. 2000.

[13] J.-C. Laprie, “Dependable computing and fault-tolerance: concepts andterminology,” in Proc. 25th Int. Symp. Fault-Tolerant Comput., 1995, pp.27–30.

[14] A. Kokkinaki and K. Valavanis, “Error specification, monitoring, andrecovery in computer-integrated manufacturing: an analytic approach,”IEEE Control Theory and Applications, vol. 143, no. 6, pp. 499–508,Nov. 1996.

[15] U.S. Department of Defense Military Standard. (1981, Jun.) Definitionsof Terms for Reliability and Maintainability (MIL-STD-721C). [On-line]. Available: www.weibull.com/knowledge/milhdbk.htm

[16] U.S. Department of Defense Military Standard. (1978, Feb.) FailureClassification for Reliability Testing (MIL-STD-2074). [Online]. Avail-able: www.weibull.com/knowledge/milhdbk.htm

[17] F. Sheldon et al., “Reliability prediction of distributed embeddedfault-tolerant systems,” in Proc. 4th Int. Symp. Software ReliabilityEngineering, 1993, pp. 92–102.

[18] A. Tchangani, “Reliability analysis using bayesian networks,” Studies inInformatics Control, vol. 10, 2001.

[19] S. Ramaswamy and K. Valavanis, “Hierarchical time-extended petri nets(h-epns) based error identification and recovery for multilevel systems,”IEEE Trans. Syst. Man, Cybern., B, vol. 26, pp. 164–175, Feb. 1996.

[20] J. L. Gonzalez et al., “Influences of robot maintenance and failures inthe performance of a multirobot system,” in Proc. 2002 IEEE/RSJ Int.Conf. Intell. Robot. Syst., vol. 3, 2002, pp. 2801–2806.

[21] R. Simmons, C. Pecheur, and G. Srinivasan, “Toward automatic verifi-cation of autonomous systems,” in Proc. IEEE Int. Conf. Intell. Robot.Syst., vol. 2, 2000, pp. 1410–1415.

[22] D. A. Norman, The Design of Everyday Things. Cambridge, MA: TheMIT Press, 2002, pp. 105–140.

[23] (2003) Joint Robotics Program. U.S. Department of Defense, Wash-ington, DC. [Online]. Available: www.jointrobotics.com

[24] IEEE Recommended Practice for the Design of Reliable Industrial andCommercial Power Systems, IEEE Std. 493-1997, 1998.

[25] G. Piskulic. (2000) Military Police/Engineer Urban Robot (URBOT)Concept Experimentation Program (CEP) Report, Engineer Module.U.S. Army, Test and Evaluation Coordination Office (TECO), Ft.Leonard Wood, MO. [Online]. Available: www.jointrobotics.com

[26] J. Burke, R. Murphy, M. Coovert, and D. Riddle, “Moonlight in miami:a field study of human-robot interaction in the context of an urban searchand rescue disaster response training exercise,” Human-Computer Inter-action, vol. 19, pp. 85–116, 2004.

[27] R. Walker and M. Scarlett. (1999) Test Report for the All-Purpose Re-mote Transport System (ARTS) Executive Summary. U.S. Army, Testand Evaluation Coordination Office (TECO), Ft. Leonard Wood, MO.[Online]. Available: www.jointrobotics.com

[28] (2003) Final Report for the Chemical Biological Radiological Nuclear(CBRN) Sensor Module With Robotic Platform Limited Objective Ex-periment (LOE). U.S. Army, Test and Evaluation Coordination Office(TECO), Ft. Leonard Wood, MO. [Online]. Available: www.join-trobotics.com

[29] G. Boxley. (1999) Teleoperational d-7 Dozer Concept Evaluation Pro-gram (CEP) Executive Summary. U.S Army, Test and Evaluation Co-ordination Office (TECO), Ft. Leonard Wood, MO. [Online]. Available:www.jointrobotics.com

[30] F. Cook. (2000) Limited Objective Experimentation Report for the De-ployable Universal Combat Earthmover (DEUCE). U.S. Army, Test andEvaluation Coordination Office (TECO), Ft. Leonard Wood, MO. [On-line]. Available: www.jointrobotics.com

[31] G. Piskulic. (2001) Military Police CEP for Small Robots in Supportof Dismounted Troops Utilizing Non-Lethal Weapons. U.S. Army, Testand Evaluation Coordination Office (TECO), Ft. Leonard Wood, MO.[Online]. Available: www.jointrobotics.com

[32] R. Weiss. (2001) Final Test Report for the Unmanned Ground VehicleRapid Obscurant Platform (UGVROP) Chemical Corps Concept Ex-ploration Program. US Army, Test and Evaluation Coordination Of-fice (TECO), Ft. Leonard Wood, MO. [Online]. Available: www.join-trobotics.com

[33] J. Preece, Y. Rogers, and H. Sharp, Interaction Design: Beyond Human-Computer Interaction. New York: Wiley, 2002.

[34] H. Nguyen et al., “Autonomous mobile communication relays,” Proc.SPIE, vol. 4715, pp. 50–57, Apr. 2002.

[35] A. Cerpa and D. Estrin, “Ascent: adaptive self-configuring sensor net-work topologies,” in Proc. IEEE 21st Annu. Joint Conf. IEEE Comput.Commun. Soc. (INFOCOM 2002), vol. 3, 2002, pp. 1278–1287.

[36] J. Carlson et al., “Follow-up analysis of mobile robot failures,” in Proc.IEEE Int. Conf. Robot. Autom. (ICRA), Apr. 2004, pp. 4987–4994.


Jennifer Carlson (S’00) received the B.S. degreein computer science from the University of SouthFlorida, Tampa, in 2001. She recently completedher master’s thesis, “Analysis of how mobile robotsfail in the field” based on the work presented in thispaper. She is working towards the Ph.D. degree atthe Center for Robot-Assisted Search and Rescue(CRASAR), the University of South Florida.

Her current work is focused on intelligent faultidentification and recovery for autonomous UGVs.Her research interests include robotics, artificial

intelligence, and human–robot interaction.

Robin R. Murphy (S’88–M’91) received the B.M.Edegree in mechanical engineering, the M.S. and thePh.D. degrees in computer science from the GeorgiaInstitute of Technology (Georgia Tech), Atlanta, in1980, 1989, and 1992, respectively, where she was aRockwell International Doctoral Fellow.

She is currently a Professor in the ComputerScience and Engineering Department, University ofSouth Florida, Tampa, with a joint appointment inCognitive and Neural Sciences in the Department ofPsychology, and also serves as the Director of the

Center for Robot-Assisted Search and Rescue. She is the author of over 70publications in the areas of multiagent sensor fusion, human–robot interaction,and rescue robotics, as well as the textbook Introduction to AI Robotics. Herresearch is funded by the National Science Foundation (NSF), the U.S. DefenseAdvanced Research Project Agency (DARPA), the Office of Naval Research(ONR), the Department of Energy (DOE), and industry, and includes groundand aerial unmanned systems.

Dr. Murphy is a member of AAAI, ACM, and SPIE. From 2000 to 2002,she served as the secretary of the IEEE Robotics and Automation Society andis the North American co-chair of the Safety, Security and Rescue Roboticstechnical committee. She co-chaired the 1st IEEE Workshop on Safety, Securityand Rescue Robotics, 2003. Since 2000, she has been a member of the U.S. AirForce Scientific Advisory Board and participated in numerous defense studiesand panels on unmanned systems. She is a recipient of an NIUSAR Eagle Awardfor her participation at the World Trade Center.

ieee transactions on robotics, vol. 21, no. 3, june 2005 ... collection/tro/2005/june/14.pdf ·...

Documents