medical devices: the therac-25

49

Upload: others

Post on 11-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Medical Devices� The Therac����

Nancy Leveson

University of Washington

� Introduction

Between June ���� and January ����� a computer�controlled radiation ther�apy machine� called the Therac���� massively overdosed six people� Theseaccidents have been described as the worst in the ��year history of medicalaccelerators ���

A detailed accident investigation� drawn from publicly available docu�ments� can be found in Leveson and Turner �� The following account istaken from this report and includes both the factors involved in the overdosesthemselves and the attempts by the users� manufacturers� and governmentsto deal with them� Because this accident was never o�cially investigated�some information on the Therac��� software development� management� andquality control procedures are not available� What is included below hasbeen gleaned from law suits and depositions� government records� and copiesof correspondence and other material obtained from the U�S� Food and DrugAdministration �FDA�� which regulates these devices�

� Background

Medical linear accelerators �linacs� accelerate electrons to create high�energybeams that can destroy tumors with minimal impact on the surrounding

�This appendix is taken from Nancy Leveson� Safeware� System Safety and Computers�Addison�Wesley� ����� Copyright ����� All rights reserved�

App� A � Medical Devices� The Therac��� �

healthy tissue� Relatively shallow tissue is treated with the accelerated elec�trons� to reach deeper tissue� the electron beam is converted into X�ray pho�tons�

In the early ����s� Atomic Energy of Canada Limited �AECL�� and aFrench company called CGR went into business together building linear ac�celerators� The products of this cooperation were ��� the Therac��� a �million electron volt �MeV� accelerator capable of producing X�rays onlyand later ��� the Therac���� a �� MeV� dual�mode �X�rays or electrons�accelerator� Both were versions of older CGR machines� the Neptune andSagittaire� respectively� which were augmented with computer control usinga DEC PDP��� minicomputer� We know that some of the old Therac�� soft�ware routines were reused in the Therac��� and that CGR developed theinitial software�

Software functionality was limited in both machines� The computer merelyadded convenience to the existing hardware� which was capable of standingalone� Industry�standard hardware safety features and interlocks in the un�derlying machines were retained�

The business relationship between AECL and CGR faltered after theTherac��� e�ort� Citing competitive pressures� the two companies did notrenew their cooperative agreement when scheduled in �����

In the mid�����s� AECL had developed a radical new �double pass� con�cept for electron acceleration� A double�pass accelerator needs much lessspace to develop comparable energy levels because it folds the long physicalmechanism required to accelerate the electrons� and it is more economical toproduce� Using this double�pass concept� AECL designed the Therac���� adual�mode linear accelerator that can deliver either photons at �� MeV orelectrons at various energy levels�

Compared with the Therac���� the Therac��� is notably more compact�more versatile� and arguably easier to use� The higher energy takes advantageof the phenomenon of depth dose� As the energy increases� the depth in thebody at which maximum dose build�up occurs also increases� sparing thetissue above the target area� Economic advantages also come into play for thecustomer� since only one machine is required for both treatment modalities

�AECL was an arms�length entity� called a crown corporation� of the Canadian govern�ment� Since the time of the incidents related in this paper� AECL Medical� a division ofAECL� was privatized and is now called Theratronics International� Ltd� Currently� theprimary business of AECL is the design and installation of nuclear reactors�

App� A � Medical Devices� The Therac���

�electrons and photons��Several features of the Therac��� are important in understanding the

accidents� First� like the Therac�� and the Therac���� the Therac��� is con�trolled by a PDP��� computer� However� AECL designed the Therac���to take advantage of computer control from the outset� they did not buildon a stand�alone machine� The Therac�� and Therac��� had been designedaround machines that already had histories of clinical use without computercontrol�

In addition� the Therac��� software has more responsibility for maintain�ing safety than the software in the previous machines� The Therac��� hasindependent protective circuits for monitoring the electron�beam scanningplus mechanical interlocks for policing the machine and ensuring safe opera�tion� The Therac��� relies more on software for these functions� AECL tookadvantage of the computer�s abilities to control and monitor the hardwareand decided not to duplicate all the existing hardware safety mechanismsand interlocks�

Some software for the machines was interrelated or reused� In a letterto a Therac��� user� the AECL quality assurance manager said� �The sameTherac�� package was used by the AECL software people when they startedthe Therac��� software� The Therac��� and Therac��� software programswere done independently starting from a common base� �� The reuse ofTherac�� design features or modules may explain some of the problematicaspects of the Therac��� software design� The quality assurance managerwas apparently unaware that some Therac��� routines were also used in theTherac���� this was discovered after a bug related to one of the Therac���accidents was found in the Therac��� software�

AECL produced the �rst hardwired prototype of the Therac��� in �����and the completely computer�controlled commercial version was available inlate �����

Turntable Positioning� The Therac��� turntable design plays an impor�tant role in the accidents� The upper turntable �see Figure �� rotates ac�cessory equipment into the beam path to produce two therapeutic modes�electron mode and photon mode� A third position �called the �eld lightposition� involves no beam at all� but rather is used to facilitate correct po�sitioning of the patient� Because the accessories appropriate to each mode

App� A � Medical Devices� The Therac���

Mirror

Turntable

Plunger

Flattener

Counterweight

X-ray modetarget

scan magnetElectron mode

actuatorsMicroswitch

microswitch assembly

Figure �� Upper turntable assembly�

App� A � Medical Devices� The Therac��� �

are physically attached to the turntable� proper operation of the Therac���is heavily dependent on the turntable position� which is monitored by threemicroswitches�

The raw� highly concentrated accelerator beam is dangerous to livingtissue� In electron therapy� the computer controls the beam energy �from �to �� MeV� and current� while scanning magnets are used to spread the beamto a safe� therapeutic concentration� These scanning magnets are mountedon the turntable and moved into proper position by the computer� Similarly�an ion chamber to measure electrons is mounted on the turntable and alsomoved into position by the computer� In addition� operator�mounted electrontrimmers can be used to shape the beam if necessary�

For X�ray �or photon� therapy� only one energy level is available� �� MeV�Much greater electron�beam current is required for X�ray mode �some ���times greater than that for electron therapy� �� to produce comparable out�put� Such a high dose�rate capability is required because a �beam �attener�is used to produce a uniform treatment �eld� This �attener� which resemblesan inverted ice cream cone� is a very e�cient attenuator� thus� to get a rea�sonable treatment dose rate out of the �attener� a very high input dose rateis required� If the machine should produce a photon beam with the beam�attener not in position� a high output dose to the patient results� This isthe basic hazard of dual�mode machines� If the turntable is in the wrongposition� the beam �attener will not be in place�

In the Therac���� the computer is responsible for positioning the turntable�and for checking the turntable position� so that a target� �attening �lter�and X�ray ion chamber are directly in the beam path� With the target inplace� electron bombardment produces X�rays� The X�ray beam is shapedby the �attening �lter and measured by the X�ray ion chamber�

No accelerator beam is expected in the third or �eld light turntable posi�tion� A stainless steel mirror is placed in the beam path and a light simulatesthe beam� This lets the operator see precisely where the beam will strike thepatient and make necessary adjustments before treatment starts� There is noion chamber in place at this turntable position� since no beam is expected�

Traditionally� electromechanical interlocks have been used on these typesof equipment to ensure safety � in this case� to ensure that the turntable andattached equipment are in the correct position when treatment is started� Inthe Therac���� software checks were substituted for many of the traditionalhardware interlocks�

App� A � Medical Devices� The Therac��� �

359.2VERIFIEDVERIFIEDVERIFIEDVERIFIEDVERIFIEDVERIFIED

0.0

14.2

01

27.2

TREATMENT MODE : FIX BEAM TYPE: X

50 50UNIT RATE/MINUTEMONITOR UNITSTIME (MIN) 0.27

0ACTUAL

ENERGY (MeV): 25

PRESCRIBED200200

1.00

0

01

27.314.3359

PATIENT NAME : TEST

DATETIMEOPR ID

: 84-OCT-26: 12:55: 8: T25V02-R03

SYSTEMTREATREASON

: BEAM READY: TREAT PAUSE: OPERATOR

OP. MODE : TREAT AUTO

COMMAND: X-RAY 173777

GANTRY ROTATION (DEG)COLLIMATOR ROTATION (DEG)COLLIMATOR X (CM)COLLIMATOR Y (CM)WEDGE NUMBERACCESSORY NUMBER

Figure �� Operator interface screen layout�

The Operator Interface� The description of the operator interface hereapplies to the version of the software used during the accidents� Changesmade as a result of an FDA recall are described later�

The Therac��� operator controls the machine through a DEC VT���terminal� In the general case� the operator positions the patient on thetreatment table� manually sets the treatment �eld sizes and gantry rotation�and attaches accessories to the machine� Leaving the treatment room� theoperator returns to the console to enter the patient identi�cation� treatmentprescription �including mode or beam type� energy level� dose� dose rate�and time�� �eld sizing� gantry rotation� and accessory data� The system thencompares the manually set values with those entered at the console� If theymatch� a veri�ed message is displayed and treatment is permitted� If theydo not match� treatment is not allowed to proceed until the mismatch iscorrected� Figure � shows the screen layout�

When the system was �rst built� operators complained that it took too

App� A � Medical Devices� The Therac��� �

long to enter the treatment plan� In response� AECL modi�ed the softwarebefore the �rst unit was installed� Instead of reentering the data at thekeyboard� operators could simply use a carriage return to copy the treatmentsite data ��� A quick series of carriage returns would thus complete the dataentry� This modi�cation was to �gure in several of the accidents�

The Therac��� could shut down in two ways after it detected an errorcondition� One was a treatment suspend� which required a complete machinereset to restart� The other� not so serious� was a treatment pause� which onlyrequired a single key command to restart the machine� If a treatment pauseoccurred� the operator could press the�P key to �proceed� and resume treat�ment quickly and conveniently� The previous treatment parameters remainedin e�ect� and no reset was required� This feature could be invoked a max�imum of �ve times before the machine automatically suspended treatmentand required the operator to perform a system reset�

Error messages provided to the operator were cryptic� and some merelyconsisted of the word malfunction followed by a number from � to � denoting an analog�digital channel number� According to an FDA memo�randum written after one accident�

The operator�s manual supplied with the machine does not ex�plain nor even address the malfunction codes� The Maintancesic� Manual lists the various malfunction numbers but gives noexplanation� The materials provided give no indication that thesemalfunctions could place a patient at risk�

The program does not advise the operator if a situation existswherein the ion chambers used to monitor the patient are satu�rated� thus are beyond the measurement limits of the instrument�This software package does not appear to contain a safety systemto prevent parameters being entered and intermixed that wouldresult in excessive radiation being delivered to the patient undertreatment�

An operator involved in one of the accidents testi�ed that she had becomeinsensitive to machine malfunctions� Malfunction messages were common�place and most did not involve patient safety� Service technicians would �xthe problems or the hospital physicist would realign the machine and makeit operable again� She said�

App� A � Medical Devices� The Therac��� �

�It was not out of the ordinary for something to stop the machine� � � �It would often give a low dose rate in which you would turn themachine back on� � � �They would give messages of low dose rate�V�tilt� H�tilt� and other things� I can�t remember all the reasonsit would stop� but there was a lot of them��

A radiation therapist at another clinic reported that an average of � dose�rate malfunctions� attributed to underdoses� occurred on some days�

The operator further testi�ed that during instruction she had been taughtthat there were �so many safety mechanisms� that she understood it wasvirtually impossible to overdose a patient�

Hazard Analysis� In March ���� AECL performed a safety analysis onthe Therac���� This analysis was in the form of a fault tree and apparentlyexcluded the software� According to the �nal report� the analysis madeseveral assumptions about the computer and its software�

�� Programming errors have been reduced by extensive test�ing on a hardware simulator and under �eld conditions onteletherapy units� Any residual software errors are not in�cluded in the analysis�

�� Program software does not degrade due to wear� fatigue� orreproduction process�

� Computer execution errors are caused by faulty hardwarecomponents and by �soft� �random� errors induced by alphaparticles and electromagnetic noise�

The fault tree resulting from this analysis does appear to include com�puter failure� although apparently� judging from the basic assumptions above�it considers hardware failures only� For example� in one OR gate leading tothe event of getting the wrong energy� a box contains �Computer selectswrong energy�� and a probability of ����� is assigned to this event� For�Computer selects wrong mode�� a probability of � ���� is given� Thereport provides no justi�cation of either number�

App� A � Medical Devices� The Therac��� �

� Events

Eleven Therac���s were installed� �ve in the United States and six in Canada�Six accidents occurred between ���� and ����� when the machine was �nallyrecalled to make extensive design changes� These changes include addinghardware safeguards against software errors�

Related problems were found in the Therac��� software� but they werenot recognized until after the Therac��� accidents because the Therac���includes hardware safety interlocks� Thus� no injuries resulted�

��� Kennestone Regional Oncology Center� June ����

Details of this accident in Marietta� Georgia� are sketchy because it wasnever investigated� There was no admission that the injury was caused bythe Therac��� until long after the occurrence� despite claims by the patientthat she had been injured during treatment� the obvious and severe radia�tion burns the patient su�ered� and the suspicions of the radiation physicistinvolved�

After undergoing a lumpectomy to remove a malignant breast tumor� a���year�old woman was receiving follow�up radiation treatment to nearbylymph nodes on a Therac��� at the Kennestone facility in Marietta� TheTherac��� had been operating at Kennestone for about six months� otherTherac���s had been operating� apparently without incident� since ����

On June � ����� the patient was set up for a �� MeV electron treatmentto the clavicle area� When the machine turned on� she felt a �tremendousforce of heat� � � this red�hot sensation�� When the technician came in� shesaid� �You burned me�� The technician replied that that was impossible�Although there were no marks on the patient at the time� the treatment areafelt �warm to the touch��

It is unclear exactly when AECL learned about this incident� Tim Still�the Kennestone physicist� said that he contacted AECL to ask if the Therac��� could operate in electron mode without scanning to spread the beam�Three days later the engineers at AECL called the physicist back to explainthat improper scanning was not possible�

In an August ��� ���� letter from AECL to the FDA� the AECL qualityassurance manager said� �In March of ���� AECL received a lawsuit from thepatient involved� � �This incident was never reported to AECL prior to this

App� A � Medical Devices� The Therac��� ��

date� although some rather odd questions had been posed by Tim Still� thehospital physicist�� The physicist at a hospital in Tyler� Texas� where a lateraccident occurred� reported� �According to Tim Still� the patient �led suitin October ���� listing the hospital� manufacturer and service organizationresponsible for the machine� AECL was noti�ed informally about the suit bythe hospital� and AECL received o�cial noti�cation of a law suit in November������

Because of the lawsuit ��led November �� ������ some AECL adminis�trators must have known about the Marietta accident�although no investi�gation occurred at this time� FDA memos point to the lack of a mechanismin AECL to follow up reports of suspected accidents ��

The patient went home� but shortly afterward she developed a reddeningand swelling in the center of the treatment area� Her pain had increased tothe point that her shoulder �froze�� and she experienced spasms� She wasadmitted to a hospital in Atlanta� but her oncologists continued to send herto Kennestone for Therac��� treatments� Clinical explanation was soughtfor the reddening of the skin� which at �rst her oncologist attributed to herdisease or to normal treatment reaction�

About two weeks later� the Kennestone physicist noticed that the patienthad a matching reddening on her back as though a burn had gone rightthrough her body� and the swollen area had begun to slough o� layers ofskin� Her shoulder was immobile� and she was apparently in great pain� Itwas now obvious that she had a radiation burn� but the hospital and herdoctors could provide no satisfactory explanation�

The Kennestone physicist later estimated that the patient received oneor two doses of radiation in the ������ to ������ rad �radiation absorbeddose� range� He did not believe her injury could have been caused by lessthan ����� rads� To understand the magnitude of this� consider that typicalsingle therapeutic doses are in the ��� rad range� Doses of ����� rads canbe fatal if delivered to the whole body� in fact� ��� rads is the accepted�gure for whole�body radiation that will cause death in �� percent of thecases� The consequences of an overdose to a smaller part of the body dependon the tissue�s radio�sensitivity� The director of radiation oncology at theKennestone facility explained their confusion about the accident as due to thefact that they had never seen an overtreatment of that magnitude before ���

Eventually� the patient�s breast had to be removed because of the radia�tion burns� Her shoulder and arm were paralyzed� and she was in constant

App� A � Medical Devices� The Therac��� ��

pain� She had su�ered a serious radiation burn� but the manufacturer andoperators of the machine refused to believe that it could have been caused bythe Therac���� The treatment prescription printout feature of the computerwas disabled at the time of the accident� so there was no hardcopy of thetreatment data� The lawsuit was eventually settled out of court�

From what we can determine� the accident was not reported to the FDAuntil after further accidents in ����� The reporting requirements for med�ical device incidents at that time applied only to equipment manufacturersand importers� not users� The regulations required that manufacturers andimporters report deaths� serious injuries� or malfunctions that could resultin those consequences� but health�care professionals and institutions werenot required to report incidents to manufacturers� The comptroller gen�eral of the U�S� Government Accounting O�ce �GAO�� in testimony beforeCongress on November �� ����� expressed great concern about the viabilityof the incident�reporting regulations in preventing or spotting medical deviceproblems� According to a ���� GAO study� the FDA knew of less than �percent of deaths� serious injuries� or equipment malfunctions that occurredin hospitals ��� The law was amended in ���� to require health�care facilitiesto report incidents to the manufacturer and to the FDA�

At this point� the other Therac��� users were also unaware that anythinguntoward had occurred and did not learn about any problems with the ma�chine until after subsequent accidents� Even then� most of their informationcame through personal communication among themselves�

��� Ontario Cancer Foundation� July ����

The second in this series of accidents occurred about seven weeks after theKennestone patient was overdosed� At that time� the Therac��� at the On�tario Cancer Foundation in Hamilton� Ontario �Canada�� had been in usefor more than six months� On July ��� ����� a forty�year�old patient cameto the clinic for her twenty�fourth Therac��� treatment for carcinoma of thecervix� The operator activated the machine� but the Therac shut down after�ve seconds with an htilt error message� The Therac����s console displayread no dose and indicated a treatment pause�

Since the machine did not suspend and the control display indicated nodose was delivered to the patient� the operator went ahead with a secondattempt at treatment by pressing the �P key �the proceed command�� ex�

App� A � Medical Devices� The Therac��� ��

pecting the machine to deliver the proper dose this time� This was standardoperating procedure� and Therac��� operators had become accustomed tofrequent malfunctions that had no untoward consequences for the patient�Again� the machine shut down in the same manner� The operator repeatedthis process four times after the original attempt�the display showing no

dose delivered to the patient each time� After the �fth pause� the ma�chine went into treatment suspend� and a hospital service technician wascalled� The technician found nothing wrong with the machine� According toa Therac��� operator� this scenario also was not unusual�

After the treatment� the patient complained of a burning sensation� de�scribed as an �electric tingling shock� to the treatment area in her hip� Sixother patients were treated later that day without incident� She came backfor further treatment on July �� and complained of burning� hip pain� andexcessive swelling in the region of treatment� The patient was hospitalizedfor the condition on July �� and the machine was taken out of service�

AECL was informed of the apparent radiation injury and sent a serviceengineer to investigate� The U�S� FDA� the then Canadian Radiation Pro�tection Bureau �RPB��� and users were informed that there was a problem�although the users claim that they were never informed that a patient injuryhad occurred� Users were told that they should visually con�rm the properturntable alignment until further notice �which occurred three months later��

The patient died on November � ����� of an extremely virulent cancer�An autopsy revealed the cause of death as the cancer� but it was noted thathad she not died� a total hip replacement would have been necessary as aresult of the radiation overexposure� An AECL technician later estimatedthe patient had received between ����� and ������ rads�

����� Manufacturer�s Response

AECL could not reproduce the malfunction that had occurred� but suspecteda transient failure in the microswitch used to determine the turntable posi�tion� During the investigation of the accident� AECL hardwired the errorconditions they assumed were necessary for the malfunction and� as a result�found some turntable positioning design weaknesses and potential mechanicalproblems�

�On April �� ����� the Radiation Protection Bureau and the Bureau of Medical Deviceswere merged to form the Bureau of Radiation and Medical Devices �BRMD�

App� A � Medical Devices� The Therac��� �

The computer senses and controls turntable position by reading a �bitsignal about the status of three microswitches in the turntable switch as�sembly� Essentially� AECL determined that a ��bit error in the microswitchcodes �which could be caused by a single open�circuit fault on the switchlines� could produce an ambiguous position message to the computer� Theproblem was exacerbated by the design of the mechanism that extends aplunger to lock the turntable when it is in one of the three cardinal positions�The plunger could be extended when the turntable was way out of position�thus giving a second false position indication� AECL devised a method toindicate turntable position that tolerated a ��bit error so that the code wouldstill unambiguously reveal correct position with any one microswitch failure�

In addition� AECL altered the software so that the computer checkedfor �in transit� status of the switches to keep further track of the switchoperation and turntable position and to give additional assurance that theswitches were working and the turntable was moving�

As a result of these improvements� AECL claimed in its report and corre�spondence with hospitals that �analysis of the hazard rate of the new solutionindicates an improvement over the old system by at least � orders of mag�nitude emphasis added��� However� in its �nal incident report to the FDA�AECL concluded that they �cannot be �rm on the exact cause of the accidentbut can only suspect � � � �� which underscored their inability to determine thecause of the accident with any certainty� The AECL quality assurance man�ager testi�ed that they could not reproduce the switch malfunction and thattesting of the microswitch was �inconclusive�� The similarity of the errantbehavior and the patient injuries in this accident and a later one in Yakima�Washington� provide good reason to believe that the Hamilton overdose wasprobably related to software error rather than to a microswitch failure�

����� Government and User Response

The Hamilton accident resulted in a voluntary recall by AECL� and theFDA termed it a Class II recall� Class II means �a situation in which theuse of� or exposure to� a violative product may cause temporary or medicallyreversible adverse health consequences or where the probability of seriousadverse health consequences is remote�� The FDA audited AECL�s subse�quent modi�cations� and after the modi�cations were made� the users weretold they could return to normal operating procedures�

App� A � Medical Devices� The Therac��� �

As a result of the Hamilton accident� the head of advanced X�ray systemsin the Canadian RPB� Gordon Symonds� wrote a report that analyzed thedesign and performance characteristics of the Therac��� with respect to ra�diation safety� Besides citing the �awed microswitch� the report faulted bothhardware and software components of the Therac�s design� It concluded witha list of four modi�cations to the Therac��� necessary for compliance withCanada�s Radiation Emitting Devices �RED� Act� The RED law� enactedin ����� gives government o�cials power to ensure the safety of radiation�emitting devices�

The modi�cations speci�ed in the Symonds report included redesigningthe microswitch and changing the way the computer handled malfunctionconditions� In particular� treatment was to be terminated in the event ofa dose�rate malfunction� giving a treatment �suspend�� This change wouldhave removed the option to proceed simply by pressing the �P key� Thereport also made recommendations regarding collimator test procedures andmessage and command formats� A November �� ���� letter� signed by thedirector of the Canadian RPB� asked that AECL make changes to the Therac��� based on the Symond�s report �to be in compliance with the RED act��

Although� as noted above� AECL did make the microswitch changes�they did not comply with the directive to change the malfunction pausebehavior into treatment suspends� instead reducing the maximum number ofretries from �ve to three� According to Symonds� the de�ciencies outlinedin the RPB letter of November � were still pending when the next accidenthappened �ve months later�

Immediately after the Hamilton accident� the Ontario Cancer Foundationhired an independent consultant to investigate� He concluded in a September���� report that an independent system �beside the computer� was needed toverify the turntable position and suggested the use of a potentiometer� TheRPB wrote a letter to AECL in November ���� requesting that AECL installsuch an independent interlock on the Therac���� Also� in January �����AECL received a letter from the attorney representing the Hamilton clinic�The letter said that there had been continuing problems with the turntable�including four incidents at Hamilton� and requested the installation of anindependent system �potentiometer� to verify the turntable position� AECLdid not comply� No independent interlock was installed by AECL on theTherac���s at this time� The Hamilton Clinic� however� decided to installone themselves on their machine�

App� A � Medical Devices� The Therac��� ��

��� Yakima Valley Memorial Hospital� December ����

In this accident� as in the Kennestone overdose� machine malfunction wasnot acknowledged until after later accidents were understood�

The Therac��� at Yakima� Washington� had been modi�ed by AECL inSeptember ���� in response to the overdose at Hamilton� During December����� a woman treated with the Therac��� developed erythema �excessivereddening of the skin� in a parallel striped pattern on her right hip� Despitethis� she continued to be treated by the Therac���� as the cause of her reactionwas not determined to be abnormal until January ����� On January �� hertreatments were completed�

The sta� monitored the skin reaction closely and attempted to �nd pos�sible causes� The open slots in the blocking trays in the Therac��� couldhave produced such a striped pattern� but by the time the skin reaction wasdetermined to be abnormal� the blocking trays had been discarded� so theblocking arrangement and tray striping orientation could not be reproduced�A reaction to chemotherapy was ruled out because that should have pro�duced reactions at the other treatment sites and would not have producedstripes� When the doctors discovered that the woman slept with a heatingpad� they thought maybe the burn pattern had been caused by the parallelwires that deliver the heat in such pads� The sta� X�rayed the heating padbut discovered that the wire pattern did not correspond to the erythemapattern on the patient�s hip�

The hospital sta� sent a letter to AECL on January �� and they alsospoke on the phone with the AECL technical support supervisor� On Febru�ary � � the AECL technical support supervisor sent a written response to thedirector of radiation therapy at Yakima saying� �After careful considerationwe are of the opinion that this damage could not have been produced byany malfunction of the Therac��� or by any operator error�� The letter goeson to support this opinion by listing two pages of technical reasons why anoverdose by the Therac��� was impossible� along with the additional argu�ment that there have �apparently been no other instances of similar damageto this or other patients�� The letter ends� �In closing� I wish to advise thatthis matter has been brought to the attention of our Hazards Committee asis normal practice��

The hospital sta� eventually ascribed the patient�s skin reaction to �causeunknown�� In a report written on this �rst Yakima incident after another

App� A � Medical Devices� The Therac��� ��

Yakima overdose a year later� the medical physicist involved wrote�

At that time� we did not believe that the patient� was overdosedbecause the manufacturer had installed additional hardware andsoftware safety devices to the accelerator�

In a letter from the manufacturer dated ���Sep���� it is statedthat �Analysis of the hazard rate resulting from these modi�ca�tions indicates an improvement of at least �ve orders of magni�tude�� With such an improvement in safety ������������� we didnot believe that there could have been any accelerator malfunc�tion� These modi�cations to the accelerator were completed on����Sep����

Even with fairly sophisticated physics support� the hospital sta�� as users�did not have the ability to investigate the possibility of machine malfunctionfurther� They were not aware of any other incidents and� in fact� were toldthat there had been none� so there was no reason for them to pursue the mat�ter� No further investigation of this incident was done by the manufactureror by any government agencies �who did not know about it��

About a year later �February ������ after the second Yakima overdose ledthe hospital sta� to suspect that this �rst injury had been due to a Therac���fault� the sta� investigated and found that the �rst overdose victim had achronic skin ulcer� tissue necrosis �death� under the skin� and was in continualpain� The damage was surgically repaired� skin grafts were made� and thesymptoms relieved� The patient is alive today with minor disability andsome scarring related to the overdose� The hospital sta� concluded that thedose accidentally delivered in the �rst accident must have been much lowerthan in the second� as the reaction was signi�cantly less intense and necrosisdid not develop until six or eight months after exposure� Some other factorsrelated to the place on the body where the overdose occurred also kept herfrom having more signi�cant problems�

��� East Texas Cancer Center� March ���

More is known about the Tyler� Texas� accidents than the others becauseof the diligence of the Tyler hospital physicist� Fritz Hager� without whosee�orts the understanding of the software problems may have been delayedeven further�

App� A � Medical Devices� The Therac��� ��

The Therac��� had been at the East Texas Cancer Center �ETCC� fortwo years before the �rst serious accident� and more than ��� patients hadbeen treated� On March ��� ����� a male patient came into ETCC for hisninth treatment on the Therac���� one of a series prescribed as followup tothe removal of a tumor from his back�

This treatment was to be a �� MeV electron beam treatment of ��� radson the upper back and a little to the left of his spine� for a total of ����� radsover six and a half weeks� He was taken into the treatment room and placedface down on the treatment table� The operator then left the treatmentroom� closed the door� and sat at the control terminal�

The operator had held this job for some time� and her typing e�ciencyhad increased with experience� She could quickly enter prescription data andchange it conveniently with the Therac�s editing features� She entered thepatient�s prescription data quickly� then noticed that she had typed �x� �forX�ray� when she had intended �e� �for electron� mode� This was a commonmistake as most of the treatments involved X�rays� and she had gotten usedto typing this� The mistake was easy to �x� she merely used the �� key toedit the mode entry�

Because the other parameters she had entered were correct� she hit thereturn key several times and left their values unchanged� She reached thebottom of the screen� where it was indicated that the parameters had beenverified and the terminal displayed beam ready� as expected� She hit theone�key command��B for beam on� to begin the treatment� After a moment�the machine shut down and the console displayed the message malfunction��� The machine also displayed a treatment pause� indicating a problemof low priority� The sheet on the side of the machine explained that thismalfunction was a �dose input �� error� The ETCC did not have any otherinformation available in its instruction manual or other Therac��� documen�tation to explain the meaning of malfunction ��� An AECL technicianlater testi�ed that �dose input �� meant that a dose had been delivered thatwas either too high or too low� The messages had been expected to be usedonly during internal company development�

The machine showed a substantial underdose on its dose monitor display�� monitor units delivered whereas the operator had requested ��� monitorunits� She was accustomed to the quirks of the machine� which would fre�quently stop or delay treatment� in the past� the only consequences had beeninconvenience� She immediately took the normal action when the machine

App� A � Medical Devices� The Therac��� ��

merely paused� which was to hit the�P key to proceed with the treatment�The machine promptly shut down with the same malfunction �� errorand the same underdose shown by the dosimetry�

The operator was isolated from the patient� since the machine apparatuswas inside a shielded room of its own� The only way that the operator couldbe alerted to patient di�culty was through audio and video monitors� Onthis day� the video display was unplugged and the audio monitor was broken�

After the �rst attempt to treat him� the patient said that he felt as ifhe had received an electric shock or that someone had poured hot co�ee onhis back� He felt a thump and heat and heard a buzzing sound from theequipment� Since this was his ninth treatment� he knew that this was notnormal� He began to get up from the treatment table to go for help� Itwas at this moment that the operator hit the �P key to proceed with thetreatment� The patient said that he felt like his arm was being shocked byelectricity and that his hand was leaving his body� He went to the treatmentroom door and pounded on it� The operator was shocked and immediatelyopened the door for him� He appeared visibly shaken and upset�

The patient was immediately examined by a physician� who observed in�tense reddening of the treatment area� but suspected nothing more seriousthan electric shock� The patient was discharged and sent home with instruc�tions to return if he su�ered any further reactions� The hospital physicistwas called in� and he found the machine calibration within speci�cations�The meaning of the malfunction message was not understood� The machinewas then used to treat patients for the rest of the day�

In actuality� but unknown to anyone at that time� the patient had receiveda massive overdose� concentrated in the center of the treatment location�After�the�fact simulations of the accident revealed possible doses of ������to ������ rads in less than � second over an area of about � cm�

Over the weeks following the accident� the patient continued to have painin his neck and shoulder� He lost the function of his left arm and had periodicbouts of nausea and vomiting� He was eventually hospitalized for radiation�induced myelitis of the cervical cord causing paralysis of his left arm and bothlegs� left vocal cord paralysis �which left him unable to speak�� neurogenicbowel and bladder� and paralysis of the left diaphragm� He also had a lesionon his left lung and recurrent herpes simplex skin infections� He died fromcomplications of the overdose �ve months after the accident�

App� A � Medical Devices� The Therac��� ��

����� User and Manufacturer Response

The Therac��� was shut down for testing the day after this accident� One lo�cal AECL engineer and one from the home o�ce in Canada came to ETCC toinvestigate� They spent a day running the machine through tests� but couldnot reproduce a Malfunction � � The AECL engineer from the home o�cereportedly explained that it was not possible for the Therac��� to overdose apatient� The ETCC physicist claims that he asked AECL at this time if therewere any other reports of radiation overexposure and that AECL personnel�including the quality assurance manager� told him that AECL knew of noaccidents involving radiation overexposure by the Therac���� This seemsodd since AECL was surely at least aware of the Hamilton accident thathad occurred seven months before and the Yakima accident� and� even bytheir account� learned of the Georgia law suit around this time �which hadbeen �led four months earlier�� The AECL engineers then suggested that anelectrical problem might have caused the problem�

The electric shock theory was checked out thoroughly by an independentengineering �rm� The �nal report indicated that there was no electricalgrounding problem in the machine� and it did not appear capable of givinga patient an electrical shock� The ETCC physicist checked the calibration ofthe Therac��� and found it to be satisfactory� He put the machine back intoservice on April �� ����� convinced that it was performing properly�

��� East Texas Cancer Center� April ���

Three weeks later� on April ��� ����� another male patient was scheduled toreceive an electron treatment at ETCC for a skin cancer on the side of hisface� The prescription was for �� MeV� The same technician who had treatedthe �rst Tyler accident victim prepared this patient for treatment� Much ofwhat follows is from the operator�s deposition�

As with her former patient� she entered the prescription data and thennoticed an error in the mode� Again she used the edit �� key to changethe mode from X�ray to electron� After she �nished editing� she pressed thereturn key several times to place the cursor on the bottom of the screen�She saw the beam ready message displayed and turned the beam on�

Within a few seconds the machine shut down� making a loud noise audiblevia the �now working� intercom� The display showed malfunction ��

App� A � Medical Devices� The Therac��� ��

again� The operator rushed into the treatment room� hearing her patientmoaning for help� He began to remove the tape that had held his headin position and said something was wrong� She asked him what he felt�and he replied� ��re� on the side of his face� She immediately went to thehospital physicist and told him that another patient appeared to have beenburned� Asked by the physicist to described what had happened� the patientexplained that something had hit him on the side of the face� he saw a �ashof light� and he heard a sizzling sound reminiscent of frying eggs� He wasvery agitated and asked� �What happened to me� what happened to me��

This patient died from the overdose on May �� ����� three weeks after theaccident� He had disorientation� which progressed to coma� fever to �� �F�and neurological damage� An autopsy showed an acute high�dose radiationinjury to the right temporal lobe of the brain and the brain stem�

����� User and Manufacturer Response

After this second Tyler accident� the ETCC physicist immediately took themachine out of service and called AECL to alert them to this second apparentoverexposure� The physicist then began a careful investigation of his own�He worked with the operator� who remembered exactly what she had done onthis occasion� After a great deal of e�ort� they were eventually able to elicitthe malfunction �� message� They determined that data entry speedduring editing was the key factor in producing the error condition� If theprescription data was edited at a fast pace �as is natural for someone whohas repeated the procedure a large number of times�� the overdose occurred�It took some practice before the physicist could repeat the procedure rapidlyenough to elicit the malfunction �� message at will�

The next day� an engineer from AECL called and said that he could notreproduce the error� After the ETCC physicist explained that the procedurehad to be performed quite rapidly� AECL could �nally produce a similarmalfunction on its own machine� Two days after the accident� AECL said ithad measured the dosage �at the center of the �eld� to be ������ rads� AnAECL engineer explained that the frying sound heard by the patients wasthe ion chambers being saturated�

In one law suit that resulted from the Tyler accidents� the AECL qualitycontrol manager testi�ed that a �cursor up� problem had been found in theservice �maintenance� mode at other clinics in February or March of ���� and

App� A � Medical Devices� The Therac��� ��

also in the summer of ����� Both times� AECL thought that the softwareproblems had been �xed� There is no way to determine whether there is anyrelationship between these problems and the Tyler accidents�

����� Related Therac��� Problems

The software for both the Therac��� and Therac��� �evolved� from theTherac�� software� Additional functions had to be added because the Therac��� �and Therac���� operate in both X�ray and electron mode� while theTherac�� has only X�ray mode� CGR modi�ed the software for the Therac���to handle the dual modes� When the Therac��� development began� AECLengineers adapted the software from the Therac��� but they also borrowedsoftware routines from the Therac��� to handle electron mode� which wasallowed under their cooperative agreements�

After the second Tyler� Texas� accident� a physicist at the Universityof Chicago Joint Center for Radiation Therapy heard about the Therac��� software problem and decided to �nd out whether the same thing couldhappen with the Therac���� At �rst� the physicist was unable to reproducethe error on his machine� but two months later he found the link�

The Therac��� at the University of Chicago is used to teach students ina radiation therapy school conducted by the center� The center�s physicist�Frank Borger� noticed that whenever a new class of students started usingthe Therac���� fuses and breakers on the machine tripped� shutting downthe unit� These failures� which had been occurring ever since the school hadacquired the machine� might happen three times a week while new studentsoperated the machine and then disappear for months� Borger determinedthat new students make many di�erent types of mistakes and use �creativemethods� of editing parameters on the console� Through experimentation�he found that certain editing sequences correlated with blown fuses and de�termined that the same computer bug �as in the Therac��� software� wasresponsible� The physicist noti�ed the FDA� which noti�ed Therac��� users��

The software error is just a nuisance on the Therac��� because this ma�chine has independent hardware protective circuits for monitoring the elec�tron beam scanning� The protective circuits do not allow the beam to turnon� so there is no danger of radiation exposure to a patient� While theTherac��� relies on mechanical interlocks for monitoring the machine� the

App� A � Medical Devices� The Therac��� ��

Therac��� relies largely on software�

����� The Software Bug

A lesson to be learned from the Therac��� story is that focusing on partic�ular software �bugs� is not the way to make a safe system� Virtually allcomplex software can be made to behave in an unexpected fashion undersome conditions� The basic mistakes here involved poor software engineeringpractices and building a machine that relies on the software for safe oper�ation� Furthermore� the particular coding error is not as important as thegeneral unsafe design of the software overall� Examining the part of the codeblamed for the Tyler accidents is instructive� however� in demonstrating theoverall software design �aws� First the software design is described and thenthe errors believed to be involved in the Tyler accidents and perhaps others�

Therac��� Software Development and Design� AECL claims propri�etary rights to its software design� However� from voluminous documentationregarding the accidents� the repairs� and the eventual design changes� we canbuild a rough picture of it�

The software is responsible for monitoring the machine status� acceptinginput about the treatment desired� and setting the machine up for this treat�ment� It turns the beam on in response to an operator command �assumingthat certain operational checks on the status of the physical machine aresatis�ed� and also turns the beam o� when treatment is completed� when anoperator commands it� or when a malfunction is detected� The operator canprint out hardcopy versions of the CRT display or machine setup parameters�

The treatment unit has an interlock system designed to remove powerto the unit when there is a hardware malfunction� The computer monitorsthis interlock system and provides diagnostic messages� Depending on thefault� the computer either prevents a treatment from being started or� if thetreatment is in progress� creates a pause or a suspension of the treatment�

There are two basic operational modes� treatment mode and servicemode� Treatment mode controls the normal treatment process� In servicemode� the unit can be operated with some of the operational and treatmentinterlocks bypassed� and additional operational commands and characteris�tics may be selected� Service mode is entered only through the use of apassword at the service keyboard�

App� A � Medical Devices� The Therac��� �

The manufacturer describes the Therac��� software as having a stand�alone� real�time treatment operating system� The system does not use astandard operating system or executive� Rather� the real�time executivewas written especially for the Therac��� and runs on a �K PDP������Cycles are allocated to the critical and noncritical tasks using a preemptivescheduler�

The software� written in PDP��� assembly language� has four major com�ponents� stored data� a scheduler� a set of critical and noncritical tasks� andinterrupt services� The stored data includes calibration parameters for theaccelerator setup as well as patient�treatment data� The interrupt routinesinclude

� A clock interrupt service routine� A scanning interrupt service routine� Traps �for software over�ow and computer hardware generated inter�

rupts�� Power up �initiated at power up to initialize the system and pass control

to the scheduler�� Treatment console screen interrupt handler� Treatment console keyboard interrupt handler� Service printer interrupt handler� Service keyboard interrupt handler

The scheduler controls the sequencing of all noninterrupt events and co�ordinates all concurrent processes� Tasks are initiated every ��� second� withthe critical tasks executed �rst and the noncritical tasks executed in anyremaining cycle time� Critical tasks include the following�

� The treatment monitor �Treat� directs and monitors patient setup andtreatment via eight operating phases� These are called as subroutines�depending on the value of the Tphase control variable� Following theexecution of a particular subroutine� Treat reschedules itself� Treatinteracts with the keyboard processing task� which handles operatorconsole communication� The prescription data is cross�checked andveri�ed by other tasks �such as keyboard processor or parameter setupsensor� that inform the treatment task of the veri�cation status viashared variables�

App� A � Medical Devices� The Therac��� �

� The servo task controls gun emission� dose rate �pulse repetition fre�quency�� symmetry �beam steering�� and machine motions� The servotask also sets up the machine parameters and monitors the beam�tilt�error and the �atness�error interlocks�

� The housekeeper task takes care of system status interlocks and limitchecks and displays appropriate messages on the CRT display� It de�codes some information and checks the setup veri�cation�

Noncritical tasks include

� Checksum processor �scheduled to run periodically�� Treatment console keyboard processor �scheduled to run only if it is

called by other tasks or by keyboard interrupts�� This task acts as thecommunication interface between the other software and the operator�

� Treatment console screen processor �run periodically�� This task laysout appropriate record formats for either CRT displays or hard copies�

� Service keyboard processor �run on demand�� This task arbitrates non�treatment�related communication between the therapy system and theoperator�

� Snapshot �run periodically by the scheduler�� Snapshot captures pres�elected parameter values and is called by the treatment task at the endof a treatment�

� Hand control processor �run periodically��� Calibration processor� This task is responsible for a package of tasks

that let the operator examine and change system setup parameters andinterlock limits�

It is clear from the AECL documentation on the modi�cations that thesoftware allows concurrent access to shared memory� that there is no realsynchronization aside from data that are stored in shared variables� andthat the �test� and �set� for such variables are not indivisible operations�Race conditions resulting from this implementation of multitasking playedan important part in the accidents�

Speci�c Design Errors� The following explanation of the speci�c softwareproblems found at this time is taken from the description AECL provided tothe FDA� but clari�ed somewhat� The description leaves some unansweredquestions� but it is the best that can be done with the information available�

App� A � Medical Devices� The Therac��� ��

Mode/Energy Offset (MEOS)

Handler

Date, Time, ID Changes

Reset

Set Up Test

Datent

Pause Treatment

Terminate Treatment

Hand

collimator)(Set upper

Set Up Done

Keyboard

Patient Treatment

Data EntryComplete

Offset parameters Mode/energy5

0

1

2

3

4

6

7

Tphase

Treat

controlvariable

Calibrationtables

Figure � Tasks and subroutines in the code blamed for the Tyler accidents�

The treatment monitor task �Treat� controls the various phases of treat�ment by executing its eight subroutines� The treatment phase indicator vari�able �Tphase� is used to determine which subroutine should be executed �Fig�ure �� Following the execution of a particular subroutine� Treat reschedulesitself�

One of Treat�s subroutines� called Datent �data entry�� communicateswith the keyboard handler task �a task that runs concurrently with Treat�via a shared variable �Data Entry Complete �ag� to determine whether theprescription data has been entered� The keyboard handler recognizes thecompletion of data entry and changes the Data Entry Complete variable todenote this� Once this variable is set� the Datent subroutine detects thevariable�s change in status and changes the value of Tphase from � �Datent�to �Set Up Test�� In this case� the Datent subroutine exits back to theTreat subroutine� which will reschedule itself and begin execution of theSet Up Test subroutine� If the Data Entry Complete variable has not beenset� Datent leaves the value of Tphase unchanged and exits back to Treat�s

App� A � Medical Devices� The Therac��� ��

mainline� Treat will then reschedule itself� essentially rescheduling the Datentsubroutine�

The command line at the lower right�hand corner of the screen �see Fig�ure �� is the cursor�s normal position when the operator has completed allthe necessary changes to the prescription� Prescription editing is signi�ed bymoving the cursor o� the command line� As the program was originally de�signed� the Data Entry Complete variable by itself is not su�cient becauseit does not ensure that the cursor is located on the command line� underthe right circumstances� the data entry phase can be exited before all editchanges are made on the screen�

The keyboard handler parses the mode and energy level speci�ed by theoperator and places an encoded result in another shared variable� the ��byteMode�Energy O�set variable �MEOS�� The low�order byte of this variableis used by another task �Hand� to set the collimator�turntable to the properposition for the selected mode and energy� The high�order byte of the MEOSvariable is used by Datent to set several operating parameters�

Initially� the data�entry process forces the operator to enter the modeand energy except when the photon mode is selected� in which case theenergy defaults to �� MeV� The operator can later edit the mode and energyseparately� If the keyboard handler sets the Data Entry Complete �ag beforethe operator changes the data in MEOS� Datent will not detect the changesbecause it has already exited and will not be reentered again� The uppercollimator �turntable�� on the other hand� is set to the position dictated bythe low�order byte of MEOS by another concurrently running task �Hand�and can therefore be inconsistent with the parameters set in accordance withthe information in the high�order byte� The software appears to contain nochecks to detect such an incompatibility�

The �rst thing Datent does when it is entered is to check whether thekeyboard handler has set the mode and energy in MEOS� If so� it uses thehigh�order byte to index into a table of preset operating parameters andplaces them in the digital�to�analog output table� The contents of this outputtable are transferred to the digital�to�analog converter during the next clockcycle� Once the parameters are all set� Datent calls the subroutine Magnet�which sets the bending magnets� The following shows a simpli�ed pseudocodedescription of relevant parts of the software�

Datent�

App� A � Medical Devices� The Therac��� ��

if mode�energy speci�ed thenbegin

calculate table indexrepeat

fetch parameteroutput parameterpoint to next parameter

until all parameters setcall Magnetif mode�energy changed then return

end

if data entry is complete then set Tphase to if data entry is not complete then

if reset command entered then set Tphase to �return

Magnet�Set bending magnet �agrepeat

Set next magnetcall Ptimeif mode�energy has changed� then exit

until all magnets are setreturn

Ptime�repeat

if bending magnet �ag is set thenif editing taking place then

if mode�energy has changed then exituntil hysteresis delay has expiredClear bending magnet �agreturn

Setting the bending magnets takes about eight seconds� Magnet calls asubroutine called Ptime to introduce a time delay� Since several magnets need

App� A � Medical Devices� The Therac��� ��

to be set� Ptime is entered and exited several times� A �ag to indicate thatthe bending magnets are being set is initialized upon entry to the Magnetsubroutine and cleared at the end of Ptime� Furthermore� Ptime checks ashared variable� set by the keyboard handler� that indicates the presenceof any editing requests� If there are edits� then Ptime clears the bendingmagnet variable and exits to Magnet� which then exits to Datent� But theedit change variable is checked by Ptime only if the bending magnet �ag isset� Because Ptime clears it during its �rst execution� any edits performedduring each succeeding pass through Ptime will not be recognized� Thus�an edit change of the mode or energy� although re�ected on the operator�sscreen and the mode�energy o�set variable� will not be sensed by Datent soit can index the appropriate calibration tables for the machine parameters�

Recall that the Tyler error occurred when the operator made an entryindicating the mode and energy� went to the command line� then moved thecursor up to change the mode or energy and returned to the command line allwithin eight seconds� Because the magnet setting takes about eight secondsand Magnet does not recognize edits after the �rst execution of Ptime� theediting had been completed by the return to Datent� which never detectedthat it had occurred� Part of the problem was �xed after the accident byclearing the bending magnet variable at the end of Magnet �after all themagnets have been set� instead of at the end of Ptime�

But this is not the only problem� Upon exit from the Magnet subroutine�the data entry subroutine �Datent� checks the Data Entry Complete variable�If it indicates that data entry is complete� Datent sets Tphase to andDatent is not entered again� If it is not set� Datent leaves Tphase unchanged�which means it will eventually be rescheduled� But the Data Entry Completevariable only indicates that the cursor has been down to the command line�not that it is still there� A potential race condition is set up� To �x this�AECL introduced another shared variable controlled by the keyboard handlertask that indicates the cursor is not positioned on the command line� If thisvariable is set� then prescription entry is still in progress and the value ofTphase is left unchanged�

����� The Government and User Response

The FDA does not approve each new medical device on the market� Allmedical devices go through a classi�cation process that determines the level

App� A � Medical Devices� The Therac��� ��

of FDA approval necessary� Medical accelerators follow a procedure calledpre�market noti�cation before commercial distribution� In this process� the�rm must establish that the product is substantially equivalent in safety ande�ectiveness to a product already on the market� If that cannot be done tothe FDA�s satisfaction� a pre�market approval is required� For the Therac���� the FDA required only a pre�market noti�cation� After the Therac���accidents� new procedures for approval of software�controlled devices wereadopted�

The agency is basically reactive to problems and requires manufacturersto report serious ones� Once a problem is identi�ed in a radiation�emittingproduct� the FDA is responsible for approving the corrective action plan�CAP��

The �rst reports of the Tyler incidents came to the FDA from the Stateof Texas Health Department� and this triggered FDA action� The FDAinvestigation was well under way when AECL produced a medical devicereport to discuss the details of the radiation overexposures at Tyler� TheFDA declared the Therac��� defective under the Radiation Control for Healthand Safety Act and ordered the �rm to notify all purchasers� investigate theproblem� determine a solution� and submit a corrective action plan for FDAapproval�

The �nal CAP consisted of more than twenty changes to the systemhardware and software� plus modi�cations to the system documentation andmanuals� Some of these changes were unrelated to the speci�c accidents�but were improvements to the general safety of the machine� The full CAPimplementation� including an extensive safety analysis� was not completeuntil more than two years after the Tyler accidents�

AECL made their accident report to the FDA on April ��� ����� Onthat same date� AECL sent out a letter to each Therac user recommending atemporary ��x� to the machine that would allow continued clinical use� Theletter �shown in its complete form� stated�

SUBJECT� CHANGE IN OPERATING PROCEDURES FORTHE THERAC �� LINEAR ACCELERATOR

E�ective immediately� and until further notice� the key usedfor moving the cursor back through the prescription sequence�i�e�� cursor �UP� inscribed with an upward pointing arrow� mustnot be used for editing or any other purpose�

App� A � Medical Devices� The Therac��� �

To avoid accidental use of this key� the key cap must be re�moved and the switch contacts �xed in the open position withelectrical tape or other insulating material� For assistance withthe latter you should contact your local AECL service represen�tative�

Disabling this key means that if any prescription data enteredis incorrect then a �R� reset command must be used and the wholeprescription reentered�

For those users of the Multiport option it also means thatediting of dose rate� dose and time will not be possible betweenports�

On May �� ����� the FDA declared the Therac defective� demanded aCAP� and required renoti�cation of all the Therac customers� In the letterfrom the FDA to AECL� the Director of Compliance� Center for Devices andRadiological Health� wrote�

We have reviewed AECL�s� April �� letter to purchasers and haveconcluded that it does not satisfy the requirements for noti�cationto purchasers of a defect in an electronic product� Speci�cally�it does not describe the defect nor the hazards associated withit� The letter does not provide any reason for disabling the cur�sor key and the tone is not commensurate with the urgency fordoing so� In fact� the letter implies the inconvenience to opera�tors outweighs the need to disable the key� We request that youimmediately renotify purchasers�

AECL promptly made a new notice to users and also requested an exten�sion to produce a CAP� The FDA granted this request�

About this time� the Therac��� users created a user�s group and held their�rst meeting at the annual conference of the American Association of Physi�cists in Medicine� At the meeting� users discussed the Tyler accident andheard an AECL representative present the company�s plans for respondingto it� AECL promised to send a letter to all users detailing the CAP�

Several users described additional hardware safety features that they hadadded to their own machines to provide additional protection� An interlock�that checked gun current values�� which the Vancouver clinic had previouslyadded to their Therac���� was labeled as redundant by AECL� the users

App� A � Medical Devices� The Therac��� �

disagreed� There were further discussions of poor design and other problemsthat caused a ��� to ��percent underdosing in both modes�

The meeting notes said

There was a general complaint by all users present about the lackof information propagation� The users were not happy about re�ceiving incomplete information� The AECL representative coun�tered by stating that AECL does not wish to spread rumors andthat AECL has no policy to �keep things quiet�� The consensusamong the users was that an improvement was necessary�

After the �rst user�s group meeting� there were two user�s group newslet�ters� The �rst� dated fall ����� contained letters from Tim Still� the Kenne�stone physicist� who complained about what he considered to be eight majorproblems he had experienced with the Therac���� These problems includedpoor screen�refresh subroutines that leave trash and erroneous informationon the operator console and some tape�loading problems upon startup thathe discovered involved the use of �phantom tables� to trigger the interlocksystem in the event of a load failure instead of using a checksum� He askedthe question� �Is programming safety relying too much on the software in�terlock routines�� The second user�s group newsletter� in December �����further discussed the implications of the phantom table problem�

AECL produced its �rst CAP on June �� ����� The FDA asked forchanges and additional information about the software� including a softwaretest plan� AECL responded on September �� with several documents de�scribing the software and its modi�cations but no test plan� They explainedhow the Therac��� software evolved from the Therac�� software and statedthat �no single test plan and report exists for the software since both hard�ware and software were tested and exercised separately and together overmany years�� AECL concluded that the current CAP improved �machinesafety by many orders of magnitude and virtually eliminates the possibilityof lethal doses as delivered in the Tyler incident��

An FDA internal memo dated October �� commented on these AECLsubmissions� raising several concerns�

Unfortunately� the AECL response also seems to point out anapparent lack of documentation on software speci�cations and asoftware test plan�

App� A � Medical Devices� The Therac��� �

� � � concerns include the question of previous knowledge ofproblems by AECL� the apparent paucity of software quality as�surance at the manufacturing facility� and possible warnings andinformation dissemination to others of the generic type problems�

� � �As mentioned in my �rst review� there is some confusion onwhether the manufacturer should have been aware of the softwareproblems prior to the ARO�s Accidental Radiation Overdoses� inTexas� AECL had received o�cial noti�cation of a law suit inNovember ���� from a patient claiming accidental over�exposurefrom a Therac��� in Marietta� Georgia�� � � If knowledge of thesesoftware de�ciencies were known beforehand� what would be theFDA�s posture in this case�

� � �The materials submitted by the manufacturer have notbeen in su�cient detail and clarity to ensure an adequate soft�ware quality assurance program currently exists� For example� aresponse has not been provided with respect to the software partof the CAP to the CDRH�s FDA Center for Devices and Radio�logical Health� request for documentation on the revised require�ments and speci�cations for the new software� In addition� ananalysis has not been provided� as requested� on the interactionwith other portions of the software to demonstrate the correctedsoftware does not adversely a�ect other software functions�

The July � letter from the CDRH requested a documentedtest plan including several speci�c pieces of information identi�edin the letter� This request has been ignored up to this point bythe manufacturer� Considering the rami�cations of the currentsoftware problem� changes in software QA attitudes are neededat AECL�

AECL also planned to retain the malfunction codes� but the FDA requiredbetter warnings for the operators� Furthermore� AECL had not planned onany quality assurance testing to ensure exact copying of software� but theFDA insisted on it� The FDA further requested assurances that rigoroustesting would become a standard part of AECL�s software modi�cation pro�cedures�

We also expressed our concern that you did not intend to performthe protocol to future modi�cations to software� We believe that

App� A � Medical Devices� The Therac���

the rigorous testing must be performed each time a modi�cationis made in order to ensure the modi�cation does not adverselya�ect the safety of the system�

AECL was also asked to draw up an installation test plan to ensure thatboth hardware and software changes perform as designed when installed�

AECL submitted CAP Revision � and supporting documentation on De�cember ��� ����� They changed the CAP to have dose malfunctions suspendtreatment and included a plan for meaningful error messages and highlighteddose error messages� They also expanded their diagrams of software modi��cations and expanded their test plan to cover hardware and software�

�� Yakima Valley Memorial Hospital� January ���

On Saturday� January ��� ����� the second patient of the day was to betreated for a carcinoma� This patient was to receive two �lm veri�cationexposures of and rads plus a ���rad photon treatment �for a total exposureof �� rads��

Film was placed under the patient and rads were administered� Afterthe machine paused to open the collimator jaws further� the second exposureof rads was administered� The machine paused again�

The operator entered the treatment room to remove the �lm and verifythe patient�s precise position� He used the hand control in the treatmentroom to rotate the turntable to the �eld light position� which allowed himto check the alignment of the machine with respect to the patient�s body inorder to verify proper beam position� He then either pressed the set buttonon the hand control or left the room and typed a set command at the consoleto return the turntable to the proper position for treatment� there is someconfusion as to exactly what transpired� When he left the room� he forgot toremove the �lm from underneath the patient� The console displayed �beamready�� and the operator hit the�B key to turn the beam on�

The beam came on� but the console displayed no dose or dose rate� After�ve or six seconds� the unit shut down with a pause and displayed a message�The message �may have disappeared quickly�� the operator was unclear onthis point� However� since the machine merely paused� he was able to pushthe�P key to proceed with treatment�

The machine paused again� this time displaying flatness on the reason

App� A � Medical Devices� The Therac���

line� The operator heard the patient say something over the intercom� butcould not understand him� He went into the room to speak with the pa�tient� who reported �feeling a burning sensation� in the chest� The consoledisplayed only the total dose of the two �lm exposures �� rads� and nothingmore�

Later in the day� the patient developed a skin burn over the entire treat�ment area� Four days later� the redness developed a striped pattern matchingthe slots in the blocking tray� The striped pattern was similar to the burn ayear earlier at this same hospital� which had �rst been ascribed to a heatingpad and later o�cially labeled by the hospital as �cause unknown��

AECL began an investigation� and users were told to con�rm the turntableposition visually before turning on the beam� All tests run by the AECLengineers indicated that the machine was working perfectly� From the in�formation that had been gathered to that point� it was suspected that theelectron beam had come on when the turntable was in the �eld light position�But the investigators could not reproduce the fault condition�

On the following Thursday� AECL sent in an engineer from Ottawa toinvestigate� The hospital physicist had� in the meantime� run some testshimself� He placed a �lm in the Therac�s beam and then ran two exposuresof X�ray parameters with the turntable in �eld light position� The �lmappeared to match the �lm that was left �by mistake� under the patientduring the accident�

After a week of checking the hardware� AECL determined that the �incor�rect machine operation was probably not caused by hardware alone�� Afterchecking the software� AECL engineers discovered a �aw �described below�that could explain the erroneous behavior� The coding problems explainingthis accident are completely di�erent from those associated with the Tyleraccidents�

Preliminary dose measurements by AECL indicated that the dose deliv�ered under these conditions�that is� when the turntable is in the �eld lightposition�is on the order of ���� to ����� rads� After two attempts� thepatient could have received ����� to ������ instead of the �� rads prescribed�AECL again called users on January �� �nine days after the accident� andgave them detailed instructions on how to avoid this problem� In an FDAinternal report on the accident� the AECL quality assurance manager in�vestigating the problem is quoted as saying that the software and hardwarechanges to be retro�tted following the Tyler accident nine months earlier

App� A � Medical Devices� The Therac��� �

�but which had not yet been installed� would have prevented the Yakimaaccident�

The patient died in April from complications related to the overdose� Hehad a terminal form of cancer� but a lawsuit was initiated by his survivorsalleging that he died sooner than he would have and endured unnecessarypain and su�ering due to the radiation overdose� The suit� like all the others�was settled out of court�

����� The Yakima Software Bug

The software problem for the second Yakima accident is fairly well�establishedand di�erent from that implicated in the Tyler accidents� There is no way todetermine what particular software design errors were related to the Kenne�stone� Hamilton� and �rst Yakima accidents� Given the unsafe programmingpractices exhibited in the code� unknown race conditions or errors could havebeen responsible for them� There is speculation� however� that the Hamil�ton accident was the same as this second Yakima overdose� In a report ofa conference call on January ��� ����� between the AECL quality assurancemanager and Ed Miller of the FDA discussing the Yakima accident� Millernotes

This situation probably occurred in the Hamilton� Ontario acci�dent a couple of years ago� It was not discovered at that timeand the cause was attributed to intermittent interlock failure�The subsequent recall of the multiple microswitch logic networkdid not really solve the problem�

The second Yakima accident was again attributed to a type of race condi�tion in the software � this one allowed the device to be activated in an errorsetting �a �failure� of a software interlock�� The Tyler accidents were relatedto problems in the data�entry routines that allowed the code to proceed toSet Up Test before the full prescription had been entered and acted upon�The Yakima accident involved problems encountered later in the logic afterthe treatment monitor Treat reaches Set Up Test�

The Therac����s �eld light feature allows very precise positioning of thepatient for treatment� The operator can control the machine right at thetreatment site using a small hand control that o�ers certain limited functionsfor patient setup� including setting gantry� collimator� and table motions�

App� A � Medical Devices� The Therac��� �

Set Up Test

During SetIncrement Class 3 on each cycle

Check F$mal

If F$mal=0 system is consistentthen set Tphase=2 for Set Up Done

Hkeper

Lmtchk

If Class3=0

If Class3 is not 0

Chkcol

If upper collimatorinconsistent with treatmentthen set bit 9 of F$mal

Treat

Tphase

Then do not enter Chkcol

Then enter Chkcol

0 1

3

2

F$mal

Class3

Figure � The Yakima software �aw�

Normally� the operator enters all the prescription data at the console�outside the treatment room� before the �nal setup of all machine parame�ters is completed in the treatment room� This gives rise to an unverified

condition at the console� The operator then completes patient setup in thetreatment room� and all relevant parameters now verify� The console dis�plays a message to press set button while the turntable is in the �eldlight position� The operator now presses the set button on the hand controlor types �set� at the console� That should set the collimator to the properposition for treatment�

In the software� after the prescription is entered and veri�ed by the Da�tent routine� the control variable Tphase is changed so that the Set Up Testroutine is entered �Figure �� Every pass through the Set Up Test rou�

App� A � Medical Devices� The Therac��� �

tine increments the upper collimator position check� a shared variable calledClass� If Class is nonzero� there is an inconsistency and treatment shouldnot proceed� A zero value for Class indicates that the relevant parametersare consistent with treatment� and the software does not inhibit the beam�

After setting the Class variable� Set Up Test next checks for any mal�functions in the system by checking another shared variable �set by a routinethat actually handles the interlock checking� called F mal to see if it has anonzero value� A nonzero value in F mal indicates that the machine is notready for treatment� and the Set Up Test subroutine is rescheduled� WhenF mal is zero �indicating that everything is ready for treatment�� the SetUp Test subroutine sets the Tphase variable equal to �� which results innext scheduling the Set Up Done subroutine and the treatment is allowed tocontinue�

The actual interlock checking is performed by a concurrent Housekeepertask �Hkeper�� The upper collimator position check is performed by a subrou�tine of Hkeper called Lmtchk �analog�to�digital limit checking�� Lmtchk �rstchecks the Class variable� If Class contains a non�zero value� Lmtchk callsthe Check Collimator �Chkcol� subroutine� If Class contains zero� Chkcolis bypassed and the upper collimator position check is not performed� TheChkcol subroutine sets or resets bit � of the F mal shared variable� depend�ing on the position of the upper collimator�which in turn is checked by theSet Up Test subroutine of Treat to decide whether to reschedule itself or toproceed to Set Up Done�

During machine setup� Set Up Test will be executed several hundred timesbecause it reschedules itself waiting for other events to occur� In the code�the Class variable is incremented by one in each pass through Set Up Test�Since the Class variable is one byte� it can only contain a maximum valueof ��� decimal� Thus� on every ���th pass through the Set Up Test code�the variable will over�ow and have a zero value� That means that on every���th pass through Set Up Test� the upper collimator will not be checkedand an upper collimator fault will not be detected�

The overexposure occurred when the operator hit the �set� button atthe precise moment that Class rolled over to zero� Thus� Chkcol was notexecuted and F mal was not set to indicate that the upper collimator wasstill in the �eld�light position� The software turned on the full �� MeVwithout the target in place and without scanning� A highly concentratedelectron beam resulted� which was scattered and de�ected by the stainless

App� A � Medical Devices� The Therac��� �

steel mirror that was in the path�The technical ��x� implemented for this particular software �aw is de�

scribed by AECL as simple� the program is changed so that the Classvariable is set to some �xed nonzero value each time through Set Up Testinstead of being incremented�

����� Manufacturer Government and User Response

On February � ����� after interaction with the FDA and others� includingthe user�s group� AECL announced to its customers

�� A new software release to correct both the Tyler and Yakima softwareproblems

�� A hardware single�pulse shutdown circuit� A turntable potentiometer to independently monitor turntable position � A hardware turntable interlock circuit

The second item� a hardware single�pulse shutdown circuit� essentiallyacts as a hardware interlock to prevent overdosing by detecting an unsafelevel of radiation and halting beam output after one pulse of high energy andcurrent� This interlock e�ectively provides an independent way to protectagainst a wide range of potential hardware failures and software errors� Thethird item� a turntable potentiometer� was the safety device recommendedby several groups after the Hamilton accident�

After the second Yakima accident� the FDA became concerned that theuse of the Therac��� during the CAP process� even with AECL�s interim op�erating instructions� involved too much risk to patients� The FDA concludedthat the accidents demonstrated that the software alone could not be reliedupon to assure safe operation of the machine� In a February ��� ����� inter�nal FDA memorandum� the Director of the Division of Radiological Productswrote�

It is impossible for CDRH to �nd all potential failure modes andconditions of the software� AECL has indicated the �simple soft�ware �x� will correct the turntable position problem displayed atYakima� We have not yet had the opportunity to evaluate thatmodi�cation� Even if it does� based upon past history� I am not

App� A � Medical Devices� The Therac��� �

convinced that there are not other software glitches that couldresult in serious injury�

� � �We are in the position of saying that the proposed CAP canreasonably be expected to correct the de�ciencies for which theywere developed �Tyler�� We cannot say that we are reasonablesic� con�dent about the safety of the entire system to prevent orminimize exposure from other fault conditions�

On February �� ����� Ed Miller of the FDA called Pavel Dvorak ofCanada�s Health and Welfare to advise him that the FDA would recom�mend that all Therac���s be shutdown until permanent modi�cations couldbe made� According to Miller�s notes on the phone call� Dvorak agreed andindicated that Health and Welfare would coordinate their actions with theFDA�

AECL responded on April � with an update on the Therac CAP statusand a schedule of the nine action items pressed by the users at a user�s groupmeeting in March� This unique and highly productive meeting provided anunusual opportunity to involve the users in the CAP evaluation process� Itbrought together all concerned parties in one place and at one time so that acourse of action could be decided upon and approved as quickly as possible�The attendees included representatives from

� The manufacturer �AECL�� All users� including their technical and legal sta�s� The FDA and the Canadian Bureau of Radiation and Medical Devices� the Canadian Atomic Energy Control Board� the Province of Ontario� the Radiation Regulations Committee of the Canadian Association of

Physicists

According to Gordon Symonds� from the Canadian BRMD� this meeting wasvery important to the resolution of the problems� since the regulators� users�and manufacturer arrived at a consensus in one day�

At this second user�s meeting� the participants carefully reviewed all thesix known major Therac��� accidents to that date and discussed the elementsof the CAP along with possible additional modi�cations� They came upwith a prioritized list of modi�cations they wanted included in the CAP and

App� A � Medical Devices� The Therac��� �

expressed concerns about the lack of independent evaluation of the softwareand the lack of a hardcopy audit trail to assist in diagnosing faults�

The AECL representative� who was the quality assurance manager� re�sponded that tests had been done on the CAP changes� but that the testswere not documented and that independent evaluation of the software �mightnot be possible�� He claimed that two outside experts had reviewed the soft�ware� but he could not provide their names� In response to user requests fora hard copy audit trail and access to source code� he explained that memorylimitations would not permit including such options and that source codewould not be made available to users�

On May �� AECL issued CAP Revision as a result of the FDA commentsand the user�s meeting input� The FDA response on May �� approved theCAP subject to submission of the �nal test plan results and an independentsafety analysis� distribution of the draft revised manual to customers� andcompletion of the CAP by June �� ����� The FDA concluded by rating thisa Class I recall� a recall in which there is a reasonable probability that theuse of� or exposure to� a violative product will cause serious adverse healthconsequences or death ���

AECL sent more supporting documentation to the FDA on June �� �����including the CAP test plan� a draft operator�s manual� and the draft ofthe new safety analysis� This time the analysis included the software in thefault trees but used a �generic failure rate� of ���� for software events� Thisnumber was justi�ed as being based on the historical performance of theTherac��� software� The �nal report on the safety analysis states that manyof the fault trees had a computer malfunction as a causative event� and theoutcome for quanti�cation was therefore dependent on the failure rate chosenfor the software� Assuming that all software errors are equally likely seemsrather strange�

A close inspection of the code was also conducted during this safety anal�ysis to �obtain more information on which to base decisions�� An outsideconsultant performed the inspection� which included a detailed examinationof the implementation of each function� a search for coding errors� and aqualitative assessment of the software�s reliability� No information is pro�vided in the �nal safety report about whether any particular methodologyor tools were used in the software inspection or whether someone just readthe code looking for errors�

App� A � Medical Devices� The Therac��� �

AECL planned a �fth revision of the CAP to include the testing and �nalsafety analysis results� Referring to the test plan at this� the �nal stage ofthe CAP process� an FDA reviewer said�

Amazingly� the test data presented to show that the softwarechanges to handle the edit problems in the Therac��� are appro�priate prove the exact opposite result� A review of the data tablein the test results indicates that the �nal beam type and energy�edit change� has no e�ect on the initial beam type and energy�I can only assume that either the �x is not right or the data wasentered incorrectly� The manufacturer should be admonished forthis error� Where is the QC Quality Control� review for the testprogram� AECL must� ��� clarify this situation� ��� change thetest protocol to prevent this type of error from occurring� and ��set up appropriate QC control on data review�

A further FDA memo indicated�

The AECL quality assurance manager� could not give an expla�nation and will check into the circumstances� He subsequentlycalled back and veri�ed that the technician completed the formincorrectly� Correct operation was witnessed by himself and oth�ers� They will repeat and send us the correct data sheet�

At the American Association of Physicists in Medicine meeting in July����� a third user�s meeting was held� The AECL representative describedthe status of the latest CAP and explained that the FDA had given verbalapproval and that he expected full implementation by the end of August �����He went on to review and comment on the prioritized concerns of the lastmeeting� Three of the user�requested hardware changes had been includedin the CAP� Changes to tape load error messages and checksums on the loaddata would wait until after the CAP was done� Software documentation wasdescribed as a lower priority task that needed de�nition and would not beavailable to the FDA in any form for over a year�

On July �� ����� AECL sent a letter to all users to update them onthe FDA�s verbal approval of the CAP and to delineate how AECL wouldproceed� Finally� on July ��� ����� AECL issued the �nal and �fth CAPrevision� The major features of the �nal CAP are these�

App� A � Medical Devices� The Therac��� �

� All interruptions related to the dosimetry system will go to a treatmentsuspend� not a treatment pause� Operators will not be allowed torestart the machine without reentering all parameters�

� A software single�pulse shutdown will be added�� An independent hardware single�pulse shutdown will be added�� Monitoring logic for turntable position will be improved to ensure that

the turntable is in one of the three legal positions�� A potentiometer will be added to the turntable� The output is used to

monitor exact turntable location and provide a visible position signalto the operator�

� Interlocking with the ����degree bending magnet will be added to en�sure that the target and beam �attener are in position if the X�raymode is selected�

� Beam�on will be prevented if the turntable is in the �eld light or anyintermediate position�

� Cryptic malfunction messages will be replaced with meaningful mes�sages and highlighted dose�rate messages�

� Editing keys will be limited to cursor up� backspace� and return� Allother keys will be inoperative�

� A motion�enable footswitch �a type of deadman switch� will be added�The operator will be required to hold this switch closed during move�ment of certain parts of the machine to prevent unwanted motions whenthe operator is not in control�

� Twenty three other changes will be made to the software to improve itsoperation and reliability� including disabling of unused keys� changingthe operation of the set and reset commands� preventing copying of thecontrol program on site� changing the way various detected hardwarefaults are handled� eliminating errors in the software that were detectedduring the review process� adding several additional software interlocks�disallowing changes in the service mode while a treatment is in progress�and adding meaningful error messages�

� The known software problems associated with the Tyler and Yakimaaccidents will be �xed�

� The manuals will be �xed to re�ect the changes�

Figure � shows a typical Therac��� installation after the CAP changeswere made�

App� A � Medical Devices� The Therac���

Beamon/off light

Motionpower switch

Treatment table

Intercom

Therac-25 unit

Turntableposition monitor

Control

Printer

Display

TV Monitor

footswitch

RoomEmergencySwitches

DoorInterlockSwitch

Motion enableterminal

console

TV camera

Figure �� A typical Therac��� facility after the �nal CAP�

App� A � Medical Devices� The Therac���

Ed Miller� the director of the Division of Standards Enforcement� Centerfor Devices and Radiological Health at the FDA� wrote in �����

FDA has performed extensive review of the Therac��� softwareand hardware safety systems� We cannot say with absolute cer�tainty that all software problems that might result in improperdose have been found and eliminated� However� we are con�dentthat the hardware and software safety features recently addedwill prevent future catastrophic consequences of failure�

No Therac��� accidents have been reported since the �nal corrective ac�tion plan was implemented�

� Causal Factors

Many lessons can be learned from this series of accidents� A few are consid�ered here�

Overcon�dence in Software� A common mistake in engineering� in thiscase and in many others� is to put too much con�dence in software� Thereseems to be a feeling among nonsoftware professionals that software will notor cannot fail� which leads to complacency and overreliance on computerfunctions�

A related tendency among engineers is to ignore software� The �rst safetyanalysis on the Therac��� did not include software�although nearly fullresponsibility for safety rested on it� When problems started occurring� itwas assumed that hardware had caused them� and the investigation lookedonly at the hardware�

Confusing Reliability with Safety� This software was highly reliable� Itworked tens of thousands of times before overdosing anyone� and occurrencesof erroneous behavior were few and far between� AECL assumed that theirsoftware was safe because it was reliable� and this led to complacency�

Lack of Defensive Design� The software did not contain self�checks orother error�detection and error�handling features that would have detected

App� A � Medical Devices� The Therac��� �

the inconsistencies and coding errors� Audit trails were limited because ofa lack of memory� However� today larger memories are available and audittrails and other design techniques must be given high priority in makingtradeo� decisions�

Patient reactions were the only real indications of the seriousness of theproblems with the Therac���� there were no independent checks that the ma�chine and its software were operating correctly� Such veri�cation cannot beassigned to operators without providing them with some means of detectingerrors� The Therac��� software �lied� to the operators� and the machine itselfwas not capable of detecting that a massive overdose had occurred� The ionchambers on the Therac��� could not handle the high density of ionizationfrom the unscanned electron beam at high beam current� they thus becamesaturated and gave an indication of a low dosage� Engineers need to designfor the worst case�

Failure to Eliminate Root Causes� One of the lessons to be learnedfrom the Therac��� experiences is that focusing on particular software designerrors is not the way to make a system safe� Virtually all complex softwarecan be made to behave in an unexpected fashion under some conditions�There will always be another software bug� Just as engineers would notrely on a design with a hardware single point of failure that could lead tocatastrophe� they should not do so if that single point of failure is software�

The Therac��� contained the same software error implicated in the Tylerdeaths� but this machine included hardware interlocks that mitigated theconsequences of the error� Protection against software errors can and shouldbe built into both the system and the software itself� We cannot eliminateall software errors� but we can often protect against their worst e�ects� andwe can recognize their likelihood in our decision making�

One of the serious mistakes that led to the multiple Therac��� accidentswas the tendency to believe that the cause of an accident had been deter�mined �e�g�� a microswitch failure in the case of Hamilton� without adequateevidence to come to this conclusion and without looking at all possible con�tributing factors� Without a thorough investigation� it is not possible todetermine whether a sensor provided the wrong information� the softwareprovided an incorrect command� or the actuator had a transient failure anddid the wrong thing on its own� In the case of the Hamilton accident� a

App� A � Medical Devices� The Therac��� �

transient microswitch failure was assumed to be the cause even though theengineers were unable to reproduce the failure or to �nd anything wrong withthe microswitch�

In general� it is a mistake to patch just one causal factor �such as thesoftware� and assume that future accidents will be eliminated� Accidentsare unlikely to occur in exactly the same way again� If we patch only thesymptoms and ignore the deeper underlying causes� or if we �x only thespeci�c cause of one accident� we are unlikely to have much e�ect on futureaccidents� The series of accidents involving the Therac��� is a good exampleof exactly this problem� Fixing each individual software �aw as it was founddid not solve the safety problems of the device�

Complacency� Often it takes an accident to alert people to the dangersinvolved in technology� A medical physicist wrote about the Therac��� acci�dents�

In the past decade or two� the medical accelerator �industry�has become perhaps a little complacent about safety� We haveassumed that the manufacturers have all kinds of safety designexperience since they�ve been in the business a long time� Weknow that there are many safety codes� guides� and regulations toguide them and we have been reassured by the hitherto excellentrecord of these machines� Except for a few incidents in the �����s�e�g�� at Hammersmith� Hamburg� the use of medical acceleratorshas been remarkably free of serious radiation accidents until now�Perhaps� though we have been spoiled by this success ���

This problem seems to be common in all �elds�

Unrealistic Risk Assessments� The �rst hazard analyses initially ig�nored software� and then they treated it super�cially by assuming that allsoftware errors were equally likely� The probabilistic risk assessments gener�ated undue con�dence in the machine and in the results of the risk assessmentthemselves� When the �rst Yakima accident was reported to AECL� the com�pany did not investigate� Their evidence for their belief that the radiationburn could not have been caused by their machine included a probabilisticrisk assessment showing that safety had increased by �ve orders of magnitudeas a result of the microswitch �x�

App� A � Medical Devices� The Therac��� �

The belief that safety had been increased by such a large amount seemshard to justify� Perhaps it was based on the probability of failure of themicroswitch �typically ����� and�ed with the other interlocks� The problemwith all such analyses is that they typically make many independence as�sumptions and exclude aspects of the problem�in this case� software�thatare di�cult to quantify but which may have a larger impact on safety thanthe quanti�able factors that are included�

Inadequate Investigation or Followup on Accident Reports� Everycompany building safety�critical systems should have audit trails and incidentanalysis procedures that are applied whenever any hint of a problem is foundthat might lead to an accident� The �rst phone call by Tim Still shouldhave led to an extensive investigation of the events at Kennestone� Certainly�learning about the �rst lawsuit should have triggered an immediate response�

Inadequate Software Engineering Practices� Some basic software en�gineering principles that apparently were violated in the case of the Therac���include the following�

� Software speci�cations and documentation should not be an afterthought�� Rigorous software quality assurance practices and standards should be

established�� Designs should be kept simple and dangerous coding practices avoided�� Ways to detect errors and and get information about them� such as

software audit trails� should be designed into the software from thebeginning�

� The software should be subjected to extensive testing and formal anal�ysis at the module and software level� system testing alone is not ade�quate� Regression testing should be performed on all software changes�

� Computer displays and the presentation of information to the opera�tors� such as error messages� along with user manuals and other docu�mentation need to be carefully designed�

The manufacturer said that the hardware and software were �tested andexercised separately or together over many years�� In his deposition forone of the lawsuits� the quality assurance manager explained that testingwas done in two parts� A �small amount� of software testing was done on

App� A � Medical Devices� The Therac��� �

a simulator� but most of the testing was done as a system� It appears thatunit and software testing was minimal� with most of the e�ort directed at theintegrated system test� At a Therac��� user�s meeting� the same man statedthat the Therac��� software was tested for ����� hours� Under questioningby the users� he clari�ed this as meaning ����� hours of use�� The FDAdi�culty in getting an adequate test plan out of the company and the lackof regression testing are evidence that testing was not done well�

The design is unnecessarily complex for such critical software� It isuntestable in the sense that the design ensured that the known errors �theremay very well be more that have just not been found� would most likelynot have been found using standard testing and veri�cation techniques� Thisdoes not mean that software testing is not important� only that softwaremust be designed to be testable and that simple designs may prevent errorsin the �rst place�

Software Reuse� Important lessons about software reuse can be found inthese accidents� A naive assumption is often made that reusing softwareor using commercial o��the�shelf software will increase safety because thesoftware will have been exercised extensively� Reusing software modules doesnot guarantee safety in the new system to which they are transferred andsometimes leads to awkward and dangerous designs� Safety is a quality ofthe system in which the software is used� it is not a quality of the softwareitself� Rewriting the entire software in order to get a clean and simple designmay be safer in many cases�

Safe versus Friendly User Interfaces� Making the machine as easy aspossible to use may con�ict with safety goals� Certainly� the user interfacedesign left much to be desired� but eliminating multiple data entry and as�suming that operators would check the values carefully before pressing thereturn key was unrealistic�

User and Government Oversight and Standards� Once the FDA gotinvolved in the Therac���� their response was impressive� especially con�sidering how little experience they had with similar problems in computer�controlled medical devices� Since the Therac��� events� the FDA has movedto improve the reporting system and to augment their procedures and guide�

App� A � Medical Devices� The Therac��� �

lines to include software� The input and pressure from the user group wasalso important in getting the machine �xed and provides an important lessonto users in other industries�

References

�� C�A� Bowsher� Medical device recalls� Examination of selected cases�Technical Report GAO Report GAO�PEMD������ U�S� Government Ac�counting Organization� October �����

�� C�A� Bowsher� Medical devices� The public health at risk� TechnicalReport GAO Report GAO�T�PEMD������ U�S� Government AccountingOrganization� �����

� M� Kivel� editor� Radiological Health Bulletin� volume XX��� Centerfor Devices and Radiological Health� Food and Drug Administration�Rockville� Maryland� December �����

� Nancy G� Leveson and Clark S� Turner� An investigation of the Therac���accidents� IEEE Computer� ��������! �� July ����

�� Ed Miller� The Therac��� experience� In Conference of State RadiationControl Program Directors� �����

�� J�A� Rawlinson� Report on the Therac���� In OCTRF�OCI PhysicistsMeeting� Kingston� Ontario� May �����

�� R� Saltos� Man killed by accident with medical radiation� Boston Globe�June �� �����