failsafe: supporting product quality with knowledge-based systems

failsafe: supporting product quality with knowledge-based systemsE.W. Stein*, D.K. Miscikowski

The Pennsylvania State University, Great Valley School of Graduate Professional Studies, Department of Management Science and Information Systems,Management Division, 30 East Swedesford Road, Malvern, PA 19355, USA

Abstract

This article describes the design, development, implementation and impact of an expert system used in Quality Assurance in the foodindustry. The knowledge base of the system was developed by a non-programmer in a high level production rule language. The introductionof the system led to improved decision-making at the plant level and improved reporting to meet internal and external quality requirements.The system also contributed to team based learning. The system is likely to save one plant over US$100,000 due to decreased consumercomplaints. q 1999 Elsevier Science Ltd. All rights reserved.

Keywords: Quality assurance; Root cause analysis; Knowledge management; Decision support system; Expert system

1. Introduction

Knowledge-based systems and expert systems can beused in a variety of settings and industries for knowledgemanagement. In this case, we describe the use of an expertsystem in the food industry. Functionally, the expert systemserves the organization in Quality Assurance (QA). In thesections that follow, we describe the context of the case,justify the need for the system, discuss the design, develop-ment, and implementation of the system known as failsafewithin a major food company, and document the impact ofthe system on the organization. We close with some reflec-tions about how knowledge-based systems can contribute toteam decision-making and organizational learning.

2. Context and problem description

2.1. Regulatory climate

The Food and Drug Administration (FDA) is the primaryfederal organization which regulates the manufacturing offood in the United States. The key enforcement act is theFood, Drug, and Cosmetic (FD&C) Act of 1938 (Food,1938). Two important sections of the FD&C Act aresections 402(a)(3) and 402(a)(4), which state a food shallbe deemed adulterated if it consists of any putrid substanceor was prepared or packed under conditions which con-taminated the food with filth which may have rendered it

injurious to health. Additionally, the FDA states GoodManufacturing Practices (GMP) published in the Code ofFederal Regulations (1993), which further define the FD&Cact. The GMPs state the legal criteria for food manufacturersunder which food can be manufactured. Adequacy of foodsafety systems is a key component of the GMP standardsand thus a priority for all food manufacturers.

2.2. Case background

The company in this study is a large US food manufac-turer. Since the late 1980s, this company has been through aseries of structural redesigns to remain competitive in theUS food industry. The reason for this is simple: customersand consumers are more demanding. Customers (i.e. super-market chains) expect exemplary service and consumersexpect high quality at competitive prices. To remain afood industry leader, this company realized it must developand implement new methods to increase efficiency andensure product quality. In response, this organization imple-mented two key initiatives: Efficient Consumer Response(ECR) and Supply Chain Management (SCM). Both initia-tives require manufacturing and QA groups to rapidly iden-tify substandard product prior to shipment.

ECR is a method of controlling customer inventory.Using this procedure, the manufacturer adjusts productionto meet the inventory needs of the customer and allows thecustomer to reduce inventory costs. ECR allows themanufacturer to schedule and produce products just intime, which also reduces costs. SCM is a method ofcontrolling manufacturing costs in three areas: ingredientsupply, conversion to finished product, and finished product

Expert Systems with Applications 16 (1999) 365377PERGAMON

Expert Systemswith Applications

0957-4174/99/$ - see front matter q 1999 Elsevier Science Ltd. All rights reserved.PII: S0957-4174(99)00012-3

* Corresponding author. Tel.: 1 1-610-648-3256.E-mail address: [email protected] (E.W. Stein)

delivery. SCM essentially synchronizes production withproduct demand. Both ECR and SCM require that produc-tion be handled more efficiently and that production data bedelivered to decision makers in real-time. In both cases,there is a greater need to make decisions concerning productquality and to conduct risk analyses quickly.

2.3. Quality initiatives

To meet the challenges posed above, the companys QAleadership implemented new QA methods and systems at itsproduction plants to help analyze data quickly and accu-rately, and to ensure product quality (Marsisli, 1995;Brandt, 1996; Russell, 1996). The holistic method imple-mented at the facility profiled in this case was Quality atTime of Pack (QTOP). The QTOP method helps operatorsidentify, quantify, monitor, and control in-process controlpoints. Contrasted with traditional quality systems thatinspect finished product, QTOP helps prevent the produc-tion of substandard products through the use of in-processcontrol points.

Prior to the QTOP initiative, two fundamental steps weretaken. First, the organization was restructured. Trained qual-ity personnel were transferred from the QA Department tothe Business Units at the plant. This reorganizationproduced a tighter coupling between QA and the productionpersonnel. Another key element of the restructuring was anexpansion of the locus of responsibility of line leaders andplant operators. Prior to the change, quality decisions weremade by QA personnel only. With the change, operators andfront line leaders (FLLs) were asked to make quality deci-sions based on in-process data. This change represented acultural as well as functional shift for the organization.

Secondly, a large scale production information systemwas implemented to help analyze QTOP in-process data.This system, known as the Manufacturing Quality System(MQS), is an information system that allows users to accessplant data in real time. The graphical representations andstatistical analyses provided by the system help operatorsand FLLs make better production decisions.

However, despite the availability of these methods andsystems, discontinuous events resulting in out of specifica-tion product and the decision processes surrounding suchincidents were not addressed. When incidents occur thatcause work slow-downs or stoppages, the cause of the inci-dent must be analyzed and action taken. Historically, mostincidents were analyzed by the QA department personnelprior to taking action. The problem with this solution is thatQA relies on reports from the shop floor to make decisions.It is therefore essential that the plant operators and front linesupervisors supply QA with accurate event information.Getting quality information about production and proces-sing events requires training in Root Cause Analysis(RCA), which is not widely known. Further, operators andFLLs were reluctant to collect data and make decisions

involving finished product since it was typically out oftheir areas of expertise and responsibility.

There was an obvious decision gap between the trainingof operators and front line supervisors in traditional problemsolving tools (brainstorming, cause and effect diagrams,Pareto analysis, force field analysis, flow charting, histo-grams, control charts and process capability studies) andthe actual usage of such tools on the plant floor. A bridgewas needed to facilitate the decision making process on theplant floor. What could this organization do to better supportits quality initiatives?

2.4. Proposed solution

To deal with this problem, senior management approvedthe development of an expert system known as failsafe tohelp plant operators and FLLs undertake RCA of incidentsthat occur on the production line. The operator or leaderwould use the expert system to help him or her document,identify, and classify production events and to isolate, anddispose of product that is below standard. The documenta-tion process requires the collection of key data regarding theincident under review such as the product type, time,personnel on duty, line numbers, etc. The exact nature ofthe event is also identified. Next, an event is classified ascritical or routine. A critical event is defined as one thatresults in raw material that places the consumer at risk,which may be a microbiological, chemical, or physicalhazard. A routine event results in an out-of-specificationproduct that does not place the consumer at risk. Basedupon the classification, the system would recommend thequantity of product to be placed in isolation. Finally, thesystem would recommend material disposition based uponcurrent federal regulations concerning food production. Atthe conclusion of the consultation, the system wouldprovide a report of the details of the incident, its conclu-sions, and a summary report of the findings for management.

3. Justification for the system

3.1. Domain characteristics

The primary function of the failsafe system is to helpconduct a Root Cause. Analysis of an incident. RCA helpsmanagers monitor, identify, measure, analyze, and amelio-rate problems.

Root cause is that most basic reason for an undesir-able condition or problemRoot causes are usuallydefined in terms of specific or systematic factors-Root Cause Analysis refers to the process of identify-ing these causal factors (Wilson and Dell, 1993).This problem area meets with several key domain

requirements for expert systems (Harmon, 1990; Ignizio,1991):

E.W. Stein, D.K. Miscikowski / Expert Systems with Applications 16 (1999) 365377366

well defined; complex but structured; primarily cognitive; recognized experts.

Since RCA is a well defined procedure, it lends itself tobeing translated into production rules. At the company,recognized experts in (RCA) were known and accessible.In addition, select groups were trained in RCA principlesand decision analysis for application to the plant floor.However, owing to its relative complexity, RCA was notused on a routine basis, which is a common problem in allindustries. A knowledge-based system was an ideal way tobridge the gap between the book knowledge providedthrough training materials and courses and the contingen-cies of daily operations.

3.2. Management climate

The attitudes and philosophies of an organizationsmanagement can greatly impact the success of any project.This organization had several factors in its favor. First,management fostered an attitude that employees shouldseek solutions to problems and to initiate change. Second,organizations line operators were computer literate andcomfortable using personal computers. Third, managementwas supportive of the expert system concept. The impetusbehind this project was the QA Manager, who had beenexposed to expert system technology. His boss, the plantmanager, was very supportive of the effort. Liebowitz(1992) notes that an awareness of expert systems amongmanagers is critical for the success of expert systemprojects. Finally, the project was in alignment with thecompany goal of improving product quality to bettercompete in the marketplace. The expert system projectwas likely to lead to improved decision-making in termsof consistency, speed and quality since these are typicalbenefits provided by expert systems.

3.3. Economic feasibilityThis project was evaluated in terms of real costs and

tangible and intangible benefits in order to gain support bymanagement (Turban, 1992).

3.4. Intangible benefitsExpert systems provide a number of intangible benefits

including improved decision-making, decision consistency,speed, etc. To quantify these benefits, we conducted a utilityassessment of an expert system implementation versescurrent operations and the option of simply adding morepeople to the line. The benefit categories were:

decision quality; decision consistency; decision speed; increased production; job satisfaction; competitive advantage.

Table 1 shows the results of a weighted linear additivemodel assessment for the failsafe project. The three alter-natives were to implement an expert system, continue withcurrent operations, or hire more people.

The highest weights were given to decision quality anddecision consistency since these were the key objectives ofthe system. However, other factors such as increasedproduction, job satisfaction, and contribution to competitiveadvantage were also considered. In all categories but one(job satisfaction) failsafe scored the highest among thethree alternatives. Overall, failsafe was rated as providingover twice as much value (2.9) across all dimensions ascurrent operations (1.3). The HR solution rated second over-all (1.8). failsafe was therefore the clear winner among thethree alternative courses of action. Further, it was believedthat failsafe would help personnel anticipate problemsbefore they arose, thus resulting in organizational learning,although this effect was not assessed at the start of theproject (we discuss this aspect later in this article).

3.5. Project costsSoftware project costs typically include the personnel

time to develop a customized application, developmentand runtime software licenses, and training costs. In thiscase, licenses for the expert system development softwareand the runtime software totalled about US$15,000, plusannual maintenance fees. Developing the knowledge basetook about 80 h of time, which was developed during off

E.W. Stein, D.K. Miscikowski / Expert Systems with Applications 16 (1999) 365377 367

Table 1Linear additive model assessment of failsafe

Benefits ( 1 ) Weight (01) Expert system (31) V1 Swx Current operation V2 Swx HR solution V3 Swx

Decision quality 0.3 3 0.9 1 0.3 2 0.6Decision consistency 0.3 3 0.9 1 0.3 2 0.6Decision speed 0.1 3 0.3 2 0.2 1 0.1Increased production 0.1 3 0.3 2 0.2 1 0.1Job satisfaction 0.1 2 0.2 1 0.1 3 0.3Competitive advantage 0.1 3 0.3 2 0.2 1 0.1

Total benefit S1.0 17 2.9 9.0 1.3 10.0 1.8

hours by a non-programmer and therefore was not includedin the calculation (Assuming a billable rate of US$90 perhour results in a market-equivalent cost of US$7200. Devel-oping such a system from scratch using a conventionallanguage such as C 1 , Visual Basic, or Pascal was esti-mated to cost several times this amountapproximatelyUS$50,000100,000). Training four people in the use ofdevelopment tool was US$2500 for a two day course.

3.6. Tangible benefitsIt was assumed that the failsafe expert system would

reduce the processing costs of documenting, classifying,isolating, and disposing of out of specification product.The saving would come in a few forms: (i) reduce thetime needed to fully document an incident; (ii) reduce theamount of management time needed to determine the sever-ity of event (iii) reduce the scope of the hold resulting thedisposal of less product or raw material; (iv) reduce the timeto prepare reports documenting the incidents. By conserva-tive account, it was estimated that failsafe would result ina 1015% reduction in event detection costs. This amountwas estimated by using historical cost data. The cost ofevents is tracked on a monthly basis using a Product Unfa-vorable Disposition score card. The plant score card quan-tifies the cost of finished product defects and identifies theproportion produced by each business unit. The costsincluded costs of natural ingredients and labor necessaryto finalize the material disposition. Event determination istypically made by the QA personnel. Table 2 provides ahistorical review of the annual costs of unfavorable casesfrom one specific facility from 19911996.

The average number of cases lost to critical and routineevents was 36,277 per year resulting in an average cost ofUS$213,130 per year. Assuming at 15% reduction in thesecosts yields savings of US$34,660 per year, which amountsto US$104,000 over 3 year (see Table 3).

The net savings for the project was estimated atUS$81,480 using the hard costs of the failsafe system interms of software, training, and maintenance. The netpresent value of the project was positive (US$60,38522,500 37,885). The return on investment for the projectwas estimated to be about 40%.

4. System description

4.1. Overall function

The failsafe expert system helps line operators andleaders to conduct a RCA of a production incident eitheras it is occurring or within 24 h. The first step of the systemis to collect data about the event. Several questions areasked of the user pertaining to the product, the time, andother event data. Next, the system identifies and classifiesthe severity of the event. There are two possible outcomeshere: routine or critical events. A critical event is defined asone that leads to product that places the consumer at risk. Aroutine event is defined as one that leads to product that isout-of-specification but does not place the consumer at risk.Once this determination is made, the system recom-mends the quantity of product, if any, to be placed inisolation. For example, the system may recommend thatfifty units of product be isolated. If the operator hasalready isolated the substandard product, failsafeasks the user for his or her confidence on a scale of0100 that the material was fully isolated and that therisks to the consumer were eliminated. Finally, thesystem recommends what to do with the isolatedproduct based on federal regulations regarding foodproduction. Based on the severity of the classification,the product will be either released, reworked, donated,or destroyed. The system produces several reports at theconclusion of the session. The main report provides allthe information regarding the event, its classification,isolation, and disposition. A transcript of the interactionbetween the user and the system is also logged. Thecompleted transcript is reviewed by the QA managerfor accuracy and content. If additional information isrequired, the QA manager will contact the individualswho completed the RCA. The final report (if classified asa critical event) requires the QA manager to report the finaldisposition of the isolated material.

4.2. System features and architecture4.2.1. Key components

The key components of the system are a knowledge baseand the runtime software that conducts the question-answer


Table 2Annual costs of unfavorable dispositions 19911996

Year #Units(dispositioned asunfavorable)

Cost US$ of units(ingredients, materialsand labor)

1991 58,779 360,0951992 31,375 191,3561993 31,480 189,6081994 49,739 308,5381995 29,589 205,5381996 16,700 131,647Average 36,277 213,130

Table 3Estimated saving with failsafe

Year Project costs US$ Estimated cost reduction US$ Net savings

1997 17,500 34,660 17,1601998 2500 34,660 32,1601999 2500 34,660 32,160

Total 22,500 104,000 81,480NPV 60,385

session as driven by the inference engine.1 The failsafeknowledge base was written by a non-programmer in ascripting language developed by one of the authors. Thescripting language features production rules as its primaryrepresentation. One advantage of the scripting language isthe language extensions that provide the ability to issuereports, control navigation, conduct variable tests andcomparisons, send files, and other features common to

Windows based software programs but not typical to expertsystems per se.

Once the file was written, it was run from a shared driveusing an expert systems shell developed by one of theauthors. The shell acts likes a browser that contains aninference engine and an easy to use interface that displaysqueries, on-screen text, external files (e.g., text, images),end-user reports, and information about its reasoningprocess. Reports can be saved, printed or emailed fromthe interface. The shells inference engine executes both


Fig. 1. Process event flowchart to regain normal operating conditions.

1 Please contact one of the authors for more information on the software.

backward and forward chaining. The interface is so easy touse that virtually no training is necessary. One of the authorstested this assumption by randomly selecting a person in theaudience at a presentation, and he was able to run the systemwithout prior instruction.

Backward chaining is the default for the system, but in theabsence of information about goals (as contained in the rulebase), the system will shift into a forward chaining mode.The expert system also has the capability to send andretrieve data from external programs and display externalfiles called by the knowledge base during a consultation.

One of the key advantages of the system is that it lendsitself to a distributed architecture in a client-server environ-ment. For example, a knowledge based stored in Chicagocan be activated by someone in Texas using a locallyinstalled shell. The knowledge base files that contain allthe key business logic take up only 1050 kbytes ofmemory. These structural features have several advantage.By separating the business logic from the processing, main-tenance is simplified and costs are reduced. Changing needscan be implemented much more easily. Since the files aresmall, transmission and sharing is easy using email andother electronic means.

5. System development

5.1. Sources of knowledge and development process

failsafe is based on both written and human sources ofknowledge and expertise. The primary written knowledgesource for the failsafe system was the Code of FederalRegulations title 21, section 110, which describes GMPsfor the food industry. The GMPs state regulations for foodmanufacturers on how to eliminate adulterated food fromthe food process. The domain expert for the project (seebelow) interpreted the GMP guidelines for incorporationinto the failsafe rule base.

The primary human source of expertise was the QAmanager for the facility. Prior to the development of

failsafe, the QA manager had developed quality decisiontrees and matrix classifications that were used as guides bythe QA department and the business units for event classi-fication. A decision tree flowchart is provided in Fig. 1,which describes the analytical decision-making processaround event classification.

In addition, the QA manager developed a matrix that wasused to classify events as routine or critical. This knowledgeis illustrated in Table 4.

The QA manager used the decision trees and matrices tohelp map the knowledge domain prior to coding the fail-safe expert system.

Once all areas of knowledge were adequately mapped,the QA manager coded the knowledge base. It should bepointed out that the QA manager was not a programmer nordid he have experience with expert systems at the beginningof the project. However, the QA manager learned how todevelop expert systems in a graduate class in an MBAprogram at Penn State (Penn State Great Valley, School ofGraduates for Professional Studies, Malvern, PA), anddeveloped a working prototype of failsafe within fiveweeks (see next section for a description of the prototype).

With the approval of the failsafe prototype, the QAmanager worked with three other quality experts to developthe final system. The quality organization at the plantincludes a QA leader located in each of the three businessunits for a total of three QA leaders. The QA managerworked with the three QA leaders to refine and test thefailsafe knowledge base. Each of the QA leaders receivedtraining in expert systems development in a two day inten-sive course provided by one of the authors.

5.2. The knowledge base

The problem addressed by the failsafe expert system isthe combination of forward driven data collection and clas-sification. Classification type problems require that objectsbe sorted into one of several defined categories based onmultiple characteristics (Ignizio, 1991). In this case theprimary determination is the severity of the event (routine


Table 4Event classification matrix

Location Routine category Critical category 1 Critical category 2

Process A Sensory Biological Regulatory1. Flavor 1. Microbiological 1. Weight control2. Texture 2. Pathogens 2. Labeling3. Odor 3. Nutritional

4. Formal reqs.

Chemical1. Cross contamination

Physical1. Foreign material2. Extraneous material

or critical) based on federal guidelines. Action steps are thenspecified based on this determination.

The first cut of the failsafe system resulted in a knowl-edge base of 29 rules. The prototype contained all theelements of the final version, but restricted the documenta-tion of the event to only a few parameters and applied toonly one of the business units. The first step in the process isto collect basic event data. Next the system identifiessources of contamination (if they exist) by category: e.g.,metal, foreign material, regulatory, or sensory. Next, thesystem classifies the event as routine or critical. If theevent was critical and the line was not stopped, the user isasked to explain why the process was not stopped per plantQA procedures. Conversely, if the event was routine and theline was stopped, the user is asked to explain why theprocess was stopped since plant policy does not requirestopping the process. The system then recommends thenumber of finished product units to isolate based upon theclassification. Finally, the system asks the user to input hisor her confidence on a 0100 scale that the substandardproduct was fully isolated and the consumer was protected.Based on the responses provided by the line operator orsupervisor, total confidence is calculated as a geometricmean of the percent confidence that the material wasisolated and no product was released to the consumer.

After extensive testing, evaluation, and modification, thecompleted knowledge base was constructed by the QAmanager with input from the business units. The completefailsafe expert system knowledge base has over 120 rules,100 objects and over 15 goals.

6. Evaluation and testing

6.1. Verification and validation

Verification and validation are two important aspects ofthe development process (Wright, 1992). Verificationdescribes how closely the final system matches the designspecifications. Validation evaluates the system with respectto its intended use. Verification and validation, as well as

field testing, are required to minimize Type 1 and Type 2errors (see below).

6.1.1. Analysis and performance testingThe performance of the failsafe system was analyzed

repeatedly as the system was developed. Beginning with theprototype system, verification and validation checks wereperformed. First, the knowledge base was examined forisolated rules, subsumed rules, conflicting rules, circularrules, redundant rules, as well as rules with incorrect syntax.Thorough inspection of the rule base for each developmentcycle revealed no rules with these errors. Providing combi-nations of test responses did not generate errors, nor did trialsessions generate unreferenced attribute values. The sessiontrail log and consultation results displayed no errormessages. The logic of the fired rules was comprehensibleand convincing. After verifying the rules of the knowledgebase for common errors, it was field-tested by four endusers. Minor design changes were made to the systembased on the results of these tests.

The problem domain allows for 3680 possible outcomesbased on 46 subcategories of possible event classes, whichare the key identifiers for the system. If the programming foreach subcategory works, the system as a whole works byproviding accurate classification and confidence calcula-tions. As part of the verification of the coding, each of the46 subcategories was evaluated. No errors were found in thesubsections, therefore it was concluded that the knowledgebase contained no logical errors.

Validation was performed to ensure that the final systempresented correct solutions. Performance validation of thefailsafe expert system was conducted using past events atthe facility. Several cases were submitted to failsafe forevaluation and the recommended actions were found to beconsistent with what would have been recommended byQA. The failsafe expert system classified all events as aroutine or critical in a way that was consistent with the eventclassification matrix (refer to Table 4).

6.2. Error analysis

Two type of errors are associated with this system (seeTable 5).

The worst type of error, Type 2, occurs if an event isclassified as routine when it is critical. This poses the great-est threat to the consumer. To limit the possibility of thistype of error, the failsafe logic was designed with aconservative bias; i.e. the system was intentionally designedto accept more critical cases than routine cases. In addition,once the determination is made that a event is critical, thesystem is biased to recommend that more product beisolated from the line to increase the probability that thematerial was fully isolated. Any event classified as criticalresults in an increase in scope of units isolated (as opposedto a routine event) to ensure consumer safety. This increasein the magnitude of the scope reduces the probability of


Table 5Types of event detection errors

Event DecisionClassify a routine Classify a critical

RoutineTRUEa True Type I errorbRoutineFALSE (Critical) Type II errorc True

a The null hypothesis is that the event is routine unless otherwise identi-fied as critical.

b Type I Error (producers risk): the hypothesis is true and a routine eventis classified as critical. The result is additional product dispositioned asunfavorable to ensure consumer safety.

c Type II Error (consumers risk): the hypothesis is false: a critical eventis classified as routine. The result is that product is possibly released to theconsumer that poses some risk.

Type 2 errors (but increases Type 1 errors). In conjunctionwith the increased scope, a confidence factor is used toestimate the risk that substandard product is isolated fromthe process. This estimate is provided by the line supervisor.If this value is low, the scope of the isolation is increasedeven more.

A Type 1 error occurs if a event is mistakenly classified ascritical when it is routine. The consequences of this type oferror are: (1) more product is isolated and disposed of,thereby incurring higher production costs; (2) more paper-work is generated by the plant in order to document criticalevents; (3) the plant reports more severe events than actuallyexist, which may have a bearing on its evaluation. Ingeneral, Type 1 errors in this case increase overall produc-tion costs. Despite this drawback, however, the decision wasmade to allow more Type 1 errors in order to protect theconsumer. It was assumed that if the system caught morecritical events, this would result in fewer consumercomplaints overall, which would compensate for the costsof the Type 1 errors. The system would also reduce liabil-ities associated with critical events given its biases (seesection on liability).

6.3. Risk management

All decision-making processes, human or otherwise, areprone to error. Quality professionals make decisions basedupon risk assessments where the key component is consu-mer safety. failsafe was designed to do the same thing atthe same or higher levels of quality. In either case, there is arisk of exposure. A company can reduce claims of liabilityand negligence by exercising due diligence in training andthrough the use of preventive measures (Monippallil, 1992).The failsafe program was designed to work in accordancewith the federal GMPs, which identify production incidentsthat lead to product posing concerns to consumers. Thesystem was intentionally biased to avoid Type 2 errors(favoring consumer protection) while incurring Type 1errors (higher production costs). Development of the

knowledge base using key QA department personnel helpedensure that failsafe-supported decisions were of high qual-ity. failsafe was designed such that real risks were placedat a high level of concern and unrealistic risks were placed ata lower level of concern (Newsom, 1990; Silverman, 1996).Line supervisors were thoroughly trained and tested in theuse of the expert system prior to use.

7. Implementation

7.1. failsafe timetable

failsafe was developed and implemented over a 1.5 yearperiod. It should be noted that failsafe was not a scheduledIS project, nor was it worked on continuously. In this sense,it truly was a skunkworks project, owing its early life tothe interest and dedication of the QA manager. Table 6provides a rough chronology of the life of failsafe at thisspecific facility.

In Phase 1, the QA Manager attended a graduate course atPenn State and learned how to develop expert systems. Theinitial, 29 rule prototype was generated during this time anda report was prepared that justified the system and docu-mented specifications for a complete design. The prototypewas structured for the one of business units at a plant. Thereport and prototype were demonstrated to senior plantmanagement. Once it was approved, the QA manager wasgiven the approval to continue work on an expanded versionof failsafe. Other members of the QA team were exposedto expert systems technology during a two day trainingseminar provided by one of the authors. The failsafe struc-ture was reviewed by the team and modifications weremade. The QA manager completed development of thesystem part-time over a two month period. The near-complete system was demonstrated at a meeting of all QAmanagers in the US and Canada at the corporate offices.Response to the system was favorable by other plantmanagers and QA managers. In early 1998, failsafe wasintroduced into the first business unit. Based on its successin these areas, it was introduced to the second and thirdbusiness units at one plant facility. By May 1998, all busi-ness units at this facility were using the failsafe system toreport line stoppages and critical events.

7.2. Training and preparation

There are fourteen FLLs and QA Sanitation Leaders atthe facility using the failsafe system. To facilitate asmooth introduction (Pigford and Baur, 1990) into theplant, the following steps were taken. First, the QA Managergave a short seminar on QA principles. Next, the leaderswere shown how to use the software. Because the interfacewas so easy to use, this took on average 1015 min. Oncethe leaders were comfortable with the system, they wereasked to run through several test examples with QA person-nel. After this stage was completed, failsafe was installed


Table 6failsafe development and implementation chronology (19971998)

Phase Date Description

1 JanuaryMarch 1997 Prototype development (40 h)2 AprilSeptember 1997 Review and approval by plant

management3 October 1997 Two day training of

development team4 NovemberDecember 1997 Completion of knowledge

base (40 h)Demonstration of completesystem at the corporatedivision meeting

5 March 1998 First business unit on line6 April 1998 Second BU on line7 May 1998 Third BU on-line

in the plant. QA continued to monitor the results of thetraining and implementation.

7.3. System requirements

failsafe required only a modest investment in hardwareand software, which increased its probability of success(Moser, 1987). Four licenses of the 16 bit version of theexpert system shell were installed on four existing PCslocated in the plant operations area for each of the businessunits. All PCs ran Windows 3.1 via the plant network, andwere either 486 or Pentium class machines. At the start of asession, the failsafe knowledge base was loaded from ashared drive on the network. User reports and session trailswere either printed or saved directly to the shared drive (seealso Section 8). Four licenses of the scripting languagedevelopment tool were installed on machines available inthe QA department. Only QA personnel were allowed tomake modifications to the knowledge base. Because it wasdecided that the QA department would maintain the soft-ware, failsafe imposed few resource requirements on theplant Information Service (IS) department.

8. Results and organizational impacts

8.1. Usage and reporting costs

Since its installation in the spring of 1998, failsafe hasbeen used to detect several product-related events. Forexample, from 4/22/98 to 7/14/98 failsafe documentedand evaluated 46 events (about five per week), 17 of

which were critical and 29 of which were routine. Eachfailsafe session produced a full report of the incident(see sample in the Appendix) for management and corpo-rate. Discussions with the QA manager revealed that corpo-rate is pleased with the new format and the depth ofinformation now reported compared to the earlier manualprocess. The time to prepare each report has dropped from1 h to about 10 min. This represents a savings of 250 eventsper year 50 min 208 h US$60 per hour US$12,480per year in labor costs. Further, the quality of the informa-tion received from the line is dramatically improved.Whereas QA used to receive a 13 sentence descriptionof the event, it now receives a 23 page report fully docu-menting the incident.

8.2. Costs of event detectionInitially it was hoped that failsafe would significantly

reduce costs associated with substandard product, on theorder of 1015% (refer to Table 2). The cost of event detec-tion and handling in 1997 was US$126,038 and the estimatefor 1998 is US$121,576 dollars. The 1998 estimate is signif-icantly below the average of 19911996 of US$231,130dollars (19911996) and slightly below the value for 1997(about 5%), but it is too early to tell whether this change is asignificant reduction in hold dollars; more post-implementa-tion data is needed.2 In the long run, it may turn out that


Fig. 2. Consumer contacts (19971998).

2 This study does not control for other mediating factors, although noother significant changes were made to QA during the introduction of fail-safe.

operators and front line supervisors are more vigilant due tothe failsafe program. If more events are handled conserva-tively, more product will isolated thereby resulting in thesame or slightly increased costs.

8.3. Change in consumer complaints

8.3.1. Number of complaintsA review of the top two consumer complaints (i.e.

contacts) for six quarters (19971998) reveals adramatic decrease in consumer contacts (see Fig. 2) follow-ing the implementation of failsafe (implemented inprocess 1 in the first quarter of 1998). Across three of thefour processes, the decrease in average number of consumercontacts was in excess of 60% from 1997 compared to thefirst two quarters of 1998. Consumer responses in 1997decreased from an average of 315 contacts to 143 contactsfor the first two quarters of 1998. This represents a decreaseof 55%.

8.3.2. Cost of complaintsWe calculated the cost of complaints using the following

method. We used an industry model that estimates costsbased on the number of complaints. (Table 7). The follow-ing assumptions are made in the model:

For every consumer that complains (C), there are 50 non-complainers (NC).

Of the fifty non-complainers, 46% will not purchase theproduct again.

Of the complainers, 14% will not purchase the productagain.

Contacts for all of 1998 were estimated by simply doublingthe total number of contacts from the first two quarters.

We totalled lost sales as the sum of sales lost to complai-ners and non-complainers. We assumed that two purchaseswere lost for each consumer. Using unit costs of US$1.00and 1.50 dollars, total savings were estimated at:US$187,503 2 103,019 84,484. This figure, due toimproved shop floor decisions resulting from the implemen-tation of failsafe, is a significant savings for the company.Combined with the other savings, the system produces anattractive investment.

8.4. Individual and organizational learning

Stein and Vandenbosch (1996) observe that the successor failure of an advanced information system is not alwaystied to system outcome; i.e. a system can even fail, yet anorganization can benefit. The key variables here are:

project outcomes; organizational learning outcomes; organizational performance outcomes.

For example, they observe that in the case of KenseyNash (Malvern, PA), a small bio-tech firm, the expertsystem did itself out of a job because data collected by theprogram was used to redesign the manufacturing process(Stein and Evans, 1995; Stein and Vandenbosch, 1996). Inthis case, the project was only partially successful in its longterm implementation (it was discontinued due to the rede-sign). However, the project yielded benefits that were inde-pendent from the development of the expert system; i.e. theorganization engaged in Type 1 and Type 2 learning andachieved a higher level of performance. We can use thisframework to analyze the failsafe system in a similarway (Table 8).

According to the framework, we find that the project wassuccessful because it was successfully designed and


Table 7Cost of complaints 19971998

C NC C 50 NC 46% C 14% Total Units Unit US$ Lost US$$

1997Process 1 1258 62,900 28,934 176 29,110 2 1.00 58,220Process 2a 801 40,050 18 423 112 18,535 2 1.50 55,605Process 2b 536 26,800 12,328 75 12,403 2 1.50 37,209Process 3 788 39,400 18,124 110 18,234 2 1.00 36,469Total 187,503

1998Process 1 572 28,600 13,156 80 13,236 2 1.00 26,472Process 2a 358 17,900 8234 50 8284 2 1.50 24,852Process 2b 542 27,100 12,466 76 12,542 2 1.50 37,626Process 3 304 15,200 6992 43 7035 2 1.00 14,069Total 103,019

Table 8Analysis of failsafe in terms of outcomes

Outcomes Development Post implementation

Project outcomes Success SuccessOrganizational learningoutcomes

Codification (C) Learning 1 (OL1)

Learning 1 (OL1) Learning 2 (OL2)??Organizational performance No change Positive ( 1 )

implemented. The knowledge base continues to be refinedto accommodate employee use (and potential misuse), toimprove reporting capabilities, and to reflect changes inwork processes and procedures. While the original purposeof the project was to codify and apply existing regulationsand RCA techniques, the organization used the developmentof failsafe as a learning opportunity. Line supervisors andoperations teams learned to better detect and handle produc-tion events through the use of the system, which we considerto be partially responsible for the resultant decrease inconsumer complaints. It remains to be seen whether theorganization can develop the capacity to engage in Type 2organizational learning as a result of the implementation ofthe system. To engage in Type 2 organizational learning(i.e. change existing procedures and methods), the organi-zation will have to analyze its performance in light of thefailsafe system implementation. Such efforts will requirethe collection and analysis of the failsafe use dataover several quarters. We envision that future versionsof the software will collect these types of data routi-nely. Further, we see application of the expert systemtechnology for the analysis of other events that disruptthe production process. For instance, a module could bebuilt to help analyze packaging events that loweroutputs. Another module could be built to analyze staff-ing events. These knowledge modules could be madeavailable throughout the plant via the network, thus creatinga scaleable and integrated solution for a host of productionand quality problems.

9. Conclusions

failsafe represented an effective and low-cost way toapply expert systems technology to QA in the food industry.The knowledge base was built and maintained by a non-programmer from the QA area for a fraction of the costrequired by building a traditional system. The systemprovided a number of savings and benefits to the company.Real cost savings were recorded in the time saved in report-ing and a reduction in the cost of materials lost due tocritical and routine events. The benefits included improveddocumentation of events, improved decision-makingconsistency, speed, and quality. The company also sees a

reduction in consumer complaints as a result of the use ofthe system. The system has served as a catalyst for organi-zational learning and improved team decision-making.

References

Brandt, L. (1996). Eye on QA: demands on QA labs are greater than ever.On-line monitoring and outsourcing helps labs cope. Food Formulat-ing, November/December, 3538.

Code of Federal Regulations, part 110, (1993). Current good manufacturingpractice in manufacturing, packing, or holding human food, 188198.

Food, D. (1938). Cosmetic Act of 1938 (amended 1971).Harmon, P., & Sawyer, B. (1990). Creating expert systems. New York:

Wiley.Ignizio, J. P. (1991). Introduction to expert systems. New York: McGraw-

Hill.Liebowitz, J. (1992). How to succeed in expert systems without really

trying. In E. Turban & J. Liebowitz (Eds.), Managing expert systems,(pp. 324). Harrisburg, PA: Idea Group Publishing.

Marsisli, R. (1995). Improving quality control through computerization.Food Product Design, December 17.

Monippallil, M. (1992). The application of strict liability to defective expertsystems. In E. Turban & J. Liebowitz (Eds.), Managing expert systems,(pp. 211). Harrisburg, PA: Idea Group Publishing.

Moser, J. (1987). Management expert systems (MES): a framework fordevelopment and implementation. Information processing and manage-ment,, 1723.

Newsom, R.L. (1990). The risk/benefit concept as applied to food, Instituteof Food Technologists.

Pigford, D., & Baur, G. (1990). Expert systems for business. Boston, MA:Boyd&Fraser.

Russell, M. (1996). Information exchange: is your plant wired? Food Engi-neering, January, 5056.

Silverman, R.S. (1996). Legal implications of risk assessments in foodprocessing operations, Food Technology, December, 6567.

Stein, E. W., & Evans, D. G. (1995). ANCHOR INSPECTOR: intended andunintended effects of a graphics-oriented expert system. Expert Systemswith Applications, 9 (2), 103113.

Stein, E. W., & Vandenbosch, B. (1996). Organizational learning duringadvanced system development: opportunities and obstacles. Journal ofManagement Information Systems, 13 (2), 115136.

Turban, E. (1992). Why expert systems succeed and why they fail. In E.Turban & J. Liebowitz (Eds.), Managing expert systems. Harrisburg,PA: Idea Group Publishing.

Wilson, P., & Dell, L., et al. (1993). Root cause analysis. Milwaukee, WI:ASQC Quality Press.

Wright, G. (1992). Expert systems verification and validation. In E. Turban& J. Liebowitz (Eds.), Managing expert systems, (pp. 300). Harrisburg,PA: Idea Group Publishing.



Appendix

failsafe: supporting product quality with knowledge-based systems

Documents

food manufacturersunder

food shallbe

major food company

theus food industry

high quality

regulatory climatethe

impact ofthe system

quality assurance qa