competence and responsibility in intelligent systems

Artificial Intelligence Review 6, 217--226, 1992. © 1992 KluwerAcademic Publishers. Printed in the Netherlands.

Competence and responsibility in intelligent systems

TONY MORGAN

SD-Scicon, Pembroke House, Pembroke Broadway, Camberley, Surrey GU15 3XD, U.K.

Abstract. The capabilities of Artificial Intelhgence have increased dramatically over the last decade. We can now contemplate the use of automation in tasks which were previously considered the exclusive domain of human professionals, Such a possibility raises new legal issues. This paper argues the need for finding methods of assessing the competence of such systems in order to assign responsibility for their actions.

Key Words: competence, responsibility, intelligent systems

1. INTRODUCTION

The last decade has seen a dramatic increase in the capabilities of artificial intelligence (AI). Early work in the 1950s concentrated on AI as a means of understanding human mental capabilities. While this is still an important area of research, much recent activity has been stimulated by the desire to u s e AI techniques, regardless of whether or not they are an accurate representation of the equivalent human processes. This engineering viewpoint has produced a number of successful operational systems, mostly of the 'expert system' variety. Current developments are coupling second-generation expert systems with other developments such as genetic algorithrns, deep knowledge, and neural networks. To avoid a tedious technical debate about the precise definition of terms, this paper treats all of the newer technologies together under the banner of 'Intelligent Systems'. Although using a wide variety of techniques and implementation strategies, the common feature shared by all intelligent systems is a deliberate attempt to emulate some aspect of human performance.

The systems which are likely to enter service during the next decade will raise some new legal issues. There are two reasons for this. Firstly, intelligent systems capabilities can now extend into areas which have previously resisted automation. Just as previous waves of computing have had their impact on clerical and manual labour, the newer technologies address 'professional' areas (including the legal profession itself). Tasks which have always been considered essentially human activities are now candidates for automation, at least in part. A particular example, used throughout this paper for illustration, is the domain of air traffic control (ATC). While automated support has long been a feature of ATC, the controller's job itself is beyond the competence of the existing data processing techniques. However, continual increases in traffic demand cannot

218 T O N Y M O R G A N

be met by simply adding more controllers: a law of diminishing returns sets in because the controllers have to spend more time in liaison with each other. Most authorities are now examining the use of intelligent systems to take over some of the functions currently performed by controllers.

The second issue raised by intelligent systems is the lack of hard experience in their use. The older technology of data processing has been in commercial use since the 1950s, and the consolidated experience of thirty years is now firmly ingrained in existing practice. This stretches beyond software issues alone, to cover aspects such as the way in which contracts are let for the supply of automation facilities. The new element introduced by intelligent systems is capability to deal with information rather than just data. Several of the underlying assumptions in (say) the procurement cycle for new software are nullified by differences in technology. For example, the 'specify-build-test' model of software development, which underlies most software contracts, is a poor fit to some types of intelligent system which may be trained, rather than programmed, to fulfil their functions.

2. THE QUESTION OF COMPETENCE

In order to examine the ways in which intelligent systems could fail, we must first understand what could be considered acceptable in such systems. This relates primarily to the idea of competence. We have an unquestioning expectation that human professionals will be competent in their chosen area. The understanding of the need for competence is reflected in our willingness to provide corresponding rewards; either individually (for example, through professional fees) or by acceptance of a special standing for a group providing a less individual service (such as air traffic controllers). In some way, we would like to be sure that any intelligent system possesses a similar competence before entrusting it with our business, or even with our life. Some insight can be gained by examining the characteristics of human competence. We can briefly consider the dimensions along which we might wish to assess the performance of a human professional. • Accuracy. Producing results which are 'correct', judged by some objective

standard. This may consist of making an accurate diagnosis in a complex situation, or choosing the right course of action from a large number of possibilities.

• Consistency. Avoiding variations in accuracy against other factors: for example, tiring quickly, or being easily distracted.

• Flexibility. Coping with new or unexpected situations, shifts in the type of tasks performed, or changes in the balance of the workload.

• Presentation. Producing results in a suitable form. For example, in a written report this would correspond to a good structure, clear writing style, attention to grammar, punctuation, and so on.

• Coherence. Performing tasks by an explicable method or path, so that any results are reproducible and can be audited should the need arise.

COMPETENCE AND RESPONSIBILITY 219

• Timeliness. Completing tasks within specified time constraints. • Availability. Being ready to take on new or additional tasks, as and when

needed. These are the kinds of criteria which may pass through our mind in choosing professional advisors: doctors, solicitors, and so on. Of course, we often have no direct evidence about these characteristics and have to make a choice based on indirect evidence. More will be said later on the subject of evaluation.

It seems that very similar criteria can be used to assess both humans and intelligent systems with the appropriate adjustments. For example, in humans the factors relevant to 'availability' might be absenteeism, frequent or prolonged sickness, timekeeping, etc., whereas for an intelligent systems they could be mechanical and electrical reliability, and the time needed to repair or replace faulty units.

Of course, other attributes are also used in assessing the appropriateness of intelligent systems such as initial cost, potential for upgrade, and so on, but these relate more to questions of cost-effectiveness than to the competence of the system, and are therefore outside the scope of the present paper.

3. INTELLIGENT SYSTEMS IN USE

The introduction of intelligent systems will naturally be cautious. We are unlikely to see an immediate replacement of skilled humans by automation. A more likely scenario is the initial appearance of 'automated assistants', capable of relieving a professional of repetitive routine tasks which can actually occupy a significant portion of the available time. For example, in the air traffic domain a controller will have a number of tasks such as continually monitoring the paths of aircraft in the airspace for which he or she is responsible. The assessment criteria used in monitoring are straightforward: conformance to a pre-defined flight plan, and maintenance of correct vertical and horizontal separation between pairs of aircraft are typical. The difficulties for a human controller normally arise in times of stress, due to traffic overload or some unusual condition in the controlled airspace. There is a possibility that the competence of the controller could suffer, particularly the attribute of consistency. One could imagine that a degree of automation could be beneficial to the service here, since an automated assistant would not suffer from fatigue or emotion, as a human might.

An important distinction between the 'assistant' approach and full automation is the degree of responsibility for decision making. In the assistant role, an intelligent system could filter the available information and call important aspects to the attention of the human. For example, the routine passage of an aircraft through a portion of airspace may require no action unless it threatens to violate some constraint, such as separation. The competence of the assistant in this case may relate to detecting anomalies in a dynamically changing set of data. Responsibility for taking decisions would still rest with the human controller. By implication, the controller would also be responsible for ensuring

220 TONY MORGAN

that the right information was used in coming to a decision, including information provided by the assistant. This emphasises the importance of the 'coherence' property as part of the competence of the assistant.

In situations of this sort, the immediate responsibility seems clear. It is the human who makes the decisions and is therefore responsible for any outcome. Any support system is simply an instrument, used by the professional as an aid to decision making. Increasing the sophistication of the instrument does not change its nature. It is much the same position as an X-ray machine used by a doctor, or the radar displays currently in use in ATC centres. Adding a greater degree of intelligence should improve the quality of the information provided, but would not change the fundamental position of the human. Responsibility is accepted by those who have to rely on machine-processed data (as is the case at present in ATC) on the assumption that the underlying automation systems meet certain standards. Assuring the quality of the more conventional data processing systems is not a trivial task at present, and will be even more difficult for future generations of intelligent systems.

The situation becomes considerably worse if a system is expected to make major decisions, such as issuing instructions to an aircraft to avoid a potential collision. If a human is not provided with the same information as the system, nor given an opportunity to inject conclusions, the responsibility shifts to the agency responsible for the operation of the automated system. In the ATC example this would probably be the Civil Aviation Authority of the country responsible for the airspace concerned. While it is true that the same authority would be ultimately responsible for any human controllers, the practicalities of the situation are significantly different. Humans practice their skills after a period of training and assessment, and are allocated greater responsibility after a period of demonstrable competence. There is no expectation that a human can be 'programmed' with all of the necessary attributes in advance. Policies and procedures are currently built around human capabilities, and do not provide a good framework for autonomous intelligent systems.

Intelligent systems are more than just better computer systems, and their introduction is likely to have widespread ramifications throughout an organisation. As an example, assume that it is feasible to build a system capable of autonomous operation in a domain such as ATC. Since humans would no longer be present to make operational decisions, the burden of responsibility for safety would shift to those procuring and installing the system. In many organisations, staff performing these functions have expertise in contractual and administration matters, not in operations. They might therefore find great difficulty in ensuring that the specification of such a system was correct, and in determining that the system was indeed performing as expected. Clearly, such a state of affairs is unlikely to be allowed to occur, but it points to the need to re- assess the way in which such systems may be acquired and used.

C O M P E T E N C E AND RESPONSIBILITY 221

4. THE PROCUREMENT CYCLE

The procurement of most significant systems (such as those used for ATC) is based on the practice evolved for early data processing systems. Assuming that the need for the system has been recognised, and that suitable budgetary provi- sion within the agency has been made, the typical sequence of events is as follows. (a) A specification is produced, describing the capabilities required of the

system. This stage usually involves the users of the system (such as controllers in the ATC case), although often the involvement is in the form of advice provided to the specification writers, rather than actual authorship.

(b) The specification is used as the basis for an invitation to tender (ITT) sent to several potential contractors. Contractors are normally asked to submit costed proposals within a given time limit. The ITT is normally framed to elicit competitive offers from several contractors.

(c) The proposals are evaluated against the specified requirement and the available budget. Different agencies have different policies: one extreme is 'lowest compliant bid wins', others prefer to take into account the cost- effectiveness of the proposal.

(d) A contract for supply is negotiated with the selected contractor. This provides an opportunity to 'fine tune' the contractor's proposed solution to the needs of the agency, as well as detailing the precise conditions of contract. The most common arrangement is for the agency to pay a fixed price for delivery of a system meeting the agreed specification.

(e) The contractor implements the system on the agreed basis. Again, different agencies have different policies on their involvement at this stage. The policy of the contractor is generally to remain as remote from the agency as possible. This results from the normal fixed-price basis, which discourages any deviation from the original specification.

(f) The system is tested against a schedule derived from the agreed specification. The agency accept delivery of the system on successful completion of a series of tests designed to show that the system does indeed meet the specification.

After completion of this sequence there may be a training and familiarisation phase before the system goes 'live'. At any event, the system is normally in the charge of the agency for introduction into service. Any modifications or upgrades are usually handled through the same mechanism as detailed above, although often on a smaller scale, and sometimes without the competitive element.

Several difficulties have been found with this sequence of events in practice. One of these arises from the combination of the relative inflexibility of the process and the long timescales involved. A period of several years may elapse between the original specification and the delivery of the system. During this time, the requirement may well shift, to the extent that the system originally specified is unworkable by the time it is delivered. Production of the original specification may itself be no mean feat, particularly if the system involves some


of the more human capabilities described above. Ensuring that the intention of the specification is covered by the tests applied to the system is another difficult area.

While this procurement model can work for data processing systems that can be described in terms of input--output relationships, it has been found to be difficult to apply in even fairly simple AI systems. Other approaches to development have shown greater promise, and practices such as 'prototyping' are now becoming more common. However, a particularly important barrier remains to be overcome before more intelligent systems can be deployed. This is a means of assessing the competence of such a system.

5. DETERMINING SYSTEM BEHAVIOURS

A pressing reason for assessing the competence of systems lies in the need to assign responsibility. However sophisticated the automation, and however many levels of computing are involved, there will eventually be a human (individual or group) who takes responsibility for the actions of the system. An American colleague summed up the situation neatly when he said: "If things go wrong, there has to be someone to sue!"

The introduction of intelligent systems will therefore be constrained not only by the technology itself, but by our capacity to deal with its implications. This is one reason why the early introduction of the technology is likely to be limited to 'assistant' systems. Such applications leave the original human professional in the same role, and concentrate on supporting, rather than usurping, his or her functions. It is very clear in this case that the lawyer, or doctor, or air traffic controller is the person taking responsibility. What is less clear is that the individuals will be happy to accept the responsibility if it depends, in part, on the advice of a system which they find hard to trust. Concern is already being expressed in the USA about future ATC automation on just these grounds.

In principle, the assessment of competence in data processing systems seems straightforward. A set of inputs will give rise to a set of outputs, and these can be checked. In practice, a truly rigorous analysis of anything but relatively simple systems is beyond current technology. Several methods of proving program correctness have been proposed, but they tend to be extremely laborious and are very expensive to carry out. Also, the program can only be proved correct relative to its specification. This does not necessarily guarantee fitness for purpose, since the specification itself may contain errors. While much hope is currently pinned on formal methods as a means of assuring the performance of conventional software systems, the prospects for rapid progress appear to be limited.

The picture is even more complicated for intelligent systems. These are notoriously difficult to specify in advance -- some would even argue that the very idea of specification shows a fundamentally flawed approach to development. This makes it hard to see how to accept such systems in the conventional sense, since it is not clear what standard their performance can be assessed

COMPETENCE AND RESPONSIBILITY 223

against. There are also several pitfalls for the unwary. For example, one can not infer that a system which can solve a complex problem will be also able to solve a simpler version of the same problem. This situation can arise in systems which achieve their goals through some form of pattern matching, because of over- specialisation of the patterns.

Many of the difficulties concerning intelligent systems seem to result from the merging of two different views. One the one hand we want these systems to have all of the best attributes of human competence; on the other, we also want to see them in pure machine terms, capable of expression in logic formulae. There is no evidence that it is possible to express human capabilities in a strict mathematical form. Such circumstantial evidence as does exist seems to indicate the opposite. Without wishing to be drawn into a 'can machines think' argument, we can look at a human as an existence proof that certain types of reasoning engine are feasible. One possible way of assuring ourselves of the competence of intelligent systems is therefore to apply to a machine the same sorts of tests and standards that we apply to humans.

6. MEASURING COMPETENCE

In real life, professional capabilities are often sought to protect against the possibility of loss or damage to person or property. Safety-critical situations such as ATC are typical in this respect. If such functions are to be automated, in whole or in part, then some prior evidence of competence will be required. This may impose a requirement for certification of such systems as 'fit to practice' before entering service. We can use human experience as a guide to assessment methods by looking at the ways in which humans are assessed.

General capabilities can be tested by means of examinations. The examinations are normally moderated so that they are of a known and widely accepted standard. Questions are framed to test a particular skill or section of knowledge in a constrained situation. Although the constraints (time limits, use of aids) are known in advance by the examinees, the questions are usually not. The questions are not intended to present real life operational tasks to the examinees, but to probe more general capabilities. Typical examples of the kind of capability tested include: the assimilation of given data relevant to a particular domain, the imposition of structure on data, and the performance of a computa- tion which combines given information in the correct manner. A variant of the examination is the direct testing or evaluation of a candidate, often by interview. This can be very subjective, and is usually aimed at assessing areas of competence such as flexibility and availability rather than 'performance' aspects, such as accuracy and consistency.

A second kind of assessment, which is rather more directed towards operational tasks, can be based on 'experience'. This shows previous accomplishment in some related field, implying that the capabilities are transferable to a new situation. Proven competence of this kind often plays a major part in selecting candidates for employment.

224 TONY MORGAN

The third main category is 'on-the-job' assessment, either for a trial period or on a continuous basis. This provides the most direct evidence of a candidate's competence. The degree to which live conditions are used for assessment depends on the application. A novice will normally only be trusted with simple and strictly limited tasks, typically under supervision. In the case of systems which have a learning capability (including humans!), this also provides a method of 'programming' them with approved procedures, methods, and practices.

In general it seems possible to apply some of these methods for testing automated systems. These have the additional possibility of direct inspection of the knowledge contained in the system, which is not available in the human counterpart. Validation could be eased if a recognisable model of the knowledge contained in the system could be identified. Separation of knowledge (model) and reasoning (program) would allow the model to be tested indepen- dently. This still leaves the problem of assessing the correctness of the reasoning engine, but with the complexity reduced to a comparably-sized data processing system.

'Examinations' for machines have only been used on a small scale, mainly in assessing performance through benchmark tests. These are normally carried out by measuring the time taken to complete a defined set of tasks, typically to compare similar software packages (such as compilers) or the relative speeds of different processing systems. A crucial difference between these tests and human examinations is the extent of advance knowledge. In human case, the specific tests (exam questions) are not known in advance, but can be selected to cover a given syllabus. There is no reason in principle why a similar policy could not be applied to intelligent systems. However, the idea of acceptance of a system depending on passing a series of unknown tests is unlikely to be attrac- tive to contractors in the current type of fixed-price contractual situations.

At present, assessing the previous experience of an intelligent system offers little potential. We are currently unable to build truly intelligent systems in a single domain, let alone those which can be re-used in some kind of 'job transfer'.

The in-service assessment of systems shows greater promise. It should be feasible to constrain the scope of an 'apprentice' system to prevent any major mishap. Coupled with some adaptive capability, the degree of trust placed in the system and the problem scope could be gradually extended. While this approach differs radically from the use of formal methods, it is precisely the way in which we accept human performance in areas demanding a high degree of competence. Taking ATC as the example again, controllers are subject to continual assessment and supervision. Their seniority reflects their competence, as judged by their peers. This emphasises the certainty that humans will be involved in such systems for the foreseeable future. Where otherwise can we find the peer group for assessing competence? Without human involvement at some point we have an infinite regress of machines being supervised by ever more intelligent machines.

The use of assessment of this kind changes the balance of the design of

C O M P E T E N C E A N D R E S P O N S I B I L I T Y 225

intelligent systems. For example, the coherence property discussed earlier will become a key feature in building confidence in system performance. There does seem to be a trade-off between scope and flexibility on the one hand, and simplicity and provability on the other. Increasing the scope and/or the flexibility of a system will almost certainly reduce simplicity and will make the task of determining the limits of the system more difficult. This hints at a possible evolution path for intelligent systems: concentration on producing a multiplicity of systems, each of which is specific to a single purpose. This allows limitation of scope without also limiting competence, as a means of controlling complexity and hence increasing the practicality of implementation.

7. WHERE DOES THE RESPONSIBILITY LIE?

The need to assign responsibility was pointed out earlier in this paper. This is true whether the system is fully autonomous or is used in the role of an assistant to a professional. There appear to be three major areas in which responsibility could be located.

The first is in the creation of the software. Of course, this is in itself a skilled professional task, and has also attracted attempts at automation. Leaving this aside for the moment, the two human activities most directly connected with the final competence of the system are the software writing and testing. From a legal point of view, responsibility in both cases would lie with the organisation contracted to supply the software. This puts the onus on the employer to select, train, and assess staff who are carrying out these functions. An employee who failed to observe the necessary standards may find their career blighted by failure, but would be unlikely to be the direct object of litigation. A rather different category is the use of software tools. If a software contractor uses a tool (such as a compiler) in good faith, it may be that a failure attributed to shortcomings in a tool would be the responsibility of the tool supplier. Here it would depend a great deal on the circumstances of the particular case; for example, if the contractor had been directed to use a particular compiler by his client.

A second area of responsibility lies in the procurement cycle. In current practice, the competence of the system is largely governed by the specification. Errors here are clearly the responsibility of the originating agency. Less obvious is the case of ambiguity which leaves the specification open to differing inter- pretations. This is also partly related to the definition of standards, either those written by the procuring agency if it has a role as regulatory body, or selected from available public standards as being appropriate to the contractual situation. In addition to defining the scope required of the system, the specification and standards will also simply the extent and nature of the testing required to determine 'fitness to practice'.

The third area of responsibility lies with the users of the system. This applies particularly to systems used in the role of an assistant. The user of the software is responsible for assuring that it is used within the limits of its competence. The


user is also responsible for the conditions of use. For example, use of software on hardware for which it was not designed may produce some unexpected interactions. Looking ahead to the potential use of fully autonomous systems, the users will no longer have the expertise to carry out the same degree of supervision exercised by a professional user. In this case we can anticipate that these functions would be transferred to some regulatory body, who would then take responsibility for assuring professional standards.

8. CONCLUSIONS

Increases in the capabilities of technology and in the urgency of many require- ments will increase the pressure to use intelligent systems in many applications requiring high standards. At present, the methods of assessing the competence of such systems are derived mainly from an earlier generation of software, which posed rather different problems. It appears to be impractical to expect a significant contribution from formal methods in the foreseeable future, and the assurance of total rigour remains a very distant aim. However, perhaps we should instead be considering the fact that current systems are operated by reasoning engines which have no formal proof of correctness. The fact that they are made of flesh and blood rather than silicon need not obscure the fact that tests of competence and clear lines of responsibility are shown to be practical.

In order to transfer this experience to future intelligent systems, we will need to change a number of currently accepted practices. An example is the procurement cycle, which imposes an unsuitable method of development on systems in order to simplify a contractual position.

The most likely prospect for change is through the introduction of increasing numbers of intelligent systems used in an assistant role. In addition to providing a worthwhile service, they will also provide both a demand for resolution of the issues raised here, and the means for testing new ideas against practical experience.

competence and responsibility in intelligent systems

Documents