why johnny can’t encrypt: a usability evaluation of pgp...

Why Johnny Can’t Encrypt:A Usability Evaluation of PGP 5.0

Alma WhittenSchool of Computer ScienceCarnegie Mellon University

Pittsburgh, PA [email protected]

J. D. Tygar1

EECS and SIMSUniversity of California

Berkeley, CA [email protected]

1 Also at Computer Science Department, Carnegie Mellon University (on leave).

Abstract

User errors cause or contribute to most computersecurity failures, yet user interfaces for security stilltend to be clumsy, confusing, or near-nonexistent. Isthis simply due to a failure to apply standard userinterface design techniques to security? We argue that,on the contrary, effective security requires a differentusability standard, and that it will not be achievedthrough the user interface design techniques appropriateto other types of consumer software.

To test this hypothesis, we performed a case studyof a security program which does have a good userinterface by general standards: PGP 5.0. Our casestudy used a cognitive walkthrough analysis togetherwith a laboratory user test to evaluate whether PGP 5.0can be successfully used by cryptography novices toachieve effective electronic mail security. The analysisfound a number of user interface design flaws that maycontribute to security failures, and the user testdemonstrated that when our test participants were given90 minutes in which to sign and encrypt a messageusing PGP 5.0, the majority of them were unable to doso successfully.

We conclude that PGP 5.0 is not usable enough toprovide effective security for most computer users,despite its attractive graphical user interface, supportingour hypothesis that user interface design for effectivesecurity remains an open problem. We close with abrief description of our continuing work on thedevelopment and application of user interface designprinciples and techniques for security.

1 Introduction

Security mechanisms are only effective when usedcorrectly. Strong cryptography, provably correctprotocols, and bug-free code will not provide security ifthe people who use the software forget to click on theencrypt button when they need privacy, give up on acommunication protocol because they are too confusedabout which cryptographic keys they need to use, oraccidentally configure their access control mechanismsto make their private data world-readable. Problemssuch as these are already quite serious: at least oneresearcher [2] has claimed that configuration errors arethe probable cause of more than 90% of all computersecurity failures. Since average citizens are nowincreasingly encouraged to make use of networkedcomputers for private transactions, the need to makesecurity manageable for even untrained users hasbecome critical [4, 9].

This is inescapably a user interface designproblem. Legal remedies, increased automation, anduser training provide only limited solutions. Individualusers may not have the resources to pursue an attackerlegally, and may not even realize that an attack tookplace. Automation may work for securing acommunications channel, but not for setting accesscontrol policy when a user wants to share some filesand not others. Employees can be required to attendtraining sessions, but home computer users cannot.

Why, then, is there such a lack of good userinterface design for security? Are existing general userinterface design principles adequate for security? Toanswer these questions, we must first understand whatkind of usability security requires in order to be

effective. In this paper, we offer a specific definition ofusability for security, and identify several significantproperties of security as a problem domain for userinterface design. The design priorities required toachieve usable security, and the challenges posed by theproperties we discuss, are significantly different fromthose of general consumer software. We thereforesuspect that making security usable will require thedevelopment of domain-specific user interface designprinciples and techniques.

To investigate further, we looked to existingsoftware to find a program that was representative ofthe best current user interface design for security, anexemplar of general user interface design as applied tosecurity software. By performing a detailed case studyof the usability of such a program, focusing on theimpact of usability issues on the effectiveness of thesecurity the program provides, we were able to getvaluable results on several fronts. First, our case studyserves as a test of our hypothesis that user interfacedesign standards appropriate for general consumersoftware are not sufficient for security. Second, goodusability evaluation for security is itself something ofan open problem, and our case study discusses anddemonstrates the evaluation techniques that we found tobe most appropriate. Third, our case study providesreal data on which to base our priorities and insights forresearch into better user interface design solutions, bothfor the specific program in question and for the domainof security in general.

We chose PGP 5.02 [5, 14] as the best candidatesubject for our case study. Its user interface appears tobe reasonably well designed by general consumersoftware standards, and its marketing literature [13]indicates that effort was put into the design, stating thatthe “significantly improved graphical user interfacemakes complex mathematical cryptography accessiblefor novice computer users.” Furthermore, since publickey management is an important component of manysecurity systems being proposed and developed today,the problem of how to make the functionality in PGPusable enough to be effective is widely relevant.

We began by deriving a specific usability standardfor PGP from our general usability standard forsecurity. In evaluating PGP 5.0’s usability against thatstandard, we chose to employ two separate evaluationmethods: a direct analysis technique called cognitivewalkthrough [17], and a laboratory user test [15]. The 2 At the time of this writing, PGP 6.0 has recently beenreleased. Some points raised in our case study may not applyto this newer version; however, this does not significantlydiminish the value of PGP 5.0 as a subject for usabilityanalysis. Also, our evaluation was performed using the AppleMacintosh version, but the user interface issues we addressare not specific to a particular operating system and areequally applicable to UNIX or Windows security software.

two methods have complementary strengths andweaknesses. User testing produces more objectiveresults, but is necessarily limited in scope; directanalysis can consider a wider range of possibilities andfactors, but is inherently subjective. The sum of thetwo methods produces a more exhaustive evaluationthan either could alone.

We present a point by point discussion of theresults of our direct analysis, followed by a briefdescription of our user test’s purpose, design, andparticipants, and then a compact discussion of the usertest results. A more detailed presentation of thismaterial, including user test transcript summaries, maybe found in [18].

Based on the results of our evaluation, we concludethat PGP 5.0’s user interface does not come evenreasonably close to achieving our usability standard – itdoes not make public key encryption of electronic mailmanageable for average computer users. This, alongwith much of the detail from our evaluation results,supports our hypothesis that security-specific userinterface design principles and techniques are needed.In our continuing work, we are using our usabilitystandard for security, the observations made in ourdirect analysis, and the detailed findings from our usertest as a basis from which to develop and applyappropriate design principles and techniques.

2 Understanding the problem

2.1 Defining usability for security

Usability necessarily has different meanings in differentcontexts. For some, efficiency may be a priority, forothers, learnability, for still others, flexibility. In asecurity context, our priorities must be whatever isneeded in order for the security to be used effectively.We capture that set of priorities in the definition below.

Definition: Security software is usable if thepeople who are expected to use it:

1. are reliably made aware of the securitytasks they need to perform;

2. are able to figure out how to successfullyperform those tasks;

3. don’t make dangerous errors; and4. are sufficiently comfortable with the

interface to continue using it.

2.2 Problematic properties of security

Security has some inherent properties that make it adifficult problem domain for user interface design.Design strategies for creating usable security will needto take these properties explicitly into account, andgeneralized user interface design does not do so. Wedescribe five such properties here; it is possible thatthere are others that we have not yet identified.

1. The unmotivated user property

Security is usually a secondary goal. People do notgenerally sit down at their computers wanting tomanage their security; rather, they want to sendemail, browse web pages, or download software,and they want security in place to protect themwhile they do those things. It is easy for people toput off learning about security, or to optimisticallyassume that their security is working, while theyfocus on their primary goals. Designers of userinterfaces for security should not assume that userswill be motivated to read manuals or to go lookingfor security controls that are designed to beunobtrusive. Furthermore, if security is toodifficult or annoying, users may give up on italtogether.

2. The abstraction property

Computer security management often involvessecurity policies, which are systems of abstractrules for deciding whether to grant accesses toresources. The creation and management of suchrules is an activity that programmers take forgranted, but which may be alien and unintuitive tomany members of the wider user population. Userinterface design for security will need to take thisinto account.

3. The lack of feedback property

The need to prevent dangerous errors makes itimperative to provide good feedback to the user,but providing good feedback for securitymanagement is a difficult problem. The state of asecurity configuration is usually complex, andattempts to summarize it are not adequate.Furthermore, the correct security configuration isthe one which does what the user “really wants”,and since only the user knows what that is, it ishard for security software to perform much usefulerror checking.

4. The barn door property

The proverb about the futility of locking the barndoor after the horse is gone is descriptive of animportant property of computer security: once asecret has been left accidentally unprotected, evenfor a short time, there is no way to be sure that ithas not already been read by an attacker. Becauseof this, user interface design for security needs toplace a very high priority on making sure usersunderstand their security well enough to keep frommaking potentially high-cost mistakes.

5. The weakest link property

It is well known that the security of a networkedcomputer is only as strong as its weakestcomponent. If a cracker can exploit a single error,the game is up. This means that users need to beguided to attend to all aspects of their security, notleft to proceed through random exploration as theymight with a word processor or a spreadsheet.

2.3 A usability standard for PGP

People who use email to communicate over the Internetneed security software that allows them to do so withprivacy and authentication. The documentation andmarketing literature for PGP presents it as a toolintended for that use by this large, diverse group ofpeople, the majority of whom are not computerprofessionals. Referring back to our general definitionof usability for security, we derived the followingquestion on which to focus our evaluation:

If an average user of email feels the need forprivacy and authentication, and acquiresPGP with that purpose in mind, will PGP’scurrent design allow that person to realizewhat needs to be done, figure out how to doit, and avoid dangerous errors, withoutbecoming so frustrated that he or she decidesto give up on using PGP after all?

Stating the question in more detail, we want to knowwhether that person will, at minimum:

� understand that privacy is achieved byencryption, and figure out how to encryptemail and how to decrypt email received fromother people;

� understand that authentication is achievedthrough digital signatures, and figure out how

to sign email and how to verify signatures onemail from other people;

� understand that in order to sign email andallow other people to send them encryptedemail a key pair must be generated, and figureout how to do so;

� understand that in order to allow other peopleto verify their signature and to send themencrypted email, they must publish their publickey, and figure out some way to do so;

� understand that in order to verify signatures onemail from other people and send encryptedemail to other people, they must acquire thosepeople’s public keys, and figure out some wayto do so;

� manage to avoid such dangerous errors asaccidentally failing to encrypt, trusting thewrong public keys, failing to back up theirprivate keys, and forgetting their pass phrases;and

� be able to succeed at all of the above within afew hours of reasonably motivated effort.

This is a minimal list of items that are essential tocorrect use of PGP. It does not include such importanttasks as having other people sign the public key,signing other people’s public keys, revoking the publickey and publicizing the revocation, or evaluating theauthenticity of a public key based on accompanyingsignatures and making use of PGP’s built-inmechanisms for such evaluation.

3 Evaluation methods

We chose to evaluate PGP’s usability through twomethods: an informal cognitive walkthrough [17] inwhich we reviewed PGP’s user interface directly andnoted aspects of its design that failed to meet theusability standard described in Section 2.3; and a usertest [15] performed in a laboratory with test participantsselected to be reasonably representative of the generalpopulation of email users. The strengths andweaknesses inherent in each of the two methods madethem useful in quite different ways, and it was morerealistic for us to view them as complementaryevaluation strategies [7] than to attempt to use thelaboratory test to directly verify the points raised by thecognitive walkthrough.

Cognitive walkthrough is a usability evaluationtechnique modeled after the software engineeringpractice of code walkthroughs. To perform a cognitivewalkthrough, the evaluators step through the use of thesoftware as if they were novice users, attempting tomentally simulate what they think the novices’

understanding of the software would be at each point,and looking for probable errors and areas of confusion.As an evaluation tool, cognitive walkthrough tends tofocus on the learnability of the user interface (asopposed to, say, the efficiency), and as such it is anappropriate tool for evaluating the usability of security.

Although our analysis is most accurately describedas a cognitive walkthough, it also incorporated aspectsof another technique, heuristic evaluation [11]. In thistechnique, the user interface is evaluated against aspecific list of high-priority usability principles; our listof principles is comprised by our definition of usabilityfor security as given in Section 2.1 and its restatementspecifically for PGP in Section 2.3. Heuristicevaluation is ideally performed by people who are“double experts,” highly familiar with both theapplication domain and with usability techniques andrequirements (including an understanding of the skills,mindset and background of the people who areexpected to use the software). Our evaluation draws onour experience as security researchers and on additionalbackground in training and tutoring novice computerusers, as well as in theater, anthropology andpsychology.

Some of the same properties that make the designof usable security a difficult and specialized problemalso make testing the usability of security a challengingtask. To conduct a user test, we must ask theparticipants to use the software to perform some taskthat will include the use of the security. If, however,we prompt them to perform a security task directly,when in real life they might have had no awareness ofthat task, then we have failed to test whether thesoftware is designed well enough to give them thatawareness when they need it. Furthermore, to testwhether they are able to figure out how to use thesecurity when they want it, we must make sure that thetest scenario gives them some secret that they considerworth protecting, comparable to the value we expectthem to place on their own secrets in the real world.Designing tests that take these requirements adequatelyinto account is something that must be done carefully,and with the exception of some work on testing theeffectiveness of warning labels [19], we have foundlittle existing material on user testing that addressessimilar concerns.

4 Cognitive walkthrough

Since this paper is intended for a security audience,and is subject to space limitations, we present theresults of our cognitive walkthrough in summary form,focusing on the points which are most relevant tosecurity risks.

4.1 Visual metaphors

The metaphor of keys is built into cryptologicterminology, and PGP’s user interface relies heavily ongraphical depictions of keys and locks. The PGPToolsdisplay, shown in Figure 1, offers four buttons to theuser, representing four operations: Encrypt, Sign,Encrypt & Sign, and Decrypt/Verify, plus a fifth buttonfor invoking the PGPKeys application. The graphicallabels on these buttons indicate the encryptionoperation with an icon of a sealed envelope that has ametal loop on top to make it look like a closed padlock,and, for the decryption operation, an icon of an openenvelope with a key inserted at the bottom. Even for anovice user, these appear to be straightforward visualmetaphors that help make the use of keys to encrypt anddecrypt into an intuitive concept.

Still more helpful, however, would be an extensionof the metaphor to distinguish between public keys forencryption and private keys for decryption; normallocks use the same key to lock and unlock, and the keymetaphor will lead people to expect the same forencryption and decryption if it is not visually clarifiedin some way. Faulty intuition in this case may leadthem to assume that they can always decrypt anythingthey have encrypted, an assumption which may haveupsetting consequences. Different icons for public andprivate keys, perhaps drawn to indicate that they fittogether like puzzle pieces, might be an improvement.

Signatures are another metaphor built intocryptologic terminology, but the icon of the blue quillpen that is used to indicate signing is problematic.People who are not familiar with cryptographyprobably know that quills are used for signing, and willrecognize that the picture indicates the signatureoperation, but what they also need to understand is thatthey are using their private keys to generate signatures.The quill pen icon, which has nothing key-like about it,will not help them understand this and may even leadthem to think that, along with the key objects that theyuse to encrypt, they also have quill pen objects that theyuse to sign. Quill pen icons encountered elsewhere inthe program may be taken to be those objects, ratherthan the signatures that they are actually intended torepresent. A better icon design might keep the quill pen

to represent signing, but modify it to show a private keyas the nib of the pen, and use some entirely differenticon for signatures, perhaps something that looks morelike a bit of inked handwriting and incorporates akeyhole shape.

Signature verification is not represented visually,which is a shame since it would be easy for people tooverlook it altogether. The single button forDecrypt/Verify, labeled with an icon that only evokesdecryption, could easily lead people to think that“verify” just means “verify that the decryption occurredcorrectly.” Perhaps an icon that showed a private keyunlocking the envelope and a public key unlocking thesignature inside could suggest a much more accuratemodel to the user, while still remaining simple enoughto serve as a button label.

4.2 Different key types

Originally, PGP used the popular RSA algorithm forencryption and signing. PGP 5.0 uses the Diffie-Hellman/DSS algorithms. The RSA and Diffie-Hellman/DSS algorithms use correspondingly differenttypes of keys. The makers of PGP would prefer to seeall the users of their software switch to use of Diffie-Hellman/DSS, but have designed PGP 5.0 to bebackward compatible and handle existing RSA keyswhen necessary. The lack of forward compatibility,however, can be a problem: if a file is encrypted forseveral recipients, some of whom have RSA keys andsome of whom have Diffie-Hellman/DSS keys, therecipients who have RSA keys will not be able todecrypt it unless they have upgraded to PGP 5.0;similarly, those recipients will not be able to verifysignatures created with Diffie-Hellman/DSS without asoftware upgrade.

PGP 5.0 alerts its users to this compatibility issuein two ways. First, it uses different icons to depict thedifferent key types: a blue key with an old fashionedshape for RSA keys, and a brass key with a moremodern shape for Diffie-Hellman/DSS keys, as shownin Figure 2. Second, when users attempt to encryptdocuments using mixed key types, a warning message

Figure 1

is displayed to tell them that recipients who have earlierversions of PGP may not be able to decrypt it.

Unfortunately, information about the meaning ofthe blue and brass key icons is difficult to find,requiring users either to go looking through the 132page manual, or to figure it out based on the presence ofother key type data. Furthermore, other than thewarning message encountered during encryption,explanation of why the different key types aresignificant (in particular, the risk of forwardcompatibility problems) is given only in the manual.Double-clicking on a key pops up a Key Propertieswindow, which would be a good place to provide ashort message about the meaning of the blue or brasskey icon and the significance of the corresponding keytype.

It is most important for the user to pay attention tothe key types when choosing a key for messageencryption, since that is when mixed key types cancause compatibility problems. However, PGP’s dialogbox (see Figure 3) presents the user with the metaphorof choosing people (recipients) to receive the message,rather than keys to encrypt the message with. This isnot a good design choice, not only because the humanhead icons obscure the key type information, but alsobecause people may have multiple keys, and it iscounterintuitive for the dialog to display multipleversions of a person rather than the multiple keys thatperson owns.

4.3 Key server

Key servers are publicly accessible (via the Internet)databases in which anyone can publish a public keyjoined to a name. PGP is set to access a key server atMIT by default, but there are others available, most ofwhich are kept up to date as mirrors of each other. PGPoffers three key server operations to the user under theKeys pull-down menu shown in Figure 4: Get SelectedKey, Send Selected Key, and Find New Keys. The firsttwo of those simply connect to the key server andperform the operation. The third asks the user to typein a name or email address to search for, connects to thekey server and performs the search, and then tells theuser how many keys were returned as a result, askingwhether or not to add them to the user’s key ring.

The first problem we find with this presentation ofthe key server is that users may not realize it exists,since there is no representation of it in the top level ofthe PGPKeys display. Putting the key server operationsunder a Key Server pull-down menu would be a betterdesign choice, especially since it is worthwhile toencourage the user to make a mental distinctionbetween operations that access remote machines andthose that are purely local. We also think that it shouldbe made clearer that a remote machine is beingaccessed, and that the identity of the remote machineshould be displayed. Often the “connecting…receivingdata…closing connection” series of status messagesthat PGP displayed flashed by almost too quickly to beread.

At present, PGPKeys keeps no records of keyserver accesses. There is nothing to show whether a

Figure 2

key has been sent to a key server, or when a key wasfetched or last updated, and from which key server thekey was fetched or updated. This is information thatmight be useful to the user for key management and forverifying that key server operations were completedsuccessfully. Adding this record keeping to theinformation displayed in the Key Properties windowwould improve PGP.

Key revocation, in which a certificate is publishedto announce that a previously published public keyshould no longer be considered valid, generally impliesthe use of the key server to publicize the revocation.PGP’s key revocation operation does not send theresulting revocation certificate to the key server, whichis probably as it should be, but there is a risk that someusers will assume that it does do so, and fail to take thataction themselves. A warning that the createdrevocation certificate has not yet been publicized wouldbe appropriate.

4.4 Key management policy

PGP maintains two ratings for each public key in a PGPkey ring. These ratings may be assigned by the user orderived automatically. The first of these ratings isvalidity which is meant to indicate how sure the user isthat the key is safe to encrypt with (i.e., that it does

belong to the person whose name it is labeled with). Akey may be labeled as completely valid, marginallyvalid, or invalid. Keys that the user generates arealways completely valid. The second of these ratings istrust which indicates how much faith the user has in thekey (and implicitly, the owner of the key) as a certifierof other keys. Similarly, a key may be labeled ascompletely trusted, marginally trusted, or untrusted, andthe user's own keys are always completely trusted.

What the user may not realize, unless they read themanual very carefully, is that there is a policy built intoPGP that automatically sets the validity rating of a keybased on whether it has been signed by a certainnumber of sufficiently trusted keys. This is dangerous.There is nothing to prevent users from innocentlyassigning their own interpretations to those ratings andsetting them accordingly (especially since “validity”and “trust” have different colloquial meanings), and itis certainly possible that some people might makemental use of the validity rating while disregarding andperhaps incautiously modifying the trust ratings. PGP'sability to automatically derive validity ratings can beuseful, but the fact that PGP is doing so needs to bemade obvious to the user.

Figure 3

4.5 Irreversible actions

Some user errors are reversible, even if they requiresome time and effort to reconstruct the desired state.The ones we list below, however, are not, andpotentially have unpleasant consequences for the user,who might lose valuable data.

Accidentally deleting the private key

A public key, if deleted, can usually be gottenagain from a key server or from its owner. Aprivate key, if deleted and not backed upsomewhere, is gone for good, and anythingencrypted with its corresponding public key willnever be able to be decrypted, nor will the userever be able to make a revocation certificate forthat public key. PGP responds to any attempt todelete a key with the question “Do you really wantto delete these items?” This is fine for a publickey, but attempts to delete a private key should bemet with a warning about the possibleconsequences.

Accidentally publicizing a key

Information can only be added to a key server, notremoved. A user who is experimenting with PGPmay end up generating a number of key pairs thatare permanently added to the key server, withoutrealizing that these are permanent entries. It is truethat the effect of this can be partially addressed byrevoking the keys later (or waiting for them toexpire), but this is not a satisfactory solution. First,even if a key is revoked or expired, it remains onthe key server. Second, the notions of revocationand expiration are relatively sophisticatedconcepts; concepts that are likely to be unfamiliarto a novice user. For example, as discussed above,the user may accidentally lose the ability to

generate a revocation certificate for a key. This isparticularly likely for a user who wasexperimenting with PGP and generating a varietyof test keys that they intend to delete. One way toaddress this problem would be to warn the userwhen he sends a key to a server that theinformation being sent will be a permanentaddition.

Accidentally revoking a key

Once the user revokes a public key, the only way toundo the revocation is to restore the key ring froma backup copy. PGP’s warning message for therevocation operation asks “Are you sure you wantto revoke this key? Once distributed, others will beunable to encrypt data to this key.” This messagedoesn’t warn the user that, even if no distributionhas taken place, a previous backup of the key ringwill be needed if the user wants to undo therevocation. Also, it may contribute to themisconception that revoking the key automaticallydistributes the revocation.

Forgetting the pass phrase

PGP suggests that the user make a backuprevocation certificate, so that if the pass phrase islost, at least the user can still use that certificate torevoke the public key. We agree that this is auseful thing to do, but we also believe that onlyexpert users of PGP will understand what thismeans and how to go about doing so (under PGP’scurrent design, this requires the user to create abackup of the key ring, revoke the public key,create another backup of the key ring that has therevoked key, and then restore the key ring from theoriginal backup).

Figure 4

Failing to back up the key rings

We see two problems with the way the mechanismfor backing up the key rings is presented. First, theuser is not reminded to back up the key rings untilhe or she exits PGPKeys; it would be better toremind as soon as keys are generated, so as not torisk losing them to a system crash. Second,although the reminder message tells the user that itis important to back up the keys to some mediumother than the main hard drive, the dialog box forbacking up presents the main PGP folder as adefault backup location. Since most users will justclick the “Okay” button and accept the default, thisis not a good design.

4.6 Consistency

When PGP is in the process of encrypting or signing afile, it presents the user with a status message that saysit is currently “encoding.” It would be better to say“encrypting” or “signing”, since seeing terms thatexplicitly match the operations being performed helpsto create a clear mental model for the user, andintroducing a third term may confuse the user intothinking there is a third operation taking place. Werecognize that the use of the term “encoding” here maysimply be a programming error and not a design choiceper se, but we think this is something that should becaught by usability-oriented product testing.

4.7 Too much information

In previous implementations of PGP, the supportingfunctions for key management (creating key rings,collecting other people’s keys, constructing a “web oftrust”) tended to overshadow PGP’s simpler primaryfunctions, signing and encryption. PGP 5.0 separatesthese functions into two applications: PGPKeys for keymanagement, and PGPTools for signing and encryption.This cleans up what was previously a rather jumbledcollection of primary and supporting functions, andgives the user a nice simple interface to the primaryfunctions. We believe, however, that the PGPKeysapplication still presents the user with far too muchinformation to make sense of, and that it needs to do abetter job of distinguishing between basic, intermediate,and advanced levels of key management activity so asnot to overwhelm its users.

Currently, the PGPKeys display (see Figure 2)always shows the following information for each keyon the user’s key ring: owner’s name, validity, trust

level, creation date, and size. The key type is alsoindicated by the choice of icon, and the user can togglethe display of the signatures on each key. This is a lotof information, and there is nothing to help the userfigure out which parts of the display are the mostimportant to pay attention to. We think that this willcause users to fail to recognize data that is immediatelyrelevant, such as the key type; that it will increase thechances that they will assign wrong interpretations tosome of the data, such as trust and validity; and that itwill add to making users feel overwhelmed anduncertain that they are managing their securitysuccessfully.

We believe that, realistically, the vast majority ofPGP’s users will be moving from sending all of theiremail in plain text to using simple encryption whenthey email something sensitive, and that they will beinclined to trust all the keys they acquire, because theyare looking for protection against eavesdroppers andnot against the sort of attack that would try to trickthem into using false keys. A better design ofPGPKeys would have an initial display configurationthat concentrated on giving the user the correct modelof the relationship between public and private keys, thesignificance of key types, and a clear understanding ofthe functions for acquiring and distributing keys.Removing the validity, trust level, creation date andsize from the display would free up screen area for this,and would help the user focus on understanding thebasic model well. Some security experts may find thedownplaying of this information alarming, but the goalhere is to enable users who are inexperienced withcryptography to understand and begin to use the basics,and to prevent confusion or frustration that might leadthem to use PGP incorrectly or not at all.

A smaller set of more experienced users willprobably care more about the trustworthiness of theirkeys; perhaps these users do have reason to believe thatthe contents of their email is valuable enough to be thetarget of a more sophisticated, planned attack, orperhaps they really do need to authenticate a digitalsignature as coming from a known real world entity.These users will need the information given by thesignatures on each key. They may find the validity andtrust labels useful for recording their assessments ofthose signatures, or they may prefer to glance at theactual signatures each time. It would be worthwhile toallow users to add the validity and trust labels to thedisplay if they want to, and to provide easily accessiblehelp for users who are transitioning to this moresophisticated level of use. But this would only makesense if the automatic derivation of validity by PGP’sbuilt-in policy were turned off for these users, for thereasons discussed in Section 4.4.

Key size is really only relevant to those whoactually fear a cryptographic attack, and could certainly

be left as information for the Key Properties dialog, ascould the creation date. Users who are sophisticatedenough to make intelligent use of that information arecertainly sophisticated enough to go looking for it.

5 User test

5.1 Purpose

Our user test was designed to evaluate whether PGP 5.0meets the specific usability standard described inSection 2.3. We gave our participants a test scenariothat was both plausible and appropriately motivating,and then avoided interfering with their attempts to carryout the security tasks that we gave them.

5.2 Description

5.2.1 Test design

Our test scenario was that the participant hadvolunteered to help with a political campaign and hadbeen given the job of campaign coordinator (the partyaffiliation and campaign issues were left to theparticipant’s imagination, so as not to offend anyone).The participant’s task was to send out campaign planupdates to the other members of the campaign team byemail, using PGP for privacy and authentication. Sincepresumably volunteering for a political campaignimplies a personal investment in the campaign’ssuccess, we hoped that the participants would beappropriately motivated to protect the secrecy of theirmessages.

Since PGP does not handle email itself, it wasnecessary to provide the participants with an emailhandling program to use. We chose to give themEudora, since that would allow us to also evaluate thesuccess of the Eudora plug-in that is included withPGP. Since we were not interested in testing theusability of Eudora (aside from the PGP plug-in), wegave the participants a brief Eudora tutorial beforestarting the test, and intervened with assistance duringthe test if a participant got stuck on something that hadnothing to do with PGP.

After briefing the participants on the test scenarioand tutoring them on the use of Eudora, they weregiven an initial task description which provided themwith a secret message (a proposed itinerary for thecandidate), the names and email addresses of thecampaign manager and four other campaign teammembers, and a request to please send the secretmessage to the five team members in a signed and

encrypted email. In order to complete this task, aparticipant had to generate a key pair, get the teammembers’ public keys, make their own public keyavailable to the team members, type the (short) secretmessage into an email, sign the email using their privatekey, encrypt the email using the five team members’public keys, and send the result. In addition, wedesigned the test so that one of the team members hadan RSA key while the others all had Diffie-Hellman/DSS keys, so that if a participant encryptedone copy of the message for all five team members(which was the expected interpretation of the task), theywould encounter the mixed key types warning message.Participants were told that after accomplishing thatinitial task, they should wait to receive email from thecampaign team members and follow any instructionsthey gave.

Each of the five campaign team members wasrepresented by a dummy email account and a key pairwhich were accessible to the test monitor through anetworked laptop. The campaign manager’s privatekey was used to sign each of the team members’ publickeys, including her own, and all five of the signedpublic keys were placed on the default key server atMIT, so that they could be retrieved by participantrequests.

Under certain circumstances, the test monitorposed as a member of the campaign team and sentemail to the participant from the appropriate dummyaccount. These circumstances were:

1. The participant sent email to that team memberasking a question about how to do something. Inthat case, the test monitor sent the minimallyinformative reply consistent with the test scenario,i.e. the minimal answer that wouldn’t make thatteam member seem hostile or ignorant beyond thebounds of plausibility3.

2. The participant sent the secret in a plaintext email.The test monitor then sent email posing as thecampaign manager, telling the participant what

3 This aspect of the test may trouble the reader in thatdifferent test participants were able to extract differentamounts of information by asking questions in email, thusleading to test results that are not as standardized as we mightlike. However, this is in some sense realistic; PGP is beingtested here as a utility for secure communication, and peoplewho use it for that purpose will be likely to ask each other forhelp with the software as part of that communication. Wepoint out also that the purpose of our test is to locate extremeusability problems, not to compare the performance of one setof participants against another, and that while inaccuratelyimproved performance by a few participants might cause us tofail to identify some usability problems, it certainly would notlead us to identify a problem where none exists.

happened, stressing the importance of usingencryption to protect the secrets, and asking theparticipant to try sending an encrypted test emailbefore going any further. If the participantsucceeded in doing so, the test monitor (posing asthe campaign manager) then sent an updated secretto the participant in encrypted email and the testproceeded as from the beginning.

3. The participant sent email encrypted with thewrong key. The test monitor then sent emailposing as one of the team members who hadreceived the email, telling the participant that theteam member was unable to decrypt the email andasking whether the participant had used that teammember’s key to encrypt.

4. The participant sent email to a team memberasking for that team member’s key. The testmonitor then posed as that team member and sentthe requested key in email.

5. The participant succeeded in carrying out the initialtask. They were then sent a signed, encryptedemail from the test monitor, posing as thecampaign manager, with a change for the secretmessage, in order to test whether they coulddecrypt and read it successfully. If at that pointthey had not done so on their own, they receivedemail prompting to remember to back up their keyrings and to make a backup revocation certificate,to see if they were able to perform those tasks. Ifthey had not sent a separately encrypted version ofthe message to the team member with the RSAkey, they also received email from the test monitorposing as that team member and complaining thathe couldn’t decrypt the email message.

6. The participant sent email telling the team memberwith the RSA key that he should generate a newkey or should upgrade his copy of PGP. In thatcase the test monitor continued sending email asthat team member, saying that he couldn’t or didn’twant to do those things and asking the participantto please try to find a way to encrypt a copy that hecould decrypt.

Each test session lasted for 90 minutes, from the pointat which the participant was given the initial taskdescription to the point when the test monitor stoppedthe session. Manuals for both PGP and Eudora wereprovided, along with a formatted floppy disk, andparticipants were told to use them as much as theyliked.

5.2.2 Participants

The user test was run with twelve different participants,all of whom were experienced users of email, and noneof whom could describe the difference between publicand private key cryptography prior to the test sessions.The participants all had attended at least some college,and some had graduate degrees. Their ages rangedfrom 20 to 49, and their professions were diverselydistributed, including graphic artists, programmers, amedical student, administrators and a writer. Moredetailed information about participant selection anddemographics is available in [18].

5.3 Results

We summarize the most significant results we observedfrom the test sessions, again focusing on the usabilitystandard for PGP that we gave in Section 2.3. Detailedtranscripts of the test sessions are available in [18].

Avoiding dangerous errors

Three of the twelve test participants (P4, P9, and P11)accidentally emailed the secret to the team memberswithout encryption. Two of the three (P9 and P11)realized immediately that they had done so, but P4appeared to believe that the security was supposed to betransparent to him and that the encryption had takenplace. In all three cases the error occurred while theparticipants were trying to figure out the system byexploring.

One participant (P12) forgot her pass phrase duringthe course of the test session and had to generate a newkey pair. Participants tended to choose pass phrasesthat could have been standard passwords, eight to tencharacters long and without spaces.

Figuring out how to encrypt with any key

One of the twelve participants (P4) was unable to figureout how to encrypt at all. He kept attempting to find away to “turn on” encryption, and at one point believedthat he had done so by modifying the settings in thePreferences dialog in PGPKeys. Another of the twelve(P2) took more than 30 minutes4 to figure out how toencrypt, and the method he finally found required areconfiguration of PGP (to make it display thePGPMenu inside Eudora). Another (P3) spent 25

4 This is measured as time the participant spent working onthe specific task of encrypting a message, and does notinclude time spent working on getting keys, generating keys,or otherwise exploring PGP and Eudora.

minutes sending repeated test messages to the teammembers to see if she had succeeded in encryptingthem (without success), and finally succeeded only afterbeing prompted to use the PGP Plug-In buttons.

Figuring out the correct key to encrypt with

Among the eleven participants who figured out how toencrypt, failure to understand the public key model waswidespread. Seven participants (P1, P2, P7, P8, P9,P10 and P11) used only their own public keys toencrypt email to the team members. Of those seven,only P8 and P10 eventually succeeded in sendingcorrectly encrypted email to the team members beforethe end of the 90 minute test session (P9 figured outthat she needed to use the campaign manager’s publickey, but then sent email to the the entire team encryptedonly with that key), and they did so only after they hadreceived fairly explicit email prompting from the testmonitor posing as the team members. P1, P7 and P11appeared to develop an understanding that they neededthe team members’ public keys (for P1 and P11, thiswas also after they had received prompting email), butstill did not succeed at correctly encrypting email. P2never appeared to understand what was wrong, evenafter twice receiving feedback that the team memberscould not decrypt his email.

Another of the eleven (P5) so completelymisunderstood the model that he generated key pairs foreach team member rather than for himself, and thenattempted to send the secret in an email encrypted withthe five public keys he had generated. Even afterreceiving feedback that the team members were unableto decrypt his email, he did not manage to recover fromthis error.

Decrypting an email message

Five participants (P6, P8, P9, P10 and P12) receivedencrypted email from a team member (aftersuccessfully sending encrypted email and publicizingtheir public keys). P10 tried for 25 minutes but wasunable to figure out how to decrypt the email. P9mistook the encrypted message block for a key, andemailed the team member who sent it to ask if that wasthe case; after the test monitor sent a reply from theteam member saying that no key had been sent and thatthe block was just the message, she was then able todecrypt it successfully. P6 had some initial difficultyviewing the results after decryption, but recoveredsuccessfully within 10 minutes. P8 and P12 were ableto decrypt without any problems.

Publishing the public key

Ten of the twelve participants were able to successfullymake their public keys available to the team members;the other two (P4 and P5) had so much difficulty withearlier tasks that they never addressed key distribution.Of those ten, five (P1, P2, P3, P6 and P7) sent theirkeys to the key server, three (P8, P9 and P10) emailedtheir keys to the team members, and P11 and P12 didboth. P3, P9 and P10 publicized their keys only afterbeing prompted to do so by email from the test monitorposing as the campaign manager.

The primary difficulty that participants appeared toexperience when attempting to publish their keysinvolved the iconic representation of their key pairs inPGPKeys. P1, P11 and P12 all expressed confusionabout which icons represented their public keys andwhich their private keys, and were disturbed by the factthat they could only select the key pair icon as anindivisible unit; they feared that if they then sent theirselection to the key server, they would be accidentallypublishing their private keys. Also, P7 tried and failedto email her public key to the team members; she wasconfused by the directive to “paste her key into thedesired area” of the message, thinking that it referred tosome area specifically demarcated for that purpose thatshe was unable to find.

Getting other people’s public keys

Eight of the twelve participants (P1, P3, P6, P8, P9,P10, P11 and P12) successfully got the team members’public keys; all of the eight used the key server to doso. Five of the eight (P3, P8, P9, P10 and P11) receivedsome degree of email prompting before they did so. Ofthe four who did not succeed, P2 and P4 never seemedaware that they needed to get the team members’ keys,P5 was so confused about the model that he generatedkeys for the team members instead, and P7 spent 15minutes trying to figure out how to get the keys butultimately failed.

P7 gave up on using the key server after one failedattempt in which she tried to retrieve the campaignmanager’s public key but got nothing back (perhaps dueto mis-typing the name). P1 spent 25 minutes tryingand failing to import a key from an email message; hecopied the key to the clipboard but then kept trying todecrypt it rather than import it. P12 also had difficultytrying to import a key from an email message: the keywas one she already had in her key ring, and when hercopy and paste of the key failed to have any effect onthe PGPKeys display, she assumed that her attempt hadfailed and kept trying. Eventually she became soconfused that she began trying to decrypt the keyinstead.

Handling the mixed key types problem

Four participants (P6, P8, P10 and P12) eventuallymanaged to send correctly encrypted email to the teammembers (P3 sent a correctly encrypted email to thecampaign manager, but not to the whole team). P6 sentan individually encrypted message to each teammember to begin with, so the mixed key types problemdid not arise for him. The other three received a replyemail from the test monitor posing as the team memberwith an RSA key, complaining that he was unable todecrypt their email.

P8 successfully employed the solution of sendingthat team member an email encrypted only with hisown key. P10 explained the cause of the problemcorrectly in an email to that team member, but didn’tmanage to offer a solution. P12 half understood,initially believing that the problem was due to the factthat her own key pair was Diffie-Hellman/DSS, andattempting to generate herself an RSA key pair as asolution. When she found herself unable to do that, shethen decided that maybe the problem was just that shehad a corrupt copy of that team member’s public key,and began trying in various ways to get a good copy ofit. She was still trying to do so at the end of the testsession.

Signing an email message

All the participants who were able to send an encryptedemail message were also able to sign the message(although in the case of P5, he signed using key pairsthat he had generated for other people). It was unclearwhether they assigned much significance to doing so,beyond the fact that it had been requested as part of thetask description.

Verifying a signature on an email message

Again, all the participants who were able to decrypt anemail message were by default also verifying thesignature on the message, since the only decryptionoperation available to them includes verification.Whether they were aware that they were doing so, orpaid any attention to the verification result message, isnot something we were able to determine from this test.

Creating a backup revocation certificate

We would have liked to know whether the participantswere aware of the good reasons to make a backuprevocation certificate and were able to figure out how todo so successfully. Regrettably, this was very difficultto test for. We settled for direct prompting to make abackup revocation certificate, for participants who

managed to successfully send encrypted email anddecrypt a reply (P6, P8 and P12).

In response to this prompting, P6 generated a testkey pair and then revoked it, without sending either thekey pair or its revocation to the key server. Heappeared to think he had successfully completed thetask. P8 backed up her key rings, revoked her key, thensent email to the campaign manager saying she didn’tknow what to do next. P12 ignored the prompt,focusing on another task.

Deciding whether to trust keys from the key server

Of the eight participants who got the team members’public keys, only three (P1, P6, and P11) expressedsome concern over whether they should trust the keys.P1’s worry was expressed in the last five minutes of histest session, so he never got beyond that point. P6noted aloud that the team members’ keys were allsigned by the campaign manager’s key, and took that asevidence that they could be trusted. P11 expressedgreat distress over not knowing whether or not sheshould trust the keys, and got no further in theremaining ten minutes of her test session. None of thethree made use of the validity and trust labelingprovided by PGPKeys.

6 Conclusions

6.1 Failure of standard interface design

The results seen in our case study support ourhypothesis that the standard model of user interfacedesign, represented here by PGP 5.0, is not sufficient tomake computer security usable for people who are notalready knowledgeable in that area. Our twelve testparticipants were generally educated and experienced atusing email, yet only one third of them were able to usePGP 5.0 to correctly sign and encrypt an email messagewhen given 90 minutes in which to do so. Furthermore,one quarter of them accidentally exposed the secret theywere meant to protect in the process, by sending it inemail they thought they had encrypted but had not.

In Section 2.1, we defined usability for security interms of four necessary qualities, which translatedirectly to design priorities. PGP 5.0’s user interfacefails to enable effective security where it is notdesigned in accordance with those priorities: testparticipants did not understand the public key modelwell enough to know that they must get public keys forpeople they wish to send secure email to; many whoknew that they needed to get a key or to encrypt stillhad substantial difficulties in figuring out how to do so;some erroneously sent secrets in plaintext, thinking that

they had encrypted; and many expressed frustrationand unhappiness with the experience of trying to usePGP 5.0, to the point where it is unlikely that theywould have continued to use it in the real world.

All this failure is despite the fact that PGP 5.0 isattractive, with basic operations neatly represented bybuttons with labels and icons, and pull-down menus forthe rest, and despite the fact that it is simple to use forthose who already understand the basic models ofpublic key cryptography and digital signature-basedtrust. Designing security that is usable enough to beeffective for those who don’t already understand it mustthus require something more.

6.2 Usability evaluation for security

Since usable security requires user interface designpriorities that are not the same as those of generalconsumer software, it likewise requires usabilityevaluation methods that are appropriate to testingwhether those priorities have been sufficientlyachieved. Standard usability evaluation methods,simplistically applied, may treat security functions as ifthey were primary rather than secondary goals for theuser, leading to faulty conclusions. A body of publicwork on usability evaluation in a security context wouldbe extremely valuable, and will almost certainly have tocome from research sources, since software developersare not eager to make public the usability flaws theyfind in their own products.

In our own work, which has focused on personalcomputer users who have little initial understanding ofsecurity, we have assigned a high value to learnability,and thus have found cognitive walkthrough to be anatural evaluation technique. Other techniques may bemore appropriate for corporate or military users, but arelikely to need similar adaptation to the prioritiesappropriate for security. In designing appropriate usertests, it may be valuable to look to other fields in whichthere is an established liability for consumer safety;such fields are more likely to have a body of researchon how best to establish whether product designssuccessfully promote safe modes of use.

6.3 Toward better design strategies

The detailed findings in our case study suggest severaldesign strategies for more usable security, which we arepursuing in our ongoing work. To begin with, it is clearthat there is a need to communicate an accurateconceptual model of the security to the user as quicklyas possible. The smaller and simpler that conceptualmodel is, the more plausible it will be that we can

succeed in doing so. We thus are investigatingpragmatic ways of paring down security functionality tothat which is truly necessary and appropriate to theneeds of a given demographic, without sacrificing theintegrity of the security offered to the user.

After a minimal yet valid conceptual model of thesecurity has been established, it must be communicatedto the user, more quickly and effectively than has beennecessary for conceptual models of other types ofsoftware. We are investigating several strategies foraccomplishing this, including the possibility ofcarefully crafting interface metaphors to match securityfunctionality at a more demanding level of accuracy.

In addition, we are looking to current research ineducational software for ideas on how best to guideusers through learning to manage their security. We donot believe that home users can be made to cooperatewith extensive tutorials, but we are investigating gentlermethods for providing users with the right guidance atthe right time, including how best to make use ofwarning messages, wizards, and other interactive tools.

7 Related work

We have found very little published research to date onthe problem of usability for security. Of what doesexist, the most prominent example is the Adage project[12, 20], which is described as a system designed tohandle authorization policies for distributedapplications and groups. Usability was a major designgoal in Adage, but it is intended for use by professionalsystem administrators who already possess a high levelof expertise, and as such it does not address theproblems posed in making security effectively usableby a more general population. Work has also beendone on the related issue of usability for safety criticalsystems [10], like those which control aircraft ormanufacturing plants, but we may hope that unlike theusers of personal computer security, users of thosesystems will be carefully selected and trained.

Ross Anderson discusses the effects of user non-compliance on security in [1], and Don Davis analyzesthe unrealistic expectations that public-key basedsecurity systems often place on users in [3].

Beyond that, we know only of one paper onusability testing of a database authentication routine [8],and some brief discussion of the security and privacyissues inherent in computer supported collaborativework [16]. John Howard’s thesis [6] providesinteresting analyses of the security incidents reported toCERT5 between 1989 and 1995, but focuses more on

5 CERT is the Computer Emergency Response Team formedby the Defense Advanced Research Projects Agency, andlocated at Carnegie Mellon University.

the types of attacks than on the causes of thevulnerabilities that those attacks exploited, andrepresents only incidents experienced by entitiessophisticated enough to report them to CERT.

Acknowledgements

We thank Robert Kraut for helpful advice on the designof our user test. This publication was supported in partby Contract No. 102590-98-C-3513 from the UnitedStates Postal Service. The contents of this publicationare solely the responsibility of the authors.

References

1. Ross Anderson. Why Cryptosystems Fail. InCommunications of the ACM, 37(11), 1994.

2. Matt Bishop. UNIX Security: Threats andSolutions. Presentation to SHARE 86.0, March1996. At http://seclab.cs.ucdavis.edu/~bishop/scriv/1996-share86.pdf.

3. Don Davis. Compliance Defects in Public-KeyCryptography. In Proceedings of the 6th USENIXSecurity Symposium, 1996.

4. The Economist. The End of Privacy. May 1, 1999,pages 21-23.

5. Simson Garfinkel. PGP: Pretty Good Privacy.O’Reilly and Associates, 1995.

6. John D. Howard. An Analysis of Security Incidentson the Internet 1989-1995. Carnegie MellonUniversity Ph.D. thesis, 1997.

7. John, B. E., & Mashyna, M. M. (1997) Evaluatinga Multimedia Authoring Tool with CognitiveWalkthrough and Think-Aloud User Studies. InJournal of the American Society of InformationScience, 48 (9).

8. Clare-Marie Karat. Iterative Usability Testing of aSecurity Application. In Proceedings of theHuman Factors Society 33rd Annual Meeting, 1989.

9. Stephen Kent. Security. In More Than ScreenDeep: Toward Every-Citizen Interfaces to theNation’s Information Infrastructure. NationalAcademy Press, Washington, D.C., 1997.

10. Nancy G. Leveson. Safeware: System Safety andComputers. Addison-Wesley PublishingCompany, 1995.

11. Jakob Nielsen. Heuristic Evaluation. In UsabilityInspection Methods, John Wiley & Sons, Inc.,1994.

12. The Open Group Research Institute. Adage SystemOverview. Published on the web in July 1998 athttp://www.osf.org/www/adage/relatedwork.htm

13. Pretty Good Privacy, Inc. PGP 5.0 Features andBenefits. Published in 1997 athttp://pgp.com/products/PGP50-fab.cgi

14. Pretty Good Privacy, Inc. User’s Guide for PGPfor Personal Privacy, Version 5.0 for the Mac OS.Packaged with software, 1997.

15. Jeffrey Rubin. Handbook of usability testing: howto plan, design, and conduct effective tests. Wiley,1994.

16. HongHai Shen and Prasun Dewan. Access Controlfor Collaborative Environments. In Proceedings ofCSCW ’92.

17. Cathleen Wharton, John Rieman, Clayton Lewisand Peter Polson. The Cognitive WalkthroughMethod: A Practioner’s Guide. In UsabilityInspection Methods, John Wiley & Sons, Inc.,1994.

18. Alma Whitten and J.D. Tygar. Usability ofSecurity: A Case Study. Carnegie MellonUniversity School of Computer Science TechnicalReport CMU-CS-98-155, December 1998.ftp://reports-archive.adm.cs.cmu.edu/1998/CMU-CS-98-155.ps

19. Wogalter, M. S., & Young, S. L. (1994).Enhancing warning compliance through alternativeproduct label designs. Applied Ergonomics, 25,53-57.

20. Mary Ellen Zurko and Richard T. Simon. User-Centered Security. New Security ParadigmsWorkshop, 1996.

why johnny can’t encrypt: a usability evaluation of pgp...

Documents