heuristic evaluation john kelleher. 1 what do you want for your product? good quality? inexpensive?...
Post on 21-Dec-2015
218 views
TRANSCRIPT
Heuristic Evaluation
John Kelleher
2
What do you want for your product?Good quality? Inexpensive? Quick to get to the market?Good, cheap, quick: pick any two.
- Old engineer’s saying
3
Outline
Discount usability engineering Heuristic evaluation Heuristics How to perform an HE HE vs. user testing How well does HE work
4
Discount Usability Engineering
Cheap no special labs or equipment needed the more careful you are, the better it gets
Fast on order of 1 day to apply standard usability testing may take weeks
Easy to use can be taught in 2-4 hours
5
Expert Evaluation
Strongly diagnostic Overview of whole
interface Few resources needed
(except for experts) Cheap High potential return -
detects significant problems
Relies in role playing – can be restricting
Subject to bias Problems locating experts Cannot capture real user
behaviour
Advantages Disadvantages
6
Heuristic Evaluation Developed by Jakob Nielsen (www.useit.com)
Helps find usability problems in a UI design Small set (3-5) of evaluators examine UI
independently check for compliance with usability principles (“heuristics”)
different evaluators will find different problems
Can perform on working UI or on sketches
7
Heuristic Evaluation (cont.) Evaluators goes through UI several times
inspects various dialogue elements compares with list of usability principles consider any additional principles or results that come to
mind
Usability principles Nielsen’s “heuristics” supplementary list of category-specific heuristics
competitive analysis & user testing of existing products
Use violations to redesign/fix problems
8
Heuristics (original) H1-1: Simple and natural dialog H1-2: Speak the users’ language H1-3: Minimize users’ memory load H1-4: Consistency H1-5: Feedback H1-6: Clearly marked exits H1-7: Shortcuts H1-8: Precise and constructive error messages H1-9: Prevent errors H1-10: Help and documentation
9
Phases of Heuristic Evaluation
1) Pre-evaluation training give evaluators needed domain knowledge and
information on the scenario
2) Evaluation individuals evaluate and then aggregate results
3) Severity rating determine how severe each problem is (priority)
4) Debriefing discuss the outcome with design team
10
How to Perform Evaluation Design may be verbal description, paper mock-up, working
prototype, or running system. [when evaluating paper mock-ups, pay special attention to missing dialogue elements!] Optionally provide evaluators with some domain-specific training.
Each evaluator works alone ( ~1–2 hours). Interface examined in two passes: first pass focuses on general
flow, second on individual dialogue elements. Notes taken either by evaluator or evaluation manager.
Independent findings are aggregated Severity ratings are assigned first individually and are then
aggregated. Group debriefing session to suggest possible redesigns.
11
Severity Rating
Used to allocate resources to fix problems Estimates of need for more usability efforts Combination of
frequency impact number of affected users
Should be calculated after all evals. are in Should be done independently by all judges
12
Severity Ratings (cont.)
0 - don’t agree that this is a usability problem
1 - cosmetic problem
2 - minor usability problem
3 - major usability problem; important to fix
4 - usability catastrophe; imperative to fix
13
How Many Problems Found?
Evaluation Name
Number of Evaluators
Total Known Problems
Average No. Problems Found per
EvaluatorTeledata 37 52 51%Mantel 77 30 38%Savings 34 48 26%Transport 34 34 20%
Four heuristic evaluations were conducted by “usability novices” (Nielsen93, UE)
14
Aggregated Evaluations
Aggregate: 1 2 3 5 10Teledata 51% 71% 81% 90% 97%Mantel 38% 52% 60% 70% 83%Savings 26% 41% 50% 63% 78%Transport 20% 33% 42% 55% 71%
Individual evaluators found relatively few problems.
Aggregating the evaluations of several individuals produced much better results:
15
Aggregated Evaluations Average proportion of usability problems found by aggregates
of size 1 to 30.
16
Debriefing
Conduct with evaluators, observers, and development team members
Discuss general characteristics of UI Suggest potential improvements to address
major usability problems Make it a brainstorming session
little criticism until end of session
17
Examples Can’t copy info from one window to another
violates “Minimize the users’ memory load” (H1-3) fix: allow copying
Typography uses mix of upper/lower case formats and fonts violates “Consistency and standards” (H2-4) slows users down probably wouldn’t be found by user testing fix: pick a single format for entire interface
18
HE vs. User Testing HE is much faster
1-2 hours each evaluator vs. days-weeks
HE doesn’t require interpreting user’s actions User testing is far more accurate (by def.)
takes into account actual users and tasks HE may miss problems & find “false positives”
Good to alternate between HE and user testing find different problems don’t waste participants
19
Results of Using HE Discount: benefit-cost ratio of 48 [Nielsen94]
cost was $10,500 for benefit of $500,000 value of each problem ~15K (Nielsen & Landauer)
how might we calculate this value? in-house productivity; open market sales
Correlation between severity & finding w/ HE Single evaluator achieves poor results
only finds 35% of usability problems 5 evaluators find ~ 75% of usability problems why not more evaluators???? 10? 20?
20
# Evals vs Problems Found
21
Cost vs. Benefit
22
Experience of Evaluators
Experience of evaluators influences results. Study of one interface, the Banking System, a touch tone
“voice response” telephone banking system, by 3 groups of evaluators: 31 “novice” evaluators: computer science students with no
formal knowledge of UI or usability (no usability expertise). 19 “regular” specialists: people with UI and usability
experience, but no expertise in voice-response systems (usability expertise).
14 “double” specialists: people with expertise both in usability and in telephone-operated interfaces (usability and domain expertise).
Task: transfer $1000 from savings account to check account.
23
Sample Banking System Dialogue
[First there is a short dialogue in which the user is identified by entering an identification number and access code – this is not part of the evaluation exercise]
1) S: Enter one for account information, three for transfers between your own accounts, . . .
2) U: 3# {the user interrupts the system}3) S: Enter account to transfer from.4) U: 1234567890# {savings account number}5) S: Enter account to transfer to.6) U: # {an abbreviation for the checking account}7) S: Enter amount in cents.8) U: 100000#9) S: From account number twelve thirtyfour fiftysix seventyeight ninety to account
number primary account, a transfer of one thousand dollars is to be made. Press one to confirm, zero to cancel.
10) U: 1#11) S: You do not have access to this function.
24
Major Usability ProblemsNo. Problem Novice Regular Double
1 Error message appears too late. 68% 84% 100%2 Do not require dollar amount to be
entered in cents. 68% 74% 79%3 The error message is not precise. 55% 63% 64%4 The error message is not constructive. 6% 11% 21%5 Replace term "primary account" with
"checking account". 10% 47% 43%6 Let users choose account from a menu. 16% 32% 43%7 Only require a # where it is necessary. 3% 32% 71%8 Give feedback as name of chosen
account. 6% 26% 64%29% 46% 61%
Major Usability Problems
Average for major problems
Proportion of novice, specialist, and double specialist usability evaluators finding problems in the Banking
System. Results from Nielsen [1992].
25
Minor Usability ProblemsNo. Problem Novice Regular Double
9 Read menu item description before action number. 3% 11% 71%
10 Avoid gap in menu numbers between 1 and 3. 42% 42% 79%
11 Provide earlier feedback. 42% 63% 71%12 Replace use of 1/0 for
accept/reject with #/*. 6% 21% 43%13 Remove the field label "number"
when no number is given. 10% 32% 36%14 Change prompt "account" to
"account number". 6% 37% 36%15 Read numbers one digit at a time. 6% 47% 79%16 Use "press" consistently and avoid
"enter". 0% 32% 57%15% 36% 59%
Minor Usability Problems
Average for minor problems
26
Results
Average proportion of usability problems found by aggregates of novice evaluators, regular specialists, and double specialists. Results from Nielsen [1992].
27
Heuristic Evaluation Test
The following figure illustrates a checkout screen for an online store. We describe ten usability violations. Each violation is labelled with a number on the figure.
For each problem, suggest a solution to solve each of these problems.
28
Heuristic Evaluation Test
29
3
1
2
5
46
8
9
7
10
30
10 Heuristic Violations1. H2-1 Visibility of System Status
Problem: UI only says that you are in stage 3, not providing the user with information on how many more stages there are left.
Solution: Indicating that the user is in Stage 3 of 6 or providing a timeline along the top of the page stepping the user through the timeline as they progress through their transaction.
2. H2-2 Match between system and the real world Problem: The term “Wagon” does not match the user’s
conceptual model of shopping. Solution: Change the term “Wagon” to “Cart” or “Basket”.
31
10 Heuristic Violations (contd)3. H2-8 Aesthetic and minimalist design
Problem: The news from the net section has nothing to do with the user’s transaction. This information is distracting and can lead to the user leaving our site to explore a news story and not complete their transaction.
Solution: Remove this section. Can provide this kind of information after the transaction is completed.
4. H2-9 Help users recognize, diagnose, and recover from errors Problem: The message tells the user that the form has errors, but it
doesn’t tell them which fields have errors. Potentially the user could create more errors by changing fields that were originally correct.
Solution: Mark the fields that need to be changed. Move the error message to the top of the page and highlight the fields that the user needs to fix.
32
10 Heuristic Violations (contd)5. H2-4 Consistency and standards
Problem: The ‘Modify’ and ‘Change’ button seem to have the same functionality. Therefore they should be labeled the same. If they do have distinct functions, then they should be labeled clearer and moved so that they do not mislead the user.
Solution: Change the labels on the buttons to ‘Change Item’.
6. H2-3 User control and freedom Problem: The user is given only one choice that is to proceed to the
next page. There is not option to cancel or go back. Solution: A cancel and back button should be implemented
allowing the user to have more control over their process
33
10 Heuristic Violations (contd)7. H2-2 Match between system and the real world
Problem: ‘Transmit’ is not a common term, it is a technical term for sending a form to be processed.
Solution: Change the term to something more clear, like ‘Submit’.
8. H2-6 Recognition rather than recall Problem: To insert an item the user has to recall the item
number. This is too much for the user to remember, especially if there is no correlation between the code and the item.
Solution: Provide a link for the user to continue shopping. This will allow the user to go back to the initial page and search and browse items they might want to add to their cart.
34
10 Heuristic Violations (contd)9. H2-4 Consistency and standards
Problem: The text is in blue and underlined, signaling the user that the text is a hyperlink, which it probably isn’t.
Solution: Change the color and the underlining. Ideally this section should not even be on this page.
10. H2-5 Error prevention Problem: The fields for phone numbers are not fixed in
length. This can be an area that users enter in invalid data.
Solution: To prevent users from accidentally entering in incorrect data, set widths for the text fields so that a format is provided, or provide an example of how the entry should be made.
35
Summary Heuristic evaluation is a discount method Single evaluator finds only small subset of potential problems. Have evaluators go through the UI twice. Ask them to see if it complies with heuristics
note where it doesn’t and say why Combine the findings from 3 to 5 evaluators Have evaluators independently rate severity Discuss problems with design team Alternate with user testing May miss domain-specific problems