do you really need to test with only 5 users
TRANSCRIPT
3
ROBERT A. VIRZI
PHOTO CREDIT: https://www.researchgate.net/profile/Robert_Virzi
Technical Lead at GTE Laboratories Inc (Verizon)
9
Alphonse Chapanis
Beginning 1981-1994
1981
“Observing about five to six users reveals most of the problems in a usability test”
http://www.measuringu.com/blog/five-history.php
10
1982
Dr. James R. (Jim) Lewis
Beginning 1981-1994
1981
Alphonse Chapanis
“Suggested Binomial Distribution to model the sample size needed to find usability problems.”
“Observing about five to six users reveals most of the problems in a usability test”
http://www.measuringu.com/blog/five-history.php
11
1990-92
Robert Virzi
“Five users is enough to find the majority of usability problem.”
Beginning 1981-1994
Dr. James R. (Jim) Lewis
1981
Alphonse Chapanis
“Suggested Binomial Distribution to model the sample size needed to find usability problems.”
“Observing about five to six users reveals most of the problems in a usability test”
1982
http://www.measuringu.com/blog/five-history.php
16
Carl Turner, Jim Lewis and Jakob Nielsen
2006
Review the criticisms of the sample sizes formulas but show how it can and should be legitimately used.
Clarifications 2006~
http://www.measuringu.com/blog/five-history.php
17
Carl Turner, Jim Lewis and Jakob Nielsen
2006
Jim Lewis
2006
Provides a detailed history of how we find sample sizes using "mostly math, not magic."
Clarifications 2006~
Review the criticisms of the sample sizes formulas but show how it can and should be legitimately used.
http://www.measuringu.com/blog/five-history.php
18
Carl Turner, Jim Lewis and Jakob Nielsen
2006
Jim Lewis
2006
Why You Only Need To Test With Five User
2010
Jeff Sauro
Provides a detailed history of how we find sample sizes using "mostly math, not magic."
Clarifications 2006~
Review the criticisms of the sample sizes formulas but show how it can and should be legitimately used. MeasuringU
http://www.measuringu.com/blog/five-history.php
21
“Evaluating the thinking-aloud technique for use by computer scientists.” 1990
Jakob Nielsen
Initial Motivation
26
PHOTO CREDIT: Security and Usability by Simson Garfinkel, Lorrie Faith Cranor
Experiment 1
Voice Mail System Manual
Experiment method
38
Binomial Distribution Formula
1-(1-p)n
p - probability of detecting a given problemn - the sample of size
OUTPUT
Diminishing returns: later subjects are not as likely to uncover new usability problems as are earlier ones.
OUTPUT
The more severe a problem is, the more likely it will be uncovered within the first few subjects
60
What did we learn?
USABILITY PROBLEMS
80%x5SUBJECTS
MORE SEVERE A PROBLEM WILL BE UNCOVERED FIRST
62
Experiment 3“ Can expert make judgments of problem
severity without access to frequency information?”
OUTPUT
As problem severity increases, the likelihood that the problem will be detected within the first few subjects also increases.
77
Overall Discussion
1. The first 5 users find 85% of problems in a usability test
2. Additional subjects are less and less likely to reveal new information
3. Severe problems are more likely to be detected by the first few users
4. Experts can judge problems’ severity without access to frequency information
81
Is it that simple?
USABILITY PROBLEMS
80%x5SUBJECTS
The problems you will find affect 32% of users
85
How many subjects are required to identify any problem experienced by 10% or more of the population at the
90% confidence level?
Calculating Sample Size
86
Calculating Sample Size
1-(1-p)n
p - probability of detecting a given problemn - the sample of size
93
22 users needed to have a 90% likelihood of detecting problem
that will be experienced by 10% of people
Determine # of Subject
96
Reading List• A Brief History Of The Magic Number 5 In Usability
Testing http://www.measuringu.com/blog/five-history.php
• Al-Awar, J., Chapanis, A., and Ford, R. (1981). Tutorials for the first-time computer user. IEEE Transactions on Professional Communication, 24, 30-37.
• Lewis, J. R. (1982). Testing Small System Customer Setup. in Proceedings of the Human Factors Society 26th Annual Meeting (pp. 718-720). Santa Monica, CA: HFES.on Human factors in computing systems, March 31-April 05, Seattle, Washington.
• Virzi, R. A. (1992). Refining the test phase of usability evaluation: How many subjects is enough? Human Factors, 34, 457-471.
97
Reading List
• Nielsen, J., & Landauer, T. K. (1993). A mathematical model of the finding of usability problems. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp.206-213). Amsterdam: ACM.
• Lewis, J. R. (1993). Problem discovery in usability studies: A model based on the binomial probability formula. In Proceedings of the Fifth International Conference on Human-Computer Interaction (pp. 666-671). Orlando, FL: Elsevier.
• Lewis, J. R. (1994). Sample sizes for usability studies: Additional considerations. Human Factors, 36, 368-378.
98
Reading List• Caulton, D. A. (2001). Relaxing the homogeneity assumption
in usability testing. Behaviour & Information Technology, 20, 1-7.
• Spool J., & Schroeder W. (2001). Testing web sites: five users is nowhere near enough, CHI '01 extended abstracts on Human factors in computing systems, March 31-April 05, Seattle, Washington.
• Perfetti, C., & Landesman, L. (2001). Eight is not enough. Retrieved July 15, 2010 from
• Turner, C. W., Lewis, J. R., & Nielsen, J. (2002). UPA Panel: How many users is enough? Determining usability test sample size
99
Reading List• Wixon, D. (2003) Evaluating usability methods: why the
current literature fails the practitioner, interactions, v.10 n.4, July + August.
• Lewis, J. R., 2001, Evaluation of procedures for adjusting problem-discovery rates estimated from small samples. International Journal of Human-Computer Interaction, 13, 445-479.
• Hertzum, M. & Jacobsen, N. J. (2003 – corrected version, original published in 2001). The evaluator effect: A chilling fact about usability evaluation methods. International Journal of Human-Computer Interaction, 15, 183-204.
100
Reading List• Woolrych, A. & Cockton, G., (2001), Why and when five test
users aren't enough. In Vanderdonckt, J., Blandford, A. and Derycke A. (eds.) Proceedings of IHM-HCI 2001 Conference, Vol. 2 (Toulouse, France: Cépadèus Éditions), pp. 105-108.
• Bevan, N., Barnum, C., Cockton, G., Nielsen, J., Spool, J., and Wixon, D. 2003. The "magic number 5": is it enough for web testing?. In CHI '03 Extended Abstracts on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA, April 05 - 10, 2003). CHI '03. ACM, New York, NY, 698-699
• Turner, C. W., Lewis, J. R., & Nielsen, J. (2006). Determining usability test sample size. In W. Karwowski (ed.), International Encyclopedia of Ergonomics and Human Factors (pp. 3084-3088). Boca Raton, FL: CRC Press.
101
Reading List• Lewis, J. R. (2006). Sample sizes for usability tests: mostly
math, not magic. interactions 13, 6 (Nov. 2006), 29-33.
• Lindgaard, G., & Chattratichart, J. (2007). Usability testing: what have we overlooked?. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA, April 28 - May 03, 2007). CHI '07. ACM, New York, NY, 1415-1424.
• Schmettow, M. (2008), "Heterogeneity in the Usability Evaluation Process," in Proceedings of the 22nd British HCI Group Annual Conference on HCI 2008: People and Computers XXII: Culture, Creativity, Interaction - Volume 1, ACM, Liverpool, UK, pp. 89-98.