skovsgaard.2011.evaluation of a remote webcam based eye tracker
DESCRIPTION
TRANSCRIPT
Evaluation of a Remote Webcam-Based Eye Tracker
Henrik SkovsgaardIT University of Copenhagen
Rued Langgaards Vej 7
2300 Copenhagen S
Javier San AgustinIT University of Copenhagen
Rued Langgaards Vej 7
2300 Copenhagen S
Sune Alstrup JohansenIT University of Copenhagen
Rued Langgaards Vej 7
2300 Copenhagen S
John Paulin HansenIT University of Copenhagen
Rued Langgaards Vej 7
2300 Copenhagen S
Martin TallDuke University
2424 Erwin Rd. (Hock Plaza)
Durham, NC 27705
ABSTRACT
In this paper we assess the performance of an open-sourcegaze tracker in a remote (i.e. table-mounted) setup, andcompare it with two other commercial eye trackers. An ex-periment with 5 subjects showed the open-source eye trackerto have a significantly higher level of accuracy than one ofthe commercial systems, Mirametrix S1, but also a highererror rate than the other commercial system, a Tobii T60.We conclude that the web-camera solution may be viablefor people who need a substitute for the mouse input butcannot afford a commercial system.
Categories and Subject Descriptors
H.5.2 [Information Interfaces and Presentation]: Userinterfaces—Evaluation/methodology
General Terms
Human factors, Experimentation, Performance, Measure-ment
Keywords
Gaze interaction, low-cost gaze tracking, performance eval-uation, universal access
1. INTRODUCTION
Gaze tracking systems enable people with severe motordisabilities to communicate using only their eye movements.However, some of them cannot afford a commercial system,which cost between $5,000 and $30,000. While the qualityof the systems has improved dramatically over the years, theprice has remained more or less constant. Systems that em-ploy low-cost and off-the-shelf hardware components are be-coming increasingly popular as camera technology improves.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.NGCA ’11 May 26-27 2011, Karlskrona, SwedenCopyright 2011 ACM 978-1-4503-0680-5/11/05 ...$10.00.
The use of off the-shelf hardware components in gaze track-ing represents a growing research field [2].
In 2004, Babcock and Pelz [1] presented a head mountedeye-tracker that uses two small cameras attached to a pairof safety glasses. Li et al. [4] extended their work and builta similar system that worked in real time, called OpenEyes.Being headmounted, both systems are affected by head move-ments and are thus not suitable for use in combination witha desktop computer. Although the components used inthe systems described above are inexpensive, assembling thehardware requires advanced knowledge of electronics. Zielin-ski’s Opengazer system [10], based on a remote webcam,takes a simpler hardware approach. The gaze estimationmethod is not tolerant to head movements, and thereforethe user needs to keep the head still after calibration.
Sewell and Komogortsev [7] developed a neural-networkbased eye tracker able to run on a personal computer’s built-in webcam under normal lightning conditions (i.e., no in-frared light). The aim of their study was to employ eyetracking without any modifications to the hardware. Thefive participants in the study complained that even duringfixations they felt a jumpy sensation of the marker, and thatthe marker was unstable during use.
The ITU Gaze Tracker1 is an open-source gaze trackingsoftware that can be used with low-cost and off-the-shelfhardware, such as webcams and video cameras. The soft-ware tracks the pupil and one or two corneal reflections pro-duced by infrared light sources. The first version of thesystem was introduced and evaluated by San Agustin et al.in [6]. The results obtained indicated that a low-cost systembuilt with a webcam could have the same performance as ex-pensive commercial systems. However, the system requiredplacing the webcam very close to the user’s eye, which wasnot comfortable. Furthermore, the camera blocked part ofthe user’s view. Being a headmounted system, it also re-quired the user to sit completely still, as head movementsaffected the cursor position.
The second version of the system enables remote eye track-ing by using a camera with a narrow field-of-view. The samewebcam used in [6] can be employed by replacing the stan-dard wide-angle lens with an inexpensive 16 mm zoom lens.Figure 1 shows the hardware configuration for such remotesystem with a webcam and two light sources.
1http://www.gazegroup.org
Figure 1: Hardware configuration for the webcam-based gaze tracker.
The aim of this study is to investigate whether the per-formance of the remote, webcam-based ITU Gaze Tracker(costing around $100) can match the performance of twocommercial gaze-tracking systems, a Tobii T60 ($25,000)and a Mirametrix S1 ($6,000) in an interaction task.
2. PERFORMANCE METRICS
2.1 Accuracy and Precision
The performance of a sensor it typically measured in ac-
curacy and precision, where accuracy refers to the degree towhich the sensor readings represent the true value of what ismeasured, while precision (also known as spatial resolution)refers to the extent to which successive readings of the samephysical phenomenon agree in value [8].
The working copy of the COGAIN report: Eye tracker
accuracy terms and definitions [5] has a set of definitionsand terminologies for measuring accuracy and precision ofan eye tracking system. Here, accuracy Adeg is defined as theaverage angular distance, θi (measured in degrees of visualangle) between n fixation locations and the correspondingfixation targets (see Equation 1).
Adeg =1n
nX
i=1
θi (1)
Spatial precision is calculated as the Root Mean Square,RMS, of the angular distance θi (measured in degrees of vi-sual angle) between successive samples (xi, yi) to (xi+1, yi+1)(Equation 2).
RMSdeg =
vuut 1n
nX
i=1
θ2i (2)
The working copy of the COGAIN report does not statehow angular distances, θ should be calculated. Distancesare typically measured in pixels on computers and for thisexperiment, we used a function to map distances to pixels,∆px to degrees of visual angle,∆ ◦. Besides knowing thedistance in pixels, the physical size of a pixel S and thedistance from user to screen D need to be known (Equation3).
∆◦ =360π
· tan−1
„∆px · S2 · D
«(3)
2.2 Target Acquisition
In order to evaluate the performance of the different inputdevices, we followed the methodology described by the ISO9241-9 standard for non-keyboard input devices [3]. Theperformance is quantified by the throughput and error ofeach device.
Calculating the throughput is based on the effective tar-get width We and the effective distance De, which are usedto calculate the effective index of difficulty IDe followingEquation 4. Throughput is measured in bps and is calcu-lated as the relationship between effective index of difficultyIDe and movement time MT (Equation 5) [9].
IDe = log2
„De
We+ 1
«, We = 4.133 · SDx (4)
Throughput =IDe
MT(5)
3. PERFORMANCE EVALUATION
3.1 Participants
A total of five participants, three male and two female,with ages ranging from 29 to 39 years (M = 34 years, SD =4.3), volunteered to participate the study. Three of the par-ticipants had no previous experience with gaze interaction.One of them used contact lenses.
3.2 Apparatus
The computer used was a 2.6 GHz Intel Dual Core pro-cessor desktop computer with 3 GB RAM running WindowsXP SP3. We used the 17" monitor with a resolution of1280×1024 that comes with the Tobii T60 system. Threegaze trackers and a Logitech optical mouse (for baseline com-parison) were tested as input devices. Two of the three gazetrackers were the commercial systems Tobii T60 and Mi-rametrix. The third system was the ITU Gaze Tracker us-ing a Sandberg Nightcam 2 webcam running at 30 fps with a16 mm lens, and two Sony HVL-IRM infrared light sources.The total cost was around $100. The three gaze trackersused a 9-point calibration procedure. Figure 2 shows theexperimental setup.
3.3 Design and Procedure
After calibrating the system, participants completed anaccuracy test followed by a 2D target-selection task. Partic-ipants sat approximately 60 cm away from the monitor, andwere asked to sit as still as possible. The experiment wasconducted employing a within-subjects factorial design. Thetarget-selection task had the following independent variablesand levels:
• Device (4): Mouse, Tobii T60, Mirametrix, Webcam
• Amplitude (2): 450, 900 pixels
• Target Width (2): 75, 100 pixels
The dependent variables in the study were accuracy (de-grees), precision (degrees), throughput (bps) and error rate(%). Each participant completed 4 blocks of 1 trial (i.e., 4
Figure 2: Experimental setup. The participant isconducting the test using the Mirametrix system.
trials) for the accuracy and precision test, and 16 blocks of 15trials (i.e., 240 trials) for the target-selection task, where de-vice, amplitude, and target width were fixed within blocks.The orders of input device and task were counterbalancedacross users to neutralize learning effects. Participants wereencouraged to take a comfortable position in front of thecomputer and remain as still as possible during the test.The total test session lasted approximately 15 minutes.
Immediately after a successful calibration participants wereinstructed to gaze on a randomly appearing target in a 4×4matrix (evenly distributed with 100 pixels to the bordersof the monitor). A new target would appear when a to-tal of 50 samples had been recorded at 30 Hz. Prematuresamples were avoided with a smooth animated transition be-tween targets plus a reaction delay of 600 ms. Furthermore,samples further than M ± 3× SD away were considered asoutliers. To prevent distractions from cursor movements, wehid the cursor throughout the blocks except, of course, forthe mouse condition.
Once the accuracy test was completed, the target selec-tion task started. Participants were presented with 15 cir-cular targets arranged in a circle in the center of the screen.Targets were highlighted one-by-one, and participants wereinstructed to select the highlighted target as quickly and asaccurately as possible. Selections were performed with thespacebar for the gaze trackers and a left-button click for themouse condition. Activations outside the target area wereregarded as misses and were thus considered as the errorrate. Every selection ended the current trial and started thenext one. Based on the amplitudes and target widths, thenominal indexes of difficulty were between 2.5 and 3.7 bits.
4. RESULTS
4.1 Accuracy and Precision
Analysis of the accuracy and precision was performed us-ing a one-way ANOVA, with device as independent variable.Accuracy and precision were analyzed as the dependent vari-ables. 228 outliers of the 16,000 samples were removed fromthe analysis. An LSD post-hoc test was applied after theanalysis. Figure 3 shows a plot of the average accuracy andprecision per device.
Mean accuracy for mouse, Tobii, Mirametrix and webcamwas 0.14◦, 0.67◦, 1.34◦ and 0.88◦, respectively (left-side bar
!"
!#$"
!#%"
!#&"
!#'"
("
(#$"
(#%"
(#&"
(#'"
)*+,-" .*/00" )0123-4105" 6-/723"
8-91--,":;<"
=77+127>"?1-70,0*@"
Figure 3: Accuracy and Precision by device. Errorbars show ± SD.
in Figure 3). The main effect of device on accuracy wasstatistically significant, F (3, 12) = 16.03, p < 0.001. Thepost-hoc test showed a significant difference between mouseand all of the gaze trackers. Tobii performed significantlybetter than Mirametrix, t(4) = 3.65, p < 0.05. The webcamalso performed significantly better than Mirametrix, t(4) =4.42, p < 0.05. There was no significant difference betweenthe webcam and Tobii with t(4) = 1.57, p > 0.05.
Mean precision for mouse, Tobii, Mirametrix and web-cam was 0.05◦, 0.08◦, 0.43◦ and 0.31◦, respectively (right-side bar in Figure 3). Mauchly’s test indicated that theassumption of sphericity had been violated, χ(5) = 16.60,p < 0.01, therefore degrees of freedom were corrected us-ing Greenhouse-Geisser estimates of sphericity (� = 0.47).The results show that there was no significant effect on theprecision of the devices, F (1.42, 5.67) = 4.38, p = 0.08.
4.2 Throughput and Error Rate
Analysis of the target selection task was performed using a4×2×2 ANOVA, with device, amplitude and target width asthe independent variables. Throughput and error rate wereanalyzed as the dependent variables. An LSD post-hoc testwas applied after the analysis. All data were included.
Mean throughput for mouse, Tobii, Mirametrix and web-cam was 4.00, 2.63, 2.00 and 2.31 bps, respectively (left-sidebars in Figure 4). The main effect of device on throughputwas statistically significant, F (3, 12) = 9.61, p < 0.01. Thepost-hoc test showed a significant difference between mouseand all other devices. There was a main effect of ampli-tude F (3, 12) = 10.73, p < 0.05, with short amplitudes (M= 2.83 bps) having a significantly higher throughput thanlong amplitudes (M = 2.62 bps), t(4) = 3.30, p < 0.05. Nosignificance of target width was found F (3, 12) = 2.00, p =0.23.
Mean error rate for Mouse, Tobii, Mirametrix and Web-cam was 5.34%, 19.21%, 39.29% and 27.50%, respectively(right-side bars in Figure 4). The main effect of device onerror rate was statistically significant, F (3, 12) = 9.71, p <0.01. The post-hoc test showed a significant difference be-tween mouse and all other devices. Tobii had a significantlylower error rate than the webcam, t(4) = 4.96, p < 0.05.We found no effect of amplitude F (3, 12) = 0.37, p = 0.58nor target width F (3, 12) = 0.37, p = 0.58.
!"#"$!"$#"%!"%#"&!"&#"'!"'#"#!"
!"!(#"$"
$(#"%"
%(#"&"
&(#"'"
'(#"#"
)*+,-" .*/00" )0123-4105" 6-/723"
811*1"9
24-":;
<"
.=1*+>=?
+4":/
?,<"
.=1*+>=?+4"
811*1"924-"
Figure 4: Overall throughput and error rate by de-vice. Error bars show ± SD.
5. DISCUSSION
Our results suggest that the accuracy of the webcam-based gaze tracker (0.88◦) is significantly better than theaccuracy of the Mirametrix system (1.34◦), while showingno significant difference to the Tobii T60 (0.67◦). This in-dicates that the ITU Gaze Tracker can be used in softwareapplications meant to be controlled by gaze input.
Although we did not find any significant effect of the indi-vidual devices in the precision study, the data indicates thatthe mouse and the Tobii system had a higher precision thanthe Mirametrix S1 and the webcam-based system. It mustbe noted that the precision is calculated after the low-passfiltering that the eye trackers perform on the data samplesduring fixations. This is done to smooth the signal and pre-vent a jittery cursor from annoying the user. The ITU GazeTracker gives users control over the level of smooth duringfixations, a feature that many commercial systems do notprovide.
The results obtained in the target-selection task indicatethat the webcam-based eye tracker has a similar perfor-mance to the other two commercial systems in terms ofthroughput. The error rate of the webcam tracker was, how-ever, significantly higher than the error rate of the Tobii T60.Throughput values were slightly lower than in previous stud-ies [6, 9]. This can be due to the lower control over hardwaresetup in our experiment, as well as the lack of experience ofnovice users, who tended to be rather slow.
6. CONCLUSION
Our study on performance evaluation shows that a re-mote, webcam-based eye tracker can have a performancecomparable to expensive systems. However, there are othercrucial factors for the practical usefulness of an eye track-ing device that have not been evaluated in this study, suchas the quality of the documentation, API, tolerance againsthead movements, ease of use and stability over time.
In our future work, we aim to further investigate theseissues and implement new algorithms to improve the perfor-mance. Specifically, we would like to explore how continuousrecalibrations and repositioning of the participants can im-prove performance over time. In this study we would alsolike test various hardware setups for the ITU Gaze Tracker(e.g. better cameras), and different algorithms for calcu-lating the point-of-regard. A usability and user experiencestudy should also be employed to include subjective mea-sures of the different systems.
Finally, it is our hope that researchers, students and hob-byists will collaborate in the development of the software,and contribute to make the open-source ITU Gaze Trackera more reliable system.
7. ACKNOWLEDGEMENTS
We would like to thank EYEFACT for supporting the ex-periment, and the open source community for their help withimproving the ITU Gaze Tracker.
8. REFERENCES[1] J. S. Babcock and J. B. Pelz. Building a lightweight
eyetracking headgear. In Proceedings of the 2004
symposium on Eye tracking research & applications,pages 109–114, San Antonio, Texas, 2004. ACM.
[2] J. P. Hansen, D. Hansen, and A. Johansen. Bringinggaze-based interaction back to basics. In Universal
Access in HCI (UAHCI): Towards an Information
Society for All, volume 3, pages 325–329, NewOrleans, USA, 2001. Lawrence Erlbaum.
[3] ISO. Ergonomic requirements for office work withvisual display terminals (VDTs) - part 9. InRequirements for nonkeyboard input devices.International Organization for Standardization, 2000.
[4] D. Li, J. Babcock, and D. J. Parkhurst. openEyes. InProceedings of Eye tracking research & applications,pages 95–100, San Diego, California, 2006. ACM.
[5] F. Mulvey. Eye tracker accuracy terms and definitions- working copy. Technical report, COGAIN, 2010.
[6] J. San Agustin, H. Skovsgaard, J. P. Hansen, andD. W. Hansen. Low-cost gaze interaction: ready todeliver the promises. In Proceedings of CHI’09, pages4453–4458, Boston, MA, USA, 2009. ACM.
[7] W. Sewell and O. Komogortsev. Real-time eye gazetracking with an unmodified commodity webcamemploying a neural network. In Proceedings of the 28th
of the international conference extended abstracts on
Human factors in computing systems, page 3739–3744,New York, USA, 2010. ACM. ACM ID: 1754048.
[8] A. D. Wilson. Sensor- and Recognition-Based inputfor interaction. In The Human Computer Interaction
Handbook, pages 177–199. Lawrence ErlbaumAssociates, 2007.
[9] X. Zhang and I. S. MacKenzie. Evaluating eyetracking with ISO 9241 - part 9. In Proceedings of the
12th international conference on HCI: intelligent
multimodal interaction environments, pages 779–788,Beijing, China, 2007. Springer.
[10] P. Zielinski. Opengazer: open-source gaze tracker forordinary webcams.http://www.inference.phy.cam.ac.uk/opengazer/,2010.