semantic models and corpora choice when using semantic fields to predict eye movement on web pages

1071-5819/$ - se

doi:10.1016/j.ijh

�CorrespondE-mail addr

simon.dennis@

Int. J. Human-Computer Studies 69 (2011) 720–740

www.elsevier.com/locate/ijhcs

Semantic models and corpora choice when using Semantic Fields topredict eye movement on web pages

Benjamin Stonea,�, Simon Dennisb

aSchool of Psychology, Level 4, Hughes Building, The University of Adelaide, Adelaide, SA 5005, AustraliabDepartment of Psychology, Ohio State University, 1827 Neil Ave, Columbus, OH 43210, USA

Received 15 January 2010; received in revised form 28 June 2011; accepted 30 June 2011

Communicated by B. Lok

Available online 5 July 2011

Abstract

Ten models are compared in their ability to predict eye-tracking data that was collected from 49 participants’ goal-oriented search

tasks on a total of 1809 Web pages. Forming the basis of six of these models, three semantic models and two corpus types are compared

as components for the Semantic Fields model (Stone and Dennis, 2007) that estimates the semantic salience of different areas displayed

on Web pages. Latent Semantic Analysis, Sparse Nonnegative Matrix Factorization, and Vectorspace were used to generate similarity

comparisons of goal and Web page text in the semantic component of the Semantic Fields model. Overall, Vectorspace was the best

performing semantic model in this study. Two types of corpora or knowledge-bases were used to inform the semantic models, the well

known TASA corpus and other corpora that were constructed from the Wikipedia encyclopedia. In all cases the Wikipedia corpora

outperformed the TASA corpora. A non-corpus-based Semantic Fields model that incorporated word overlap performed more poorly

at these tasks. Three baseline models were also included as a point of comparison to evaluate the effectiveness of the Semantic Fields

models. In all cases the corpus-based Semantic Fields models outperformed the baseline models when predicting the participants’ eye-

tracking data. Both final destination pages and pupil data (dilation) indicated that participants’ were actively performing goal-oriented

search tasks.

Published by Elsevier Ltd.

Keywords: Semantic Fields model; Semantic salience; Web navigation; Semantic models; LSA; SpNMF; Vectorspace; Eye tracking; Pupil dilation

1. Introduction

The placement of information on a Web page is criticalto the likelihood that it is viewed by a Web user. In printmedia, journalists talk about the importance of being‘‘above the fold’’. The same is true for Web pages, whereinformation that needs to be scrolled to may not be seen(Nielsen, 2010). In this regard, visual salience models likethe Semantic Fields (Stone and Dennis, 2007) hold greatpromise for both Web users and developers. Given knowl-edge of a user’s information needs, Web pages can beadjusted to present this information more obviously andquickly to the Web user.

e front matter Published by Elsevier Ltd.

cs.2011.06.007

ing author. Fax: þ1 614 688 3984.

esses: [email protected] (B. Stone),

gmail.com (S. Dennis).

Research focusing on user behavior in Web pageenvironments can generally be delineated into two mainstreams: display-based and semantics-based research.While both methods to some degree attempt to predictthe area on a Web page that a user will focus theirattention toward, they approach this task in differentways. Display-based research has focused on perceptualaspects of the Web page, and explores components such aselement and menu position, color usage, font style, andanimation of graphics (Grier, 2004; Ling and Van Schaik,2002, 2004; Pan et al., 2004; Rigutti and Gerbino, 2004;McCarthy et al., 2003; Pearson and Van Schaik, 2003;Faraday, 2000, 2001). Alternatively, when attempting topredict users’ Web page navigation, semantic-basedresearch matches Web users’ information needs to theconcepts displayed within the textual content of Web pages(Blackmon et al., 2007; Fu and Pirolli, 2007; Wu and

dx.doi.org/10.1016/j.ijhcs.2011.06.007

www.elsevier.com/locate/ijhcs

dx.doi.org/10.1016/j.ijhcs.2011.06.007

mailto:[email protected]

mailto:[email protected]

B. Stone, S. Dennis / Int. J. Human-Computer Studies 69 (2011) 720–740 721

Miller, 2007; Blackmon et al., 2005; Kaur and Hornof,2005; Kitajima et al., 2000, 2005; Cox and Young, 2004;Miller and Remington, 2004; Brumby and Howes, 2003;Chi et al., 2003; Pirolli and Fu, 2003; Blackmon et al.,2002). Overall, display-based and semantics-based researchinto Web users’ visual search of Web page hyperlinks hasindicated that the user’s search processes are influenced byfactors such as text semantics, element position, aestheticqualities of elements, and environmental learning.1

Several researchers have highlighted the importance ofcombining display-based and semantic information whenmodeling users’ navigation through Web sites (Blackmonet al., 2002, 2005, 2007; Fu and Pirolli, 2007; Kaur andHornof, 2005; Chi et al., 2003; Pirolli and Fu, 2003).Research that has combined display and semantic infor-mation when predicting Web users’ behavior includes theCognitive Walkthrough for the Web (CWW, Blackmonet al., 2005), the Bloodhound Project (Chi et al., 2003)using the InfoScentTM Simulator, and the Latent SemanticAnalysis-Semantic Fields model (LSASF, Stone and Den-nis, 2007). In the research presented here, we extend uponour previous work by further validating and optimizing theSemantic Fields model of visual salience.

1.1. Display-based Web page search

1.1.1. Faraday’s hierarchical cognitive model

Faraday (2000) purported a hierarchical cognitive modelof user Web page search that combined both the char-acteristics of page elements and the area in which they arecontained. He proposed that the inherent characteristics ofthe page elements govern their visual salience. Morespecifically, Faraday (2000) suggested that elements thatmove will be attended to before large elements, images,color, text style, and position, respectively. Faraday sug-gests that Gestalt principles govern the grouping ofelements into separate areas of interest. For example, acollection of text hyperlinks could be grouped by eitherproximity or a similar background color. Upon initiating agoal-driven Web page search, the user is attracted to anelement with the greatest visual salience (e.g., an animatedGraphics Interchange Format image); she will then scanthe area of interest that contains that element for targetinformation. If the user is unsuccessful in satisfying theirinformation need during this sequence, the element withthe second greatest visual salience is located (e.g., a largeheading) and its surrounding area of interest is scanned.This process is repeated until the user’s goal is satisfied orthe search is aborted.

In a series of eye-tracking experiments, Grier (2004)found little support for Faraday’s model. While motionwas found to attract initial user attention at levels greaterthan chance, there were no significant differences inparticipants’ initial viewing preferences when size, image,

1For Web site navigation and environmental learning see Pan et al.

(2004).

color and textstyle were manipulated. Grier criticized thelack of emphasis Faraday placed on position. Her researchsuggested that users predominantly focus initial attentionto the middle of the screen, with the next most likely areaof initial focus being the top-left part of the display (p. 52).In Faraday’s defense, after gaining more data from an eye-tracking study (Faraday, 2001), he revised some aspects ofhis model that were not examined in Grier’s research.Furthermore, Faraday indicated that the content of theWeb page would have ‘‘a strong impact upon the searchand scan phases’’ (Section 8, 2000). However, while Fara-day acknowledged its relevance, neither Faraday’s norGrier’s work examined the affect that similarity betweenthe semantic structures (or elements) of a Web page andthe user’s goals produce in the Web page search process.

1.1.2. Position

Rigutti and Gerbino (2004) present the WebStep modelthat predicted differences in user behavior when followinga navigation bar option as opposed to links embedded inthe content text. They suggest users will judge thatembedded links lead to specific information, whereasnavigation links lead to broad categories. In their model,perceived distance to the target information moderates useof both the navigation (menu) bar and embedded links.When the distance was deemed to be small, their modelpredicts embedded links, as opposed to navigation barlinks, are more likely to be chosen by the user. However, asthe perceived distance to the target information increases,the probability that a navigation bar link will be chosen byuser increases, and the probability that an embedded linkwill be chosen by the user decreases.In the WebStep model, two other factors are predicted

to influence the users decision to follow links. First, utilityof navigation bar links decreases as the position of thenavigation bar is placed lower in the visual display.Second, headings increase the probability that embeddedlinks will be selected by the user. However, the researchersdo not specify how close the headings have to be eithersemantically or in proximity to the embedded link duringthis process. Rigutti and Gerbino do not deny theimportance of semantic content in user link selection.However, they emphasized the importance of pragmaticor display-based models, such as WebStep, when semanticcontent alone does not discriminate user’s choice in linkfollowing behavior.McCarthy et al. (2003) utilized eye-tracking to explore

optimal positioning of navigation menus. When comparingleft, right and top positioned navigation menus, theseresearchers were surprised that most user gaze time wasfocused at the middle of the screen (48% of glances). Moreinterestingly, they found that participants adapted to theWeb page environment. During second page views, usershad increased (no inferential statistics were reported) theamount of gaze time on the region of the navigation menuregardless of its position (top, left or right of the page).

B. Stone, S. Dennis / Int. J. Human-Computer Studies 69 (2011) 720–740722

1.1.3. Learning

The notion of users adapting to, or learning, the structureof Web page environments is also supported by the researchof Pan et al. (2004). In a study that compared user’s eyemovements over four types of Web sites (shopping, search,news, and business), results indicated a main effect for pageorder and an interaction effect between page order and Website type on the dependent variables mean fixation duration,gazing time and saccade (occurrence) rate. The dependentvariable ‘‘mean fixation duration’’ was interpreted as ameasure of task difficulty (Fitts et al., 1950), while gazingtime and saccade rate were both interpreted as being inverselyrelated to task difficulty (Nakayama et al., 2002).

Pan et al. had difficulties when interpreting their results,with apparent ‘‘contradictory conclusions’’ in the ocularmeasures that they used (p. 5). It should be noted thatNakayama et al.’s tasks are quite different from those usedin Pan et al.’s study. In Nakayama et al.’s study, saccadeoccurrence rate (the number of saccades per second) ismeasured while the participant visually follows an inter-mittently displayed target on a Cathode Ray Tube (CRT)monitor, and concurrently answers mathematical ques-tions. In this situation, saccades could be prompted bysearch for the next target after the question has beenanswered. This is an occurrence that might happen lessoften in more difficult conditions that have longer calcula-tion times, and thereby restrict the opportunity for users toperform after-task scanning. Moreover, this reducedopportunity to scan would also produce a negativerelationship between task difficulty and saccade rate.Given the nature of Nakayama’s task, it is not apparentthat saccade rate can be interpreted in the same way whenapplied to the context of a user’s Web page search.

In Pan et al.’s study, it is interesting to note that saccade(occurrence) rate only decreased from first page to secondpage viewed in the search Web site condition. This couldreflect the ordered structure of search engine output, whichmay reduce the need for user saccades when compared tothe output of news, business, or shopping Web sites. At thevery least, it signifies a need for researchers interested inthe cognitive modeling of user search behavior to alsoconsider the specific visual and semantic characteristics ofthe Web page, along with the user’s environmental learn-ing during navigation of a Web site.

1.1.4. Link color and format

Ling and Van Schaik (2002) found that link coloraffected both visual search accuracy and speed in Web-page-like display. When searching for targets displayed inthe default blue hyperlink on a white background partici-pants results were less accurate than when the target linkwas presented in other formats such as black link text onyellow backgrounds or black link text on white back-grounds. However, visual search times were generallyfaster when default colors were used. This may indicatethat these participants were able to find and identify linksmore quickly when in default colors but, in a ‘‘knee-jerk’’

like reaction, would choose to follow a link even if thecontent was wrong.In a later study, Ling and Van Schaik (2004) manipu-

lated font style of the hyperlinks presented in bothnavigation bars and the context areas of Web-page-likedisplays. Hyperlinks were presented in the default bluewhile other text was presented in black. Hyperlink fontstyle was manipulated between normal, bold, underlined,and bold-underline conditions. These researchers report,that when hyperlink font style was manipulated in thenavigation bar, there were no statistically significantdifferences in the participants’ ability to correctly identifytarget hyperlinks. However, in the context area, the boldstyle was less accurately identified by participants whencompared to the results of hyperlinks presented in eitherplain or bold-underlined styles. Ling and Van Schaik(2004) concluded that Web page designers need to treatnavigation bar and context area hyperlinks differently,ensuring that the latter are ‘‘made more salient’’ to theWeb page user (p. 919).One problem with the experiments conducted by Ling

and van Shaik is that the experimental stimulus lackedecological validity. The Web-page-like displays were screendumps (images) of Web pages that were displayed inInternet Explorer, but lacked basic functionality. That is,users could not mouse click on hyperlinks and traverse tonew Web pages. Instead, responses were recorded withbutton presses,‘‘p’’ if the item was present and ‘‘a’’ if thesearch item was absent from the display. Van Shaik notesthe possibility that this display method changes the user’stask, and that ‘‘automatic attention responses’’ Web usersmay display in their visual search behavior of Web pagesmay not be revealed under these simulated conditions(Pearson and Van Schaik, 2003, p. 345). Moreover, bothauthors propose that visual search processes of this typeneed to be evaluated ‘‘within the contexts of more realistictasks’’ (Ling and Van Schaik, 2004).

1.1.5. Section summary

The display-based research reported above can bedelineated into three broad Web page characteristics thatinfluence a user’s visual search of Web page hyperlinks.First, position has been shown to affect element salienceand the likelihood of Web users’ hyperlink followingbehavior. Second, aesthetic qualities of elements such ascolor and style both attract the users attention and grouphyperlinks into regions. Third, environmental learning alsoappears to affect how Web users selectively focus attentionduring Web page navigation. Furthermore, eye-trackinghas proved to be a useful tool in several research studiesthat investigate visual search for information in Web pageenvironments.

1.2. Semantic-based Web page search

Semantic-based approaches to modeling user search beha-vior in Web pages emphasize the importance of hyperlink

2Note, a second assessment with lowered threshold is not performed on

the hyperlinks for child pages.


content and its similarity to user search goals. This similarityis often referred to as the hyperlinks utility, ‘‘informationscent’’, or the perceived distance of the hypertext link fromthe users search goal. Two methods commonly employed forassigning a hyperlink’s utility are subjective assessment andautomated semantic systems. Regardless of which of thesemethods is used to assign utility to the individual hyperlinkson a Web page, researchers contend that links with highutility will attract users’ attention. Furthermore, if the linkutility is above a set threshold (contingent to other pre-determined rules), then the link clicking behavior will ensue.

Subjective assessment by human raters requires panelsof judges to estimate the perceived degree of similaritybetween Web element text (e.g., a hyperlink) and a user’sgoal. The benefits of this method include a consistentdegree of accuracy. For example, human raters can infersemantic similarity based on relatively little informationprovided by either the Web element’s text or the user’sgoal. However, subjective assessments also have inherentproblems. The time taken to make these ratings is bothcumbersome and compromises any automation of Webpage analysis. Furthermore, there may be idiosyncraticbiases held by the raters (e.g., high IQ or greater technicalvocabulary) which may not be also held by the everydayuser.

Automated semantic systems can be divided into threecategories: statistical, taxonomical, and hybrid (Kaur andHornof, 2005). There are numerous examples of statisticalsystems: the vector space model (Salton et al., 1975);Latent Semantic Analysis (LSA, Landauer et al., 2007);the Topic model (Topics, Griffiths and Steyvers, 2002; Bleiet al., 2003); Sparse Nonnegative Matrix Factorization(SpNMF, Xu et al., 2003); the Constructed SemanticsModel (CSM, Kwantes, 2005); Hyperspace Analogue toLanguage (HAL, Burgess and Lund, 1997); Point-WiseMutual Information using Information Retrieval (PMIIR,Turney, 2001); and Non-Latent Similarity (NLS, Cai et al.,2004). Statistical systems or semantic models measure theprobability of word co-occurrences from analysis of largecollections of text documents. Alternatively, taxonomicalsystems, such as WordNet (Miller, 1995), create hier-archical databases in which terms subsume related terms.For example, fruit would be a broader category containingoranges and apples. Hierarchical systems can also take intoaccount word relations such as synonyms, hypernymy, andmeronymy. Lastly, Hybrid systems such as res (Resnik,1999) use both statistical and hierarchical methods tomeasure relation between terms.

1.2.1. Subjective assessment of semantic similarity and Web

page navigation

The Method for Evaluating Site Architectures (MESA)is a model of Web users’ goal-oriented navigation andsearch times (Wu and Miller, 2007; Miller and Remington,2004). Before MESA is used to predict Web site naviga-tion, researchers first provide human assessments of utilityfor all Web page hyperlinks when compared to a given

search goal. Using these human ratings of utility, MESAthen conducts a serial assessment of the hyperlinks on thefirst or ‘‘index’’ page of a Web site, using an ‘‘opportunis-tic’’ strategy to follow the first hyperlink that exceeds apredetermined utility threshold. If no hyperlink on theindex page exceeds this threshold, the threshold is thenlowered and the hyperlinks are re-examined in the samemanner. If still none of the hyperlinks meet this reducedcriteria, then navigation is terminated. Alternatively, if asuitable hyperlink is found, it is followed and the process isrepeated on the ‘‘child’’ page. On the child page, if eitherthe target information is not found, or there are nohyperlinks that exceed the threshold, then a ‘‘back’’operation is performed returning the model to the last‘‘parent’’ page.2

MESA is based on three principles: limited capacity,simplicity, and rationality. Like humans, MESA is ‘‘lim-ited’’ in the amount of information it can retain, andtherefore only assesses one hyperlink at a time. MESA’sevaluation of hyperlinks is ‘‘simplistic’’, for example itevaluates all hyperlinks for the same amount of timeregardless of their utility. The MESA model assumes thehuman users will exhibit ‘‘rational’’ behavior within thelimitations of the first two principles outlined above. WhileMESA has successfully been used to model Web users’navigation and search times, its lack of automated hyper-link utility assessment makes the MESA model difficult forresearchers to implement on any scale. In more recentwork, these researchers have reported preliminary findingsfrom a small (N¼2) eye-tracking study that suggests easysearch goals are associated with shorter fixation times. Wuand Miller (2007) propose that top-down knowledge of thesearch goal may be facilitating hyperlink processing andthis is reflected by the participants’ shorter fixations onthese Web elements.In an eye-tracking study, Brumby and Howes (2003)

reported evidence that both past experience and utility ofthe surrounding menu options affected participants’ goal-directed menu item search patterns. When goal items weresituated amongst ‘‘very bad’’ or lower relevance distrac-ters, participants made fewer eye fixations on menu itemsthan when ‘‘moderate’’ or more relevant distracter itemswere used. Also, when an initial menu display was deemedto have been difficult, participants were observed to makemore eye fixations in the next presented menu display,when compared to trials in which the initial menu had beendeemed easy. Brumby and Howes (2003) also concludethat Web users’ do not always access all of the menuoptions presented to them, and that users fixate on smallersubsets of menu items prior to selection of a item.In a reanalysis of Brumby and Howes’s (2003) data, Cox

and Young (2004) propose a rational model of visualsearch through menu items. They suggest that menu itemchoice is dependent on the ‘‘estimated relevance’’ or utility


of previously viewed items in that menu (p. 87). Whensurrounded by low utility distracter, high utility items thatappear early in the menu’s structure are likely to beimmediately chosen, terminating the search process. How-ever, if these items are presented toward the end of themenu structure, the model predicts visual search may notterminate with the high utility item on the ‘‘first pass’’, butinstead the menu items will be reviewed, before possibletermination of search and selection of the high utility item.

1.2.2. Statistical semantic systems and modeling of Web

user hyperlink following behavior

Grounded in the evolutionary-based theory of Informa-tion Foraging (Pirolli and Card, 1999), the BloodhoundProject’s InfoScentTM Simulator models Web user beha-vior by accessing the ‘‘proximal scent’’ (akin to ‘‘informa-tion scent’’) of Web page links. Information scent isdescribed as ‘‘the subjective sense of value and cost ofaccessing a page based on perceptual cues’’ (Chi et al.,2001, p. 490). The Web User Flow by Information Scent(WUFIS) algorithm accesses un-normalized proximal scentof each hyperlink from proximal cues (e.g., hyperlink linktext, surrounding text, page position), user informationgoals, and a Term Frequency by Inverse DocumentFrequency (TF-IDF) weighted matrix of all terms anddocuments contained in the Web site structure. These un-normalized values are then normalized so that all hyper-links on a Web page sum to a value of one. When proximalscent cannot be calculated because the hyperlink is animage, distal scent is used instead. Distal scent is calculatedin a similar fashion to proximal scent, except textualinformation from the ‘‘linked to’’ page is combined withproximal cues in the algorithm. Chi et al. (2003) found thattheir modeling of Web user’s link following behavior wasvery successful one-third of the time and moderatelysuccessful at other times.

Other research from the Palo Alto Research Center(PARC) has offered related models of users interactionwith Web page environments. Pirolli and Fu’s Scent-basedNavigation and Information Foraging in the ACT archi-tecture (SNIF-ACT) model combines Information Fora-ging Theory with the ACT-R model of cognition (Fu andPirolli, 2007; Pirolli and Fu, 2003). Apart from itsintegration of ACT-R theory, SNIF-ACT also differs fromits sibling, the Bloodhound project, in its use of aBayesian-based model of ‘‘spreading activation’’ (Pirolli,1997) to access information scent (or link utility). SNIF-ACT’s primary document corpus or knowledge base, theTipster database, is augmented by the AltaVista searchengine. AltaVista is used to expand the knowledge base ofthe SNIF-ACT model to incorporate ‘‘popular’’ terms thatare not included in the Tipster database, such as movietitles and characters. Pirolli and Fu (2003) report thatSNIF-ACT’s modeling of user hyperlink clicking behaviorexceeds what could be expected by chance. However,SNIF-ACT differs from the Bloodhound Project by pri-marily investigating and predicting when the user will leave

a Web site, as information scent diminishes, in favor ofpursuing other Web sites that may have more fruitfulinformation sources.The Cognitive Walkthrough for the Web (CWW)

usability evaluation method is based on the Comprehen-sion-based Linked model of Deliberate Search (CoLiDeS,Kitajima et al., 2000, 2005) for Web site navigation(Blackmon et al., 2002, 2005, 2007). Using CoLiDeS,CWW approaches the problem of modeling Web user’slink following behavior in a somewhat similar fashion tothe Bloodhound Project. Like the Bloodhound Project,some aspects of the screens display are taken into con-sideration, with screen areas being grouped into regions(navigation bar, tab menu, main content area, and so on).Also, like the Bloodhound Project, the semantic content ofthe each Web page is evaluated statistically against theWeb user’s target goals. However, instead of using theWUFIS algorithm, the CWW and CoLiDeS use LSA tocompare semantic content. Furthermore, like SNIF-ACT,the CWW does not limit the content of its statisticalsemantic analysis to the documents in the Web site, butruns the LSA algorithm on a large corpus of documentsthat is considered to represent the user’s knowledge base.Once a Web page has been segmented, the model generatesa description of each section. These descriptions are thencompared with the users goals using LSA and the knowl-edge base. The section description that has the highestsimilarity to these user goals will be selected for furtheranalysis. Link texts in the selected section are thenevaluated against the Web user’s goal using LSA and theknowledge base. After this evaluation, the model thenfollows the hyperlink with the highest utility if its value isabove a predetermined threshold.

1.2.3. Section summary

Like the display-based studies reviewed above, semantic-based research has found both element position (within amenu) and environmental learning to be factors affectingWeb user’s hyperlink following behavior. Furthermore,users’ Web navigation has been modeled with moderatesuccess using automated statistical semantic-based systemsthat take some characteristics of Web page design intoconsideration.

1.3. Combining display-based and semantic-based

information

A review of both the display-based and semantics-basedresearch into Web user’s visual search and hyperlinkclicking has indicated that the user’s search processes areinfluenced by text semantics, element position, aestheticqualities of elements, and environmental learning. As isdescribed above, semantics-based researchers have, tovarying degrees, started to incorporate characteristics ofthe Web page display into their models. Moreover, severalresearchers have highlighted the importance of this com-bined approach to modeling users navigation through Web

Fig. 1. Semantic Fields heat map of goal-oriented visual salience. Areas of greater estimated goal-oriented information salience have darker colors in this

heat map. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)


sites (Blackmon et al., 2002, 2005, 2007; Fu and Pirolli,2007; Kaur and Hornof, 2005; Chi et al., 2003; Pirolli andFu, 2003).

1.4. Introducing the Semantic Fields model

The Semantic Fields model attempts to estimate thevisual salience of the semantic elements on a Web pagerelative to a Web user’s information goal. It is based on theassumption that each element displayed on a Web page(such as hyperlinks, content text, and images) will influencehow the Web page elements surrounding it are perceivedand what value a Web page user will assign to them. TheSemantic Fields model has two components: semantic anddisplay. The semantic component provides estimates ofsimilarity between a Web user’s goal and the text con-tained in each of the elements3 on a Web page. After thesemantic component is calculated by the Semantic Fieldsmodel, the display component distributes and aggregatesthese estimates of semantic similarity over a Web page to

3A textual element is defined as any element (tag) on a Web page that

contains text. For example, hyperlink text, heading (H1) text, paragraph

text, etc. If an element is nested in another element, then text is only taken

from the child element. In its current form, the Semantics Fields model

does not evaluate text contained within image files. Instead, in cases where

a Web page designer has included descriptive text in the image tag’s ‘‘alt’’

parameter, this semantic description of the image has been included in the

Semantic Fields model’s calculations. In future, Optical Character

Recognition may be used by the Semantic Fields model to extract the

text contained in image files.

provide estimates of the goal-specific visual salience. Toillustrate, an example of the Semantic Fields estimates ofgoal-oriented visual salience of a Web Page is given inFig. 1.

1.4.1. Semantic component

Similar to the CWW, LSA was initially chosen to informthe semantic component of the Semantic Fields model(Stone and Dennis, 2007). This decision was motivated bythe successful implementation of LSA in tasks that requireaccurate estimates of the similarity between texts such asessay grading (Foltz et al., 1999). Furthermore, LSA hasbeen shown to reflect human knowledge in a variety ofways. LSA measures correlate highly with humans’ scoreson standard vocabulary and subject matter tests; mimichuman word sorting and category judgments; simulateword–word and passage–word lexical priming data; andaccurately estimate passage coherence (Landauer et al.,2007).In the first version of the Semantic Fields model, the well

known Touchstone Applied Science Associates (TASA)corpus was selected to provide a generic knowledge basefor LSA. The TASA corpus has been constructed torepresent the reading material that is covered by Americanstudents up to their first year of college (Dennis, 2007).Therefore, it was thought that this knowledge base wouldbe appropriate for use in modeling goal-oriented Web sitesearches performed by university staff and students in laterexperiments.

4Both the Vectorspace model and LSA are described more fully in

Section 2.5 of this paper.


1.4.2. Display component: combining the semantic component

with knowledge of the display

One rationale underlying the concept of the SemanticFields model, is that the information displayed on oneregion of a Web page, has an influence which goes beyondits immediate location. Moreover, this influence will decayas the point of focus is moved away from the informationsource. Using a decay function, LSA estimates of goalsimilarity are distributed and summed over each pixelposition for all of the textual elements contained on a Webpage (see Eq. (1)). Combining the semantic information(L) and its location with the distance (diðx;yÞ) from a givenpoint (x,y) using a decay function, enables the productionof visual salience heat maps for Web pages (see Fig. 1).

SF ðx; yÞ ¼X

i

Lie�ldiðx;yÞ ð1Þ

A group of hyperlink elements that are close in spatialproximity may be recognized by the Web page user as anavigation menu. For instance, it is not that an individualhyperlink is recognized as a menu, but as Faraday (2000)suggests, it is the proximity and similarity in appearance tothe set of hyperlinks around it, which forms their overallGestalt or structure into a navigation menu. The SemanticFields model captures this spatial relation between Webpage elements. Such that, textual elements that are closelypositioned on a Web page, such as hyperlinks in anavigation menu, accumulate more utility or salience thanitems placed further apart.

Brumby and Howes (2003) report that Web users do notalways access all of the menu options presented to them.Instead, the visual salience of a menu item is related to itsgoal-related utility, and the perceived utility of neighboringmenu items. Similarly, the semantic component of theSemantic Fields model estimates the goal-related utility ofindividual menu items. The display component of theSemantic Fields model then distributes these estimates ofgoal-related utility among neighboring menu items.

Some researchers have suggested a general F-pattern isproduced by Web users’ reading patterns (Nielsen, 2006).However, it is unlikely that this type of model can accom-modate navigation menus that are not located to the left ofthe display. Alternatively, the identification of menus by theSemantic Fields model is entirely governed by a Web page’selement structure. Therefore, like the Web users’ identifiedby McCarthy et al. (2003), the Semantic Fields model is ableto accommodate varying menu locations.

As can be seen in Fig. 1, varying degrees of visualsalience have been estimated for content areas of the Webpage. This varies because each element in the content (ornon-menu) areas of the Web page will be assessed by thesemantic component of the Semantic Fields model asholding varying similarity to the Web users’ goal. As hasbeen already suggested, this salience will also depend onthe utility of the neighboring elements. Similar to theWebStep model proposed by Rigutti and Gerbino (2004),if goal-related embedded hyperlinks or other textual

elements have neighboring elements that contain goal-related text (such as headings), then this will also increasethe visual salience of that area.

1.4.3. Semantic Fields in comparison with the major models

of Web site navigation

The primary focus of models like SNIF-ACT, MESA,the Bloodhound Project’s InfoScentTM Simulator andCoLiDeS is to predict users link clicking behavior. Toachieve this goal, these models make estimates of indivi-dual hyperlink’s utility to the users’ goals, ultimatelyfollowing (clicking) the hyperlink with the greatest toutility. In contrast, the Semantic Fields model is a modelof visual salience. To this end, the Semantic Fields modelattempts to estimate the likelihood that users’ will focustheir visual attention on various positions on a Web page.While ColiDeS is modeling a different user behavior, the

process by which it filters prospective hyperlinks bears someresemblance to the Semantic Fields model. As mentionedabove, CoLiDeS initially looks at the overall utility of Webpage ‘‘regions’’, summing the utility over individual hyper-links within each region on a Web page. The region with thegreatest overall utility is then selected for further explora-tion by the CoLiDeS model. Similarly, the Semantic Fieldsmodel assesses the collective utility of groups of hyperlinks(and other textual web page elements) when assessing visualsalience. Kitajima and colleagues have also incorporatedeye-tracking into their research (Habuchi et al., 2006, 2008;Namatame and Kitajima, 2008). However, they have notdirectly tested the hypothesis that the visual salience of Webpage regions will be related to their semantic similarity touser goals. It is likely, that if CoLiDeS was to used toestimate visual salience on a Web page, Kitajama andcolleagues would continue to segment a Web page intoresearcher defined regions. Alternatively, the generation ofregions by the Semantic Fields model is completely auto-mated, and built through knowledge of the relative posi-tions of a Web page’s textual elements. This feature of theSemantic Fields model makes it both simpler and more costeffective to implement than models like CoLiDeS.All of these models except for MESA use automated

methods to assess the similarity between a textual repre-sentation of the Web users’ goals and the text contained ina Web page’s hyperlinks. While SNIF-ACT uses a Baye-sian model to assess hyperlink utility, both CoLiDeS andInfoScentTM use semantic models that are also incorpo-rated into the Semantic Fields model. For InfoScentTM

this mechanism is the TF-IDF weighting scheme that iscommonly used in the Vectorspace model, whereas CoLiDeSuses LSA when accessing these textual similarities.4

The semantic components employed by each of thesemodels use a corpus or set of documents as a knowledgedomain. This knowledge domain is a representation of thebackground information needed make similarity estimates


on pairs of textual units.5 As mentioned above, bothCoLiDeS and SNIF-ACT use an external corpora thatact as this knowledge base for their semantic analysis.InfoScentTM differs from these models in that it does notuse an external corpus, instead InfoScentTM assesses tex-tual similarity using a term by document matrix that itcreates from the text contained on the Web site’s pages.Most of the Semantic Fields models tested here combinethese approaches by adding the Web pages as documentsto the external corpora that inform the semantic models.

1.5. Focus of this paper

Initially, the TASA corpus was used as a knowledgebase for the LSA component of the LSA-SF model (Stoneand Dennis, 2007). However, recent research has suggestedthat a better fit with the knowledge domains required tomodel human similarity judgments in document compar-ison tasks may be achieved using document sets which areretrieved from the online encyclopedia Wikipedia (Gabri-lovich and Markovitch, 2007; Stone et al., 2011). Also,while LSA is certainly the best studied statistical semanticmodel, Stone et al. (2011) found that other models such asthe Vector Space model (Vectorspace, Salton et al., 1975)and Sparse Nonnegative Matrix Factorization (SpNMF,Xu et al., 2003; Shashua and Hazan, 2005) outperformedLSA when estimating human judgments of paragraphsimilarity. Based on these findings, in this paper we presenta comparison of three semantic models (LSA, SpNMF,and Vectorspace), and two types of knowledge base(TASA and Wikipedia), when used as components in thegeneration of Semantic Fields.6 Furthermore, the perfor-mance of these revised Semantic Fields models is assessedon the eyemovement dataset presented in Stone andDennis (2007).

2. Method

2.1. Participants

Eyemovement data generated by 49 university partici-pants, 27 males and 22 females, on three Web sites wasrecorded during nine goal-oriented search tasks. Partici-pants were recruited from either a first-year pool volun-teering in exchange for partial course credit, or othermembers of Adelaide University who were paid $30 fortheir time. Participant ages ranged from 16 to 57 (M¼22 y3 m SD¼1 y). Also, there was a positive skew in thesamples age distribution, with 93% of participants wereyounger than 31 years old.

All participants had both previous computer and Inter-net experience. Self-reported years of Internet experience

5Textual units are defined here as word, sentences, paragraphs, and

documents; see Stone et al. (2011) for a more detailed description of the

semantic models used here and these textual units.6These components will be described more fully in the Section 2.

ranged from 4 to 17 years (M¼8 y 8 m, SD¼4 m), withself-reported frequency of Internet use ranging between 2and 75 h per week (M¼14 h 14 min, SD¼1 h 52 min).Based on these self reports, the group appeared on averageto be very experienced users of the Internet.

2.2. Apparatus

2.2.1. Behavioral recording equipment and software

The IETracker program was developed to record bothparticipants’ behavior during Web site search tasks andWeb page display characteristics. These observations areaccomplished by programmatically controlling, and inte-grating, Microsoft Internet Explorer (Version 6) and theViewPoint EyeTracker PC-60 QuickClamp System. Userand site specific data collected during this exploratoryexperiment included eye-tracking; hyperlink clicking; andWeb page composition (location, semantics, images, color,style, and size of Web page elements). All data was thenstored in a Postgresql database for later analysis.

2.2.2. Web sites

Three Web sites were chosen from the Internet:

7

Go

vie8

sf3

www.missionaustralia.com.au (Mission Australia),www.greencorps.com.au7 (Green Corps Australia),www.whitelion.asn.au (White Lion Australia).

Static versions of these sites8 were pre-fetched inDecember 2005 to avoid changes created by Web siteupdates. These Web sites are all similar in the type ofservice they provide, such that they all offer services todisadvantaged members of the community. The Web siteswere chosen because they were sufficiently complex thatsearching for information on these sites would be a non-trivial task for participants.

2.3. Eye-tracking data

This study uses the same eye-tracking data that wascollected for Stone and Dennis (2007). However, in thecurrent research, this data has been modified slightly. Theoriginal Stone and Dennis (2007) study contained eye-tracking data recorded on 1842 Web pages, in this studydata from only 1809 of those Web pages are used. It wasfound that 33 page views included in the original datasetneeded to be removed for one of two reasons. Some pageswere omitted because data for all calibration points hadnot been recorded. This stemmed from participant headmovements and overt glances during the calibrationprocedure. Also, several Web pages had been designed to

The Green Corps Australia URL is no longer used, the Australian

vernment has changed both the Web site and its URL, which can be

wed here: http://bit.ly/greencorps.

The static versions of these Web sites can be found here: http://bit.ly/

websites.

www.missionaustralia.com.au




www.greencorps.com.au




www.whitelion.asn.au




http://bit.ly/greencorps

http://bit.ly/sf3websites

http://bit.ly/sf3websites


catch user clicks on PDF files. These ‘‘catch-pages’’ onlycontained one textual element that informed the participantsthat they had either found their goal or had not and shouldclick the ‘‘back’’ button. These pages have been removedbecause they were not part of the original Web sites, and theirsimple one element construction with black text on a whitebackground probably favored the Semantic Fields model.

The other modification was to the calibration of the eye-tracking points. Calibration data was recorded for nine focalpoints in between the presentation of Web pages during theexperiment. Moreover, this calibration data was stored andapplied to the search task eye-tracking data later on duringdata analysis. Therefore, it was possible to use differentalgorithms when adjusting the search task eye-tracking datawith the calibration data. In the original 2007 paper, severalweighting schemes were employed that varied from movingthe eye-tracking data toward the nearest individual calibra-tion point to re-calibrating data using the average offsetsrecorded over all nine calibration points. To make thiscalibration process more transparent to the reader in thecurrent analysis, we have chosen only to use the lattermethod. That is, in the current study search task eye-trackingpoints were adjusted using the average offsets to correct forparticipants’ body movements during data collection.

2.4. Procedure

Participants were required to search for the followingthree target pieces of information on each of the threeWeb sites.

Mission Australia:

1.
Who is currently the Chief Operating Officer of MissionAustralia?
2.
You are interested in working for Mission Australia.Search their Web site for the current job vacanciesavailable at Mission Australia.
3.
You are currently researching homelessness in youngpeople and have heard that Mission Australia hasrecently published a report called ‘‘The voices of home-less young Australians’’. Search the Mission AustraliaWeb site for this report into youth homelessness.
Green Corps:

1.
You want to know more about Green Corps manage-ment. Find out who is the National Program Managerof Green Corps.
2.
Find what environmental and heritage benefits arecontributed by Green Corps.
3.
9The SEMMOD semantic models package was used to incorporate the
Vectorspace model, Latent Semantic Analysis, and Sparse Nonnegative

Matrix Factorization into the Semantic Fields model. The SEMMOD

semantic models package is release under the GNU License and can be

Find the online Expression of Interest form to apply tobecome a Green Corps Partner Agency.

White Lion:

found here: http://mall.psy.ohio-state.edu/wiki/index.php/Semantic_Models_

Package_%28SEMMOD%29.
1. Find out who is the current President of White Lion.
2.
You are interested in becoming a mentor for young people.Find out how to become one of White Lions mentors.
3.
You are interested in financial viability of White Lion asa business. Find out which Government Departmentsare supporters of the White Lion organization.
Each task was read aloud to the participants twice beforethey commenced their search. Moreover, they were askedto signal the experimenter (with a hand gesture) at any timethey wanted the search task repeated aloud. A three bythree Latin square design was used to control for ordereffects in the display of each Web site. Also, a nested threeby three Latin square design was used for the same purposeto guide the presentation order of each of the target tasks.After an initial calibration procedure, using the Viewpoint

eye-tracking software, participants were given their search taskin the manner described above and commenced their search.Given the physical structure of the Viewpoint Eye-Tracker(which uses a mounted camera with forehead and chin rests)and some participant’s predisposition to fidget, it was neces-sary to perform additional eye-tracking calibration during thesearch tasks. This additional calibration required the partici-pants to focus on targets in nine separate regions of themonitor. Moreover, these targets were automatically displayedon the screen by the IETracker program after each hyperlinkclicked during the participant’s search task. To elaborate onthis point further, if a participant clicked through four pages intheir search for the target information, then four extracalibration procedures were performed during this search task;each calibration performed after leaving the page that wasclicked on and before the next page was displayed to theparticipant.Finally, participants were instructed that when they

believed that they had found the target information thatthey should then click on the ‘‘HOME’’ icon (which is animage of a house on Internet Explorer’s menu bar). Thiswas followed by one more round of automated calibration.On average, it took participants three page views to findtheir goal on the White Lion Web site, four page views onthe Green Corps Web site, and five page views on theMission Australia Web site.

2.5. Semantic Fields models

Four semantic models were incorporated into theSemantic Fields model9: word overlap, Vectorspace model(Salton et al., 1975), Latent Semantic Analysis (Landaueret al., 2007), and Sparse Nonnegative Matrix Factorization(Xu et al., 2003).

http://mall.psy.ohio-state.edu/wiki/index.php/Semantic_Models_Package_%28SEMMOD%29

http://mall.psy.ohio-state.edu/wiki/index.php/Semantic_Models_Package_%28SEMMOD%29


2.5.1. Word overlap

Simple word overlap was used as a baseline in thisresearch, and also because of its solid performance atparagraph level comparisons (Stone et al., 2011). Termfrequencies are calculated for each document, and simila-rities are then measured as cosines (see Eq. (2)) of theresulting document vectors:

cos y¼v1 � v2

Jv1JJv2Jð2Þ

2.5.2. First steps toward a more complex semantic model:

term by document matrix and log-entropy weighting

A set of documents or corpus can be broken down into aterm by document matrix, with each row representing aterm, each column a document, and each cell a count ofthe number of times a term appeared in a document. Whilesimilarity estimates could be made on the raw term vectorscontained in this term by document matrix, the matrixvalues are generally weighted to counter the effects of suchthings as high frequency words. To this end, all of themore complex semantic models (Vectorspace, LSA andSpNMF) used in this research perform log-entropy weight-ing on the initial term by document matrices.

When used with LSA, researchers report that log-entropyweighting provides better results than other weightingschemes such as raw term frequency (Dumais, 1991) andinverse document frequency (Lee et al., 2005). Using log-entropy weighting, the weight of a term is the product of itslocal and global weighting: wði;jÞ ¼ localnglobal. Local weight-ing of a terms is performed by taking the log of the termfrequency (tf ði;jÞ) plus one for a term i within a document j

(see Eq. (3)). This local weighting identifies higher frequencyor important words within documents. Global weighting ofterms is performed by using the entropy of a term to reducethe impact of high frequency words that appear in manydocuments in a corpus. The entropy is defined in Eq. (4),where gfi is the total number of times i appears in the corpusand n is the number of documents:

local ¼ log2tf ði;jÞ þ1 ð3Þ

global ¼ 1þX

j

pði;jÞ log2ðpði;jÞÞ

log2nð4Þ

where,

pði;jÞ ¼tf ði;jÞ

gf i

2.5.3. The Vectorspace model

The Vectorspace model (Salton et al., 1975) assumes thatterms can be represented by the set of documents in whichthey appear. Two terms will be similar to the extent thattheir document sets overlap. To construct a representationof a document, the vectors corresponding to the uniqueterms are multiplied by the log of the frequency within thedocument and divided by their entropy across documents

and then added. Similarities are measured as the cosinesbetween the resultant vectors for different documents.

2.5.4. Latent Semantic Analysis (LSA)

LSA (Landauer et al., 2007) started with the samerepresentation as the Vectorspace model—a term by docu-ment matrix with log-entropy weighting. In order to reducethe contribution of noise to similarity ratings, however, theraw matrix is subjected to singular value decomposition(SVD). SVD decomposes the original matrix into a term byfactor matrix, a diagonal matrix of singular values and afactor by document matrix. Typically, only a small numberof factors (e.g., 300) are retained. To derive a vectorrepresentation of a novel document, term vectors areweighted, multiplied by the square root of the singularvalue vector and then added. As with the vector spacemodel, the cosine is used to determine similarity.

2.5.5. Sparse Nonnegative Matrix Factorization (SpNMF)

Nonnegative Matrix Factorization (Xu et al., 2003) is atechnique similar to LSA, which in this context creates amatrix factorization of the log-entropy weighted term bydocument matrix. This factorization involves just two matrices– a term by factor matrix and a factor by term matrix – and isconstrained to contain only nonnegative values. The nonne-gative matrix factorization has been shown to create mean-ingful word representations using small document sets, so inorder to make it possible to apply it to large collections weimplemented the sparse tensor method proposed by Shashuaand Hazan (2005). As in LSA, log-entropy weight term vectorswere added to generate novel document vectors and the cosinewas used as a measure of similarity.

2.6. Corpora

2.6.1. TASA

The Touchstone Applied Science Associates (TASA)corpus was constructed to represent the reading materialthat is covered by American students up to their first yearof college. The 35,471 documents contained in the TASAcorpus range over nine content areas: language arts andliterature, social science, science and math, fine arts, homeeconomics and related fields, trade, service and technicalfields, health, safety and related fields, business and relatedfields, popular fiction and nonfiction (Budiu et al., 2007).

2.6.2. Wikipedia

Wikipedia was used as a generic set of documents fromwhich smaller targeted sub-spaces could be sampled andcompiled. Wikipedia is maintained by the general public,and has become the largest and most frequently revised orupdated encyclopedia in the world. Critics have questionedthe accuracy of the articles contained in Wikipedia,however, research conducted by Giles (2005) did not findsignificant differences in accuracy of science based articlesin Wikipedia and similar articles contained in the Ency-clopedia Britannica. In total, 2.8 million Wikipedia entries

Table 1

Descriptive statistics for the TASA-WEB and WIKI-WEB corpora

created for each Web site: Mission Australia (MA), Green Corps (GC)

and White Lion (WL).

Corpus Documents Words Terms

MA TASA-WEB 35,528 5,178,386 54,731

WIKI-WEB 1057 4,381,764 27,069

GC TASA-WEB 35,499 5,173,086 54,591

WIKI-WEB 1028 1,708,686 45,330

WL TASA-WEB 35,494 5,172,844 54,629

WIKI-WEB 1023 3,988,760 32,671

11Note: Green Corps and White Lion Web sites’ textual data was not


were collected and are current to March 2007. However,this document number was reduced to 1.57 million afterthe removal of incomplete articles contained in the originalcorpus. The incomplete articles removed were identified ifthey contained the phrases like ‘‘help Wikipedia expand-ing’’ or ’’incomplete stub’’.

To enable the creation of sub-space corpora, Lucene10 (ahigh performance text search engine) was used to index eachdocument in the Wikipedia corpus. Lucene allows the user toretrieve documents based on customized Boolean queries.These queries can include wild-card operators like ‘‘the star’’(n) to retrieve multiple results from word stems. Like the morewell known search engine Google, the documents returned byLucene are ordered by their relevance to a query.

The three search tasks given to the participants for eachWeb site were enumerated above in Section 2.3. Based onkeywords selected from these tasks, the following Lucenequeries (in italics) were constructed to retrieve the 1000document sub-spaces from the Wikipedia corpus for eachWeb site.

Mission Australia: (’’mission Australia’’) OR (’’chiefoperating officer’’) OR (’’employment’’) OR (’’homelessn’’)

Green Corps: (’’green corps’’) OR (’’national’’ AND

‘‘program’’ AND ‘‘manager’’) OR (’’environmentn’’ AND

‘‘benefitn’’) OR (’’heritage’’ AND ‘‘benefitn’’) OR (’’part-ner’’ AND ‘‘agency’’)

White Lion: (’’white lion’’) OR (’’company president’’) OR

(’’organi?ationn president’’) OR (’’mentorn’’ AND ‘‘young’’AND ‘‘people’’) OR (’’government departmentn’’ AND

‘‘support’’)

2.6.3. Appending Web pages as documents—the creation of

TASA-WEB and WIKI-WEB corpora

The inclusion of stimulus documents into a semanticmodels’ knowledge base (or corpus) has proved to be aneffective method of corpus improvement (Stone et al., 2011;Graesser et al., 2000). Performance gains obtained in this waymost likely result from context being given to novel stimulusterms that were not contained in the original corpus. Toillustrate this point, the TASA corpus may not contain thephrase ‘‘Green Corps’’, but may contain the words ‘‘contri-bute’’ and ’’community’’. By appending documents (i.e., thetext contained in Green Corps Web site) to the TASA corpus,the semantic model can more accurately make associationsbetween these terms. Based on this Stone et al. (2011) finding,when creating corpora for each of the three Web sites in thisstudy, the textual content of each of the Web pages viewed byparticipants on that Web site were appended to the TASA andWikipedia corpora. For example, on the Mission AustraliaWeb site, overall 57 unique Web pages were viewed byparticipants during the experiment. So, the textual contentfrom these Web pages was used to construct a mini-corpus of57 documents, which was then appended to both the TASAcorpus and Wikipedia sub-spaces for Mission Australia

10PyLucene is a Python extension that allows access to the Java version

of Lucene: http://pylucene.osafoundation.org/.

corpora.11 Highlighting this appended data, the namingconventions TASA-WEB and WIKI-WEB are used through-out this paper. Descriptive statistics for all of the corpora usedin this research are displayed in Table 1.

2.7. Baseline models to estimate eye-position

Three alternative models were designed as baselines toassess the success of the Semantic Fields models whenpredicting participants’ eye-positions when they are engagedin goal-oriented search tasks on the three Web sites. Thesebaseline models are the Flat, Non-Flat, and No-Model.

2.7.1. Flat Model

The Flat Model is the simplest model, it assumes that theeye-position has equal likelihood of being focused on allpixels contained on the Web page. Given the total numberof eye-points (EP), and the total number of pixels that aneye-point (p) could be located in (1280� 1024), for eachpage (i) that is viewed (V) by participants, the Flat Modelcalculates the log-likelihood of the eye-points on any givenWeb page as LLWebpage (see Eq. (5)). The sum of these Webpage log-likelihoods (LLFLAT) calculates the fit of the FlatModel to all eye-points recorded for participants.

LLFLAT ¼Xi2V

LLWebpagei

LLWebpage ¼Xp2EP

log1

1280� 1024

� �ð5Þ

2.7.2. Non-Flat Model

The Non-Flat Model is similar to the Flat Model, withthe exception that it gives more weight to the probabilityestimates for those eye-points found in textual elements.12

The Non-Flat Model is displayed in Eq. (6), where for eachWeb page (i) that is viewed (V) by participants, N is thenumber of pixels in text elements, M is the number of

used when creating corpora for the Mission Australia data.12An example of text elements (including images that have a textual

description) on a Web page is shaded in red in Fig. 2.

http://pylucene.osafoundation.org/

Fig. 2. Textual Web page elements are highlighted in red, images that have ’’ALT’’ or descriptive text are included. (For interpretation of the references

to color in this figure legend, the reader is referred to the web version of this article.)

13No-Model holds the semantic model coefficient constant at 1.0, so the

value it returns for a pixel can still be thought of as a Semantic Field

Value.


pixels outside text elements, A is the number of eye-pointsin text elements and B is the number of eye-points outsidetext elements. Furthermore, w is the optimized probabilityof an eye-point being in a text element (see Eq. (6)). Themaximized log-likelihood (ML) over all Web pages viewedby participants occurred at a MLE of w ¼ 3:41 for thissample. Therefore, participants were 3.41 times more likelyto focus their eyes on the textual elements on a Web pagethan focusing on other areas. In accordance with thisfinding, the MLE has been used to calculate Non-FlatModel log-likelihoods and assigned a greater weighting toeye-points recorded in these textual elements than the non-textual elements:

MLNONFLAT ¼Xi2V

MLWebpagei

MLWebpage ¼A logw

wNþM

� �þB log

1

wNþM

� �ð6Þ

2.7.3. No-Model

The No-Model condition has been created to test thetheory that the Semantic Fields model is driven only by thestructure of the Web page, and that the semantic modelsdo not add to the Semantic Fields model’s capacity topredict participants’ eye-positions. It takes the same para-meters as the Semantic Fields model, however, the seman-tic model coefficient is kept constant at one (Li¼1.0) in thecalculation of the Semantic Fields (see Eq. (1)).

2.7.4. Calculating the log-likelihood for Semantic Fields and

No-Model conditions

The log-likelihoods for the Semantic Fields models and No-Model are calculated in the same fashion. The Semantic FieldValue13 (SF ðx;yÞ, see Eq. (1)) for each eye-point (p) is dividedby the summed total of the Semantic Field Values for all pixelscontained on that Web page (SFWebpage). This process calcu-lates the probability that a single eye-point is viewed. Then, thelog of these probabilities (LLWebpage) is calculated and summedover all eye-points (EP) on a Web page viewed by a partici-pant. This process of summing the log-likelihoods is repeatedfor each Web page (i) that was viewed (V) by participants tocalculate the overall log-likelihood (LLSF) for both theSemantic Fields and No-Model conditions (see Eq. (7)):

LLSF ¼Xi2V

LLWebpagei

LLWebpage ¼Xp2EP

logSF p

SFWebpage

� �ð7Þ

3. Results

3.1. Did the participants complete their tasks successfully?

When designing the nine tasks, the researchers hadidentified specific destination Web pages that matched

Table 2

Percentage of overlap between the expected search completion Web page

chosen by the experimenter and those chosen by the 49 participants.

Task 1 Task 2 Task 3 Average

Mission Australia 100.00 95.92 95.92 97.28

Green Corps 95.65 53.06 97.92 82.21

White Lion 100.00 100.00 97.92 99.31

Total 92.93

14As mentioned previously, the calibration display (set on a gray

background) was presented prior to each Web page viewed by

participants.15It is not possible to verify this in our data, as a time stamp was not

generated for the mouse click action.


the information required by each task. Table 2 displays thepercentage of search completions where participants fin-ished on the same Web page that was chosen by theexperimenter. Overall, there was a 92.93% overlapbetween the expectations of the experimenter and the finallanding pages chosen by participants. However, only halfthe participants completed the second task on the GreenCorps Web site to the expectations of the researchers.

It seems likely that finding the environmental andheritage benefits contributed by Green Corps was moreopen to subjective judgments of completion by participantsthan were elicited by the other questions. On reviewing thedata, it appears that simply finding the words ‘‘environ-ment’’ and ‘‘heritage’’ on the same page may haveprompted some participants to end their search. Althoughall tasks had been constructed with a specific end page inmind, on reflection the other eight tasks appear to havemore specific goals. For example, finding the name ofcommittee member or a specific file on the Web sites.While Green Corps Task 2 is a limitation of this study,overall all tasks the vast majority of participants found thetarget Web pages chosen by the experimenters.

3.2. Were the participants paying attention?

It is difficult to understand the psychophysiologicalconnection between pupil dilation and cognitive load.However, researchers have been reporting this phenom-enon for nearly a century now. The German psychiatristBumke, in his 1911 review of the relevant Germanliterature, proposed that ‘‘in general every active intellec-tual process yproduces pupil enlargement’’ (as cited byHess, 1972, p. 492). A fairly consistent pattern of findingshas been reported in the literature, with recordings ofparticipants’ pupil sizes increasing during more difficulttasks when compared to those recorded during easiertasks. This effect is often reported when participants areengaged in mathematical problem solving of varyingdifficulty (Hess and Polt, 1964; Schaeffer et al., 1968;Boersma et al., 1970; Ahern and Beatty, 1979; Steinhaueret al., 2000; Stone et al., 2004).

Luminance can effect the reliability of pupil dilationdata, therefore a preparatory study was performed toaccess if task evoked pupil dilations could reliably bedetected under the similar experimental conditions to thoseused in this study. Stone et al. (2004) found that using thesame eye-tracker, CRT monitor, room and lighting

conditions, participants’ pupil sizes were larger whilstengaged in more difficult mathematical subtraction taskswhen compared to their pupil size during easier addingtasks. Moreover, this pattern was consistently found whenparticipants undertaking these tasks were directed to focuson 25 evenly spaced targets that were positioned inside thecells of a five-by-five grid on the CRT monitor. This studyindicated that greater cognitive load could be detectedusing the pupil width measure across a wide range of focalpoints on the CRT monitor.Fig. 3 displays participants’ pupil width data both overall

and on each Web site in relation to the time spent searchingeach Web page. Using eye data collected on all three Website, pupil width Z-scores were calculated individually foreach participant. This standardization of participants’ pupilwidths was used to control for individual differences in pupilsize. Furthermore, participants’ reading speed, cognitiveprocessing speeds and time spent on any given page variesgreatly, therefore to control for these variations, time spenton the page has been divided in deciles, as opposed topresenting it in seconds which would have confounded theseindividual differences in participants’ performance.It is interesting to note that in all nearly cases participants’

pupil width has increased over the time spent on each Webpage. The only deviation to this trend is shown on the WhiteLion Web site, where pupil size decreases in the first 20% ontime spent on the Web pages. Greater luminance causes theparticipants’ pupils to constrict. Therefore, it is likely that thedecrease in participants’ pupil size at the beginning of eachWhite Lion page can be attributed to the brighter pages on theWhite Lion Web site when compared to the lower luminancecalibration display.14 This conclusion is also supported by thelower z-scores recorded on the White Lion Web site whencompared to either the Green Corps (darkest) or MissionAustralia (mid range) Web sites.Assuming pupil dilation is a measure of the participants’

cognitive load, then larger standardized values will reflectgreater cognitive processing by the participant. In Fig. 3, itis clearly shown that as participants’ spend more time oneach Web page, the width of the pupil increases. Thissuggests that as participants’ spent more time on eachpage, their cognitive load also increased. It is likely thatthese cognitive load increases are related to the decisionprocesses undertaken by the participants as they assess anappropriate navigation path to complete their search goal(e.g., what information is important on the Web page, andwhether to click on to a more relevant page).After the ninth decile pupil width decreases in all cases.

This probably reflects the time period after which partici-pants’ have either found the target or have clicked on alink15 to pursue and are waiting for the browser to load the

-0.4

All Web sites

Percentage of Time on Page

Sta

ndar

dize

d P

upil

Wid

th (m

ean)

20

-0.6

Mission Australia


Sta

ndar

dize

d P

upil

Wid

th (m

ean)

20

-0.5

Green Corps


Sta

ndar

dize

d P

upil

Wid

th (m

ean)

20

-0.8

White Lion


Sta

ndar

dize

d P

upil

Wid

th (m

ean)

20

0.2

0.0

-0.2

40 60 80 100

0.1

0.0

-0.1

-0.2

-0.3

-0.4

-0.5

40 60 80 100

1.0

0.5

0.0

40 60 80 100

-0.2

-0.3

-0.4

-0.5

-0.6

-0.7

40 60 80 100

Fig. 3. Standardized pupil width during participants’ fixations while they were performing goal-oriented Web page navigation. Time spent searching each

page is delineated into deciles.


next page. In both of these cases the next page participantsare taken to is the calibration page. Moreover, in bothscenarios one would expect a reduction in participants’cognitive load during this time when compared to thatproduced by participants during time spent reading andassessing page content for information relevant tosearch goals.

16In the case of negative BIC values, higher scores are closer to zero.

3.3. Ten models compared using the Bayesian Information

Criterion

The ten models are compared in this section using theBayesian Information Criterion (BIC, Schwarz, 1978). Thesemodels include seven Semantic Fields (SF) models. One ofthese SF models, word overlap, does not use a knowledgebase. Evaluations are only made on the co-occurrencesbetween words in Web page element text and the textualdescription of user goals. The other six SF models are morecomplex and use backgrounding documents, half of these SFmodels used the TASA-WEB corpus while the other halfused WIKI-WEB corpora as a knowledge base. Further-more, three semantic models (LSA, SpNMF, and Vector-space) were used to compare goal text to element text in thesesix SF models. So, for the six more complex SF models, thereis a two (corpus) by three (semantic model) experimental

design. The other three models compared here are the Flat,Non-Flat, and Non-Model conditions.The BIC is an appropriate method for selecting the best

model for the eye-tracking data, because it adjusts for thenumber of parameters going into the models. ExaminingEq. (8) shows that when the number of model parameters(k) is zero, then everything after the minus sign becomeszero, and comparing BIC scores in these cases is equivalentto comparing the models’ log-likelihoods (LL). No para-meters are set in the log-likelihood calculations for most ofthese models, with the exception of the maximized log-likelihood equation for the Non-Flat Model which has oneparameter (w). Therefore, in most cases, the BIC scorescan view as a comparison of log-likelihoods. Furthermore,higher BIC scores16 indicate a better model of the data.

BIC ¼LL� 0:5� k � logðNÞ ð8Þ

The results displayed in Table 3 are presented indescending order of their BIC scores. By the Jeffreys’criterion a ratio in likelihoods of 1:100 is decisive. Thiscriterion corresponds to a difference of log-likelihoods of4.6. The reader will notice that the differences between the

Table 3

Comparison of Bayesian Information Criteria (BIC) statistics calculated

from log-likelihoods generated for all 10 models.

Corpora Model BIC Difference

WIKI-WEB Vectorspace �10,983,963 0

WIKI-WEB SpNMF �10,989,551 �5588

WIKI-WEB LSA �10,990,633 �6670

TASA-WEB Vectorspace �10,993,390 �9427

TASA-WEB SpNMF �11,001,041 �17,078

TASA-WEB LSA �11,001,286 �17,323

– No-Model �11,005,111 �21,148

– Overlap �11,094,311 �110,348

– Non-Flat �11,287,790 �303,827

– Flat �11,417,562 �443,599


BIC scores here are large by comparison, this is driven bythe fact that there are a total of 810,556 data points.

There were three interesting trends displayed in theresults. First, BICs were higher for the six corpus-basedSemantic Fields models than the other models, thereforethe Semantic Fields models provided a better fit for theeye-tracking data than the other baseline models. More-over, as would be expected, the simple Flat Modelperformed the worst, followed by the Non-Flat Model,and then the No-Model; the latter two baseline modelswere expected to perform better as they contain informa-tion about the structure of the Web page display. Second,corpus choice appeared to affect SF model performance.The SF models using the WIKI-WEB corpora outper-formed the TASA-WEB corpora in all instances. Third,the semantic models that were used in the SF modelsproduced a main effect. Using either corpus as a knowl-edge base, Vectorspace was consistently the best perform-ing model, followed by SpNMF and then LSA.

The poor performance of the Semantic Fields model thatused the word overlap model can be in part explained bythinking about the way the Semantic Fields model calculates‘‘heat’’. Using the word overlap method, when there is nooverlap between the search goal terms and the wordscontained in the Web page element, the semantic value (L)of Eq. (1) is equal to zero. Furthermore, when L is equal tozero, then there is no spreading activation for that element.When the performance of the Semantic Fields model that usesword overlap is compared against the No-Model condition(that keeps the L constant at 1.0), it is apparent that this aspectof the overlap model hurts the performance the SemanticFields model (Table 3). While it has not been actioned here, itis possible that a Semantic Fields model that uses anaugmented word overlap value (e.g., 0.1þL) for the semanticcomponent is likely to produce better results in the SemanticFields model.

Therefore, given the way the Semantic Fields modelcalculates ‘‘heat’’, it is unsurprising that the Semantic Fieldsmodel that uses the word overlap model has performed badly.Unlike the corpus-based semantic models, as mentioned abovethe word overlap will only find similarity when there is anoverlap of words between textual elements on a Web page and

the textual description of a task. Alternatively, models such asVectorspace, LSA and SpNMF have a greater pool of wordsand their associates contained in the corpus to draw on whenmaking these similarity estimates. Finding even a small degreeof similarity, based on an associated word that has appeared inthe same corpus document (i.e., not the exact matching ofwords required by the word overlap model), will generate SFmodel ‘‘heat’’ from its source location within the structure of aWeb page.

3.4. How well does the VEC-SF model using the

WIKI-WEB corpus predict the eye data?

As reported above, the Semantic Fields model that usedVectorspace with the WIKI-WEB corpus provided the bestperformance when estimating the location of participants’eye-points during Web navigation. To provide the readerwith a visual illustration of the Semantic Fields generatedusing both Vectorspace and the WIKI-WEB corpus,examples of Semantic Field maps for some of the targetpages on each Web site are displayed in Figs. A1–A3 inAppendix A. Fig. 4 displays the mean SF values usingVectorspace with the WIKI-WEB corpus for each pageviewed by participants. Following the same line of reason-ing that was outlined in Section 3.2, time spent on page hasbeen divided into deciles to avoid possible confoundsproduced by individual differences in participants’ infor-mation processing rates.During complex tasks that require reading, such as goal-

oriented Web page search, the location of participants’ eyefixations is generally an indicator of goal-specific atten-tional processes (Rayner, 1998, p. 375). To accomplishtheir task, participants engaged in visual search must seekout locations of goal-relevant information on the display.If the Semantic Fields model is able to capture some of thevariability in participants’ eye movements, then one wouldexpect that Semantic Field values located at participants’eye focal points will increase as participants’ spend moretime viewing a page. On average, participants’ spent14.633 s on a Web page. On all three Web sites, it appearsthat after an initial orienting phase that participants’ eye-movements move toward areas of greater SF values in thefirst 20% (approximately 2.927 s) of their time spent on aWeb page. Paired sample t-tests indicated that this increasein mean Semantic Field value displayed between the firstand second deciles was significant for all Web sites (seeTable 4).As mentioned in the pupil section, it is possible that the

sharp drop off in the last 10% of time spent on the page(approximately 1.4 s) captures eye movement between theclick of a link and the browser moving onto the next page.Comparing the time leading up to this drop, but after theinitial rise in Semantic Field value (first two deciles)highlights some interesting differences between in distribu-tions of Semantic Field values on these Web sites. OnMission Australia, between deciles two and nine, timespent on Web page significantly predicted Semantic Field

0.90

Mission Australia


Sem

antic

Fie

ld V

alue

(mea

n)

20

1.20

Green Corps


Sem

antic

Fie

ld V

alue

(mea

n)

20

0.30

White Lion


Sem

antic

Fie

ld V

alue

(mea

n)

20

1.20

1.10

1.00

40 60 80 100

1.50

1.40

1.30

40 60 80 100

0.60

0.50

0.40

40 60 80 100

Fig. 4. Semantic Field values (Vectorspace with the WEB-WIKI corpus)

calculated for participant eye-points during goal-oriented Web page

navigation. Time spent searching each page is delineated into deciles.

Table 4

Paired sample t-tests between mean Semantic Field values calculated in the

first and second deciles of time spent on a Web page. m1 and s1 correspondto the mean and SD for the first decile, while m2 and s2 correspond to the

mean and SD for the second decile. For each of the Web sites: Mission

Australia (MA), Green Corps (GC) and White Lion (WL).

m1 s1 m2 s2 t (48) p

MA 0.907 0.110 1.017 0.108 7.392 o0.05

GC 1.273 0.153 1.454 0.173 6.477 o0.05

WL 0.512 0.104 0.570 0.126 4.097 o0.05

Web Sites

Aver

age

time

spen

t on

each

task

(sec

.)

0MA GC WL

80

60

40

20

Fig. 5. Average Task Duration (in s) with standard error bars for the

Mission Australia (MA), Green Corps (GC), and White Lion (WL)

Web sites.

17The average time between deciles two and nine for all Web sites.


value scores, b¼0.022, t(390)¼8.193, po0:05. Time spenton page also explain a significant proportion of thevariance in Semantic Field values during this epoch,

R2¼0.145, F(1,390)¼67.12, po0:05. One explanation

for this positive linear relationship is that participants’are not immediately finding a good match to their taskgoals on the Mission Australia Web pages, and are forcedto hunt around the page before picking up on a strongerinformation scent. In contrast, this pattern is not displayedduring observations collected between deciles two and nineon both the Green Corps and White Lion Web sites.Instead, for the other two Web sites, we fail to reject thenull hypothesis that the slope of the regression line forthese data is indeed flat. A flat level of Semantic Fieldvalue during this approximately 11.706 s17 could indicatethat the participants’ are reading information related totheir goals and deciding whether to continue searching(maybe by following a link) or terminate their search.Therefore, in comparison to the other Web sites, MissionAustralia appears to be more difficult for users to navigate.These interpretations illustrate how the Semantic Fields

model could be used to provide instructive feedback todesigners about the usability of their Web sites. Indeed, theobservation made using the Semantic Fields values above(that the Mission Australia Web site is more difficult forthe users to navigate) is supported by the average timestaken to complete the tasks for each Web site. MissionAustralia tasks took users the longest to complete, fol-lowed by Green Corps tasks, and then White Lion tasks(see Fig. 5). These task durations correspond to theaverage number of page views it took participants tocomplete the tasks. As mentioned in Section 2.4 thesewere five page views on Mission Australia Web site, fourpage views on the Green Corps Web site, and three pageviews on the White Lion Web site.


3.5. Are the participants’ skimming?

Several researchers have reported that users skim Webpages for target information (Blackmon et al., 2003; Spoolet al., 1998; Nielsen, 1997). Therefore, it is possible that theVectorspace model’s better estimates of similarity betweenthe search goal and the text displayed on a Web page are areflection of the Web users’ skim reading or scanning forinformation. More specifically, simpler models like Vector-space rely more heavily on the co-occurrence of words tomake their similarity estimate, when compared to modelssuch as LSA and SpNMF that use sophisticated dimen-sionality reduction techniques that attempt to extractdeeper underlying themes or concepts that manifest froma knowledge base. It may be that this emphasis on the co-occurrence of terms better approximates the skim readingprocess. However, it is likely that the last page viewed byparticipants will be read more thoroughly, because basedon the content of these Web pages, the participants havedecided to terminate their search. It is likely that thisdeeper comprehension is also modeled when LSA issuccessfully used to grade essays (Foltz et al., 1999). Thisbegs the question, are the Semantic Fields models that useLSA the best models for the data on the last page viewedby participants?

The BIC scores for the Semantic Fields model log-likelihoods on the last pages viewed by participants duringeach search task are presented in Table 5. As predicted thebest Semantic Field model of the eye-tracking data usesLSA in the semantic component. Again there is a cleardelineation between corpora, with the WIKI-WEB corporaproviding the best knowledge base for the semanticmodels. It is also interesting to note that LSA did notperform as well as the other semantic models using theTASA-WEB corpus, which is consistent with the ideathat TASA-WEB does not contain the information LSAneeds to generate similarity estimates that reflect deepercomprehension.

These results again indicate that there is a clear advan-tage to using the domain specific WIKI-WEB corpora. Itintuitively makes sense that a corpus that is targetedtoward a domain of knowledge is more likely to produceinterpretable results than a mostly generic corpus such as

Table 5

Comparison of Bayesian Information Criteria (BIC) statistics calculated

from log-likelihoods generated from the last pages viewed by participants

for each search task. The Semantic Fields models examined used Vector-

space, LSA and SpNMF, and either the WIKI-WEB or TASA-WEB

corpora.

Corpora Model BIC Difference

WIKI-WEB LSA �2,203,232 0

WIKI-WEB Vectorspace �2,204,173 �941

WIKI-WEB SpNMF �2,204,952 �1720

TASA-WEB Vectorspace �2,231,348 �28,116

TASA-WEB SpNMF �2,239,611 �36,379

TASA-WEB LSA �2,242,067 �38,835

TASA-WEB when comparing a models ability to extractsemantic representations from that domain. Furthermore,this finding is consistent with other research that usedsemantic models to assess paragraph similarity, in whichdomain specific corpora provided a better representationof the knowledge required to make similarity judgmentsthan generic corpora (Stone et al., 2011).The reader will have observed that the semantic models

do not retain a constant ordering (see Table 5) as they didwhen all of the Web pages were included in the modelingset (see Table 3). As noted above, it is likely that thetargeted WIKI-WEB corpora are providing a betterrepresentation of the knowledge domain than the TASA-WEB corpora. This better representation of the knowledgedomain is likely to be particularly important on these finalpages, where the user is making the decision to terminate asearch task because the page contents are the most similar(of all viewed Web pages) to their search goal. Therefore,on these final pages, it may not be fair to expect consistentperformance from a semantic model when their under-lying knowledge base (i.e., TASA-WEB) is found to beinadequate.

4. Discussion

In this paper we have evaluated the Semantic Fieldsmodels’ ability to estimate the location of 49 participants’eye movements during goal-oriented search tasks on threeWeb sites. It was found that all corpus-based SemanticFields models performed better than other models thatdepended solely on display characteristics. Particularlyencouraging was finding that the Semantic Fields modelsoutperformed No-Model condition. The No-Model con-dition is essentially the Semantic Fields model with textualsimilarity estimates held constant at one. Incorporating thesame decay function as the Semantic Fields model, the No-Model condition produces heat solely based on the loca-tion of Web page elements. That the Semantic Fieldsmodel outperforms the No-Model condition indicates thatthe semantic component of the Semantic Fields model isproviding more to the fit of this model to the eye-trackingdata than can be produced by the display componentalone. While the Semantic Fields models provided the bestfit to the human eye-tracking data in this study, there weresome interesting performance differences between each ofthe corpus-based Semantic Fields models. These perfor-mance differences were introduced by the manipulation ofboth corpora and semantic models used by the SemanticFields model.LSA has been the focus of much of the statistical lexical

semantics research in recent years. That LSA has success-fully been used to grade essays (Foltz et al., 1999) is atestament to the overall usefulness of this model. It wastherefore surprising that when considered over all Webpages a much simpler model like Vectorspace wouldconsistently outperform more complex models such asLSA and SpNMF when generating similarity comparison


of text in this study. It is possible that Vectorspace’s goodperformance is driven by a better match to processes likekeyword matching (or word co-occurrence) that underlieskimming. While it is conceivable that to a large degree theparticipants are skim reading to locate their search goals, itis less likely that they could use this strategy to successfullyterminate their goal-oriented searches. Consistent with thisidea, the Semantic Field model that incorporated LSAusing the WIKI-WEB corpus was the best model for thelast pages viewed by participants on each search task. Thisis an interesting finding because it suggests that differentsemantic models could possibly be used to estimate datacollected while people are engaged in different readingstrategies. Furthermore, a model like LSA is particularlyinteresting from this point of view, because when left at fulldimensionality LSA is the Vectorspace model, and thuscould potentially be used to model either type of readingstrategy.

Drawing targeted corpora for the semantic models fromthe larger corpus of Wikipedia provided better knowledge-bases for the semantic models than a more traditionalcorpus like TASA. The TASA corpus has been hand-picked to broadly represent the expected general knowl-edge of a first-year American college student (Dennis,2007). However, the findings in this study indicate that forsome semantic models (Vectorspace, LSA and SpNMF),semi-automated corpora generation using Wikipedia pro-vides a better base to compare the similarity of textualinformation. That said, the generation of Wikipedia sub-spaces in this research was based on very simple Boolean

Fig. A1. Mission Australia, ‘‘You are interested in working for Mission Austra

Australia.’’

queries. Greater focus on the formulation of these Lucenequeries may increase the performance of semantic modelswhen calculating text similarity and thereby conceivablyproduce better estimates of eye-tracking data by theSemantic Fields model.The pupil width measure indicated that participants’

cognitive load increased as they spent more time on eachWeb page. Overall, most participants ended their goal-oriented search on the same pages that were initiallyidentified by the experimenters when constructing thesearch tasks. Taken together, these findings support thenotion that participants were actively searching out theirtarget goals during this experiment. Given that partici-pants were predominantly ’’on-task’’ during this experi-ment, it was also pleasing to find that the corpus-basedSemantic Fields models were able to outperform the othermodels under these circumstances.

5. Conclusions

Both Web page semantics and display characteristicsdetermine the success with which a user will be able to findinformation on a Web page. The Semantic Fields modelincorporates both of these characteristics, and was foundto provide better estimates of participants’ eye-movementsduring goal-oriented search that could be generated bysolely display-based models. Choices of both the semanticmodel and knowledge base affected the performance of thesemantic component that is used by the Semantic Fields

lia. Search their Web site for the current job vacancies available at Mission


model. Contrary to expectations, a relatively simplesemantic model, Vectorspace, outperformed more complexsemantic models that employ dimensionality reduction.

Fig. A2. Green Corps, ‘‘Find what environmental and

Fig. A3. White Lion, ‘‘Find out who is t

Also, better approximations to the knowledge required tosuccessfully estimate textual similarity were produced bytargeted corpora drawn from Wikipedia when compared

heritage benefits are contributed by Green Corps.’’

he current President of White Lion.’’


to those found using the more generic TASA corpus.Overall, the Semantic Fields model that used both Vector-space and a targeted corpus drawn from Wikipedia wasfound to be the best performing model when estimatingparticipants’ eye movements during goal-oriented searchtasks in this study.

Appendix A. Examples of Web site target pages with

Semantic Fields maps generated using Vectorspace and

WIKI-WEB

See Figs. A1–A3.

References

Ahern, S., Beatty, J., 1979. Pupillary responses during information

processing varying with scholastic test scores. Science 205 (4412),

1289–1292.

Blackmon, M.H., Kitajima, M., Polson, P.G., 2003. Repairing usability

problems identified by the Cognitive Walkthrough for the web. In:

Cockton, G., Korhonen, P. (Eds.), CHI’03: Proceedings of the

SIGCHI Conference on Human Factors in Computing Systems.

ACM, New York, NY, USA, pp. 497–504.

Blackmon, M.H., Kitajima, M., Polson, P.G., 2005. Tool for accurately

predicting website navigation problems, non-problems, problem

severity, and effectiveness of repairs. In: van der Veer, G., Gale, C.

(Eds.), CHI’05: Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems. ACM, New York, NY, USA, pp.

31–40.

Blackmon, M.H., Mandalia, D.R., Polson, P.G., Kitajima, M., 2007.

Automating usability evaluation: Cognitive Walkthrough for the Web

puts LSA to work on real-world HCI design problems. In: Landauer,

T.K., McNamara, D.S., Dennis, S., Kintsch, W. (Eds.), Handbook of

Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah,

NJ, pp. 345–375.

Blackmon, M.H., Polson, P.G., Kitajima, M., Lewis, C., 2002. Cognitive

Walkthrough for the Web. In: Wixon, D., Terveen, L. (Eds.), CHI’02:

Proceedings of the SIGCHI Conference on Human Factors in

Computing Systems. ACM, New York, NY, USA, pp. 463–470.

Blei, D., Ng, A.Y., Jordan, M., 2003. Latent Dirichlet allocation. Journal

of Machine Learning Research 3 (4–5), 993–1022.

Boersma, F., Wilton, K., Barham, R., Muir, W., 1970. Effects of

arithmetic problem difficulty on pupillary dilation in normals and

educable retardates. Journal of Experimental Child Psychology 9 (2),

142–155.

Brumby, D.P., Howes, A., 2003. Interdependence and past experience in

menu choice assessment. In: Alterman, R., Kirsh, D. (Eds.), Proceed-

ings of the 25th Annual Conference of the Cognitive Science Society.

Lawrence Erlbaum Associates, Mahwah, NJ, p. 1320.

Budiu, R., Royer, C., Pirolli, P., 2007. Modeling information scent: a

comparison of LSA, PMI and GLSA similarity measures on common

tests and corpora. In: Proceedings of the 8th Annual Conference of the

Recherche d’Information Assistee par Ordinateur (RIAO). Centre des

Hautes Etudes Internationales d’Informatique Documentaire, Pitts-

burgh, PA.

Burgess, C., Lund, K., 1997. Modelling parsing constraints with high-

dimensional context space. Language and Cognitive Processes 12 (2–3),

177–210.

Cai, Z., McNamara, D.S., Louwerse, M.M., Hu, X., Rowe, M.,

Graesser, A.C., 2004. NLS: a non-latent similarity algorithm. In:

Forbus, D.G., Regier, T. (Eds.), Proceedings of the 26th Annual

Conference of the Cognitive Science Society. Lawrence Erlbaum

Associates, Mahwah, NJ, pp. 180–185.

Chi, E.H., Pirolli, P., Chen, K., Pitkow, J., 2001. Using information scent

to model user information needs and actions and the Web. In:

Beaudouin-Lafon, M., Jacob, R.J.K. (Eds.), CHI’01: Proceedings of

the SIGCHI Conference on Human Factors in Computing Systems.

ACM, New York, NY, pp. 490–497.

Chi, E.H., Rosien, A., Supattanasiri, G., Williams, A., Royer, C.,

Chow, C., Robles, E., Dalal, B., Chen, J., Cousins, S., 2003. The

bloodhound project: automating discovery of web usability issues

using the InfoScent p simulator. In: Cockton, G., Korhonen, P. (Eds.),

CHI’03: Proceedings of the SIGCHI Conference on Human Factors in


Cox, A., Young, R.M., 2004. A rational model of the effect of

information scent on the exploration of menus. In: Lovett, M.C.,

Schunn, C.D., Lebiere, C., Munro, P. (Eds.), Proceedings of the Sixth

International Conference on Cognitive Modeling. Lawrence Erlbaum

Associates, Mahwah, NJ, pp. 82–87.

Dennis, S., 2007. How to use the LSA web site. In: Landauer, T.K.,

McNamara, D.S., Dennis, S., Kintsch, W. (Eds.), Handbook of

Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah,

NJ, pp. 57–70.

Dumais, S.T., 1991. Improving the retrieval of information from external

sources. Behavior Research Methods, Instruments, & Computers 23

(2), 229–236.

Faraday, P., 2000. Visually critiquing web pages. In: Proceedings of the 6th

Conference on Human Factors and the Web. Available at /http://

www.tri.sbc.com/hfweb/faraday/faraday.htmS. Accessed October 20,

2006.

Faraday, P., 2001. Attending to Web pages. In: CHI’01: CHI’01 Extended

Abstracts on Human Factors in Computing Systems. ACM, New

York, NY, USA, pp. 159–160.

Fitts, P.M., Jones, R.E., Milton, J.L., 1950. Eye movement of aircraft

pilots during instrument-landing approaches. Aeronautical Engineer-

ing Review 9 (2), 24–29.

Foltz, P.W., Laham, D., Landauer, T.K., 1999. The intelligent essay

assessor: applications to educational technology. Interactive Multi-

media Electronic Journal of Computer-Enhanced Learning 1 (2).

Fu, W.-T., Pirolli, P., 2007. SNIF-ACT: a cognitive model of user

navigation on the World Wide Web. Human–Computer Interaction

22 (4), 355–412.

Gabrilovich, E., Markovitch, S., 2007. Computing semantic relatedness

using Wikipedia-based explicit semantic analysis. In: Veloso, M.M.

(Ed.), Proceedings of the 20th International Joint Conference on

Artificial Intelligence. AAAI Press, Menlo Park, CA, pp. 1606–1611.

Giles, J., 2005. Internet encyclopaedias go head to head. Nature 438

(7070), 900–901.

Graesser, A.C., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D.,

2000. The Tutoring Research Group, 2000. Using Latent Semantic

Analysis to evaluate the contributions of students in AutoTutor.

Interactive Learning Environments 8 (2), 129–147.

Grier, R.A., 2004. Visual attention and web design. Ph.D. Thesis,

University of Cincinnati, Department of Psychology. Available at

/http://etd.ohiolink.edu/view.cgi?ucin1092767744S. Accessed Octo-

ber 20, 2006.

Griffiths, T.L., Steyvers, M., 2002. A probabilistic approach to semantic

representation. In: Gray, W.D., Schunn, C.D. (Eds.), Proceedings of

the 24th Annual Conference of the Cognitive Science Society.

Lawrence Erlbaum Associates, Mahwah, NJ, pp. 381–386.

Habuchi, Y., Kitajima, M., Takeuchi, H., 2008. Comparison of eye

movements in searching for easy-to-find and hard-to-find information

in a hierarchically organized information structure. In: Raiha, K.-J.,

Duchowski, A.T. (Eds.), ETRA 2008 Proceedings of the 2008:

Symposium on Eye Tracking Research & Applications. ACM, New

York, NY, pp. 131–134.

Habuchi, Y., Takeuchi, H., Kitajima, M., 2006. The influence of web

browsing experience on web-viewing behavior. In: Raiha, K.-J.,

Duchowski, A.T. (Eds.), ETRA 2006: Proceedings of the 2006

Symposium on Eye Tracking Research & Applications. ACM, New

York, NY, pp. 47.

Hess, E.H., 1972. Pupillometrics: a method of studying mental, emotional,

and sensory processes. In: Greenfield, N.S., Sternbach, R.A. (Eds.),

http://www.tri.sbc.com/hfweb/faraday/faraday.htm

http://www.tri.sbc.com/hfweb/faraday/faraday.htm

http://etd.ohiolink.edu/view.cgi?ucin1092767744


Handbook of Psychophysiology. Holt, Rinehart and Winston, New

York, pp. 491–531.

Hess, E.H., Polt, J.M., 1964. Pupil size in relation to mental activity

during simple problem-solving. Science 143 (3611), 1190–1192.

Kaur, I., Hornof, A.J., 2005. A comparison of LSA, wordNet and PMI-IR

for predicting user click behavior. In: van der Veer, G., Gale, C. (Eds.),

CHI ’05: Proceedings of the SIGCHI Conference on Human Factors in


Kitajima, M., Blackmon, M.H., Polson, P.G., 2000. A comprehension-

based model of Web navigation and its applications to Web usability

analysis. In: McDonald, S., Waern, Y., Cockton, G. (Eds.), People

and Computers XIV: Usability or Else! Proceedings of HCI 2000.

Springer, New York, pp. 357–373.

Kitajima, M., Blackmon, M.H., Polson, P.G., 2005. Cognitive architec-

ture for website design and usability evaluation: comprehension and

information scent in performing by exploration. In: HCI International

2005, vol. 4, Theories, Models and Processes in HCI.

Kwantes, P.J., 2005. Using context to build semantics. Psychonomic

Bulletin and Review 12 (4), 703–710.

Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W., 2007. Handbook

of Latent Semantic Analysis. Lawrence Erlbaum Associates, Mahwah, NJ.

Lee, M.D., Pincombe, B.M., Welsh, M.B., 2005. An empirical evaluation

on models of text document similarity. In: Bara, B.G., Barsalou, L.W.,

Bucciarelli, M. (Eds.), Proceedings of the 27th Annual Conference of

the Cognitive Science Society. Lawrence Erlbaum Associates, Mah-

wah, NJ, pp. 1254–1259.

Ling, J., Van Schaik, P., 2002. The effect of text and background color on

visual searches of web pages. Displays 23 (5), 223–230.

Ling, J., Van Schaik, P., 2004. The effect of link format and screen

location on visual search of web pages. Ergonomics 47 (8), 907–921.

McCarthy, J., Sasse, M.A., Rigelsberger, J., 2003. Could I have the menu

please? An eye tracking study of design conventions. In: O’Neill, E.,

Palanque, P., Johnson, P. (Eds.), People and Computers XVII:

Designing for Society. Proceedings of HCI 2003. Springer-Verlag,

London, UK, pp. 401–414.

Miller, C.S., Remington, R.W., 2004. Modeling information navigation:

implications for information architecture. Human–Computer Interac-

tion 19 (3), 225–271.

Miller, G.A., 1995. WordNet: a lexical database for english. Commu-

nications of the ACM 38 (11), 39–41.

Nakayama, M., Takahashi, K., Shimizu, Y., 2002. The act of task

difficulty and eye-movement frequency for the ‘oculo-motor indices’.

In: Duchowski, A.T., Vertegaal, R., Senders, J.W. (Eds.), ETRA’02:

Proceedings of the 2002 Symposium on Eye Tracking Research &

Applications. ACM, New York, NY, pp. 37–42.

Namatame, M., Kitajima, M., 2008. Suitable representations of hyper-

links for deaf persons: an eye-tracking study. In: Harper, S., Barreto,

A. (Eds.), Assets’08: Proceedings of the 10th International ACM

SIGACCESS Conference on Computers and Accessibility. ACM,

New York, NY, USA, pp. 247–248.

Nielsen, J., 1997. Be succinct! (writing for the Web). Nielsen’s Alertbox

for March 15, 1997.

Nielsen, J., 2006. F-shaped pattern for reading web content. Nielsen’s

Alertbox for April 17, 2006.

Nielsen, J., 2010. Scrolling and attention. Nielsen’s Alertbox for Alertbox

March 22, 2010.

Pan, B., Hembrooke, H.A., Gay, G.K., Granka, L.A., Feusner, M.K.,

Newman, J.K., 2004. The determinants of web page viewing behavior:

an eye-tracking study. In: Duchowski, A.T., Vertegaal, R. (Eds.),

ETRA 2004: Proceedings of the Eye Tracking Research Application

Symposium. ACM, New York, NY, USA, pp. 147–154.

Pearson, R., Van Schaik, P., 2003. The effect of spatial layout of and link

colour in web pages on performance in a visual search task and an

interactive search task. International Journal of Human–Computer

Studies 59 (3), 327–353.

Pirolli, P., 1997. Computational models of information scent-following in

a very large browsable text collection. In: Pemberton, S. (Ed.),

CHI’97: Proceedings of the SIGCHI Conference on Human Factors

in Computing Systems. ACM, New York, NY, pp. 3–10.

Pirolli, P., Card, S.K., 1999. Information foraging. Psychological Review

106 (4), 643–675.

Pirolli, P., Fu, W.-T., 2003. SNIF-ACT: a model of information foraging

on the World Wide Web. In: Brusilovsky, P., Corbett, A., de Rosis, F.

(Eds.), User Modeling 2003: 9th International Conference on User

Modelling. Springer-Verlag, Berlin, Germany, pp. 45–54.

Rayner, K., 1998. Eye movements in reading and information processing:

20 years of research. Psychological Bulletin 124 (3), 372–422.

Resnik, P., 1999. Semantic similarity in a taxonomy: an information based

measure and its application to problems of ambiguity in natural

language. Journal of Artificial Intelligence Research 11, 95–113.

Rigutti, S., Gerbino, W., 2004. Navigating within a web site: the WebStep

Model. In: Lovett, M.C., Schunn, C.D., Lebiere, C., Munro, P. (Eds.),

Proceedings of the Sixth International Conference on Cognitive

Modeling. Lawrence Erlbaum Associates, Mahwah, NJ, pp. 378–379.

Salton, G., Wong, A., Yang, C.S., 1975. A vector space model for

automatic indexing. Communications of the ACM 18 (11), 613–620.

Schaeffer, T., Ferguson, J.B., Klein, J.A., Raeson, E.B., 1968. Pupillary

responses during mental activities. Psychonomic Science 12 (4), 137–138.

Schwarz, G., 1978. Estimating the dimension of a model. Annals of

Statistics 6 (2), 461–464.

Shashua, A., Hazan, T., 2005. Non-negative tensor factorization with

applications to statistics and computer vision. In: De Raedt, L.,

Wrobel, S. (Eds.), Proceedings of the 22nd International Conference

on Machine Learning. ACM Press, New York, NY, pp. 792–799.

Spool, J.M., Schroeder, W., Scanlon, T., Snyder, C., 1998. Web sites that

work: designing with your eyes open. In: Karat, C.M., Lund, A.,

Coutaz, J., Karat, J. (Eds.), CHI’98: CHI 98 Conference Summary on

Human Factors in Computing Systems. ACM, New York, NY, USA,

pp. 147–148.

Steinhauer, S.R., Condray, R., Kasparek, A., 2000. Cognitive modulation

of midbrain function: task-induced reduction of the pupillary light

reflex. International Journal of Psychophysiology 39 (1), 21–30.

Stone, B., Dennis, S., 2007. Using LSA Semantic Fields to predict eye

movement on web pages. In: McNamara, D.S., Trafton, J.G. (Eds.),

Proceedings of the 29th Annual Conference of the Cognitive Society.

Lawrence Erlbaum Associates, Mahwah, NJ, pp. 665–670.

Stone, B., Dennis, S., Kwantes, P.J., 2011. Comparing methods for

paragraph similarity analysis. Topics in Cognitive Science 31, 92–122.

Stone, B., Lee, M., Dennis, S., Nettelbeck, T., 2004. Pupil size and mental

load. 1st Adelaide Mental Life Conference. Available at /http://www.

psychology.adelaide.edu.au/cognition/aml/S: Accessed April 2, 2009.

Turney, P., 2001. Mining the web for synonyms: PMI versus LSA on

TOEFL. In: De Raedt, L., Flach, P.A. (Eds.), ECML 2001: 12th

European Conference on Machine Learning. Springer, Berlin, Ger-

many, pp. 491–502.

Wu, S.-C., Miller, C.S., 2007. Preliminary evidence for top-down and

bottom-up processes in web search navigation. In: Rosson, M.B.,

Gilmore, D.J. (Eds.), CHI’07: CHI’07 Extended Abstracts on Human

Factors in Computing Systems, pp. 2765–2770.

Xu, W., Liu, X., Gong, Y., 2003. Document clustering based on non-

negative matrix factorization. In: Callan, J., Hawking, D., Smeaton,

A., Clarke, C. (Eds.), Proceedings of the 26th Annual International

ACM SIGIR Conference on Research and Development in Informa-

tion Retrieval (SIGIR ’03). ACM Press, New York, NY, pp. 267–273.

http://www.psychology.adelaide.edu.au/cognition/aml/

http://www.psychology.adelaide.edu.au/cognition/aml/