search-logger – analyzing exploratory search tasks

6
Search-Logger Analyzing Exploratory Search Tasks Georg Singer, Ulrich Norbisrath, Eero Vainikko, Hannu Kikkas Institute of Computer Science, University of Tartu J. Liivi 2, Tartu, Estonia {FirstName.LastName}@ut.ee Dirk Lewandowski Hamburg University of Applied Sciences Hamburg, Germany [email protected] ABSTRACT In this paper, we focus on a specific class of search cases: ex- ploratory search tasks. To describe and quantify their com- plexity, we present a new methodology and corresponding tools to evaluate the user behavior when carrying out ex- ploratory search tasks. These tools consist of a client called Search-Logger, and a server side database with frontend and an analysis environment. The client is a plug-in for Firefox web browsers. The assembly of the Search-Logger tools can be used to carry out user studies for search tasks indepen- dent of a laboratory environment. It collects implicit user information by logging a number of significant user events. Explicit information is gathered via user feedback in the form of questionnaires before and after each search task. We also present the results of a pilot user study. Some of our main observations are: When carrying out exploratory search tasks, classic search engines are mainly used as an en- trance point to the web. Subsequently users work with sev- eral search systems in parallel, they have multiple browser tabs open and frequently use the clipboard to memorize, analyze and synthesize potentially useful data and informa- tion. Exploratory search tasks typically consist of various sessions and can span from hours up to weeks. 1. INTRODUCTION ”The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. And we’re a long, long ways from that.” – Larry Page 1 ). Search engines like Google, Bing and Ya- hoo have become the means for searching information on the Internet, supported by other information search por- tals like Wikipedia and Ask.com. Although search engines are excellent in document retrieval, this strength also im- poses a limitation. As they are optimized for document retrieval, their support is less optimal when it comes to ex- 1 Larry Page (Google Founder), in an interview with Business week http://www.businessweek.com/magazine/ content/04_18/b3881010_mz001.htm Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’11 March 21-25, 2011, TaiChung, Taiwan. Copyright 2011 ACM 978-1-4503-0113-8/11/03 ...$10.00. ploratory search tasks. Those are usually described as open- ended, abstract and poorly defined information needs with a multifaceted character [18, 29]. Exploratory search tasks are accompanied by ambiguity, discovery and uncertainty. When performing exploratory search tasks and if “users lack the knowledge or contextual awareness to formulate queries or to navigate complex information spaces, or the search task requires browsing and exploration, system indexing of available information is inadequate” [26]. Such exploratory search tasks fulfill needs like learning, investigating or deci- sion making. Usually exploratory search tasks require a high amount of interaction. A report by Microsoft [25] states that during exploratory searches only 1 in 4 queries is successful. In addition, complex queries have a 38% share and yield “a terrible satisfaction”. Search engines also quickly reach their limits when the information seeker is entering a new domain [15]. Along with the increasing popularity of search engines, the areas of their application have grown from sim- ple look-up to rather complex information seeking needs. Look-up searches follow the “query and response” retrieval paradigm [27]. Each time users enter a query, they get a list of possibly relevant results. Look-up searches are among the most basic types of search tasks. Usually they are hap- pening in context with question answering and fact finding. Typically they are needed to answer who, when and where questions. They are not the means to answer why, what and how questions which can be classified as exploratory search tasks. To cope with those information needs, where the available technologies do not directly produce a solution by query-answer only, users have adopted an exploratory search like behavior (multiple queries, follow links selectively, ex- plore the retrieved document space interactively). The more complex and exploratory a task-based information need be- comes, the less a search engine as a single means to fulfill the task appears to be appropriate [27]. Search engine quality measurement initiatives have widely contributed to enhancing search engine quality for look-up searches, but the same is not yet true for exploratory search tasks [27]. The evaluation methodologies focus predomi- nantly on the search system itself, not on the search process that humans need to follow in order to fulfill their search need. Only implicit information is gathered during the clas- sic search engine quality measurement experiments while ex- plicit user feedback is seldom collected [17]. As White et al. stated [27], it is important to also integrate the behavior of users into the evaluation of exploratory search systems that have expanded beyond simple look-up. Although search sys- tems are supporting exploratory search tasks better today, 751

Upload: others

Post on 02-May-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search-Logger – Analyzing Exploratory Search Tasks

Search-LoggerAnalyzing Exploratory Search Tasks

Georg Singer, Ulrich Norbisrath, EeroVainikko, Hannu Kikkas

Institute of Computer Science, University of TartuJ. Liivi 2, Tartu, Estonia

{FirstName.LastName}@ut.ee

Dirk LewandowskiHamburg University of Applied Sciences

Hamburg, [email protected]

ABSTRACTIn this paper, we focus on a specific class of search cases: ex-ploratory search tasks. To describe and quantify their com-plexity, we present a new methodology and correspondingtools to evaluate the user behavior when carrying out ex-ploratory search tasks. These tools consist of a client calledSearch-Logger, and a server side database with frontend andan analysis environment. The client is a plug-in for Firefoxweb browsers. The assembly of the Search-Logger tools canbe used to carry out user studies for search tasks indepen-dent of a laboratory environment. It collects implicit userinformation by logging a number of significant user events.Explicit information is gathered via user feedback in theform of questionnaires before and after each search task.We also present the results of a pilot user study. Some ofour main observations are: When carrying out exploratorysearch tasks, classic search engines are mainly used as an en-trance point to the web. Subsequently users work with sev-eral search systems in parallel, they have multiple browsertabs open and frequently use the clipboard to memorize,analyze and synthesize potentially useful data and informa-tion. Exploratory search tasks typically consist of varioussessions and can span from hours up to weeks.

1. INTRODUCTION”The ultimate search engine would basically understand

everything in the world, and it would always give you theright thing. And we’re a long, long ways from that.” –Larry Page1). Search engines like Google, Bing and Ya-hoo have become the means for searching information onthe Internet, supported by other information search por-tals like Wikipedia and Ask.com. Although search enginesare excellent in document retrieval, this strength also im-poses a limitation. As they are optimized for documentretrieval, their support is less optimal when it comes to ex-

1Larry Page (Google Founder), in an interview withBusiness week http://www.businessweek.com/magazine/content/04_18/b3881010_mz001.htm

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SAC’11 March 21-25, 2011, TaiChung, Taiwan.Copyright 2011 ACM 978-1-4503-0113-8/11/03 ...$10.00.

ploratory search tasks. Those are usually described as open-ended, abstract and poorly defined information needs witha multifaceted character [18, 29]. Exploratory search tasksare accompanied by ambiguity, discovery and uncertainty.When performing exploratory search tasks and if “users lackthe knowledge or contextual awareness to formulate queriesor to navigate complex information spaces, or the searchtask requires browsing and exploration, system indexing ofavailable information is inadequate” [26]. Such exploratorysearch tasks fulfill needs like learning, investigating or deci-sion making. Usually exploratory search tasks require a highamount of interaction. A report by Microsoft [25] states thatduring exploratory searches only 1 in 4 queries is successful.In addition, complex queries have a 38% share and yield“a terrible satisfaction”. Search engines also quickly reachtheir limits when the information seeker is entering a newdomain [15]. Along with the increasing popularity of searchengines, the areas of their application have grown from sim-ple look-up to rather complex information seeking needs.Look-up searches follow the “query and response” retrievalparadigm [27]. Each time users enter a query, they get a listof possibly relevant results. Look-up searches are amongthe most basic types of search tasks. Usually they are hap-pening in context with question answering and fact finding.Typically they are needed to answer who, when and wherequestions. They are not the means to answer why, what andhow questions which can be classified as exploratory searchtasks. To cope with those information needs, where theavailable technologies do not directly produce a solution byquery-answer only, users have adopted an exploratory searchlike behavior (multiple queries, follow links selectively, ex-plore the retrieved document space interactively). The morecomplex and exploratory a task-based information need be-comes, the less a search engine as a single means to fulfillthe task appears to be appropriate [27].

Search engine quality measurement initiatives have widelycontributed to enhancing search engine quality for look-upsearches, but the same is not yet true for exploratory searchtasks [27]. The evaluation methodologies focus predomi-nantly on the search system itself, not on the search processthat humans need to follow in order to fulfill their searchneed. Only implicit information is gathered during the clas-sic search engine quality measurement experiments while ex-plicit user feedback is seldom collected [17]. As White et al.stated [27], it is important to also integrate the behavior ofusers into the evaluation of exploratory search systems thathave expanded beyond simple look-up. Although search sys-tems are supporting exploratory search tasks better today,

751

Page 2: Search-Logger – Analyzing Exploratory Search Tasks

their evaluation is still limited to those systems that rely onminimal human-machine interaction [15].

In this paper, we provide a method and a set of tools thatallow to carry out evaluation experiments for exploratorysearch tasks. Our approach tries to close the gap betweenevaluation methods purely focused on technical aspects andexpensive laboratory methods. In comparison to classicsearch engine quality measurement experiments with focuson non-user aspects, we will investigate how search enginesand other information sources like Wikipedia [10], quality ofresults and usability are perceived by the information seekerwhen carrying out exploratory search tasks. Therefore wewill look at the whole search task, not only at the individualquery (for an explanation of the concepts task, query andsession please refer to Section 3).

2. RELATED WORKThis section gives an overview of the scientific work in the

areas of classical search engine performance measurementand user search experience research. Both fields are stronglyconnected when it comes to search process quality measure-ment as an integrated approach (machine and user). As B.Jansen stated, there is a tension between information search-ing and information retrieval. Despite their partly contradic-tory constructs, a trend towards convergence of both can benoticed [12]. Retrieval measures to measure the performanceand quality of information retrieval systems have been usedfor more than 50 years. A set of new web specific measureshas been developed, yet those are still limited in giving acomprehensive quality indication of present search services[23]. Lewandowksi and Hochstotter [17] proposed a searchengine quality measurement framework that both reflectsthe more system-centric approach and the user-centric ap-proach by taking index quality, quality of the results, qualityof search features and search engine usability into considera-tion. Their main point of reasoning is that for measuring theuser experience, empirical studies are indispensable. Oneside of the research methods to capture the user experienceare laboratory experiments. A user test group is collectedand the experiment is carried out in a laboratory environ-ment. During these experiments sample sizes are typicallysmall and therefore they are not very representative. A com-plementary approach is log file analysis. Although the justmentioned shortcomings of small sample sizes do not existin the log file analysis, there exists another one instead: Logfiles are anonymous data files that lack any sort of additionaldemographic information about the user like gender, age orprofession [9]. In addition, log files only include data gath-ered from a specific web site, e.g. when considering a searchengine, one can collect data on all the interactions with thesearch engine, but not the interactions taking place after theuser leaves the search engine and examines the results. Anapproach that is positioned in between log file analysis andsurveys is experience sampling via a browser plug-in. Theplug-in is installed locally and logs the behavior of the user(in search engines, as well as on other websites visited) butcan also be used to gather explicit user feedback by inte-grating questionnaires in a structured way. Fox et al. didexperiments with a browser plug-in in 2005 [7]. They alsoadded some automated questions to their experiments tocollect user feedback. With this approach they got explicitfeedback about the search engine performance on query levelas well as on session level. From the session log files Fox et

al. gathered implicit measures like time spent on page andtime to first click. Each of the approaches of classical searchengine evaluation covers some necessary aspects of an inte-grated search engine quality measurement framework, butnone covers all. The shortcomings are mostly due to too lit-tle focus on the user. Therefore we also screen the researchon evaluating exploratory search systems.

Exploratory search covers a different class of activitiesthan classic information retrieval (IR). The latter mainlyfollows the query-response paradigm and (only considersa query and its resulting documents). Exploratory searchcomprises more complex activities like investigating, evalu-ating, comparing, and synthesizing, where new informationis sought in a defined conceptual area. As exploratory searchsystems (ESSs) strongly rely on human interaction, the eval-uation of ESSs is an even more complex task [14]. Althoughan adequate set of measures has been found for search en-gine evaluation, the same is still missing for ESSs. In IR theCranfield methodology [4] was used to objectively compareIR systems. It was also later used by the Text RetrievalConference (TREC) as the main paradigm for its activitiesof large-scale evaluation of retrieval technologies [24]. WhileTREC is good at measuring technical aspects like under-lying algorithms, it does not sufficiently take the humaninteraction factor into account. Researchers are still strug-gling to integrate the user into the measuring process (as inTREC Interactive Track [6]). The main issue so far is therepeatability and comparability of experiments on differentsites. When compared to classical IR (where relevance isthe main performance measure), covering as many aspectsof a topic as possible is equally important as relevance in anexploratory search context [21]. As exploratory search ses-sions can span over days and weeks, long term studies areindispensable [14]. Sample sizes in ESS evaluation studiesare usually small. This limits the generalizability of the find-ings. Measures as in classic IR are not applicable to ESSs.Researchers have suggested to develop special measures forESSs [1, 19] as the users cannot be neglected due to theirhigh involvement in the search process. A workshop heldby Ryen White et al. in 2006 [28] produced the followingmeasures as appropriate for evaluating exploratory searchsystems: Engagement and enjoyment, information novelty,task success, task time and learning and cognition. As aconsequence, our combined approach aims to cover both,the technical side as well as the user experience aspects.

3. IMPLEMENTATIONSearch-Logger is an experimentation environment espe-

cially designed to carry out exploratory search task experi-ments. During the development phase our goal was to opti-mize the process (machine plus user interaction) as a whole.Measuring exploratory search tasks is a more complicatedendeavor then measuring search at the query level [14]. Wehave created Search-Logger’s architecture around the defini-tion of a search task as an open-ended, abstract and poorlydefined information need with a multifaceted character [18].

Such an exploratory search task typically consists of vari-ous sessions. A session is “a series of interactions by the usertoward addressing a single information need” [13]. A queryis a string of terms typed into a search engine and a list ofdocuments is retrieved. As mentioned, the major part of thesearch engine quality research is focused on the query leveland this is a limitation that we want to resolve with our

752

Page 3: Search-Logger – Analyzing Exploratory Search Tasks

Browser

Plug-in PHPFront-end Analyzer

interacts interacts

reads

Internet

monitors

browses

Figure 1: Architecture

demographics formcompleted

demographics formnot completed

no search casesleft for completion

search casesleft for completion

search case hasbeen started

Wait for clickon extension icon

Display success

Display post searchcase feedback form

Display demographicsform

Display pre search casefeedback form

search case hasnot been started

Figure 2: Activity diagram

Search-Logger. As proposed by White et al. in 2006 [28] wewill use the measures learning and understanding, task suc-cess and task time in our studies to extend the scope fromIR related search needs to broader exploratory search tasks.Our main parameters will be search task complexity and itsimpact on the search process performance.

Architecture. The architecture of the Search-Logger frame-work is shown in Figure 1. Similar to the approaches byFox et al. [7] and Lemur’s Toolbar [5], the Search-Loggeralso consists of a browser plug-in for Firefox, a remote logstorage database and an analysis environment. After ac-tivation, the plug-in monitors actions accessing the Inter-net in the browser and sends these to the PHP front-endof the database. Also initial information for starting newsearch tasks is read via this front-end from the database.The analyzer component accesses the database directly toallow the evaluation of the search results. The coarse be-havior of the plug-in component is visualized in Figure 2.Depending on if the Search-Logger has been used before, ifdemographics has been filled in, and what search cases werestarted and finished, it displays different forms and monitorsthe ongoing search task. It fulfills the following three maintasks: (i) deliver the precompiled search tasks to the users,(ii) gather implicit information about the search process bylogging various browser events as outlined in the next para-graph, (iii) gather explicit user feedback via standardizedquestionnaires supplied before and after each search task.With this approach we manage to log the search process onthe search task level. Each logged event is ear-marked for acertain search task. Therefore the task performance can beanalyzed and evaluated.

Data. With the Search-Logger experimentation frameworkwe can automatically log implicit user data and also gatherexplicit user feedback at the same time.

Implicit user data about the search process is logged onall sort of standard user events like links clicked, queries en-tered, tabs opened and closed, bookmarks added and deletedand clipboard events as illustrated in Table 1. In terms of

Type of event Explanation

Search task started(a)

Log entry for start of search task

Search task stopped(b)

Log entry for end of search task

Search experimentstarted

Start of search experiment

Search experimentstopped

End of search experiement

Web page visited User visited a web pageQuery entered User entered a queryLink clicked User followed / clicked on a linkTab opened User opened a tabTab closed User closed a tabBookmark added User created a bookmarkBookmark deleted User deleted a bookmarkText copied Copy/Paste eventSearch logger paused Pausing the search logger

extensionIntroducing newuser

Extension started for the firsttime

Demographicsdisplayed (c)

Demographics form displayed

Demographicssubmitted (d)

Demographics form submitted

Pre search task formsubmitted (e)

Pre search task feedback formsubmitted

Post search taskform submitted (f)

Post search task feedback formsubmitted

Table 1: User events logged by Search-Logger

implicit logging, all comparable tools log similar events andtherefore we will not go into every detail about that here.We focus on the implementation of the exploratory searchtask evaluation support instead. When the proband selectsthe first search case, an entry consisting of time and date islogged into the database. This also marks the start of theexperiment. Until this search task is finished by the user,each log entry carries the token of this case and is identi-fied accordingly. Next, the proband can start to carry outthe search task. The user can always pause the experiment.This will also pause the Search-Logger till the user resumesthe experiment. Those pause and start events are loggedand allow the creation of realistic search scenarios that canspan days and weeks.

Explicit data is gathered by asking the probands to fill inquestionnaires at the beginning of the experiment and be-fore and after each search task. Those questionnaires canbe designed freely and will most often contain a set of stan-dardized questions together with free form fields. Gatheringdemographic information about the user at the beginningof the experiment enables us to classify the users accord-ing to gender, Internet usage patterns and previous searchexperience. The explicit user feedback gathered before andafter each case is aimed towards gathering information re-garding task success, learning, cognition and enjoyment asthe main measures for evaluating exploratory search sys-tems [28] as outlined in Section 2. Table 1 lists the eventsthat are logged by Search-Logger. Especially the task defin-ing events (a)-(f) are unique to Search-Logger’s task based

753

Page 4: Search-Logger – Analyzing Exploratory Search Tasks

Figure 3: Dialogue window

Figure 4: Screenshot of questionnaire

measuring approach. This functionality is not built into anyof the comparable tools discussed in the related tools subsection.

User InterfacesThe users interact with Search-Logger via a little icon thatappears in the status bar of the browser at bottom right.A click on this icon opens up a window as illustrated inFigure 3. In this window the user can start, stop and pausesearch tasks. Depending on the chosen state, the icon in thebrowser status bar either blinks if in logging mode or showsa green pause otherwise.

The questionnaires as illustrated in Figure 4 are imple-mented as HTML pages and can be edited with a standardHTML editor.

Related toolsWe have identified several other tools designed mainly forthe same purpose of logging user events. The first toolto look at, is called “Wrapper” by Bernard J. Jansen [11].The Wrapper tool was developed for the purpose of loggingevents of all applications used by an information seeker (in-cluding applications like Microsoft Office or email clients).As Wrapper is a desktop application it can log any event,i.e. what programs are started, which processes are runningor what was copied to the clipboard. It was designed withthe focus on evaluating exploratory search systems

Study sample 10 participants

Gender 6 men, 4 womenAge 24-36 years

Backgrounds academic staff(5), highschoolteachers(2), students(2), sales reps(1)

Table 2: Sample description

Although this broad logging functionality is an advan-tage, the software does not have the possibility to supplyprecompiled search cases to a group of probands and doesnot collect explicit user feedback. Wrapper also does nothave the functionality built in, to relate a set of logs to acertain search task. The concept of search task does notexist in this approach. Another tool is a browser plug-inthat was created by Fox et al. in 2005 [7]. Fox’s approachwas implemented as an add-on for the Internet Explorer. Itwas the first tool to gather explicit as well as implicit infor-mation during searches at the same time. It evaluates thequery level and gathers explicit feedback after each query.Fox’s approach also came with a sophisticated analysis en-vironment for the logged data. Unfortunately Fox’s IE addon is not publicly available. A third tool is Lemur’s QueryLog Toolbar” [5] which is a toolkit implemented as a Firefoxplug-in and Internet Explorer Plug-in. It logs implicit dataon the query level. Logging exploratory search tasks is notimplemented. Further tools, we will not discuss here are:The HCI browser [2], The Curious Browser [3], WebTacker[22] and Weblogger [20]. Some of those are either discontin-ued, or do not fulfill our requirements enough to be includedin our comparison. All tools mentioned, have some featureswith the Search-Logger (to be described in the next section)in common (like the logging of user triggered events), whilesome aspects are not covered at all. The main and foremostdifference comes from the different purposes the tools weredesigned for. Most of the tools were developed for evalu-ating the query level while the Search-Logger was purelydeveloped for evaluating the exploratory search task level.None of the tools have the built-in functionality to have aprecompiled set of exploratory search tasks carried out bya test group in a non laboratory environment without anytime constraints. We hypothesize that realistic user studyresults for exploratory search experiments will only be pos-sible if probands do not have any time constraints and cansearch in a way and in a surrounding they are used to. Onlythen new phenomena like social search (e.g. asking peers onFacebook) will be showing up in the observations.

4. EVALUATIONWe carried out a pilot study to gauge the effectiveness of

the Search-Logger framework. We compiled an experimentconsisting of seven search tasks. The non look-up tasks weredesigned in accordance with the rules about designing ex-ploratory search tasks for user studies as stated in a paperby Kules and Capra [16]. The plug-in was distributed to aconvenience sample of 10 people as illustrated in Table 2.

They could install the plug-in wherever they wanted andwere free to work on the experiment at their choice. Theywere not tied to any location, narrow time frame or system.The study was conducted over a 4 week period. Of the seventasks, three were intended to be look-up tasks and four wereexploratory search tasks of increasing complexity. Due to

754

Page 5: Search-Logger – Analyzing Exploratory Search Tasks

many wedding checklists directly retrievable with one query,we reclassified task (iv) as look-up task. The tasks were asfollows:

(i) ”Please find the date when Mozart was born.”(ii) ”What is the corresponding value of 20.000 Estonian

Kroons in USD?”(iii) “When was Penicillin invented and by whom?”(iv) ”How do you plan a budget but still nice wedding;

write down the 20 most important points to consider.”(v) “Your daughter Anna is graduating from high-school

and wants to study abroad in the field of either political sci-ences or architecture. You can support her with 2000 Europer month. Which universities can you recommend that areaffordable and will offer Anna the best career perspectives af-ter graduating? Please compile a list of 10 universities thatyou can recommend.”

(vi) “Mr Johnson and his family are planning a trip toParis. Please write a summary of the following: What cul-tural events are there (for example in August 2010)? Whatsightseeing to do? Cover also hotels, flights, traveling toParis, weather.”

(vii) ”Barbara has been offered a well paid position inKabul, Afghanistan. She wants to know, how risky it is tolive in Kabul at the moment. Compile enough informationto have a good indication for the risk. Please compile at leasthalf a page of compelling facts.”

Due to the qualitative nature of this study and the smallsample size we will only present rough numbers and inter-esting qualitative results. The longest experiment was 15days and the longest net search time (when the search log-ger was recording) was 3.4 hours. The shortest net searchtime was 1h and 8 minutes. All users visited a maximum of2 web pages and needed a maximum of 3 queries for eachof the look-up tasks and performed the search with openinga maximum of 2 additional tabs in their browsers. Each ofthe tasks could successfully be fulfilled with search engineswithin a little over three minutes. The explicit user feedback(gathered before and after the tasks) showed no inconsisten-cies (apart from a slight underestimation of the effort of task(iii)). This reflects that users are able to judge the searcheffort for simple tasks quite well. As expected this combi-nation of implicit and explicit feedback confirms that searchengines are effective and efficient tools for look-up tasks.

The log for the tasks (v), (vi) and (vii) tells a differentstory. Per task users visited a maximum of 157 web pagesand submitted a maximum of 28 queries to search engines.In addition users added a maximum of two bookmarks andopened and closed up to 54 tabs per search task. 88% ofsearches in search engines were performed in Google, 9%in Bing and the rest in local search engines. More experi-mental data can be retrieved from our webpage http://www.search-logger.com/tiki-read_article.php?articleId=1

upon registration.It is clearly visible that search engines are the means of

choice to start a search as also observed in [29]. After beingdirected to a potentially interesting site, users spend most ofthe time for analyzing, synthesizing and information gath-ering away from search engine territory. We also comparedthe time users spent on searching with search engines tothe time users spent on searching with other informationsources. For each of the complex tasks less than ten percentof the time was spent on entering and rephrasing queriesand residing on the search engine results pages. In terms of

userID Action Information IP Date

Title:10 User viewed a

related link

Location: http://www.facebook.com/... 90.191.164.56 2010-08-11

11:37:56

10 User viewed a

related link

Location:

http://0.75.channel.facebook.com/iframe/11?r=

http%3A%2F%2Fstatic.ak.fbcdn.net%2Frsrc.php%2

FzBNLI%2Fh...

90.191.164.56 2010-08-11

11:37:56

10 User viewed a

related link

Location:

http://static.ak.facebook.com/common/redirect

iframe.html...

90.191.164.56 2010-08-11

11:37:55

10 User viewed a

related link

Location: about:blank... 90.191.164.56 2010-08-11

11:37:55

10 User viewed a

related link

Location: about:blank... 90.191.164.56 2010-08-11

11:37:55

10 User performed a

search

Location:

http://www.google.ee/search?hl=et&q=college+%

22political+sciences%22+tuition+comparison&aq

=f&aqi=&aq...

90.191.164.56 2010-08-11

11:37:19

10 Title: college "political sciences" tuition

comparison - Google otsing

10 User performed a

search

Location:

http://www.google.ee/search?hl=et&source=hp&q

=college+%22political+sciences%22+tuition&btn

G=Google+o...

90.191.164.56 2010-08-11

11:37:12

10 Title: college "political sciences" tuition

- Google otsing

10 User clipboard

contents

Clipboard contents 90.191.164.56 2010-08-11

11:37:09

10 Contents: college "political sciences"

tuition

10 User copied

something

Clipboard change detected ... 90.191.164.56 2010-08-11

11:37:08

Figure 5: Switch to social search in Facebook

explicit feedback, we noticed that users tend to underesti-mate the time needed for exploratory search tasks and over-estimate the power of search engines. Most users stronglyagreed“I think I can find the desired information with searchengines only”before the tasks and corrected their judgementafter the task. One user spent almost 3 hours on case (v) andout of frustration navigated to Facebook (as shown in Fig-ure 5) to shortcut his search and ask for recommendationsfrom friends. Overall, users found the Search-Logger “easyto use” and liked the freedom to carry out the experiment ina non laboratory environment. Two users commented thatthey would maybe search a bit differently (with more en-gagement) if they really depended on the reliability of theinformation. We must therefore assume that the data doesnot fully reflect perfectly realistic user behavior.

Six users out of ten finished all search tasks, one userfinished 6 tasks, and three users only finished the first 3look-up tasks.

5. CONCLUSIONS AND FUTURE WORKIn this paper, we are presenting an experimentation toolkit

called Search-Logger for the analysis of exploratory searchtasks. We introduce the difference between search queries,search sessions, and search tasks. For the proposed experi-mentation toolkit, we have already developed a Firefox plug-in and a database installer and have described them andtheir features in this paper. Furthermore, we have shownthe results of a first pilot study carried out with the Search-Logger. The main discoveries are that new phenomena likeusing various search systems in parallel over an extendedperiod of time and social search [8] (e.g. in Facebook) arepoorly covered by classic search engine quality experiments.Especially when search tasks become complex, users obvi-ously have a tendency to rather ask an expert or a peerthan spending too much time searching themselves. Withthis experiment we could also prove that the Search-Loggerapproach enables us to gauge the main measures for ex-ploratory search systems, engagement and enjoyment, tasksuccess and task time as suggested by R. White [28]. Wehave lined up several field-tests (with bigger statistically rel-evant sample sizes). The results of these experiments willbe published in several follow-up papers. Using the de-mographic information, we hope to find relations betweensearch tasks and specific users to make suggestions for im-provements for targeted domain specific search.

6. REFERENCES[1] P. Borlund and P. Ingwersen. Measures of relative

relevance and ranked half-life: performance indicatorsfor interactive IR. In Proceedings of the 21st annual

755

Page 6: Search-Logger – Analyzing Exploratory Search Tasks

international ACM SIGIR conference on Research anddevelopment in information retrieval, page 324–331.ACM New York, NY, USA, 1998.

[2] R. Capra. HCI browser: A tool for studying websearch behavior. Available from: http://uiir-2009.

dfki.de/papers/uiir2009_submission_18.pdf [citedSeptember 10, 2010].

[3] M. Claypool, P. Le, M. Wased, and D. Brown. Implicitinterest indicators. In Proceedings of the 6thinternational conference on Intelligent user interfaces,page 40, 2001.

[4] C. W. Cleverdon, J. Mills, and E. M. Keen. Aninquiry in testing of information retrieval systems.(2vols.). Cranfileld, UK: Aslib Cranfield ResearchProject, College of Aeronautics, 1966.

[5] P. Clough and B. Berendt. Report on the TrebleCLEFquery log analysis workshop 2009. In ACM SIGIRForum, volume 43, page 71–77, 2009.

[6] S. T. Dumais and N. J. Belkin. TREC Experiment andEvaluation in Information Retrieval, chapter TheTREC Interactive Track: Putting the User IntoSearch. MIT Press, 2005.

[7] S. Fox, K. Karnawat, M. Mydland, S. Dumais, andT. White. Evaluating implicit measures to improveweb search. ACM Transactions on InformationSystems (TOIS), 23(2):147–168, 2005.

[8] J. Freyne, R. Farzan, P. Brusilovsky, B. Smyth, andM. Coyle. Collecting community wisdom: integratingsocial search and social navigation. In Proceedings ofthe 12th international conference on Intelligent userinterfaces, pages 52–61, Honolulu, Hawaii, USA, 2007.ACM. Available from: http://portal.acm.org/

citation.cfm?id=1216295.1216312,doi:10.1145/1216295.1216312.

[9] C. Grimes, D. Tang, and D. M. Russell. Query logsalone are not enough. In Workshop on Query LogAnalysis at WWW. Citeseer, 2007.

[10] M. Hu, E. Lim, A. Sun, H. W. Lauw, and B. Vuong.On improving wikipedia search using article quality.In Proceedings of the 9th annual ACM internationalworkshop on Web information and data management,pages 145–152, Lisbon, Portugal, 2007. ACM.Available from:http://portal.acm.org/citation.cfm?id=1316926,doi:10.1145/1316902.1316926.

[11] B. J. Jansen, R. Ramadoss, M. Zhang, and N. Zang.Wrapper: An application for evaluating exploratorysearching outside of the lab. EESS 2006, page 14.

[12] B. J. Jansen and S. Y. Rieh. The seventeen theoreticalconstructs of information searching and informationretrieval. Journal of the American Society forInformation Science and Technology, pages n/a–n/a,2010. Available from: http://onlinelibrary.wiley.

com/doi/10.1002/asi.21358/abstract,doi:10.1002/asi.21358.

[13] B. J. Jansen, A. Spink, C. Blakely, and S. Koshman.Defining a session on web search engines. Journal ofthe American Society for Information Science andTechnology, 58(6):862–871, 2007. Available from:http://onlinelibrary.wiley.com/doi/10.1002/

asi.20564/full, doi:10.1002/asi.20564.

[14] D. Kelly, S. Dumais, and J. Pedersen. Evaluation

challenges and directions for information seekingsupport systems. IEEE Computer, 42(3), 2009.

[15] W. Kraaij and W. Post. Task based evaluation ofexploratory search systems. EESS 2006, page 24.

[16] B. Kules and R. Capra. Designing exploratory searchtasks for user studies of information seeking supportsystems. In Proceedings of the 9th ACM/IEEE-CSjoint conference on Digital libraries, pages 419–420,Austin, TX, USA, 2009. ACM. Available from:http://portal.acm.org/citation.cfm?id=1555400.

1555492, doi:10.1145/1555400.1555492.

[17] D. Lewandowski and N. Hochstotter. Web searching:A quality measurement perspective. Web Search,Information Science and Knowledge Management, 14,2008.

[18] G. Marchionini. Exploratory search: from finding tounderstanding. Communications of the ACM, Volume49(4):Pages 41–46, 2006.

[19] H. O’Brien. Defining and Measuring Engagement inUser Experiences with Technology. Unpublisheddoctoral dissertation, Dalhousie University, Halifax,Canada. 2008.

[20] R. W. Reeder, P. Pirolli, and S. K. Card.Webeyemapper and weblogger: Tools for analyzingeye tracking data collected in web-use studies. InCHI’01 extended abstracts on Human factors incomputing systems, page 20, 2001.

[21] D. Tunkelang. Precision AND recall. IEEE Computer,42(3), 2009.

[22] D. Turnbull. Webtacker: A tool for understanding webuse. Unpublished report. Retreived on May,18(2009):19–23, 1998.

[23] L. Vaughan. New measurements for search engineevaluation proposed and tested. Informationprocessing and management, 40(4):677–691, 2004.

[24] E. M. Voorhees and D. K. Harman. TREC:Experiment and evaluation in information retrieval.MIT Press, 2005.

[25] S. Weitz. Search isn’t search. Microsoft companyreport, Microsoft Corporation, SMX 2009, Munich.

[26] R. W. White, B. Kules, and S. M. Drucker.Supporting exploratory search, introduction, specialissue, communications of the ACM. Communicationsof the ACM, 49(4):36–39, 2006.

[27] R. W. White, G. Marchionini, and G. Muresan.Evaluating exploratory search systems: Introductionto special topic issue of information processing andmanagement. Information Processing & Management,44(2):433–436, Mar. 2008. Available from:http://www.sciencedirect.com/science/article/

B6VC8-4R2H21N-3/2/

9c2f50439d974562ab4fa0504e0bb865,doi:10.1016/j.ipm.2007.09.011.

[28] R. W. White, G. Muresan, and G. Marchionini.Report on ACM SIGIR 2006 workshop on evaluatingexploratory search systems. In ACM SIGIR Forum,volume 40, page 52–60. ACM New York, NY, USA,2006.

[29] R. W. White and R. A. Roth. Exploratory search:Beyond the Query-Response paradigm. SynthesisLectures on Information Concepts, Retrieval, andServices, 1(1):1–98, 2009.

756