enhanced browsing in digital libraries: three new approaches to browsing in greenstone

15
Int J Digit Libr (2004) 4: 283–297 / Digital Object Identifier (DOI) 10.1007/s00799-004-0088-6 Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone Dana McKay, Preeti Shukla, Rachael Hunt, Sally Jo Cunningham Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton, New Zealand e-mail: {dana, ps11, rmh14, sallyjo}@cs.waikato.ac.nz Published online: 20 October 2004 – Springer-Verlag 2004 Abstract. Browsing is part of the information-seeking process, used when information needs are ill-defined or unspecific. Browsing and searching are often interleaved during information seeking to accommodate changing awareness of information needs. Digital libraries often do not support browsing. Described here are three brows- ing systems created for the Greenstone digital library software. Keywords: Browsing – Information seeking – Searching – Self-organizing maps – Thesaurus 1 Introduction Browsing is a vital part of the information-seeking pro- cess, allowing information seekers to meet ill-defined in- formation needs and find new information [26, 29]. De- spite the importance of browsing in information seek- ing, however, it is poorly supported in many information systems [27]. The Greenstone digital library software [1] created by the New Zealand Digital Library research group [2] is an example of software that does support browsing, though to a limited extent; the limitations of Greenstone’s browsing facilities are discussed in Sect. 3. Greenstone is used by numerous organizations worldwide to manage and present collections of documents. The aim of this work is to present possibilities for adding greater flexibility to Greenstone’s browsing facili- ties in order to provide better support for the information- seeking process. When considering ways to make brows- ing easier and more effective, a number of potential directions presented themselves: we could provide bet- ter facilities for browsing through search results or for browsing through an entire collection, we could present conventional list displays or experiment with 2D or 3D visualizations, and we could exploit subject metadata (if available) or use learning techniques to extract subject or similarity information. The first browsing approach presented allows users to specify parameters such as the metadata by which they wish to browse and the maximum number of documents on a page (Sect. 4); this supports browsing list displays of search results. The second approach uses a self-organizing map (SOM) to create a browsable, searchable represen- tation of clusters of documents within a given collection (Sect. 5) – where the SOM uses machine learning to create similarity measures between documents and pro- vides a layered, 2D visualization of the entire collection. The third approach incorporates a subject thesaurus into the search interface to support users in query refinement and in exploration of the subject vocabulary and struc- ture of the document collection (Sect. 6). All three of these enhancements allow the user to easily move be- tween searching and browsing activities – a significant requirement for an effective digital library interface, as this supports common information seeking behaviour [5, 28, 40]. We adopted a formative evaluation approach [31] in developing these three browsing facilities – that is, prospective users were involved in each stage of their de- sign and development, and feedback from these users/ testers guided the creation and refinement of the imple- mented interface. Each browsing facility was assessed for “learnability”: the extent to which a user can get started working with the system without first undergoing train- ing [31]. A high proportion of information seekers are perpetual novices with the (potentially many) informa- tion sources that they consult [8], and so a system that can be immediately useful will be more likely to see fu- ture use. Assessment of learnability includes evaluation of predictability – the ability of users to predict system reactions [15]. The findings of the user studies, and their

Upload: dana-mckay

Post on 14-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

Int J Digit Libr (2004) 4: 283–297 / Digital Object Identifier (DOI) 10.1007/s00799-004-0088-6

Enhancedbrowsing in digital libraries: three newapproachesto browsing inGreenstone

Dana McKay, Preeti Shukla, Rachael Hunt, Sally Jo Cunningham

Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton, New Zealande-mail: {dana, ps11, rmh14, sallyjo}@cs.waikato.ac.nzPublished online: 20 October 2004 – Springer-Verlag 2004

Abstract. Browsing is part of the information-seekingprocess, used when information needs are ill-defined orunspecific. Browsing and searching are often interleavedduring information seeking to accommodate changingawareness of information needs. Digital libraries often donot support browsing. Described here are three brows-ing systems created for the Greenstone digital librarysoftware.

Keywords: Browsing – Information seeking – Searching– Self-organizing maps – Thesaurus

1 Introduction

Browsing is a vital part of the information-seeking pro-cess, allowing information seekers to meet ill-defined in-formation needs and find new information [26, 29]. De-spite the importance of browsing in information seek-ing, however, it is poorly supported in many informationsystems [27]. The Greenstone digital library software [1]created by the New Zealand Digital Library researchgroup [2] is an example of software that does supportbrowsing, though to a limited extent; the limitations ofGreenstone’s browsing facilities are discussed in Sect. 3.Greenstone is used by numerous organizations worldwideto manage and present collections of documents.The aim of this work is to present possibilities for

adding greater flexibility to Greenstone’s browsing facili-ties in order to provide better support for the information-seeking process. When considering ways to make brows-ing easier and more effective, a number of potentialdirections presented themselves: we could provide bet-ter facilities for browsing through search results or forbrowsing through an entire collection, we could presentconventional list displays or experiment with 2D or 3D

visualizations, and we could exploit subject metadata (ifavailable) or use learning techniques to extract subject orsimilarity information.The first browsing approach presented allows users to

specify parameters such as the metadata by which theywish to browse and the maximum number of documentson a page (Sect. 4); this supports browsing list displays ofsearch results. The second approach uses a self-organizingmap (SOM) to create a browsable, searchable represen-tation of clusters of documents within a given collection(Sect. 5) – where the SOM uses machine learning tocreate similarity measures between documents and pro-vides a layered, 2D visualization of the entire collection.The third approach incorporates a subject thesaurus intothe search interface to support users in query refinementand in exploration of the subject vocabulary and struc-ture of the document collection (Sect. 6). All three ofthese enhancements allow the user to easily move be-tween searching and browsing activities – a significantrequirement for an effective digital library interface, asthis supports common information seeking behaviour [5,28, 40].We adopted a formative evaluation approach [31]

in developing these three browsing facilities – that is,prospective users were involved in each stage of their de-sign and development, and feedback from these users/testers guided the creation and refinement of the imple-mented interface. Each browsing facility was assessed for“learnability”: the extent to which a user can get startedworking with the system without first undergoing train-ing [31]. A high proportion of information seekers areperpetual novices with the (potentially many) informa-tion sources that they consult [8], and so a system thatcan be immediately useful will be more likely to see fu-ture use. Assessment of learnability includes evaluationof predictability – the ability of users to predict systemreactions [15]. The findings of the user studies, and their

Page 2: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

284 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

impacts on the design and development of their respectivebrowsing facilities, are summarized with the descriptionsof the three browsing interfaces.The formative evaluation development process is em-

ployed to improve the usability of a system or interface,where “usability” refers to the ease with which a per-son can learn to interact with and then use a system orcomponent of a system [21]. As is argued in Sect. 2, the in-clusion of additional browsing features in Greenstone willincrease support for a common information-seeking tech-nique. The Technology Acceptance Model (TAM) [13]predicts that user acceptance rates (and hence actual us-age) of a system will be higher if the system is perceivedto be easy to use (to have a greater degree of usability,brought about through the use of formative evaluation)and perceived to be useful (by adding new, useful featuresto Greenstone); the intent, then, is to enhance Green-stone so as to encourage a greater user acceptance of it.Section 2 of this paper discusses the information-

seeking process: examining what browsing is, why it isimportant and how information systems can best supportit. Section 3 describes Greenstone’s current browsing ca-pabilities and considers their weak points. Sections 4–6present the three new browsing facilities, and Sect. 7draws some conclusions about this work.

2 Human information seeking

The information-seeking process begins with the concep-tion of a need for information, and (if successful) endswith the satisfaction of the information seeker that theyhave the information they require [26] (though in realityusers often “satisfice” [3], or simply give up [34]).Searching may be most of information retrieval , but

it is not all of information seeking [20]. To be as use-ful as possible, digital library interfaces should supportall stages of the information-seeking process (for ex-ample see [7, 26, 34]), not just searching. Despite researchhaving long emphasized that browsing is a fundamen-tal information-seeking activity (Bates described this in1989 [6]), many systems still do not support it [27, 38].Moreover, the information-seeking process is not neces-sarily linear; users generally seek information in an itera-tive manner, switching back and forth between stages [5,28, 40], particularly between searching and browsing.Browsing can be supported by many different fa-

cilities, including semantic browsing using such toolsas self organizing maps [10] or phrases [36], metadata-based browsing like the Greenstone classifier system [4],and subject categorization (for example the Library ofCongress classification scheme). Browsing may be withina document (for example leafing through a book) or be-tween documents (for example wandering the libraryshelves). Browsing may occur for a number of reasons,including evaluation of an information source, informa-tion discovery and clarification of an information prob-

lem [10, 28, 38]. A common definition of browsing is anexploratory information-seeking strategy relying heavilyon serendipity and being used to meet an ill-defined infor-mation need [6, 10, 33].One way conventional libraries support browsing is

through subject classification of documents. However,physical libraries cannot rearrange the shelves at whim tomeet the needs of the user (say if they wanted to browseby author and changed suddenly to title). Electronic in-formation systems (such as digital libraries) have the op-portunity to “rearrange the shelves”.For a system to support browsing effectively, it must

be flexible so as to allow the user to modify their infor-mation need and information-seeking strategy at will.It should support browsing for any number of rea-sons, including those mentioned above. For optimuminformation-seeking effectiveness, interleaving of brows-ing and searching should ideally be simple [12, 22].Browsing is easily shown to be a vital part of the

information-seeking process and very effective when com-bined with searching. Information systems need to recog-nize this importance and support browsing in ways thatwill allow users to become effective information seekers.

3 Greenstone and browsing

Greenstone is a complete digital library management sys-tem, handling everything from collection building to col-lection presentation via a Web browser. It facilitates full-text and metadata searching as well as various kinds ofbrowsing [1, 4]. Greenstone is designed to allow collec-tions to be built fully automatically (i.e. to not requirethe manual processing of source documents) and to beserved by inexpensive machines over a slow Internet con-nection [43]. Greenstone is largely stateless, not keepinginformation about what users do from one action to thenext, so as to help reduce server load. Section 3.1 dis-cusses the current browsing facilities available in Green-stone, and Sect. 3.2 explains why these facilities inade-quately meet information seekers’ needs.

3.1 Greenstone’s current browsing system

Greenstone’s current browsing system is known as the“classifier” system: documents are placed into classesat collection build time according to their metadata,and browsing structures are pre-built ready for loading.Greenstone supports a number of different types of clas-sifier, each suited to a specific kind of metadata. Eachclassifier displays information in its own way.There are five main types of classifier currently im-

plemented in Greenstone: the list, the alphabetic classi-fier, the hierarchic classifier, the date classifier [4] and thephrase-based classifier Phind [36].The list classifier is the simplest of the classifiers; it

merely sorts metadata alphabetically and presents docu-ments in a single long list (Fig. 1a).

Page 3: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 285

Fig. 1a–e.The Greenstone classifiers. a A list classifier of “How to” metadata. b An alphabetic classifier, viewedby title. The section being viewed is “K–L”. c A hierarchic classifier. The classification being viewed “02.04” istwo levels deep. d A date classifier. Note the months down the side of the page. e The “Phind” classifier. Theword “forest” is being browsed

Page 4: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

286 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

The alphabetic classifier also sorts documents alpha-betically, but the document list is then divided intoclasses according to initial letter, and the classes are dis-played across the top of the page (Fig. 1b). If the classesare smaller than a pre-set size, the classifier will mergethem (e.g. “K–L” in Fig. 1b). There is no upper limit onthe number of documents in a class.The hierarchic classifier deals with numerical hierar-

chies – documents are assigned a number indicating theirposition in the hierarchy (much like the Dewey decimalsystem), and the user views the hierarchies by progressivedrill-down clicking (Fig. 1c).The date classifier is very much like the alphabetic

classifier, though it uses the year as a basic unit (as op-posed to initial letter) and displays month informationdown the side of the hierarchy (Fig. 1d).The Phind classifier is not based on traditional meta-

data. Instead, it creates an index of phrases when thecollection is built and allows the user to browse by en-tering a single word or phrase and drilling down throughphrases to documents (Fig. 1e).

3.2 Problems with browsing using the classifiers

The classifier system in Greenstone does not supportusers as well as it might. The failings are in two majorareas – the fact that users cannot combine searching andbrowsing, and in the rigidity of the system.As discussed in Sect. 2, users locate information most

effectively when they can switch easily between searchingand browsing. Search results in Greenstone are currentlyalways displayed in lists, and if a search is not ranked,these lists are unsorted. Classifiers present all the docu-ments in a collection that have the classification meta-data; there is no way to search a classifier. Thus the cog-nitive cost of switching between searching and browsing ishigh, reducing information-seeking effectiveness.The rigidity of the classifier system is built-in – each

classifier uses static, pre-built browsing structures, thusallowing collections to be presented only in one pre-determined manner without input from the user. To illus-trate how this can become a problem, imagine a collectionwith a number of distinct documents with the same ti-tle and different authors. Users cannot specify that theywould also like to see the author metadata when brows-ing, much less insist upon the documents being sorted byauthor. Another example of how the rigidity of the classi-fication system is detrimental to the information seeker’sexperience is the size of the groups displayed. Users arebetter able to navigate and evaluate options if they do nothave to scroll [32]; yet in medium-sized collections (say1000 documents) users may have to scroll through threescreens of a single classification, and there is no way forusers to specify the largest number of documents theywish to see on a page.The Phind classifier solves the rigidity problem,

but allows browsing of only a single kind of metadata

(phrases), and still does not allow collections to be fil-tered by search terms – and thus does not entirely solvethe browsing problem.Greenstone supports browsing in a limited way: non-

searchable, static metadata classifiers. While this ap-proach goes some way towards supporting browsing, ithinders users in their information seeking by not allow-ing them the flexibility necessary for truly effective infor-mation seeking. Moreover, Greenstone has strong goalsrelated to usability, utility and simplicity of collection cre-ation. The work described in Sects. 4–6 is an attempt toovercome the failings in Greenstone while still taking intoaccount its goals of simple collection provision on inex-pensive hardware.

4 A new search-based browsingsystem for Greenstone

This first browsing enhancement to Greenstone focuseson an inter-document metadata-based browsing system.This approach was chosen because Greenstone is aboutpresenting collections of documents (rather than singledocuments), and Greenstone already has an effective se-mantic browsing system in Phind [36]. The user capabil-ities the new system was to support were defined withthe failings of the current system and the research onhuman information-seeking behaviour in mind. These ca-pabilities are as follows: users must be able to combinesearching and browsing, users must be able to choose themetadata by which they browse, users should be able tobrowse by more than one kind of metadata at a time, andusers should be able to restrict the amount of informa-tion on any one screen. The guiding principle is to give theuser the richest possible browsing experience. This brows-ing system is described in Sect. 4.1, and its evaluation isdiscussed in Sect. 4.2.

4.1 The search-based browsing systemand its implementation

This new browsing scheme is designed to provide a richbrowsing experience without being too taxing on the user– allowing users flexibility in specifying how they wish tobrowse, while providing a simple interface with sensibledefaults. This also involved enhancing the existing brows-ing system to handle alphanumeric and date metadata.One of the major advantages of this system over the

existing classifier system is that searching and browsingcan be combined easily. The search offered by this inter-face is functionally identical to the ordinary Greenstonesearch, but results are presented in a browsing struc-ture defined by the user. To avoid the loss of the usefulranking information provided by Greenstone’s underly-ing search technology MG [44], where the search is an“any words” search, rank information is displayed nextto the document metadata, similar to search engine re-sults (Fig. 2b). If the user does not enter any search

Page 5: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 287

Fig. 2a,b. The new search-based browsing system. a Browsing of a whole collection by two piecesof metadata. b Browsing of ranked search results by title metadata. Note the ranking

information on the right and the second level of the hierarchy

terms, then the user can browse the whole collection(Fig. 2a).The mechanism for specifying how documents are to

be browsed must allow great flexibility, but it also mustto be simple enough to use without training. To that end,the user is presented with familiar Web-based controlsand simple language to determine their browsing prefer-ences. They may browse one or two kinds of metadata ata time (the lists of metadata available for browsing are re-quested from the collection and inserted into the interfacewhen the browse page is displayed), and they may specifyhow many documents they wish to see on a page.Because Greenstone is stateless, and combining all

this information to form a browsing structure is compu-tationally expensive, the browsing structures are createdonly once, and the classes not being currently viewed arehidden in the page using dynamic HTML and JavaScript.This means that the user will get an instantaneous re-sponse when switching between the classes in a browsingstructure.Browsing more than one kind of metadata at a time

allows the user to view distinguishing metadata wherethe primary browsing metadata values occur more thanonce (for example many books with the same author butdifferent titles, when browsing first by author). It also al-lows the user to sort the duplicates by the second piece ofmetadata (so sorting the books by author, and then sort-ing the books with the same author by title). Both piecesof metadata are displayed for each document, even wheredocuments may only have one of the two pieces (Fig. 2a).Documents without the first piece of classification meta-data are slotted into the browsing structure under thelabel “no metadata available”.

The system accommodates the “number-of-docu-ments-per-page” option by creating a two-level hierarchy,with basic divisions (for example initial letter in the firstlevel of the hierarchy, and the second level of the hierarchydivided up so as to provide the required number of docu-ments in each class). The classes at this level are labelledsuch that they are distinct from neighbouring classes (forexample if the first document in one class is called “teakchests” and the first document in the next class is called“teas of the world”, the classes will be labelled “teak-”and “teas-” respectively). See Fig. 2b for an example ofsuch a browsing situation.Users rarely change the defaults on information-

seeking interfaces; therefore, while this interface offersa lot of flexibility, it must also have sensible defaults [24].By default, the entire collection is displayed for brows-ing by title metadata; this default is chosen with a viewto giving the user a good overview of the collection. Thedefault second piece of metadata by which to browse isdetermined by whatever metadata the collection has – itis hard to tell automatically what will be useful for anygiven collection, so the default second piece of metadatais the first detected piece of metadata that is not the ti-tle. The default number of documents per page is 20; thisis approximately one screen full, so as to avoid wastedscreen real estate, but also to lessen the need for scrolling.Searching defaults to “any” words, as this is less likelyto give a “no match” result and therefore less likely tofrustrate the user [24].The new browsing system has been designed to offer

flexibility in a very simple manner. The major advantagesit has over Greenstone’s existing classifier system are thecombination of searching and browsing and the ability to

Page 6: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

288 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

interactively change browsing structures to meet chang-ing information needs.

4.2 Evaluation

There were three main components to the evaluation ofthis system: a technical evaluation (Sect. 4.2.1), a userstudy of the way users would like to browse (Sect. 4.2.2)and a user study testing the predictability of the system(Sect. 4.2.3).

4.2.1 Technical evaluation

Systems must be technically sound to be useful. Thereare two aspects of the new browsing system that can bemeaningfully evaluated: scalability and time constraints.The new browsing system attempts to address the scal-

ability issues faced by the old system (i.e. browsing listspotentially growing very long inmedium large collections)by introducing a second level to the browsing hierarchy.Consider a collection with 26000 documents, with the ini-tial words of titles evenly distributed through the alpha-bet; the title browsing interface of the old system woulddisplay 1000 documents in a long list, for each letter of thealphabet. The new system would divide these classes of1000 documents up into smaller classes of documents, con-taining, say, 50 documents each (determined by the “max-imumnumber of documents per page” setting on the inter-face). Thismeans therewill be 20 subclasses across the topof the page under the top-level classes, which is still rea-sonably usable. Of course, it is possible with this systemto have somany documents that it also becomes unusable,but this upper bound on a manageable collection size ismuch larger than in the old system.The time constraints on the new system are more wor-

rying. Users hate waiting for Web pages to load [24, 33],so load time has a large impact on the usability of a page.Unfortunately, because the new system produces pagesthat contain entire browsing structures, the pages arevery large. The browsing interface itself is 6.78KB andeach document is 0.04KB in the browsing structure. Thismeans that over a 56 kbps modem the interface will take1 s to load, and each document in the browsing structurewill add about 0.05 s; with a large collection this adds upvery quickly. A collection of 1073 documents (browsed bytitle and coverage) was shown to take 66 s to load overa 56-kbps modem connection, precluding this interfacefrom being used over a low-speed connection. However,for a high-speed connection or a local collection, this timedrops to under 1 s, and once the page is loaded the entirebrowsing hierarchy is available instantaneously. When wecompare this to the classifier system, the total load timeover a 56-kbps connection would be 88 s for title metadataonly. However, this time is in smaller chunks as the userloads each part of the hierarchy, and thus the wait timeis more palatable to the user (each individual page wouldtake about 3 s to load over a 56-kbps connection).

A transaction log analysis of 42 collections in the NewZealand Digital Library [2] from 21 June to 19 Decem-ber 2001 shows that approximately 9.5% of all actionsare browsing with the classifier system and that 37% ofthe time when users looked at one part of a classifier(say the “A” section of a title classifier) their next ac-tion was to look at another part of the same classifier(say the “B” section). This has an associated time costunder the old system, but under the new system it isinstantaneous.

4.2.2 How users would like to browse

The first user study was designed to examine how, givenfree rein, users would like to browse a collection of docu-ments in an online information system.The first study selected ten participants from the sum-

mer population of the Department of Computer Scienceat the University of Waikato. Participants included tu-tors, lecturing staff, and graduate and undergraduate stu-dents. All the participants had some research experience(the undergraduate students were employed in summerresearch positions at the time of the study), though notnecessarily in the digital library or information-seekingfields.Each of these users completed a questionnaire about

on- and offline browsing and their experience with it. Allof the users had used online search engines, and mostfelt that they “usually” found what they were looking forwith these tools. Only two of the participants had morethan basic experience with online browsing tools, andmany commented that they “didn’t like them” (thoughone user who had a lot of experience with them liked thema lot and commented they were good for “filtering outcrap sites”).After completing this questionnaire, users were given

a set of 30 index cards, each of which represented a docu-ment randomly chosen from the Women’s History Col-lection in the New Zealand Digital Library. These indexcards each showed four types of metadata, as recorded inthe online collection (see Fig. 3 below). Participants wereasked to arrange the documents the way they would wishto browse them if they were looking for information onmedieval women.When participants had arranged the cards to their

satisfaction, the arrangement of the cards was manuallyrecorded, and participants were interviewed about their

Fig. 3. An example of an index card used in studiesof the search-based browsing system

Page 7: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 289

browsing structure (that is, a retrospective verbal proto-col was applied [41]) and their expectations of an onlineinformation-seeking system and what it would allow themto do.Of ten study participants, six created a purely meta-

data-based browsing structure, two created purely se-mantic browsing structures, and the remaining two cre-ated structures that were primarily metadata driven buthad some semantic elements (semantic in this context isused to mean “based on meaning inherent in the meta-data, rather than on the metadata itself”).The semantic structures were not reproducible by ma-

chine; one merely sorted the cards into two piles of “use-ful” documents and “not useful” documents, and theother was based on over-broad and vague categories suchas “law and politics” and “sociology”.The metadata-driven structures were based variously

on coveragemetadata and title metadata. Only one struc-ture was purely alphabetical based on title; the remainderused a combination of the two kinds of metadata.When questioned about what they would expect from

an online browsing system, three users said that theywould expect a search facility as well as browsing sup-port (in keeping with the users’ abundant experience withonline search systems and lack of experience with on-line browsing systems). Three participants commentedthat “an information system should be able to presentmore than one view of the documents”, “I would liketo be able to sort by different metadata if I wantedto” and “you can’t rearrange the shelves in a library,but I would expect an information system to be ableto arrange the documents according to all kinds ofmetadata”.In sum, the users predominantly wanted metadata-

based browsing systems, wanted to be able to com-bine search and browsing, and wanted to “rearrange theshelves” to browse by different kinds of metadata. Eachof these functionalities is something that search-basedbrowsing offers over and above the conventional Green-stone browsing experience.

Fig. 4. The interface picture shown to participants in an experiment deter-mining the predictability of the search-based browsing interface

4.2.3 Predictability of the search-basedbrowsing system

The second user study was designed to ensure the us-ability of the search-based browsing system. It has beenshown in research that users do not use help systemswhen using digital libraries, and that if something istricky to learn, it will be rapidly abandoned [25]. Withthis in mind, it was important that the search-basedbrowsing system produce results that the users would ex-pect, given the interface.There were eight participants in this study, selected

from the same pool of people as for the first study. The ex-periment used the same set of 30 index cards described inSect. 4.2.2. In this experiment, however, users were showna picture of the interface for the search-based browsingsystem (Fig. 4) and asked to arrange the documents theway they believed the system would arrange them. Thelayout they generated was manually recorded, and thenthe participants were asked to explain what about the in-terface led them to believe the systemwould create a simi-lar structure (a retrospective verbal protocol [41]).Seven of the eight participants sorted the documents

much in the way the system would, with a few minordifferences (for example sorting punctuation before titlesbeginning with “a” and commenting that “the systemshouldn’t do it this way but it probably does” – it does-n’t). The eighth participant grouped the documents byfirst letter in the title and then sorted the groups by cov-erage metadata. When asked what about the interfacemade him believe documents would be sorted this way, hereplied, “This is not how I believe the interface would sortthe documents; this is how I would like it to sort the docu-ments – in reality the interface would sort the documentsalphabetically”.All eight participants in the second study were able

to identify how the interface would sort the documents,either according to the experiment protocol or verballyafterwards. This implies the search-based browsing sys-tem behaves in a way users find readily explicable, and

Page 8: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

290 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

thus that users will not need to consult online help to un-derstand the interface.Evaluating the new browsing system both technically

and with user studies shows that it has only one majorflaw: the amount of time a browse page may take to load(and even that is ameliorated by the fact that it thenprovides better performance than the standard brows-ing interface 37% of the time). The new system handleslarge numbers of documents better than the old system,is readily comprehensible, and allows users the flexibilitythey want in a browsing system.

5 A self-organizing-map-based browsing facility

This section describes a layered, 2D interface for browsinga digital library collection, based on a self-organizing-map (SOM) representation of documents. A SOM is anunsupervised machine learning algorithm used to clusterand visualize high-dimensional data sets [10]. It is partic-ularly attractive for inclusion in digital library systemslacking human-assigned subject classification metadatabecause it creates an automated grouping of “like” docu-ments and presents this clustering in a hierarchic, maprepresentation. At present, it is often difficult to quicklyget an overview of a digital library information space orto visualize the relationships between documents. Themap representation is intended to support browsing withan easily understandable interface that loads quickly onstandard Web browsers. Of course, a user should not befaced with an either/or choice between keyword search-ing and browsing; people often employ both strategieswhen engaged in information seeking, for example in thecommon “berry-picking” behaviour [6]. The implemen-tation of the SOM-enhanced Greenstone interface allowsthe user to move easily between keyword searching andSOM-based browsing.

5.1 Implementation and interface

As noted in Sect. 3, the Greenstone user interface hassignificant limitations: searching and browsing are con-ducted in separate screens, and it is difficult to movebetween the two activities. Additionally, subject brows-ing is based on metadata supplied by the collectioncreator/maintainer; there is no built-in mechanism forautomatically clustering “like” documents together, andso subject browsing is not supported in the absence ofsubject metadata.The SOMLib digital library system [14] uses the

GHSOM algorithm [37] to create a hierarchical, content-based organization of document collections to aid brows-ing. SOMLib provides two visualizations of this informa-tion: a text-based two-dimensional grid, and a graphics-based presentation of documents as “books” on shelves.SOMLib offers support for building browsing interfacesbut has no effective search facilities.

The browsing facility described in this section bringsthe two technologies, Greenstone and SOMLib, together.SOMLib is used to cluster the documents. The mapsfor a collection are then passed, together with the docu-ments themselves, to Greenstone for collection indexingand search interface construction. The maps are linked tothe “browse” button on the Greenstone toolbar by replac-ing the Greenstone browsing functions with instructionsto display the appropriate map file. A Greenstone “plug-in” automatically creates document metadata describingwhich lower-level map a particular document is in – andso each document is linked to a map that contains all theother documents that it has been clustered with.Figure 5 presents a top-level map for a sample col-

lection, and Fig. 6 illustrates the display of a lower-levelmap. Context information is given at the top of eachmap,describing which level of the hierarchy is represented (the“layer”) and how many documents are reachable “below”this map. Navigation up and down the layers is accom-plished through either the “Up/Down one level” links orby the “View all documents” link.Each cell in the 2D grid represents either a single

document or a group of documents. The contents of eachcell are described by a list of automatically extractedkeywords that are common to the documents clusteredtogether in the cell. The most heavily weighted of thesekeywords is displayed bolded at the top of the cell; forgreater readability, the remaining keywords are listed inalphabetic order.The map can be reached through the keyword search

facility of Greenstone; search result lists include a “Viewall related documents” link with each document. Clickingon this link leads to the map containing that document,with the original document highlighted within its grid.The user now has the opportunity to continue browsing

Fig. 5. Portion of a top-level collection map

Page 9: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 291

Fig. 6. A bottom layer map

using the map interface, go back to the linear search list,or view selected document(s) through the grid. The pres-ence of this option is key to interleaving browsing andsearching.The SOM clustering is visualized using an enhanced

version of SOMLib’s grid interface rather than the graph-ical bookshelf display. The grid interface was appealingbecause of its greater efficiency – the user is not faced withthe annoying time lags common with graphics-intensivevisualizations. The text-based interface also has the ad-vantage of using only HTML in its display, so that theuser is not required to download plugins or applets. Ad-ditionally, there is no experimental evidence to show thata detailed 3D presentation improves the effectiveness ofbrowsing – and given that the 3D bookshelf view is moredifficult for a reader to scan than the primarily text-basedgrid display, the 3D display is likely to make browsingless, rather than more, efficient.The original SOMLib grid was tailored to the task

of browsing document collections to improve its read-ability and aesthetic appeal: the maps are now centredon the page, cell borders and spacing have been alteredto provide greater definition between cells, colours arenow used consistently and indicate depth within the map,keywords are sorted alphabetically, SOMLib parametervalues have been removed from the display and largerfont sizes and bolding are used for drawing the eye tocell labels. Navigation has been improved by providingan option to view all documents mapped to a cell with-out forcing the user to navigate down to the lowest levelsof the hierarchy and by adding a count of the total num-ber of documents accessible through each unit below theunit’s label (to give a sense of the size of the cluster yet tobe explored).

Note that this map is of the entire collection contentsand not of the results of a search. Creating a SOM ofsearch results would take varying – and possibly unac-ceptably long – amounts of time depending on the num-ber of hits returned by a search, and users dislike variableresponse times [39]. Users of search engines and digitallibraries also rarely look beyond the first page or two ofsearch results [24] and prefer ranked listings of results;there is scant evidence that visualizations of search re-sult distributions are a desired feature in a digital library.For these reasons, we focused on providing a visualiza-tion of the collection as a whole. The ability to movefrom a search result to its associated SOM cell supportsserendipity in browsing, as the SOM brings to the user’sattention documents that are similar to the initial searchhit but that may not share the search terms entered bythe user.

5.2 Usability evaluation

Two user studies were conducted: the first to investigatewhether the SOM map display is likely to fit comfortablywith the way potential users would wish to organize docu-ments (Sect. 5.2.1) and the second to provide usabilityfeedback on the SOM interface (Sect. 5.2.2).

5.2.1 Acceptability of the SOM visualization

The first user study was a card-sorting exercise similarto the study described in Sect. 4.2.2 – participants sorteddocument surrogates into clusters. Here, however, thefocus was on spatial layout of the groups and whetherthe SOM collection visualization reflects the way peoplegroup documents spatially.Twenty participants were recruited from the local

computer science students and faculty. A set of 20 lam-inated cards was prepared, each containing a summaryof a science article downloaded from the Google direc-tory (http://directory.google.com). Each summaryincluded the article’s title, a set of keywords taken fromthe document and an introductory (summative) para-

Fig. 7. A sample card used in the SOM card-sorting study

Page 10: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

292 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

graph from the document (Fig. 7). The cards were num-bered randomly, and the stack of cards received by eachparticipant was ordered in exactly the same way. Thecard number was used at the end of each session to recordwhere the participant placed each of the cards.Each participant was requested to imagine that they

had just finished gathering documents of personal inter-est from different fields of science, represented by the 20cards, and now they want to organize the documents so asto make them easy to work with when they are handlednext. Participants spent half an hour, on average, group-ing the cards.Upon completion of the task, a photo was taken of the

final arrangement chosen for the cards. Each participantwas then asked to explain the document grouping thatthey had created (a retrospective verbal protocol [41]).All 20 participants placed the cards in “the most con-

venient place to reach” (participant Q). Sixteen of theparticipants chose to place their cards in horizontally ar-ranged groups, even when not constrained by the area inwhich they were to place the cards. Two of the partici-pants (B and N) were constrained by the area in whichthey were performing the task and so opted for a morevertical layout. The remaining two participants (I and S)preferred a vertical layout for the cards.All participants reported that they had attempted

to group the cards “by major topic” (participant J),where this subject was self-perceived (sample picturesof participants groupings are shown in Fig. 8). Fourteenparticipants grouped the cards with the “more general[card] on top, then move to more depth about the top-ic” (participant C), starting with “big things, then it getssmaller” (participant K). That is, they liked going froma “larger perspective then down to the more specific”(participant M). Participant B remarked that they pre-ferred “putting the documents in the order that I woulduse them”, but that they had “no sub-sorting unless thegroups are quite big”. Size of group was a consideration in

Fig. 8. Sample photos of final card layouts for participants in the SOM card-sorting study

constructing the clusters: participant F stated that theyhad only “broken down [groups] into sub-groups if [they]got more than 5 cards” in a group, and two participantsstated that they “don’t like having single items in a group(participant B) because they “don’t like seeing one thingby itself” (participant F).Eight participants chose a spatial layout that at-

tempted to show a relationship between different groupsby placing “similar kind[s] of things together” (partic-ipant S) and using “the gaps in between things” to“represent relativity” (participant H). Participant Q cor-roborated that sometimes, wherever possible, “the spacesshow how related the documents or groups are” and that“related documents in a topic are arranged vertically”.Participant O attempted to “establish indirect links be-tween items” because “forming relationships may helpremembering information”. The remaining participantspreferred to have their cards “plonked randomly” (par-ticipant N) wherever they found room for them. Once thecards were arranged, the minimal amount of informationthat each participant wanted to see in each card was thetitle, or if only the top card of a group could be seen, thenall of the information on that card.In summary, the layout employed by all of the par-

ticipants consisted of discrete clusters, and most of theparticipants arranged these clusters in a predominantlyhorizontal display. Some of the participants attempted touse the space between the various clusters to represent re-lationships between the clusters, where the larger the gapbetween two clusters, the less the clusters were seen to berelated. Most of the participants also tried to create anordering of the cards within each cluster. All of the par-ticipants placed the groups within easy sight and reach.None of the participants employed a linear layout, andonly one participant used an alphabetic ordering as thepredominant arrangement.Consider the way results of the SOM are visualized in

a two-dimensional, map-like layout. Related documents

Page 11: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 293

are grouped together, and related groups are presented inadjacent cells. The participants organized their cards sim-ilarly. Most participants employed a predominantly ho-rizontal spatial layout; the SOM display parameters wereset to expand more horizontally than vertically to accom-modate this observation. Most of the participants createda hierarchy within groups either by stacking the cards ina general-to-specific structure or by creating sub-groupsordered in a similar general-to-specificmanner. This hier-archical structure is accommodated by the SOM layers.Some of the participants commented that they do not liketraversing through more than approximately three levelsin a hierarchy. Taking this into account, the number oflayers in the SOM display has been limited to three.

5.2.2 Usability study of SOM display

A usability study was conducted that focused on hownew users interpret the map display. Given that digitallibrary users generally resist reading interface instruc-tions or help files, users must be able to quickly deducethe meaning of this relatively novel interface. As recom-mended by [30], this was a small-scale usability study,with five participants. (Using more than five participantsin an initial usability study tends to be wasteful, as thesame problems are identified over and over again.) Theparticipants were chosen from the pool of participantswho took part in the study described in Sect. 5.2.1. Theparticipants were each given two tasks. In the first, theywere asked to examine the SOM display and to give theiropinion as to the functions and meaning of the variousscreen elements and to state what they believed wouldhappen if they clicked on the various links and buttonson the page. Written notes were taken of their responses(a concurrent verbal protocol [41]). In the second task,participants were asked to use the SOM-based brows-ing interface to find a set of documents and to “thinkaloud” as they performed the task (concurrent verbal pro-tocol [41]). After this second task was completed, the par-ticipants were de-briefed about their opinions of the SOMdisplay.The participants pointed out shortcomings in the la-

bels and keywords generated by the SOM: participantsfound them to be very “unhelpful and confusing” (partic-ipant iii) because “they are the same in different units”(participant iv) and did not “match the documents theyreferenced” (participant i). All participants were con-fused when the SOM generated the same label for dif-ferent units. Participant i added that “the labels are notdescriptive enough”, and participant iii concurred thatthey want the labels to be an “accurate phrase that sum-marizes the document”. These comments point to anopen problem in SOM research: the difficulty of automat-ically generating human-comprehensible labels for SOMdocument clusters [10].The participants were not able to establish whether

there was a relationship between units that were placed

in close proximity to each other – mainly because the la-bels of neighbouring cells did not make this relationshipclear. The participants therefore decided that the unitswere placed randomly because units with “similar labelsare not placed together” (participant i).All participants could understand what the different

coloured units meant; participant i stated that they “un-derstood the set-up very quickly because [the interface]has organized everything for me”. Apart from dissatisfac-tion with the keyword labels, all of the participants foundthe other parts of the map interface easy to understand.Some expressed concern that they “did not immediatelyunderstand what ‘layer’ or ‘level’ meant” (participant v),but it was “easy to make out after exploring the links”(participant iv).All the participants commented that they wanted

more information about where they were in the hierar-chy to be indicated within each layer of the map; thisis an opportunity for future development of the inter-face. All of the participants liked the fact that they wereallowed to “view all documents” that can be accessedthrough each unit with one click. They suggested thatthis option should be included with the layer 1 map aswell.The results of the usability study indicate that,

though participants had difficulty interpreting the au-tomatically generated keyword labels, the SOM-basedinterface provided them with a “good overview” (partic-ipant iv) of the documents in the collection. The partic-ipants also felt that the interface was “easy to follow”(participant v) “once you played around with it a bit”(participant i). While the relationships between clusterswere not generally understood – that is that neighbour-ing clusters are more similar than distant clusters – it wasclear that documents within a cluster were in some waysimilar. All participants liked the fact that the interfacedisplayed quickly and that it allowed them to get a quickoverview of the topics represented.

6 Adding a thesaurus toGreenstone searching/browsing

One of the issues that users face when creating andrefining queries is the “vocabulary problem”, which oc-curs when the terms selected by users do not matchthe terms used in the document collection or its meta-data [8, 18]. One useful tool for query constructionis a subject thesaurus describing the vocabulary ofthe collection’s domain and the relationships (typic-ally broader than BT , narrower than NT and relatedto RT ) among the thesaurus terms. In practice, how-ever, thesauri have been little used in searching; subjectthesauri are generally not readily available in digital for-mat or, if they are available online, are not integratedinto the searching/browsing facilities of the documentcollection.

Page 12: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

294 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

6.1 Implementation and interface

A prototype system was developed integrating theAGROVOC [17] subject thesaurus with a Greenstonesearching interface. AGROVOC is a multilingual, con-trolled vocabulary thesaurus developed by the FAO tocover concepts and terminology in agriculture, forestry,fisheries, food, and related domains. It was chosen forexperimentation because it is of a size large enough tobe challenging for the user (over 16 500 descriptors) andits structure is the common broader/narrower/related-term organization, so it can be used without sacrificinggenerality of approach in interface design. Additionally,the researchers had access to both a digital version ofAGROVOC and a document collection compiled by theFAO in the subject area of the thesaurus. Note thata generic thesaurus such as WordNet could also be in-corporated into Greenstone using the same interface de-scribed in this section; the appeal of AGROVOC is that itwill allow us in the future to investigate subject thesaurususe in searching.The Greenstone search interface is augmented by

suggesting potentially useful thesaurus terms throughthe search page (Fig. 9) and by adding a thesaurus tabto support exploration of the thesaurus structure itself(Fig. 10).In Fig. 9, the user has used a term matched by only

one document in the collection and may wish to consideradditional terms from the thesaurus – presented in the“thesaurus quicklist”. Ticking a quicklist term adds itto the terms in the search box. The quicklist terms arein a broader/narrower/related-termrelationship with the

Fig. 9. The thesaurus-enhanced Greenstone search interface

Fig. 10. Searching and browsing the thesaurus

query. If more than one query term is present, then thequicklist contains up to three thesaurus terms for eachquery term; this limits the total number of thesaurusterms appearing on the search page so as to avoid over-whelming the user with alternatives and to preserve theprimacy of the search and hitlist on this page.Thesaurus terms presented out of context of the the-

saurus structure may be less useful to some users thanif they were viewed imbedded in the thesaurus. Further,viewing and searching the thesaurus directly can be use-ful in familiarizing a user with the subject domain of thedocument collection. To support exploration of the the-saurus, users can browse it directly through the thesaurustab (Fig. 10). “Child” nodes in the hierarchy can be hid-den or expanded in the conventional manner by clickingon the parent term’s +/− icon. The user can quickly fo-cus on a specific portion of the thesaurus through “Searchwithin thesaurus for related terms”.

6.2 Usability evaluation

The thesaurus/search engine interface for Greenstone un-derwent a total of four usability studies before, after andduring its design and implementation.

6.2.1 Establishing guidelines for the initial design

An initial user study provided an indication of preferredinteraction style when searching using the thesaurus. Theuser study evaluated five thesaurus/search engine pro-totype interfaces representing the range of interface op-tions suggested by a review of the research literature andcommercial products. Interfaces I and II provide a single

Page 13: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 295

window to access both the thesaurus and a search en-gine. In interfaces III, IV and V, the search engine andthesaurus are treated independently in separate windows.Interface IV uses an interactive, graphic representation ofthe thesaurus, interface V presents the thesaurus as a hi-erarchy, and interfaces I, II and III use an alphabetic listdisplay of thesaurus terms. In interfaces III, IV and V theuser manually searches the thesaurus for terms of inter-est; the thesaurus is searched automatically in interface IIwhen the user enters a search term, and in interface I thesystem unobtrusively suggests additional search termsfrom the thesaurus.Eight computer science students were recruited as

participants for the study. Each participant performeda search on three (randomly selected) interface proto-types and was instructed to “think aloud” while searching(concurrent verbal protocol [41]). Participants’ commentswere recorded on video and their screens datalogged. Oncompletion of the session the participants were encour-aged to summarize their reactions to the three interfaces(retrospective verbal protocol [41]).A common theme from the participants was a dislike

of slow interfaces (interfaces I and II elicited this reactionmost strongly). Slow response time frustrated, distractedand sometimes distressed the participants. When partici-pant 6 was using interface II, for example, it did not seemto respond to his (repeated) clicks. He quickly became ag-itated, and soon decided that he’d had enough of thatinterface.Participants reacted negatively to large amounts of in-

formation displayed on a single page, confirming earlierresearch indicating that users feel uncomfortable whenconfronted with a “busy” page [19, 42]. Reading and ex-tracting useful information becomes a daunting chore,with one user aptly pointing out that the information hewas searching for was approximately 1 cm, on a page witha scrollbar that was 3 mm.Users do not like switching between independent win-

dows and applications when searching [35]. Participantsused the thesaurus less frequently when it was in a sepa-rate window and found the multiple independent windowinterfaces awkward to use. As an example, participant 6spent as much time rearranging two windows to fit com-fortably in the screen as he did searching the collection.One participant confirmed that “you’re more inclined touse it if it’s on the same screen . . . ”.Participants preferred that the thesaurus act semi-

automatically (that is, that it suggested search terms)rather than automatically inserting thesaurus terms intothe search or forcing the user to manually search thethesaurus for terms of interest. The semi-automated the-saurus is “sort of like when you spell things wrong inGoogle, it says ‘did you mean?’”; participant 4 notedthat “ . . . it was good bringing up synonyms wheneverI started a new search . . . ”.The visual thesaurus (interface IV) was entertaining

– participants spent time playing and having fun with it.

While they enjoyed using it, they felt that in the longterm the novelty would wear off and that it would becomeannoying: “ . . . the visual aspect helped a little, but theproliferation of concepts was too much . . . ”; “the factthat the thesaurus was in a different window and it wasjust so darn cool distracted me quite a bit . . . ”.Participants wished to see thesaurus term relation-

ship indications (that is BT, NT, RT) when browsing thethesaurus. They appreciated that they could take in therelationship between a series of terms at a glance.The study did not provide any indication of whether

users felt comfortable opening documents in the same orin new windows. As the goal of this research was to imple-ment a thesaurus searching/browsing interface in Green-stone, the Greenstone default was followed by openinglinks in the same window.

6.2.2 Testing initial design concepts

Having elicited an initial set of interface guidelines, thenext stage in the development process was iterative rede-velopment of the interface design [31]. The results fromthe user study of Sect. 6.2.2 were used to design four vi-sually distinct paper prototypes. These paper prototypeswere evaluated by three local experts in HCI and digitallibraries, in sessions totalling 5 h. Expert evaluation haseconomic advantages over other methods for detecting us-ability problems (for example full-scale usability tests).Nielsen points out that “single” experts are likely to findover 40% of usability problems, but “double experts” –those with “expertise in both usability in general andthe kind of interface being evaluated” – are likely to finda higher proportion of the problems [31]. On the basis ofthis analysis one of the paper prototypes was selected forfurther development in a low-fidelity, screen-based pro-totype; this interface prototype was subjected to furtherevaluation by the three local HCI/DL experts, and ontheir suggestions minor alterations were made to the in-terface’s appearance and functionality.At this point, six students in computer science were re-

cruited to participate in a usability study of the modified,low-fidelity prototype. Participants were asked to “thinkaloud” as they performed a walk-through of the proto-type (concurrent verbal protocol [41]); after completingthis walk-through they were queried about the system’sexpected reaction to their actions. Written notes weretaken of the think-aloud sessions and the de-briefings(retrospective verbal protocol [41]).Participants recognized the “search” portion of the

interface and (correctly) felt confident that it would re-turn a list of documents. The high level of familiaritythey felt seemed to give participants a confidence boostwhen it came to dealing with novel search elements suchas the list of alternative search terms suggested by thethesaurus. This assurance vanished when the participantsencountered the thesaurus portion of the interface. Whenqueried about the thesaurus, participants gave tentative

Page 14: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

296 D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

guesses involving synonyms, antonyms and related docu-ments. This uncertainty is likely due to a lack of familiar-ity with subject thesauri.One participant brought up the issue of ambiguous-

term entry (that is entering unrelated terms such as “pig”and “speaker” in a query). Unrelated-term entry is akinto entering ambiguous terms [45]. The brute force methodfor dealing with ambiguous-term entry would be to sim-ply display a list of all results found for all terms entered,but this runs the risk of information overload [42]. Al-ternatively, the system could display the results of am-biguous entry in “categories”. User studies performedby Dumais et al. [16] show that category interfaces aremore effective than ranked list displays; this approachwas adopted in the refined prototype design.

6.2.3 Expert evaluationof the implemented prototype

The design that emerged from the low-fidelity testing ofSect. 6.2.2 was implemented, and an expert evaluationwas conducted to discover the usability problems of its in-terface. Two local faculty members with extensive know-ledge of HCI and experience in the design and evaluationof searching interfaces performed the evaluation. Writ-ten notes were taken of their analysis. It was noted byboth experts that the thesaurus can return a substantialnumber of terms, possibly causing information overload.Two solutions they proposed included the elimination ofduplicate terms and terms that are not present in anydocuments in the collection, asking, “ . . . is it useful tohave them?”Interestingly, during the course of the expert evalua-

tions, an expert queried whether or not the thesaurus hi-erarchy should be search oriented or not, whether search-ing in the thesaurus should help users find documents (byshowing document lists, for example) or concentrate onshowing relationships between terms. How the thesauruswas going to be used was important, but he felt that it wasmore important to focus upon document-oriented users(i.e. those who use the system to help them improve theirsearch and find useful documents).

6.2.4 Usability evaluation of the refined browser

The final user study conducted investigated the pre-dictability and learnability of the implemented system, asmodified from suggestions raised by the expert evaluation(Sect. 6.2.3). It was also important to see how novice users(i.e. those who do not necessarily use a thesaurus) reactedto the presence of the thesaurus descriptor key (“BT”,“NT”, etc.). Five local computer science students were re-cruited as participants. Participants viewed (but did notinteract with) the thesaurus searching/browsing screensand were then asked to predict the system’s response topotential user actions (i.e. “What do you think will hap-pen if you click there?”). Subjects were then asked to

interact with the system while “thinking aloud” (concur-rent verbal protocol [41]). Written notes were taken ofeach session.This study indicated a high level of user uncertainty

when dealing with selecting quicklist terms – possiblyindicating a need for some more immediate documenta-tion regarding this feature. High uncertainty levels whendealing with the thesaurus were present in this study, al-though users did note that they could easily get a feel forhow to use it (“ . . . the quicklist stuff isn’t really obvi-ous, but once you start playing with it, you can quicklysee reactions and understand how it works . . . ”).Overall, the system received positive reactions from

both experts and users alike regarding its learnability andpredictability. The interface received positive commentsabout its fast reaction time, something that is vitally im-portant to any Web-based service [23].

7 Conclusions

Three enhancements have been created to the Green-stone software, designed with human information-seekingneeds and the failures of the old system in mind. Thesenew systems allow users to combine searching and brows-ing, in keeping with both the literature on information-seeking behaviour and the user experiments carried outas part of the work done for this investigation. Theseenhancements allow the user more flexibility in deter-mining the way in which they browse, which is also inkeeping with experimental results and information seek-ing literature: users can browse by more than one typeof metadata and select the metadata by which they wishto browse (Sect. 4); in collections lacking subject classifi-cation, metadata users can browse clusters of documentsgrouped automatically into a SOM structure (Sect. 5),and if a subject thesaurus is available for a collection,users can receive support from the thesaurus in query re-finement and search and browse the thesaurus itself tofamiliarize themselves with the subject vocabulary andstructure (Sect. 6). These new browsing mechanisms area vast improvement over the old Greenstone classifier sys-tem, making information seeking easier.

References

1. The Greenstone Software (2003)http://www.greenstone.org [accessed 6 March 2003]

2. The New Zealand Digital Library (2003)http://www.nzdl.org [accessed 6 March 2003]

3. Agosto D (2002) Bounded rationality and satisficing in youngpeople’s Web-Based decision making. J Am Soc Inf Sci Tech-nol 53(1):16–27

4. Bainbridge D, Mckay D, Witten I (2001) The Greenstone De-veloper’s Guide. Department of Computer Science, Universityof Waikato. http://www.greenstone.org [accessed 6 March2003]

5. Baldonado MQ (2000) Wang. A user-centered interface for in-formation exploration in a heterogeneous digital library. J AmSoc Inf Sci Technol 51(3):297–310

Page 15: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone

D. McKay et al.: Enhanced browsing in digital libraries: three new approaches to browsing in Greenstone 297

6. Bates MJ (1989) The design of browsing and berrypick-ing techniques for the online search interface. Online Rev13(5):407–424

7. Beaulieu M (2000) Interaction in information searching andretrieval. J Document 56(4)431–439

8. Borgman C (1996) Why are online catalogs still hard to use?J Am Soc Inf Sci Technol 47(7):493–503

9. Chang S-J, Rice RE (1993) Browsing: a multidimensionalframework. Annu Rev Inf Sci Technol 28:231–276

10. Chen H, Houston AL, Sewell RR, Schatz BR (1998) Internetbrowsing and searching: user evaluations of category map andconcept space techniques. J Am Soc Inf Sci Technol 49(7):582–603

11. Crabtree A, Twidale MB, O’Brien J, Nichols DM (1997) Talk-ing in the library: implications for the design of digital li-braries. In: Proc. 2nd ACM international conference on digitallibraries, Philadelphia, pp 221–228

12. Cunningham SJ, Knowles C, Reeves N (2001) An ethno-graphic study of technical support workers: why we did-n’t build a tech support digital library. In: Proc. 1st jointACM/IEEE-CS conference on digital libraries, Roanoke, VA,pp 189–198

13. Davis FD (1989) Perceived usefulness, perceived ease ofuse, and user acceptance of information technology. MIS Q13(3):319–340

14. Dittenbach M, Merkl D, Rauber A (2000) Using growing hier-archical self-organizing maps for document classification. In:Proc. 8th European symposium on artificial neural networks.Bruges, Belgium, 2000.http://www.dice.ucl.ac.be/Proceedings/esann/esannpdf/es2000-39.pdf [accessed 25 July 2004]

15. Dix A, Finaly J, Abowd G, Beale R (1997) Human-computerinteraction, 2nd edn. Prentice-Hall, Glasgow, p 163

16. Dumais S, Cutrell E, Chen H (2001) Optimizing search byshowing results in context. CHI 2001, pp 277–284

17. Food and Agricultural Organization of the United Na-tions (1998) AGROVOC: Multilingual agricultural thesaurus.http://www.fao.org/agrovoc/

18. Furnas G, Landauer T, Gomez L, Dumais S (1987) The vo-cabulary problem in human-system communication. CommunACM 30(11):964–971

19. Hancock-Beauliu M (1993) A comparative transaction log an-alysis of browsing and search formulation in online catalogues.Program 27(3):269–280

20. Henninger S, Belkin N (1995) Interface issues and interactionstrategies for information retrieval systems. In: CHI Compan-ion ’95, Denver, CO, pp 401–402

21. Institute of Electrical and Electronics Engineers (1990) IEEEstandard computer dictionary: a compilation of IEEE stan-dard computer glossaries. IEEE Press, New York

22. Jacso P (1999) Savvy searching starts with browsing. OnlineCD-ROM Rev 23(3):169–172

23. Johnson J (2000) GUI bloopers: don’ts and do’s for softwaredevelopers and Web designers. Morgan Kaufmann, San Fran-cisco

24. Jones S, Cunningham SJ, McNab R (1998) An analysis of us-age of a digital library. In: Proc. 2nd European conference onresearch and advanced technology for digital libraries, Herak-lion, Greece, pp 261–277

25. Knepshield P, Savage A, Belkin N (1999) Interaction in infor-mation retrieval: trends over time. J Am Soc Inf Sci Technol50(12):1067–1082

26. Kuhlthau C (1991) Inside the search process: informationseeking from the user’s perspective. J Am Soc Inf Sci Technol42(5):361–371

27. Lawrence S, Lee GC (1998) Context and page analysis for im-proved Web search. IEEE Internet Comput 2(4):38–46

28. Lazar J, Bessiere K, Ceaparu I, Shneiderman B (2003) Help!I’m lost: user frustration in Web navigation. IT Soc 3(1):18–26

29. Marchionini G (1995) Information seeking in electronic envi-ronments. Cambridge University Press, New York

30. Nielsen J, Landauer TK (1993) A mathematical model of thefinding of usability problems. In: Proc. ACM INTERCHI’93,Amsterdam, pp 206–213

31. Nielsen J (1994) Usability engineering. Morgan Kaufmann,San Francisco

32. Nielsen J (1997) The changes in Web usability since 1994.Alertbox 1 December.http://www.useit.com/alertbox/9712a.html[accessed March 6 2003]

33. Nielsen J (1999) The top ten new mistakes of Webpage design.Alertbox 30 May.http://www.useit.com/alertbox/990350.html[accessed March 6 2003]

34. Nordlie R (1999) “User Revealment” – a comparison of initialqueries and ensuing question development in online searchingand human reference interactions. In: Proc. 22nd annual inter-national ACM SIGIR conference on research and developmentin information retrieval, Berkeley, CA, pp 11–18

35. North C, Shneiderman B (1997) A taxonomy of multiple win-dow coordinations. Department of Computer Science Tehc-nical Report #CS-TR-3854, University of Maryland, CollegePark, MD

36. Paynter G, Witten I, Cunningham SJ, Buchanan G (2000)Scalable browsing for large collections: a case study. In: Proc.5th ACM conference on digital libraries, San Antonio, TX,pp 215–218

37. Rauber A, Merkl D (1999) The SOMLib digital library sys-tem. In: Proc. 3rd European conference on research and ad-vanced technology for digital libraries, Paris, pp 323–342

38. Salampasis M, Tait J, Bloor C (1998) Evaluation of infor-mation seeking performance in hypermedia digital libraries.Interact Comput 10:269–284

39. Shneiderman B (1984) Response time and display rate inhuman performance with computers. ACM Comput Surv16(3):265–285

40. Spink A, Bateman J, Jansen BJ (1999) Searching the Web:a survey of excite users. Internet Res Electron Netw Appl Pol-icy 9(2):117–128

41. Todd P, Benbasat I (1987) Process tracing methods in deci-sion support systems research: exploring the black box. MIS Q11(4):492–512

42. Theng YL, Jones M, Thimbleby H (1995) Reducing informa-tion overload: a comparative study of hypertext systems. IEEDigest 95/223: Colloquium on information overload, London,6/1–6/5

43. Witten I, McNab R (1994) The New Zealand Digital Library:collections and experience. Electron Libr 15(6):495–504

44. Witten I, Moffat A, Bell T (1994) Managing gigabytes: com-pressing and indexing documents and images. Van NostrandReinhold, New York

45. Xu J, Croft W (2000) Improving the effectiveness of informa-tion retrieval with local context analysis. ACM Trans Inf Syst18(1):79–112