exploratory search: from finding to understandingricci/isr/papers/p41-marchionini.pdf ·...

6
COMMUNICATIONS OF THE ACM April 2006/Vol. 49, No. 4 41 EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search has been a fundamental application that has driven research and development. For example, a paper published in the inaugural year of the IBM Journal 36 years ago out- lined challenges of text retrieval that continue to the present [4]. Today’s data storage and retrieval applications range from database systems that manage the bulk of the world’s structured data to Web search engines that provide access to petabytes of text and multimedia data. As computers have become consumer products and the Internet has become a mass medium, searching the Web has become a daily activity for everyone from children to research scientists. By Gary Marchionini F Research tools critical for exploratory search success involve the creation of new interfaces that move the process beyond predictable fact retrieval.

Upload: others

Post on 25-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory search: from finding to understandingricci/ISR/papers/p41-marchionini.pdf · EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search

COMMUNICATIONS OF THE ACM April 2006/Vol. 49, No. 4 41

EXPLORATORY SEARCH:

FROM FINDING TOUNDERSTANDING

rom the earliest days of computers, search has been afundamental application that has driven research anddevelopment. For example, a paper published in theinaugural year of the IBM Journal 36 years ago out-lined challenges of text retrieval that continue to thepresent [4]. Today’s data storage and retrievalapplications range from database systems thatmanage the bulk of the world’s structured datato Web search engines that provide access topetabytes of text and multimedia data. As

computers have become consumer products and theInternet has become a mass medium, searching theWeb has become a daily activity for everyone fromchildren to research scientists.

By Gary Marchionini

FResearch tools critical for exploratory search success involve the creation of new interfaces that move the process beyond predictable fact retrieval.

Page 2: Exploratory search: from finding to understandingricci/ISR/papers/p41-marchionini.pdf · EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search

42 April 2006/Vol. 49, No. 4 COMMUNICATIONS OF THE ACM

As people demand more of Web services, shortqueries typed into search boxes are not robust enoughto meet all of their demands. In studies of early hyper-text systems, we distinguished analytical search strate-gies that depend on a carefully planned series ofqueries posed with precise syntax from browsingstrategies that depend on on-the-fly selections [7].The Web has legitimized browsing strategies thatdepend on selection, navigation, and trial-and-errortactics, which in turn facilitate increasing expectationsto use the Web as a source for learning andexploratory discovery. This overall trend toward moreactive engagement in the search process leads theresearch and develop-ment community tocombine work inhuman-computer inter-action (HCI) and infor-mation retrieval (IR).This article distinguishesexploratory search thatblends querying andbrowsing strategies fromretrieval that is bestserved by analyticalstrategies, and illustratesinteractive IR practicesand trends with examples from two user interfacesthat support the full range of strategies.

Exploratory search. Search is a fundamental lifeactivity. All organisms seek sustenance and propaga-tion and Maslow’s classic hierarchy of needs theorypredicts that once people fulfill basic physiologicalneeds, we seek to fulfill social and psychological needsto belong and to know our world. These higher-levelneeds are often informational and this in turnexplains why information resources and communica-tion facilities are so sophisticated in developed soci-eties.

Ahierarchy of information needs mayalso be defined that ranges from basic facts that guideshort-term actions (for example, the predicted chancefor rain today to decide whether to bring an umbrella)to networks of related concepts that help us under-stand phenomena or execute complex activities (forexample, the relationships between bond prices andstock prices to manage a retirement portfolio) to com-plex networks of tacit and explicit knowledge thataccretes as expertise over a lifetime (for example, the

most promising paths of investigation for the sea-soned scholar or designer). For these respective layersof information needs, we can define kinds of infor-mation-seeking activities, each with associated strate-gies and tactics that might be supported withcomputational tools.

Figure 1 depicts three kinds of search activities thatwe label lookup, learn, and investigate; and highlightsexploratory search as especially pertinent to the learnand investigate activities.1 These activities are repre-sented as overlapping clouds because people mayengage in multiple kinds of search in parallel, andsome activities may be embedded in others; for exam-

ple, lookup activities areoften embedded in learnor investigate activities.The searcher views theseactivities as tasks, so weuse “task” in the followingdiscussion.

Lookup is the mostbasic kind of search taskand has been the focus ofdevelopment for databasemanagement systems andmuch of what Web searchengines support. Lookuptasks return discrete andwell-structured objects

such as numbers, names, short statements, or specificfiles of text or other media. Database managementsystems support fast and accurate data lookups inbusiness and industry; in journalism, lookups arerelated to questions of who, when, and where asopposed to what, how, and why questions. Inlibraries, lookups have been called “known item”searches to distinguish them from subject or topicalsearches.

Most people think of lookup searches as “factretrieval” or “question answering.” In general, lookuptasks are suited to analytical search strategies thatbegin with carefully specified queries and yield preciseresults with minimal need for result set examinationand item comparison. Clearly, lookup tasks have beenamong the most successful applications of computersand remain an active area of research and develop-ment. However, as the Web has become the informa-tion resource of first choice for information seekers,people expect it to serve other kinds of informationneeds and search engines must strive to provide ser-vices beyond lookup.

March fig 1 (4/06)- 26.5 picas

Figure 1. Search Activities.

March fig 1 (4/06) - 19.5 picas

InvestigateLearnLookup

Exploratory Search

Fact retrievalKnown item searchNavigationTransactionVerificationQuestion answering

Knowledge acquisitionComprehension/InterpretationComparisonAggregation/IntegrationSocialize

AccretionAnalysisExclusion/NegationSynthesisEvaluationDiscoveryPlanning/ForecastingTransformation

InvestigateLearnLookup

Exploratory Search

Fact retrievalKnown item searchNavigationTransactionVerificationQuestion answering

Knowledge acquisitionComprehension/InterpretationComparisonAggregation/IntegrationSocialize

AccretionAnalysisExclusion/NegationSynthesisEvaluationDiscoveryPlanning/ForecastingTransformation

Figure 1. Search activities.

1There are many important theoretical models of information search, for example,Saracevic summarizes Belkin’s and Ingrewsen’s in his stratified model [9].

Page 3: Exploratory search: from finding to understandingricci/ISR/papers/p41-marchionini.pdf · EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search

COMMUNICATIONS OF THE ACM April 2006/Vol. 49, No. 4 43

Searching to learn is increasingly viable as more pri-mary materials go online. Learning searches involvemultiple iterations and return sets of objects thatrequire cognitive processing and interpretation. Theseobjects may be instantiated in various media (graphs,or maps, texts, videos) and often require the informa-tion seeker to spend time scanning/viewing, compar-ing, and making qualitative judgments. Note that“learning” here is used in its general sense of develop-ing new knowledge and thus includes self-directedlife-long learning and professional learning as well asthe usual directed learning in schools. Using termi-nology from Bloom’s taxonomy of educational objec-tives, searches that support learning aim to achieve:knowledge acquisition, comprehension of concepts orskills, interpretation of ideas, and comparisons oraggregations of data and concepts.

Another important kind of search that falls underthe learn search activity is social searching where peo-ple aim to find communities of interest or discovernew friends in social network systems (for example,www.friendster.com). Although the motivations maybe distinct from other learning search examples, theexploratory strategies for locating, comparing, andassessing results are similar. Much of the search timein learning search tasks is devoted to examining andcomparing results and reformulating queries to dis-cover the boundaries of meaning for key concepts.Learning search tasks are best suited to combinationsof browsing and analytical strategies, with lookupsearches embedded to get one into the correct neigh-borhood for exploratory browsing.

Searches that support investigation involvemultiple iterations that take place over perhaps verylong periods of time and may return results that arecritically assessed before being integrated into per-sonal and professional knowledge bases. Investigativesearches aim to achieve Bloom’s highest-level objec-tives such as analysis, synthesis, and evaluation andrequire substantial extant knowledge. Such searchesoften include explicit evaluative annotation that alsobecomes part of the search results. Investigativesearching may be done to support planning and fore-casting, or to transform existing data into new data orknowledge. In addition to finding new information,investigative searches may seek to discover gaps inknowledge (for example, “negative search” [1]) so thatnew research can begin or dead-end alleys can beavoided. Investigative searches also include alerting

service profiles that are periodically and automaticallyexecuted.

Serendipitous browsing that is done to stimulateanalogical thinking is another kind of investigativesearch. Investigative searching is more concernedwith recall (maximizing the number of possibly rele-vant objects that are retrieved) than precision (mini-mizing the number of possibly irrelevant objects thatare retrieved) and thus not well supported by today’sWeb search engines that are highly tuned toward pre-cision in the first page of results. This explains why somany specialized search services are emerging to aug-ment general search engines. Because experts typicallyknow which information resources to use, they canformulate precise analytical queries but requiresophisticated browsing services that also provideannotation and result manipulation tools.

These distinctions among different types of searchactivities suggest that lookup searches lend themselvesto formalized turn-taking where the informationseeker poses a query and the system does the retrievaland returns results. Thus, the human and system taketurns in retrieving the best result. However, learningand investigative searching require strong human par-ticipation in a more continuous and exploratoryprocess.

To support the full range of search activities, the IRcommunity is turning increasingly to CHI develop-ments to discover ways to bring humans more activelyinto the search process. Rather than viewing thesearch problem as matching queries and documentsfor the purpose of ranking, interactive IR views thesearch problem from the vantage of an active humanwith information needs, information skills, powerfuldigital library resources situated in global and locallyconnected communities—all of which evolve overtime. The digital library resources are assumed toinclude dynamic contents such as other humans, sen-sors, and computational tools. In this view, the searchsystem designer aims to bring people more directlyinto the search process through highly interactive userinterfaces that continuously engage human controlover the information seeking process. Although this isan ambitious design goal, we are beginning to seesome progress in systems that are the forerunners tothe exploratory search engines that will evolve in theyears ahead.

TOWARD EXPLORATORY SEARCH SYSTEMS

Menus in restaurants serve the needs of both man-agement and diners. From the system point of view,menus scope the kinds of products and servicesavailable and thus optimize performance; and fromthe patron’s point of view they simplify selection and

Page 4: Exploratory search: from finding to understandingricci/ISR/papers/p41-marchionini.pdf · EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search

specification of gastronomical needs. In the com-puter industry, menus were the first kind of alterna-tive to command systems and remain an importantinteraction style for selection and browsing.Expandable hierarchical file structures are special-ized menus that serve as the mainstay of personalcomputing, cell phone, and PDAinterfaces.

Hypertext links in texts werecalled “embedded menus” by Shnei-derman [10] and current Web direc-tory structures (for example, OpenDirectory) represent sophisticatedmenu structures for finding infor-mation on Web pages. In the data-base realm, query-by-example(QBE) interfaces were early alterna-tives to formal language interfacesand QBE-like systems remain theprimary method for supportingnon-textual queries in multimediasystems. These interface designexperiences demonstrate the efficacyof selection as a form of query spec-ification, and inspire link navigation as a primary userinterface interaction style in the Web environment.

There is also substantial evidence in theIR literature that relevance feedback—asking informa-tion seekers to make relevance judgments aboutreturned objects and then executing a revised querybased on those judgments—is a powerful way toimprove retrieval. However, practice shows that peopleare often unwilling to take the added step to providefeedback when the search paradigm is the classic turn-taking model. To engage people more fully in thesearch process and put them in continuous control,researchers are devising highly interactive user inter-faces. Shneiderman and his colleagues created“dynamic query” interfaces [10] that use mouse actions

such as slider adjustments and brushing techniques topose queries and client-side processing to immediatelyupdate displays to engage information seekers in thesearch process. A number of prototypes (for example,Dynamic Home Finder, SpotFire, TreeMaps) havecome from these lines of research and development.

These techniques are especiallygood for exploration where high-

level overviews of a collection and rapid previews ofobjects help people to understand data structures andinfer relationships among concepts.

Other researchers have investigated these highlyinteractive interaction styles. Hearst and her col-leagues created a series of interfaces that tightly cou-ple queries to results, ranging from TileBars for textsearching [2] to Flamenco (see the sidebar in this sec-tion), a series of interfaces that provides hierarchical,faceted metadata as entry points for exploration andselection. Hearst and Pederson [3], and others (forexample, [11]) have used clustering of search resultsto make search more interactive, as represented bycurrent Web search alternatives such as Clusty(clusty.com) that aim to provide groups of results thatcan be used to further search. Fox et al., schraefel etal., and Cutrell and Dumais offer other examples in

44 April 2006/Vol. 49, No. 4 COMMUNICATIONS OF THE ACM

Figure 2. Open Videopreview display for aspecific video.

Exploratory search makes us all pioneers and adventurers in a new world of information riches awaiting discoveryalong with new pitfalls and costs.

Page 5: Exploratory search: from finding to understandingricci/ISR/papers/p41-marchionini.pdf · EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search

this section of blending HCI and IR to supportexploratory search. Our work at the University ofNorth Carolina parallels these efforts and two exam-ple search systems that support exploratory search areillustrated here.

OPEN VIDEO EXAMPLES

The Open Video Digital Library (www.open-video.org) aims to give people agile views of digitalvideo files [6]. The Web-based interface provides anumber of alternative ways to slice and dice the videocorpus so that people can see what is in the collection(overview) and determine greater details about avideo segment (preview) before downloading it.There are different kinds of surrogates provided,including textual and visual representations and sev-eral layers of detail and alternative display options togive people good control. The user interface wasdesigned to optimize agile exploration before down-loading while allowing standard text-based search.

A number of user studies were conducted to deter-mine which surrogates are effective and what parametersto use as defaults. This interface has proven to be quiteeffective over the past few years as thousands of usersaccess the corpus each month to find videos for educa-tional and research purposes. The home page provides atypical search form but also partitions the video collec-tion in various ways so that people can select a specificpartition to explore. Result set pages provide alternatives

for what is displayed (formats andlevel of text and visual detail) and howthe results are ordered (relevance, title,duration, date, popularity).

Figure 2 shows a preview for avideo with textual metadata and upto three kinds of visual surrogate(storyboard, fast forward, excerpt).The searcher may get more detailsby selecting the visual surrogate ordownload a video file in a format oftheir choice. The Open Video searchsystem is meant to put people incontrol and support exploration aswell as lookup. Our transaction logsindicate that half of the searchesconducted begin with keywordstrategies (analytical strategies) andthe remainder begin with partitionselection (browsing strategies).

As part of our efforts to develop highlyinteractive UIs that support exploratory search forgovernment statistical Web sites, we developed a gen-eral-purpose interface called the Relation Browser(RB) that can be applied to a variety of data sets [5].The RB aims to facilitate exploration of the relation-ships between (among) different data facets, displayalternative partitions of the database with mouseactions, and serve as an alternative to existing searchand navigation tools. RB provides searchers with asmall number of facets such as topic, time, space, ordata format; each of which is limited to a small num-ber of attributes that will fit on the screen, simplemouse-brushing capabilities to explore relationshipsamong the facets and attributes; and immediateresults displays that dynamically change as brushingcontinues. Figure 3 illustrates how the RB works for adatabase such as the Open Video DL. Panel 3a depictsa portion of the RB at startup with the mouse posi-tioned over the Educational category in the genrefacet. The number of videos in the library in each ofthe facet-categories is immediately shown along witha set of bars that show the distribution visually. Thus,simply moving the mouse partitions the full corpus

COMMUNICATIONS OF THE ACM April 2006/Vol. 49, No. 4 45

Figure 3. (a) Relation browser interface forOpen Video Library with mouse over the education facet; (b) Relation browser displayafter educational and Spanish selected, mouseover fourth title.

(a)

(b)

Page 6: Exploratory search: from finding to understandingricci/ISR/papers/p41-marchionini.pdf · EXPLORATORY SEARCH: FROM FINDING TO UNDERSTANDING rom the earliest days of computers, search

into a view of the educational items. Clicking themouse freezes this partition and allows continuedbrowsing or retrieval of the partition from the server.

Panel 3b shows a portion of the display after theuser has selected the Spanish language categorywithin the educational partition and then clicked onthe Search button. The display shows the number ofitems in each facet-category for the 41 videos in theresult set in the upper panel and the titles, keywords,and producing agent for the videos in the bottompanel with additional metadata available on mouse-over. These items are hot linked to the Open VideoDL. String search within the results fields is also sup-ported and all results panel and query panel displaysare coordinated to update in parallel when any mouseor keyboard action is executed.

The RB has been instantiated for dozens of data-bases, including several U.S. federal statistical agencyWeb sites. RB was designed to facilitate explorationand is less direct for simple lookup tasks than forexploratory tasks. Our user studies have demonstratedits efficacy when compared to standard Web-basedretrieval. To support the dynamics, the metadata andquery results must be available on the client side, thuslimiting scalability to databases of roughly tens ofthousands of items. We see this specialized kind ofinterface as an augmentation of today’s powerfullookup engines. The RB could be used as a tool forexploring very large databases where the results are notindividual items but subcollections or portals. Alterna-tively, the RB may be used after a standard Web searchhas been executed to investigate the result set if on-the-fly automatic classification is used.

CONCLUSION

It is clear that better tools to support exploratorysearching are needed. Oblinger and Oblinger [8]argue the “Net generation” (those who learned toread after the Web) are qualitatively different intheir informational behaviors and expectations; theymultitask and expect their informational resourcesto be electronic and dynamic. The Net generationwill expect to be able to use Web resources to con-duct lookup, learning, and investigative tasks withfluid user interfaces.

As people spend more time online, not only willthey increase their expectations about informationtools and content, but there are more opportunitiesfor mining their behavior patterns and applyingadversarial computing that tries to take advantage ofsystem and user behaviors. Exploratory search makesus all pioneers and adventurers in a new world ofinformation riches awaiting discovery along with newpitfalls and costs.

Today, executing a query in a Web search enginenot only returns results but targets the searcher formany kinds of presumably related opportunities andservices. Exploratory search will exacerbate this trendas more user interaction data will be available for min-ing and analysis. One implication of consideringgood Web design that supports exploratory searchtogether with client-side applications, like the RB, isto provide people with ways to trade off personalbehavior data for added value services. Those who donot want their information behaviors to be mined canchoose to use more client-side exploration tools, onlysending requests for database partitions to the server.

Regardless of where the exploration takes place, itis clear that more computational resources will bedevoted to exploratory search and the next searchengine behemoths will be the ones that provide easyto apply exploratory search tools that help informa-tion seekers get beyond finding to understanding anduse of information resources.

References1. Garfield, E. When is a negative search result positive? Essays of an Infor-

mation Scientist 1 (Aug. 12, 1970), 117–118.2. Hearst, M. TileBars: Visualization of term distribution information in

full text information access. In Proceedings of the ACM SIGCHI Con-ference on Human Factors in Computing Systems (Denver, CO, 1995).

3. Hearst, M. and Pedersen, P. Reexamining the cluster hypothesis: Scat-ter/Gather on retrieval results. In Proceedings of 19th Annual Interna-tional ACM/SIGIR Conference (Zurich, 1996).

4. Luhn, H.P. A statistical approach to mechanized encoding and search-ing of literary information IBM J. of R&D 1, 4 (1957), 309–317.

5. Marchionini, G. and Brunk, B. Toward a general relation browser: AGUI for information architects. Journal of Digital Information 4, 1(2003); jodi.ecs.soton.ac.uk/Articles/v04/i01/Marchionini/.

6. Marchionini, G. and Geisler, G. The open video digital library. dLibMag. 8, 12 (2002); www.dlib.org/dlib/december02/marchionini/12marchionini.html.

7. Marchionini, G. and Shneiderman, B. Finding facts vs. browsingknowledge in hypertext systems. Computer 2, 11 (Nov. 1988), 70–80.

8. Oblinger, D. and Oblinger, J. Is it age or IT: First steps toward under-standing the Net generation. Educating the net generation. Educause(2005); www.educause.edu/educatingthenetgen.

9. Saracevic, T. The stratified model of information retrieval interaction:Extension and applications. In Proceedings of the American Society forInformation Science 34 (1997), 313–327.

10. Shneiderman, B. and Plaisant, C. Designing the User Interface 4th Ed.Person/Addison-Wesley, Boston, MA, 2005.

11. Zamir, O. and Etzioni, O. Grouper: A dynamic clustering interface toWeb search results. In Proceedings of WWW8, (Toronto, Canada,1999).

Gary Marchionini ([email protected]) is the Cary C. BoshamerProfessor in the School of Information and Library Science at the University of North Carolina at Chapel Hill.

This work was supported by NSF grants IIS 0099638 and EIA 0131824.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.

© 2006 ACM 0001-0782/06/0400 $5.00

c

46 April 2006/Vol. 49, No. 4 COMMUNICATIONS OF THE ACM