sims 296a-3: ui background marti hearst fall ‘98
Post on 20-Dec-2015
218 views
TRANSCRIPT
Marti HearstUCB SIMS, Fall 98
Interface Topics TodayInterface Topics Today
(Other topics will be covered later)(Other topics will be covered later)
Supporting the Dynamic Continuing Supporting the Dynamic Continuing
Process of SearchProcess of Search
Search Starting PointsSearch Starting Points
Marti HearstUCB SIMS, Fall 98
Standard ModelStandard Model
Assumptions:Assumptions: Maximizing precision and recall Maximizing precision and recall
simultaneouslysimultaneously The information need remains staticThe information need remains static The value is in the resulting document setThe value is in the resulting document set
User’s InformationNeed
Index
Pre-process
Parse
Collections
Rank or Match
Query
text input
Query Reformulation
Marti HearstUCB SIMS, Fall 98
““Berry-Picking” as an Berry-Picking” as an Information Seeking Strategy Information Seeking Strategy (Bates 90)(Bates 90) Standard IR modelStandard IR model
The information need remains the same throughout the The information need remains the same throughout the search session.search session.
Goal is to produce a perfect set of relevant docs.Goal is to produce a perfect set of relevant docs. Berry-picking modelBerry-picking model
The query is continually shifting.The query is continually shifting. Users may move through a variety of sources.Users may move through a variety of sources. New information may yield new ideas and new New information may yield new ideas and new
directions.directions. The value of search is on the bits and pieces picked up The value of search is on the bits and pieces picked up
along the way.along the way.
Marti HearstUCB SIMS, Fall 98
A sketch of a searcher… “moving through many A sketch of a searcher… “moving through many actions towards a general goal of satisfactory actions towards a general goal of satisfactory completion of research related to an information completion of research related to an information need.” (after Bates 90)need.” (after Bates 90)
Q0
Q1
Q2
Q3
Q4
Q5
Marti HearstUCB SIMS, Fall 98
ImplicationsImplications
Interfaces should make it easy to store Interfaces should make it easy to store intermediate resultsintermediate results
Interfaces should make it easy to follow Interfaces should make it easy to follow trails with unanticipated resultstrails with unanticipated results
Difficulties with evaluationDifficulties with evaluation
Marti HearstUCB SIMS, Fall 98
Supporting the Information Supporting the Information Seeking ProcessSeeking Process
Two recent similar approaches that focus Two recent similar approaches that focus on supporting the processon supporting the process SketchTrieve (Hendry & Harper 97)SketchTrieve (Hendry & Harper 97) DLITE (Cousins 97)DLITE (Cousins 97)
Marti HearstUCB SIMS, Fall 98
Informal InterfaceInformal Interface InformalInformal does does notnot mean less useful mean less useful Show how the search isShow how the search is
unfolding or evolvingunfolding or evolving expanding or contractingexpanding or contracting
Prompt the user toPrompt the user to reformulate and abandon plansreformulate and abandon plans backtrack to points of task deferralbacktrack to points of task deferral make side-by-side comparisonsmake side-by-side comparisons define and discuss problemsdefine and discuss problems
Marti HearstUCB SIMS, Fall 98
SketchTrieve: An Informal SketchTrieve: An Informal InterfaceInterface (Hendry & Harper 96, 97)(Hendry & Harper 96, 97) A “spreadsheet” for information access A “spreadsheet” for information access Make use of layout, space, and localityMake use of layout, space, and locality
comprehension and explanationcomprehension and explanation search planningsearch planning
A data-flow notation for information seekingA data-flow notation for information seeking link sources to querieslink sources to queries link both to retrieved documentslink both to retrieved documents align results in space for comparisonalign results in space for comparison
Marti HearstUCB SIMS, Fall 98
SketchTrieve: Connecting SketchTrieve: Connecting Results with Next QueryResults with Next Query
Marti HearstUCB SIMS, Fall 98
DLITE DLITE (Cousins 97)(Cousins 97)
Drag and Drop interfaceDrag and Drop interface Reify queries, sources, retrieval resultsReify queries, sources, retrieval results Animation to keep track of activityAnimation to keep track of activity
Marti HearstUCB SIMS, Fall 98
Starting Points for SearchStarting Points for Search
Faced with a prompt or an empty entry form Faced with a prompt or an empty entry form … how to start?… how to start? Lists of sourcesLists of sources OverviewsOverviews
ClustersClusters Category Hierarchies/Subject CodesCategory Hierarchies/Subject Codes Co-citation LinksCo-citation Links
ExamplesExamples Automatic source selectionAutomatic source selection
Marti HearstUCB SIMS, Fall 98
List of SourcesList of Sources
Have to guess based on the nameHave to guess based on the name Requires prior exposure/experienceRequires prior exposure/experience
Marti HearstUCB SIMS, Fall 98
Overviews in the User Overviews in the User InterfaceInterface Unsupervised Groupings Unsupervised Groupings
ClusteringClustering Kohonen Feature MapsKohonen Feature Maps
Supervised CategoriesSupervised Categories Yahoo!Yahoo! SuperbookSuperbook HiBrowseHiBrowse Cat-a-ConeCat-a-Cone
CombinationsCombinations DynaCatDynaCat SONIASONIA
Marti HearstUCB SIMS, Fall 98
Text ClusteringText Clustering
Finds overall similarities among groups of Finds overall similarities among groups of documentsdocuments
Finds overall similarities among groups of Finds overall similarities among groups of tokenstokens
Picks out some themes, ignores othersPicks out some themes, ignores others
Marti HearstUCB SIMS, Fall 98
Text ClusteringText ClusteringClustering isClustering is
““The The art art of finding groups in data.” of finding groups in data.” -- Kaufmann and Rousseeu-- Kaufmann and Rousseeu
Term 1
Term 2
Marti HearstUCB SIMS, Fall 98
Text ClusteringText Clustering
Term 1
Term 2
Clustering isClustering is““The The art art of finding groups in data.” of finding groups in data.” -- Kaufmann and Rousseeu-- Kaufmann and Rousseeu
Marti HearstUCB SIMS, Fall 98
Document/Document MatrixDocument/Document Matrix
....
.....
.....
....
....
...
21
2212
1121
21
nnn
t
t
t
ddD
ddD
ddD
DDD
jiij DDd to of similarity
Marti HearstUCB SIMS, Fall 98
K-Means ClusteringK-Means Clustering
1 Create a pair-wise similarity measure1 Create a pair-wise similarity measure 2 Find K centers using agglomerative clustering2 Find K centers using agglomerative clustering
take a small sample take a small sample group bottom up until K groups foundgroup bottom up until K groups found
3 Assign each document to nearest center, 3 Assign each document to nearest center, forming new clustersforming new clusters
4 Repeat 3 as necessary4 Repeat 3 as necessary
Marti HearstUCB SIMS, Fall 98
The Cluster The Cluster HypothesisHypothesis
“Closely associated documents tend to be relevant to the same requests.”
van Rijsbergen 1979
“… I would claim that document clustering can lead to more effective retrieval than linearsearch [which] ignores the relationships thatexist between documents.”
van Rijsbergen 1979
Marti HearstUCB SIMS, Fall 98
Clustering as Clustering as CategorizationCategorization
“In a traditional library environment … the itemsare classified first into subject areas, and a search is restricted to times within a few chosen subjectclasses. The same device can also be used … [to construct] groups of related documents and confining the search to certain groups only.”
Salton 71
Marti HearstUCB SIMS, Fall 98
Clustering as Clustering as CategorizationCategorization
“… In experiments we often want to vary the cluster representatives at search time. …Of course, were we to design an operationalclassification, the cluster representatives wouldbe constructed once and for all at cluster time.
van Rijsbergen 79
Marti HearstUCB SIMS, Fall 98
Scatter/GatherScatter/Gather
Cutting, Pedersen, Tukey & Karger 92, 93Cutting, Pedersen, Tukey & Karger 92, 93
Hearst & Pedersen 95Hearst & Pedersen 95
Cluster sets of documents into general “themes”, like a table of contents Cluster sets of documents into general “themes”, like a table of contents
Display the contents of the clusters by showing Display the contents of the clusters by showing topical terms topical terms andand typical typical titlestitles
User chooses subsets of the clusters and re-clusters the documents within User chooses subsets of the clusters and re-clusters the documents within
Resulting new groups have different “themes”Resulting new groups have different “themes”
Marti HearstUCB SIMS, Fall 98
S/G Example: query on “star”S/G Example: query on “star”
Encyclopedia textEncyclopedia text
14 sports14 sports
8 symbols8 symbols 47 film, tv47 film, tv
68 film, tv (p)68 film, tv (p) 7 music 7 music
97 astrophysics97 astrophysics
67 astronomy(p)67 astronomy(p) 12 steller phenomena12 steller phenomena
10 flora/fauna10 flora/fauna 49 galaxies, stars 49 galaxies, stars
29 constellations29 constellations
7 miscelleneous7 miscelleneous
Clustering and Clustering and re-clusteringre-clustering is entirely automated is entirely automated
Marti HearstUCB SIMS, Fall 98
Two Queries: Two Two Queries: Two ClusteringsClusteringsAUTO, CAR, ELECTRIC AUTO, CAR, SAFETY
The main differences are the clusters that are central to the query
8 control drive accident …
25 battery california technology …
48 import j. rate honda toyota …
16 export international unit japan
3 service employee automatic …
6 control inventory integrate …
10 investigation washington …
12 study fuel death bag air …
61 sale domestic truck import …
11 japan export defect unite …
Marti HearstUCB SIMS, Fall 98
Publication History of Publication History of Scatter/GatherScatter/Gather
1991 1991 Patents FiledPatents Filed SIGIR 92 SIGIR 92 Initial Algorithm IntroducedInitial Algorithm Introduced SIGIR 93SIGIR 93 Optimizations PresentedOptimizations Presented AAAIFS 95 AAAIFS 95 Examples of Use on Retrieval ResultsExamples of Use on Retrieval Results TREC 95TREC 95 Use in Interactive Track ExperimentsUse in Interactive Track Experiments CHI 96CHI 96 Experiments providing evidence that Experiments providing evidence that
users learn collection structureusers learn collection structure SIGIR 96SIGIR 96 Evidence that clustering can improve Evidence that clustering can improve
ranking for TREC-like scenarioranking for TREC-like scenario
(Publication timing may lag significantly behind when the work was done)
Marti HearstUCB SIMS, Fall 98
Another use of clusteringAnother use of clustering
Use clustering to map the entire huge Use clustering to map the entire huge multidimensional document space into a multidimensional document space into a huge number of small clusters.huge number of small clusters.
““Project” these onto a 2D graphical Project” these onto a 2D graphical representation:representation:
Marti HearstUCB SIMS, Fall 98
Clustering Multi-Dimensional Clustering Multi-Dimensional Document SpaceDocument Space(image from Wise et al 95)(image from Wise et al 95)
Marti HearstUCB SIMS, Fall 98
Concept “Landscapes”Concept “Landscapes”
Pharmocology
Anatomy
Legal
Disease
Hospitals
Built using Kohonen Feature MapsXia Lin, H.C. Chen
Marti HearstUCB SIMS, Fall 98
Visualization of ClustersVisualization of Clusters
Huge 2D maps may be inappropriate focus Huge 2D maps may be inappropriate focus for information retrieval for information retrieval
Can’t see what documents are aboutCan’t see what documents are about Documents forced into one position in semantic Documents forced into one position in semantic
spacespace Space is difficult to use for IR purposesSpace is difficult to use for IR purposes Hard to view titlesHard to view titles
Perhaps more suited for pattern discoveryPerhaps more suited for pattern discovery problem: often only one view on the spaceproblem: often only one view on the space
Marti HearstUCB SIMS, Fall 98
Using Clustering in Using Clustering in Document RankingDocument Ranking
Cluster entire collectionCluster entire collection Find cluster centroid that best matches Find cluster centroid that best matches
the querythe query This has been explored extensivelyThis has been explored extensively
it is expensiveit is expensive it doesn’t work wellit doesn’t work well
Marti HearstUCB SIMS, Fall 98
Using Clustering in Using Clustering in InterfacesInterfaces Alternative (scatter/gather): Alternative (scatter/gather):
cluster top-ranked documentscluster top-ranked documents show cluster summaries to usershow cluster summaries to user
Seems usefulSeems useful experiments show relevant docs tend to end experiments show relevant docs tend to end
up in the same clusterup in the same cluster users seem able to interpret and use the users seem able to interpret and use the
cluster summaries some of the timecluster summaries some of the time More computationally feasibleMore computationally feasible
ClusteringClustering Advantages:Advantages:
Sometimes discover meaningful themesSometimes discover meaningful themes Data-driven, so reflect emphases present in the collection of Data-driven, so reflect emphases present in the collection of
documentsdocuments Can differentiate heterogeneous collectionsCan differentiate heterogeneous collections Domain independentDomain independent
DisadvantagesDisadvantages Variability in quality of resultsVariability in quality of results Only one view on documents’ themesOnly one view on documents’ themes Not good at differentiating homogenous collectionsNot good at differentiating homogenous collections Require interpretationRequire interpretation May mis-match users’ interestsMay mis-match users’ interests
Marti HearstUCB SIMS, Fall 98
Incorporating Categories Incorporating Categories into the Interfaceinto the Interface
Yahoo is the standard methodYahoo is the standard method Problems:Problems:
Hard to search, meant to be navigated.Hard to search, meant to be navigated. Only one category per document (usually)Only one category per document (usually)
Marti HearstUCB SIMS, Fall 98
Integrated Browsing & SearchIntegrated Browsing & Search
Search for category labelsSearch for category labels Browse category labelsBrowse category labels Search within document collectionSearch within document collection Browse resulting documents in bookBrowse resulting documents in book
Marti HearstUCB SIMS, Fall 98
Example: MeSH and MedLineExample: MeSH and MedLine
MeSH Category HierarchyMeSH Category Hierarchy ~18,000 labels~18,000 labels manually assigned manually assigned ~8 labels/article on average~8 labels/article on average avg depth: 4.5, max depth 9avg depth: 4.5, max depth 9
Top Level Categories:Top Level Categories:anatomyanatomy diagnosisdiagnosis related discrelated disc
animalsanimals psychpsych technologytechnology
diseasedisease biologybiology humanitieshumanities
drugsdrugs physicsphysics
Marti HearstUCB SIMS, Fall 98
Large Category SetsLarge Category Sets
Problems for User InterfacesProblems for User Interfaces
Too many categories to browseToo many categories to browse
Too many docs per categoryToo many docs per category Docs belong to multiple categoriesDocs belong to multiple categories Need to integrate searchNeed to integrate search Need to show the documentsNeed to show the documents
We’ll discuss this more next week.We’ll discuss this more next week.
Marti HearstUCB SIMS, Fall 98
Category LabelsCategory Labels Advantages:Advantages:
InterpretableInterpretable Capture summary informationCapture summary information Describe multiple facets of contentDescribe multiple facets of content Domain dependent, and so descriptiveDomain dependent, and so descriptive
DisadvantagesDisadvantages Do not scale well (for organizing documents)Do not scale well (for organizing documents) Domain dependent, so costly to acquireDomain dependent, so costly to acquire May mis-match users’ interestsMay mis-match users’ interests
Marti HearstUCB SIMS, Fall 98
Other Starting Points Other Starting Points ApproachesApproaches
Co-citation LinksCo-citation Links Examples, Guided ToursExamples, Guided Tours