facets berkeley
TRANSCRIPT
-
8/8/2019 Facets Berkeley
1/76
Semi-Automated Creation of
Facet Hierarchies
Marti HearstSchool of Information, UC Berkeley
Joint work with Dr. Emilia Stoica
-
8/8/2019 Facets Berkeley
2/76
Marti Hearst, Taxonomy Bootcamp 06
Outline
Faceted Metadata Definition
Advantages
Flamenco:
Search Interface Design using Faceted Metadata
Castanet: (Semi) Automated Tool for Creation of Category Systems
Comparison to State-of-the-Art Alternatives
Conclusions
-
8/8/2019 Facets Berkeley
3/76
Marti Hearst, Taxonomy Bootcamp 06
Focus: Search and Navigationof Large Collections
Image
Collections
E-Government
Sites
Shopping Sites
Digital Libraries
-
8/8/2019 Facets Berkeley
4/76
Marti Hearst, Taxonomy Bootcamp 06
Study by Vividence in 2001 on 69 Sites 70% eCommerce
31% Service
21% Content
2% Community
Poorly organized search results Frustration and wasted time
Poor information architecture Confusion
Dead ends
"back and forthing"
Forced to search
Problems with Site Search
-
8/8/2019 Facets Berkeley
5/76
Marti Hearst, Taxonomy Bootcamp 06
What we want to Achieve
Integrate browsing and searching seamlessly
Support exploration and learning
Avoid dead-ends, pogoing, and lostness
-
8/8/2019 Facets Berkeley
6/76
Marti Hearst, Taxonomy Bootcamp 06
Main Idea
Use hierarchical faceted metadata Design the interface to:
Allow flexible navigation
Provide previews of next steps
Organize results in a meaningful way
Support both expanding and refining the search
-
8/8/2019 Facets Berkeley
7/76
Marti Hearst, Taxonomy Bootcamp 06
The Problem With Hierarchy
Most things can be classified in more than one way.
Most organizational systems do not handle this well.
Example: Animal Classification
otter
penguin
robin
salmon
wolf
cobra
bat
Skin
Covering
Locomotion
Diet
robin
bat wolf
penguin
otter, sealsalmon
robin
bat
salmon
wolf
cobra
otter
penguin
seal
robin
penguinsalmon
cobra
bat
otter
wolf
-
8/8/2019 Facets Berkeley
8/76
Marti Hearst, Taxonomy Bootcamp 06
Inflexible Force the user to start with a particular category
What if I dont know the animals diet, but theinterface makes me start with that category?
Wasteful Have to repeat combinations of categories
Makes for extra clicking and extra coding
Difficult to modify To add a new category type, must duplicate it
everywhere or change things everywhere
The Problem with Hierarchy
-
8/8/2019 Facets Berkeley
9/76
Marti Hearst, Taxonomy Bootcamp 06
The Problem With Hierarchy
start
fur scales feathers
swim fly run slither
fur scales feathers fur scales feathers
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
fish
rodents
insects
salmon bat robin wolf
-
8/8/2019 Facets Berkeley
10/76
Marti Hearst, Taxonomy Bootcamp 06
The Idea of Facets
Facets are a way of labeling data A kind of Metadata (data about data)
Can be thought of as properties of items
Facets vs. Categories Items are placed INTO a category system
Multiple facet labels are ASSIGNED TO items
-
8/8/2019 Facets Berkeley
11/76
Marti Hearst, Taxonomy Bootcamp 06
The Idea of Facets
Create INDEPENDENT categories (facets) Each facet has labels (sometimes arranged in a hierarchy)
Assign labels from the facets to every item Example: recipe collection
Course
Main Course
Cooking
Method
Stir-fry
Cuisine
Thai
Ingredient
Bell Pepper
Curry
Chicken
-
8/8/2019 Facets Berkeley
12/76
Marti Hearst, Taxonomy Bootcamp 06
The Idea of Facets
Break out all the important concepts into theirown facets
Sometimes the facets are hierarchical Assign labels to items from any level of the hierarchy
Preparation Method
Fry
Saute
Boil
BakeBroil
Freeze
Desserts
Cakes
Cookies
Dairy
Ice CreamSorbet
Flan
Fruits
Cherries
Berries
Blueberries
StrawberriesBananas
Pineapple
-
8/8/2019 Facets Berkeley
13/76
Marti Hearst, Taxonomy Bootcamp 06
Using Facets
Now there are multiple ways to get to each item
Preparation Method
Fry
SauteBoil
Bake
Broil
Freeze
Desserts
Cakes
CookiesDairy
Ice Cream
Sherbet
Flan
Fruits
Cherries
BerriesBlueberries
Strawberries
Bananas
Pineapple
Fruit > Pineapple
Dessert > Cake
Preparation > Bake
Dessert > Dairy > Sherbet
Fruit > Berries > Strawberries
Preparation > Freeze
-
8/8/2019 Facets Berkeley
14/76
Marti Hearst, Taxonomy Bootcamp 06
Example:
Nobel Prize Winners Collection(Before and After Facets)
-
8/8/2019 Facets Berkeley
15/76
Marti Hearst, Taxonomy Bootcamp 06
Only One Way to View Laureates
-
8/8/2019 Facets Berkeley
16/76
Marti Hearst, Taxonomy Bootcamp 06
First, Choose Prize Type
-
8/8/2019 Facets Berkeley
17/76
Marti Hearst, Taxonomy Bootcamp 06
Next, view the list!
The user must first choose an
Award type (literature), then browse
through the laureates in
chronological order.
No choice is given to, say organize
by year and then award, or by
country, then decade, then award, etc.
-
8/8/2019 Facets Berkeley
18/76
Marti Hearst, Taxonomy Bootcamp 06
Flamenco Interface:Using Hierarchical Faceted Metadata
O i Vi
-
8/8/2019 Facets Berkeley
19/76
Marti Hearst, Taxonomy Bootcamp 06
Opening ViewSelect literature from PRIZE facet
-
8/8/2019 Facets Berkeley
20/76
Marti Hearst, Taxonomy Bootcamp 06
Group results by YEAR facet
-
8/8/2019 Facets Berkeley
21/76
Marti Hearst, Taxonomy Bootcamp 06
Select 1920s from YEAR facet
C t i PRIZE lit t AND
-
8/8/2019 Facets Berkeley
22/76
Marti Hearst, Taxonomy Bootcamp 06
Current query is PRIZE > literature ANDYEAR: 1920s. Now remove PRIZE > literature
-
8/8/2019 Facets Berkeley
23/76
Marti Hearst, Taxonomy Bootcamp 06
Now Group By YEAR > 1920s
Hi h T l
-
8/8/2019 Facets Berkeley
24/76
Marti Hearst, Taxonomy Bootcamp 06
Hierarchy Traversal:Group By YEAR > 1920s, and drill down to 1921
-
8/8/2019 Facets Berkeley
25/76
Marti Hearst, Taxonomy Bootcamp 06
Select an individual item
-
8/8/2019 Facets Berkeley
26/76
Marti Hearst, Taxonomy Bootcamp 06
Use Endgame to expand out
-
8/8/2019 Facets Berkeley
27/76
Marti Hearst, Taxonomy Bootcamp 06
Use Endgame to expand out
-
8/8/2019 Facets Berkeley
28/76
Marti Hearst, Taxonomy Bootcamp 06
Or use More like this to find similar items
-
8/8/2019 Facets Berkeley
29/76
Marti Hearst, Taxonomy Bootcamp 06
Start a new search using keyword California
-
8/8/2019 Facets Berkeley
30/76
Marti Hearst, Taxonomy Bootcamp 06
Note that category structure remains after the keyword search
h i k d A d i h f bhi h
-
8/8/2019 Facets Berkeley
31/76
Marti Hearst, Taxonomy Bootcamp 06
The query is now a keyword ANDed with a facet subhierarchy
-
8/8/2019 Facets Berkeley
32/76
Marti Hearst, Taxonomy Bootcamp 06
Using Facets
The system only shows the labels that correspondto the current set of items Start with all items and all facets
The user then selects a label within a facet
This reduces the set of items (only those that havebeen assigned to the subcategory label are displayed)
This also eliminates some subcategories from the view.
-
8/8/2019 Facets Berkeley
33/76
Marti Hearst, Taxonomy Bootcamp 06
Advantages of Facets
Cant end up with empty results sets (except with keyword search)
Helps avoid feelings of being lost. Easier to explore the collection.
Helps users infer what kinds of things are in thecollection. Evokes a feeling of browsing the shelves
Is preferred over standard search for collection
browsing in usability studies. (Interface must be designed properly)
-
8/8/2019 Facets Berkeley
34/76
Marti Hearst, Taxonomy Bootcamp 06
Advantages of Facets
Seamless to add new facets and subcategories Seamless to add new items.
Helps with categorization wars Dont have to agree exactly where to place something
Interaction can be implemented using a standardrelational database.
May be easier for automatic categorization
-
8/8/2019 Facets Berkeley
35/76
Marti Hearst, Taxonomy Bootcamp 06
Information previews
Use the metadata to show where to go next More flexible than canned hyperlinks
Less complex than full search
Help users see and return to previous steps
Reduces mental work Recognition over recall
Suggests alternatives
More clicks are ok only if(J. Spool) The scent of the target does not weaken
If users feel they are going towards, rather than away, from their
target.
-
8/8/2019 Facets Berkeley
36/76
Marti Hearst, Taxonomy Bootcamp 06
Facets vs. Hierarchy
Early Flamenco studies compared allowingmultiple hierarchical facets vs. just one facet.
Multiple facets was preferred and more successful.
-
8/8/2019 Facets Berkeley
37/76
Marti Hearst, Taxonomy Bootcamp 06
Limitation of Facets
Do not naturally capture MAIN THEMES
Facets do not show RELATIONS explicitly
Aquamarine
Red
Orange
Door
Doorway
Wall
Which color associated with which object?
Photo by J. Hearst, jhearst.typepad.com
-
8/8/2019 Facets Berkeley
38/76
Marti Hearst, Taxonomy Bootcamp 06
Terminology Clarification
Facets vs. Attributes Facets are shown independently in the interface Attributes just associated with individual items
E.g., ID number, Source, Affiliation
However, can always convert an attribute to a facet
Facets vs. Labels Labels are the names used within facets
These are organized into subhierarchies
Synonyms There should be alternate names for the category labels
Currently (in Flamenco) this is done with subcategories
E.g., Deer has subcategories stag, fawn, doe
-
8/8/2019 Facets Berkeley
39/76
Marti Hearst, Taxonomy Bootcamp 06
Usability Study Results
-
8/8/2019 Facets Berkeley
40/76
Marti Hearst, Taxonomy Bootcamp 06
Flamenco Usability Studies
Usability studies done on 3 collections: Recipes (epicurious): 13,000 items Architecture Images: 40,000 items Fine Arts Images: 35,000 items
Conclusions: Users like and are successful with the dynamic
faceted hierarchical metadata, especially forbrowsing tasks
Very positive results, in contrast with studies onearlier iterations.
-
8/8/2019 Facets Berkeley
41/76
Marti Hearst, Taxonomy Bootcamp 06
Most Recent Usability Study
Participants & Collection 32 Art History Students ~35,000 images from SF Fine Arts Museum
Study Design Within-subjects
Each participant sees both interfaces Balanced in terms of order and tasks
Participants assess each interface after use Afterwards they compare them directly
Data recorded in behavior logs, server logs, paper-surveys; one or
two experienced testers at each trial. Used 9 point Likert scales. Session took about 1.5 hours; pay was $15/hour
-
8/8/2019 Facets Berkeley
42/76
Marti Hearst, Taxonomy Bootcamp 06
Post-Interface Assessments
All significant at p
-
8/8/2019 Facets Berkeley
43/76
Marti Hearst, Taxonomy Bootcamp 06
Post-Test Comparison
15 16
2 30
1 29
4 28
8 23
6 24
28 3
1 31
2 29
FacetedBaseline
Overall Assessment
More useful for your tasks
Easiest to use
Most flexible
More likely to result in dead ends
Helped you learn more
Overall preference
Find images of roses
Find all works from a given period
Find pictures by 2 artists in same media
Which Interface Preferable For:
-
8/8/2019 Facets Berkeley
44/76
How to Create Facet Hierarchies?
Our Approach: Castanet
-
8/8/2019 Facets Berkeley
45/76
Marti Hearst, Taxonomy Bootcamp 06
Example: Recipes (3500 docs)
-
8/8/2019 Facets Berkeley
46/76
Marti Hearst, Taxonomy Bootcamp 06
Castanet Output (shown in Flamenco)
h l
-
8/8/2019 Facets Berkeley
47/76
Marti Hearst, Taxonomy Bootcamp 06
Castanet Output (shown in Flamenco)
C O ( h i Fl )
-
8/8/2019 Facets Berkeley
48/76
Marti Hearst, Taxonomy Bootcamp 06
Castanet Output (shown in Flamenco)
C O ( h i Fl )
-
8/8/2019 Facets Berkeley
49/76
Marti Hearst, Taxonomy Bootcamp 06
Castanet Output (shown in Flamenco)
C O ( h i Fl )
-
8/8/2019 Facets Berkeley
50/76
Marti Hearst, Taxonomy Bootcamp 06
Castanet Output (shown in Flamenco)
Our Approach:
-
8/8/2019 Facets Berkeley
51/76
Marti Hearst, Taxonomy Bootcamp 06
Our Approach:Leverage the structure of WordNet
O A h
-
8/8/2019 Facets Berkeley
52/76
Marti Hearst, Taxonomy Bootcamp 06
Our Approach
Leverage the structure of WordNet
Docum
ents
WordNet
Get
hypernym
pathsSelect
terms
Build
treeCompress
tree
Divide into facets
-
8/8/2019 Facets Berkeley
53/76
Marti Hearst, Taxonomy Bootcamp 06
1. Select Terms
red blue
Select well distributed
terms from collection Documen
ts
WordNet
Get
hypernym
pathsSelectterms
Build
tree
Comp.
tree
h
-
8/8/2019 Facets Berkeley
54/76
Marti Hearst, Taxonomy Bootcamp 06
2. Get Hypernym Path
red blue
chromatic color
abstraction
property
visual property
color
red, redness
abstraction
property
visual property
color
blue, blueness
chromatic color
Documen
ts
WordNet
Get
hypernym
pathsSelectterms
Build
tree
Comp.
tree
-
8/8/2019 Facets Berkeley
55/76
Marti Hearst, Taxonomy Bootcamp 06
3. Build Tree
red blue
chromatic color
abstraction
property
visual property
color
red, redness
abstraction
property
visual property
color
blue, blueness
chromatic color
red blue
abstraction
property
visual property
color
red, redness
chromatic color
blue, blueness
Documen
ts
WordNet
Get
hypernym
pathsSelectter
ms
Build
tree
Comp.
tree
4 Compress Tree
-
8/8/2019 Facets Berkeley
56/76
Marti Hearst, Taxonomy Bootcamp 06
4. Compress Tree
Documen
ts
WordNet
Get
hypernym
pathsSelectter
ms
Build
tree
Comp.
tree
red, redness
color
red
chromatic color
blue, blueness
blue
green, greenness
greengreenred
color
chromatic color
blue
4 Compress Tree (cont )
-
8/8/2019 Facets Berkeley
57/76
Marti Hearst, Taxonomy Bootcamp 06
4. Compress Tree (cont.)
red
color
chromatic color
blue green
color
red blue green
Documen
ts
WordNet
Get
hypernym
pathsSelectter
ms
Build
tree
Comp.
tree
5 Divide into Facets
-
8/8/2019 Facets Berkeley
58/76
Marti Hearst, Taxonomy Bootcamp 06
5. Divide into Facets
Divide into facets
Disambiguation
-
8/8/2019 Facets Berkeley
59/76
Marti Hearst, Taxonomy Bootcamp 06
Disambiguation
Ambiguity in: Word senses
Paths up the hypernym tree
Sense 1 for word tuna
organism, being
=> plant, flora=> vascular plant
=> succulent
=> cactus
=> tuna
Sense 2 for word tuna
organism, being
=> fish=> food fish
=> tuna
=> bony fish
=> spiny-finned fish
=> percoid fish
=> tuna
2 paths for same word
2 paths for
same sense
How to Select the Right Senses and Paths?
-
8/8/2019 Facets Berkeley
60/76
Marti Hearst, Taxonomy Bootcamp 06
How to Select the Right Senses and Paths?
First: build core tree (1) Create paths for words with only one sense
(2) Use Domains Wordnet has 212 Domains
medicine, mathematics, biology, chemistry, linguistics, soccer, etc.
Automatically scan the collection to see which domains apply
The user selects which of the suggested domains to use or may add own
Paths for terms that match the selected domains are added to the coretree
Then: add remaining terms to the core tree.
-
8/8/2019 Facets Berkeley
61/76
-
8/8/2019 Facets Berkeley
62/76
Castanet Evaluation
Castanet Evaluation
-
8/8/2019 Facets Berkeley
63/76
Marti Hearst, Taxonomy Bootcamp 06
Castanet Evaluation
This is a tool for information architects, so peopleof this type did the evaluation
We compared output on Recipes
Biomedical journal titles
We compared to two state-of-the-art algorithms LDA (Blei et al. 04)
Subsumption (Sanderson & Croft 99)
Subsumption Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
64/76
Marti Hearst, Taxonomy Bootcamp 06
Subsumption Output (shown in Flamenco)
Subsumption Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
65/76
Marti Hearst, Taxonomy Bootcamp 06
Subsumption Output (shown in Flamenco)
Subsumption Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
66/76
Marti Hearst, Taxonomy Bootcamp 06
Subsumption Output (shown in Flamenco)
Subsumption Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
67/76
Marti Hearst, Taxonomy Bootcamp 06
Subsumption Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
68/76
LDA Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
69/76
Marti Hearst, Taxonomy Bootcamp 06
LDA Output (shown in Flamenco)
LDA Output (shown in Flamenco)
-
8/8/2019 Facets Berkeley
70/76
Marti Hearst, Taxonomy Bootcamp 06
LDA Output (shown in Flamenco)
Evaluation Method
-
8/8/2019 Facets Berkeley
71/76
Marti Hearst, Taxonomy Bootcamp 06
Evaluation Method
Information architects assessed the categorysystems
For each of 2 systems output: Examined and commented on top-level
Examined and commented on two sub-levels
Then comment on overall properties Meaningful?
Systematic? Likely to use in your work?
Evaluation Results
-
8/8/2019 Facets Berkeley
72/76
Marti Hearst, Taxonomy Bootcamp 06
Evaluation Results
Results on recipes collection for Would you usethis system in your work? Yes in some cases or yes definitely:
Pine (Castanet): 29/34
Oak (LDA): 0/18 Birch (Subsumption): 6/16
Results on quality of categories:
Opportunities for Tagging
-
8/8/2019 Facets Berkeley
73/76
Marti Hearst, Taxonomy Bootcamp 06
Opportunities for Tagging
New opportunity: Tagging, folksonomies (flickr de.lici.ous)
People are created facets in a decentralized manner
They are assigning multiple facets to items
This is done on a massive scale This leads naturally to meaningful associations
Conclusions
-
8/8/2019 Facets Berkeley
74/76
Marti Hearst, Taxonomy Bootcamp 06
Conclusions
Flexible application of hierarchical faceted metadata is aproven approach for navigating large informationcollections.
Midway in complexity between simple hierarchies and deepknowledge representation.
Currently in use on e-commerce sites; spreading to other domains
Systems are needed to help create faceted metadatastructures
Our WordNet-based algorithm, while not perfect, seems like itwill be a useful tool for Information Architects.
-
8/8/2019 Facets Berkeley
75/76
-
8/8/2019 Facets Berkeley
76/76
For more information:
flamenco.berkeley.edu
Thank you!
Marti Hearst & Emilia Stoica