psychology of category structure facets vs. hierarchies sims 202 profs. hearst & larson uc...

34
Psychology of Category Structure Facets vs. Hierarchies SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000

Post on 21-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Psychology of Category Structure

Facets vs. Hierarchies

SIMS 202Profs. Hearst & Larson

UC Berkeley SIMSFall 2000

Last Time

Symbols and Language Lexical Relations

Major Lexical Relations

WordNet classifies lexical relations– Synonymy, Polysemy, Metonymy,

Hyponymy/Hyperonymy, Meronymy, Antonymy

These are important properties of words

Which of these apply to concepts?

Today

Psychology of Categorization How to combine attributes to

categorize information– Subject Headings vs. Descriptors– Hierarchies vs. Facets

Psychology of Categorization

Category Structure

Defining Category Membership– Necessary and Sufficient Conditions– Properties of Categorization

»Characteristic Features»Centrality/Typicality»Basic Level Categories

Defining Category Membership

Necessary and Sufficient Conditions:– Every condition must be met.– No other conditions can be required.

»Example: A prime number: An integer divisible only by itself and 1.

Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.

»Example: mother A woman who has given birth to a child.

Can category membership be defined?

What are the necessary and sufficient conditions for something to be a game?

Definition of Game Famous example by Wittgenstein

– Classic categories assume clear boundaries defined by common properties (necessary and sufficient conditions)

Counterexample: “Game”– No common properties shared by all games

» card games, ball games, Olympic games, children’s games» competition: ring-around-the-rosie» skill: dice games» luck: chess

– No fixed boundary; can be extended to new games» video games

Alternative: Concepts related by Family Resemblances

Properties of Categorization Family Resemblance

– Members of a category may be related to one another without all members having any property in common.

» Instead, they may share a large subset of traits.» Some attributes are more likely given that others have

been seen.

– Example: feathers, wings, twittering, ...» Likely to be a bird, but not all features apply to “emu”» Unlikely to see an association with “barks”

Properties of Categorization

Centrality– Example: Prime Numbers

» Definition: An integer divisible only by itself and 1» Examples: 1, 2, 3, 5, 7, 11, 13, 17, …

– A very clear-cut category. Or is it?» Can one number be “more prime” than another?

– Centrality: some members of a category may be “better examples” than others.

» Example: robins vs. chickens vs. emus

Properties of Categorization

Characteristic Features– Perceived degree of category membership has

to do with which features define the category.– Members usually do not have ALL the

necessary features, but have some subset.– Those members that have more of the central

features are seen as more central members.– People have conceptions of typical members.

Testing for Centrality/Typicality

Ask a series of questions, compare how long it takes people to answer.– True or false:

» An apple is a fruit.» A plum is a fruit.» A coconut is a fruit.» An olive is a fruit.» A tomato is a fruit.

– Rosch and Mervis: » The more features a fruit shares with the other

fruits, the more typical a member of the class it is.

Characteristic Features

– Is a cat on a mat a cat?– Is a dead cat a cat?– Is a photo of a cat a cat?– Is a cat with three legs a cat?– Is a cat that barks a cat?– Is a cat with a dog’s brain a cat?– Is a cat with every cell replaced by a dog’s

cells a cat?

Properties of Categorization

Basic-level Categories:– Categories are organized into a hierarchy

from the most general to the most specific, but the level that is most cognitively basic is “in the middle” of the hierarchy

Basic-level Primacy:– Basic-level categories are functionally

primary with respect to factors including ease of cognitive processing (learning, reasoning, recognition, etc).

Basic Level Categories

Brown 1958, 65, Berlin et al., 1972, 73 Folk biology:

– unique beginner: plant, animal– life form: tree, bush, flower– generic name: pine, oak, maple, elm– specific name: Ponderosa pine, white pine– varietal name: western Ponderosa pine

No overlap between levels Level 3 is basic

– corresponds to genus

Characteristics of Basic-level Categories

Language– People name things more readily at basic

level.– Name learned earliest in childhood.– Languages have simpler names at basic level.– Sounds like the “real name”. – Name used more frequently.

» Strange to call a dime a coin, a metal object

– Names used in neutral context.» There’s a dog on the porch.» There’s a terrier on the porch.

Characteristics of Basic-level Categories

Concepts– Things perceived more holistically at the basic

level (rather than by parts).– People interact with basic and more specific

levels similarly.– Things are remembered more readily at basic

level.– Folk biological categories correspond

accurately to scientific biological categories only at the basic level.

Three Psychologically Primary Levels

SUPERORDINATE animal furnitureBASIC LEVEL dog chairSUBORDINATE terrier rocker

Children take longer to learn superordinate

Superordinate not associated with mental images or motor actions

How related to – Hyponymy– Hyperonymy

Categories vs. Words

Necessary and Sufficient conditions for Mother?

» mother(A,B) -> female(A), gave-birth-to(A,B), same-species(A,B), …,

What about:» Birth mother vs. adoptive mother» Rearing role vs. biological role» Surrogate mother» Cloning

Need to distinguish between the word used and the underlying concept(s) it stands for.

Summary– Processes of categorization underlie many of the

issues having to do with information organization– Categorization is messier than our computer

systems would like– Human categories have graded membership,

consisting of family resemblances.» Family resemblance is expressed in part by which subset of

features are shared» It is also determined by underlying understandings of the

world that do not get represented in most systems

– Basic level categories, as well as subordinate and superordinate categories, seem to be cognitively real.

Hierarchical vs. Faceted (Subject Heading vs. Descriptor)

Category Systems

Controlled Vocabulary(The following slides follow Bates 88)

Start with the text of the document Attempt to “control” or regularize:

– The concepts expressed within» mutually exclusive» exhaustive

– The language used to express those concepts

» limit the normal linguistic variations» regulate word order and structure of phrases» reduce the number of synonyms or near-synonyms

Also, provide cross-references between concepts and their expression.

Classification Schemes

Classify possible concepts. Goals:

– Completely distinct conceptual categories (mutually exclusive)

– Complete coverage of conceptual categories (exhaustive)

AssigningHeadings vs. Descriptors

Subject headings – assign one (or a

few) complex heading(s) to the document

Descriptors– Mix and match

How would we describe recipes using each technique?

Subject Heading vs. Descriptor WILSONLINE

– Athletes– Athletes--Heath&Hygiene– Athletes--Nutrition– Athletes--Physical Exams– …– Athletics– Athletics -- Administration– Athletics -- Equipment --

Catalogs– …– Sports -- Accidents and

injuries– Sports -- Accidents and

injuries -- prevention

ERIC– Athletes– Athletic Coaches– Athletic Equipment– Athletic Fields– Athletics– …– Sports psychology– Sportsmanship

Subject Headings vs. Descriptors

Describe the contents of an entire document

Designed to be looked up in an alphabetical index– Look up document

under its heading Few (1-5)

headings per document

Describe one concept within a document

Designed to be used in Boolean searching– Combine to describe

the desired document Many (5-25)

descriptors per document

AssigningHeadings vs. Descriptors

How would we create a cookbook using each technique?

Hierarchical Classification

– Each category is successively broken down into smaller and smaller subdivisions

– No item occurs in more than one subdivision

– Each level divided out by a “character of division”. Also known as a feature.»Example: distinguish Literature based on:

Language Genre Time Period

Hierarchical Classification

Literature

SpanishFrenchEnglish

DramaPoetryProse

18th17th16th

DramaPoetryProse

19th 18th17th16th 19th

...

... ... ...

...

Labeled Categories for Hierarchical Classification

LITERATURE– 100 English Literature

» 110 English Prose English Prose 16th Century English Prose 17th Century English Prose 18th Century ...

» 111 English Poetry 121 English Poetry 16th Century 122 English Poetry 17th Century ...

» 112 English Drama 130 English Drama 16th Century …

– 200 French Literature

Faceted Classification

Create a separate, free-standing list for each characteristic of division (feature).

Combine features to create a classification.

Faceted Classification along with Labeled Categories

A Language– a English– b French– c Spanish

B Genre– a Prose– b Poetry– c Drama

C Period– a 16th Century– b 17th Century– c 18th Century– d 19th Century

Aa English Literature

AaBa English Prose AaBaCa English

Prose 16th Century AbBbCd French

Poetry 19th Century BbCd Drama 19th

Century

Important Question:How to use both types ofclassification structures?

How to look through them? How to use them in search?