1
Thesaurus Building
Martin DoerrCenter for Cultural Informatics Institute of Computer Science
Foundation for Research and Technology - Hellas
AthensJune 17, 2013
Thesauri
FORTH-ICS June 17, 2013
Overview
Motivation and Definitions Words, Terms and Concepts Knowledge Organisation Systems Thesaurus structure Thesaurus construction Examples
2
Thesauri
FORTH-ICS June 17, 2013
Motivation
The simple idea to standardise expressions of classification for better communication: Results in encyclopedia, knowledge bases, touches language engineering,
cognitive science. Becomes a major issue of electronic communication and information access.
Questions: is this item well characterized by this term? Would every expert expect to find this object under this same term? If not, would such terms be variants of the same concept? Is there a unique answer to “what is this”? Does this database contain descriptions of things falling under this term?
Historically: Roget’s Thesaurus to assist writers with better words…
3
Thesauri
FORTH-ICS June 17, 2013
Words, Terms, Concepts
Words Constituents of natural languages. Categorical meaning, in contrast to
“proper names”. Multiple senses depend on context. (Example: “order”)
Term Constituent of expert language. A word with a specific (categorical) meaning,
either defined in a scientific document or common to an expert group and discipline. (Example: “hepatitis A”)
Concept A class or set of items grouped together on the basis of some implicit or
explicit criterion or rule. The criterion can be unconscious or even innate ! (Example: “δημόσιος υπάλληλος”).
A concept is not a term and not a language element!
4
Thesauri
FORTH-ICS June 17, 2013
Functions of Terminology
Unambiguous scientific expression Use in expert discussions, expert opinions (diagnoses!) and scientific
publication. Defined in disciplinary dictionaries.
Research Defined ad-hoc to discriminate items in a research project (archeology!).
Conclude from form on function, form on provenance etc.
Data search Find all items (publications, objects etc.) possibly relevant for my research
question.
Unfortunately, each function needs a different approach!
5
Thesauri
FORTH-ICS June 17, 2013
From Words to Concepts
Terms are created by selecting or inventing a word, often a compound (“black-figure pottery”) fixing an expert group (“classical archaeologists”), fixing a scientific context (“antique Greek vases”) Term alone makes no sense (“registration”)
A concept is detected As the sense of a term or one sense of a word or the use of words in a text by analyzing context-specific use (written definitions, interviews, dialogues). A concept may be (first time) created by expressing/writing definitions.
A concept is formally identified By assigning an identifier to a description (“definition”) sufficient to clarify its
meaning and disambiguate it from other concepts.
6
Thesauri
FORTH-ICS June 17, 2013
From Words to Concepts
Understanding Comes from disambiguating the concepts (senses) behind words (terms) in a
context. This can be unconscious, conscious by context analysis, by asking clarifications (dialogue)
Databases and database records are contexts Therefore humans can understand a word in a data field
Computers do not understand senses Therefore machines cannot relate (retrieve) records by common sense Therefore senses must be identified to machines as entities and be related to terms
7
Thesauri
FORTH-ICS June 17, 2013
Classification
Concepts are many, words are few LCSH: 500.000 concepts, only general subjects, millions in our mind, UMLS:
over 5.000.000 concepts. Words : some 60.000 in our mind, some 400.000 in a language, some
30.000 in a typical dictionary. A typical thesaurus : 10.000 to 100.000 concepts One word may have some dozens of meanings (referred concepts) One concept may be referred to by several words or terms
Terms are noun phrases, composed of words
Concepts are used to classify things in texts and database records, either by referring to words, terms or concept identifiers.
8
Thesauri
FORTH-ICS June 17, 2013
Purpose of Classification 1
Organise a Universe of Discourse by concepts for cognition and comprehension recognition of discriminant attributes, attribute distribution for generalisation of observation for inferences from evidence to cause
exclusive, avoiding “mixed forms”, prototypical, selective on reality
Communication of conceptualisation presentation of a domain of discourse help for exploration of a topic
descriptive, rich, detailed, fuzzy, “cautious”, incomplete
9
Thesauri
FORTH-ICS June 17, 2013
Purpose of Classification 2
Determination of items in an automated communication process widely agreed-on naming for kinds of objects we share in a cultural space
e.g. artefact, kris (malayan), analogous or constructive classification of kinds of objects out of our cultural
space with terms from our space e.g. knife, dagger = puuko (finnish)
information seeking by constraining attribute values e.g. weapons, 18th century, south-east Asia
Surrogate role, poor, binary, standardised, comprehensive, recall-oriented rather than detailed.
For electronic communication, prescribe few, mandatory high level terms, refer in data records also all good expert terms
from here on, we only talk about this function
10
Thesauri
FORTH-ICS June 17, 2013
Knowledge Organisation Systems
For electronic communication Organize terms, concepts and their relationships into digital (machine readable) dictionaries for human comprehension such that machines can make inferences humans would approve.
Such inferences are identity (get all cats by “cat”) generalization (get “cats” by “felines” related terms (get Heraklion by “Candia”, get “bridge construction” by
“bridges”, get Heraklion by “Crete”)
We call these KOS E.g., LCSH, AAT, geonames, terms lists, ULAN…
11
Thesauri
FORTH-ICS June 17, 2013 12
A dictionary is a listing of words and phrases giving information such as spelling, morphology and part of speech, senses, definitions, usage, origin, and equivalents in other languages (bi- or multilingual dictionary).
A controlled vocabulary is a limited list of terms to be used in a database field. Only an authority may add terms.
Authority files are lists of persons (authors) or places (also gazetters) together with recommended names (controlled).
A classification system is a structure that organizes concepts into a (mono) hierarchy in order to partition some material following a sequence of decision criteria.
Kinds of KOS
Thesauri
FORTH-ICS June 17, 2013 13
An ontology “is a logical theory… …accounting for the intended meaning of a formal vocabulary, i.e. its ontological
commitment to a particular conceptualization of the world. The intended models of a logical language using such a vocabulary are constrained by its ontological commitment. An ontology indirectly reflects this commitment (and the underlying conceptualization) by approximating these intended models.”
We use “ontology” only to formally describe the meaning of information structures
A thesaurus is a controlled vocabulary of categorical terms related to concepts, and with semantic relationships between concepts.
A monolingual thesaurus has terms form one expert group or community
A multilingual thesaurus relates terms and concepts from two or more expert groups or communities (see next slide)
Kinds of KOS
Thesauri
FORTH-ICS June 17, 2013
Multilingual thesauri
Translated thesauri: Each concept is optimally interpreted in words of another or multiple languages, to
allow speakers of those languages to understand it better.
Correlated thesauri: Multiple thesauri with terms and concepts from respective groups, and a set of
concept-based mappings between the different thesauri of that aggregate, in order to process queries across different terminologies.
Interlingua: Concepts are created by fusing each cluster of similar concepts from different social
groups into a new concept. One term from each user group is attached to the new concept as the identifier to be used by this group. The interlingua provides the sharing of concepts between social groups, e.g. as a legal basis used by the European Commission like the EBTI. Note that the interlingua may not contain any of the original concepts of any user group; it contains a set of compromises to remove interpretational differences. Its concepts may again be translated and correlated to other thesauri.
14
Thesauri
FORTH-ICS June 17, 2013
Multilingual Thesauri Merged
15
AndEnglish heritage thesaurus Merimee Thesaurus
English VocabularyFrench Vocabulary
interlingua
linguistic
translation
linguistic
translation
+/-
interthesaurus correlations
+/- +/- +/- +/-
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Structure
Nodes and Links Nodes for concepts and terms Nodes are reference objects with accepted identity. Links for semantic relations concept-concept, concept-term. Links express opinions, constitute the thesaurus.
3 dimensions to specialize links By meaning. E.g. synonymity: who used, when and in which context this
expression for that concept... By version. When introduced, when withdrawn. By opinion. E.g. Who says, that this concept is subordinate to that...
2 Dominant standards: ISO2788 / ISO5964 and SKOS
16
Thesauri
FORTH-ICS June 17, 2013
ISO2788-1986
Standard about the methodology, entities and relationships of a thesaurus,
but not the format
Entities:
thesaurus preferred term non-preferred term compound term node label (facet indicator) facets
Does not yet clearly distinguish concepts and terms.
Getty Research Institute uses the term “descriptor” for representing concepts
17
Thesauri
FORTH-ICS June 17, 2013
SKOS
Simple Knowledge Organization Systems (SKOS) It provides a model for expressing the basic structure and content of
multilingual concept schemes such as thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary.
It is the first widely accepted encoding format in RDF Introduces persistent concept identifiers
Tends to be abused for placenames (gazetteers) and person lists (particulars)…
18
Thesauri
FORTH-ICS June 17, 2013 19
Thesaurus Concepts (SIS-TMS)
Term
Preferred Term
UsedForTermTopTerm (concept)
ThesaurusExpression
ThesaurusConcept
ThesaurusNotion
Non-Preferred Term
AlternativeTerm
NodeLabel
Descriptor
HierarchyTerm (concept)
ObsoleteDescriptor ObsoleteTerm
: Generalisation (isA)
Thesauri
FORTH-ICS June 17, 2013
Hierarchy ThesaurusNotionType
Object GenresObject Genres
Facet
M1_Class
S_Class
Token
ObjectFacet
TopTerm
HierarchyTerm
Descriptor
fortificationsfortifications
Single Built WorksSingle Built Works
belongs to<single built works><single built works>
BT
Semantic part
Functional part
In InInIn
In n
Logical Thesaurus Structure
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Structure: Concept Record
Intrathesaurus relations (ISO 2788)
Hierarchical Relations (from Concept/Descriptor, to Concept/Descriptor) BT (Broader Term) BTP (Broader Term Partitive) = actual kind of RT BTG (Broader Term Generic) = actual BT (IsA)
Associative Relations (from Concept/Descriptor, to Concept/Descriptor) RT (Related Term) = “world of ontologies”
Equivalence Relations (from Concept/Descriptor, to Term/Language) ALT (Alternative Term) UF (Used For Term) often extended by group/language
Now all thesauri use also a concept identifier (possibly LoD identifier).
21
Thesauri
FORTH-ICS June 17, 2013 22
Thesaurus Structure: Linking Concepts
Interthesaurus relations (ISO 5964):
• partial equivalence SKOS: broader equivalence (is subset of)
narrower equivalence (is superset of)
• exact equivalence (same set as)
• inexact equivalence (overlaps with)
good for FTR only
• single to multiple equivalence (future!)
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Structure
HIERARCHICAL RELATIONSHIPS. AAT Definition
Broader and narrower (parent/child) relationships between concepts. Hierarchical relationships are generally either whole/part or genus/species; in the AAT, most hierarchical relationships are genus/species (e.g., chalice is a type of drinking vessel). Relationships may be polyhierarchical, meaning that each child may be linked to multiple parents. Broader term (BT). Also called a broader context. A vocabulary record to which
another record or multiple records are subordinate in a hierarchy. In thesauri, the relationship indicator for this type of term is BT. Variations on the notation include BTG, (broader term generic), BTP (broader term partitive), BTI (broader term instance), BT1 (broader term level 1), BT2 (broader term level 2), etc.
Narrower term. Also called narrower context. A record to which another record or multiple records are superordinate in a hierarchy (for example, Brewster chair is a narrower term to armchair). In thesauri, the relationship indicator for this type ofterm is NT. Variations on the notation include NTG, (narrower term generic), NTP (narrower termpartitive), NTI (narrower term instance), NT1 (narrower term level 1), NT2 (narrower term level 2), etc.
Do not use BT1,BT2,BTI. Always NT must be inverse of BT. Do not use BT for BTP!
23
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Structure
ASSOCIATIVE RELATIONSHIPS AAT. AAT Definition
The relationships between concepts that are closely related conceptually, but the
relationship is not hierarchical because it is not whole/part or genus/species.
Related term (RT). A concept that is associatively (not hierarchically) linked to another concept in a thesaurus. In thesauri, the relationship indicator for this type of term is RT.
We encourage to define specializations of RT
24
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Structure
“Equivalence relationships”. AAT Definition
The relationships between synonymous terms or names that refer to the same concept, typically distinguishing preferred terms (descriptors) and non-preferred terms (variants, or ALTs and UFs). Alternate descriptor (ALT). A variant form of a descriptor available for use;
usually a singular form or a different part of speech than the descriptor (for example, lithograph is an alternate descriptor for the plural descriptor, lithographs). The relationship indicator for this type of term is ALT.
Used for term. Also called a UF. In thesaurus jargon, a term that is not a descriptor and not an alternate descriptor. If the thesaurus is being used as an authority, a used for term is not authorized for indexing. Used for terms typically comprise spelling or grammatical variants of the descriptor or have true synonymity with the descriptor.
These are now “labels” in SKOS, concept-to-string links.
25
Thesauri
FORTH-ICS June 17, 2013
Thesauri structure
Scope note (AAT Definition ): A Note that describes how the term should be used within the context of the
AAT, and provides descriptive information about the concept or expands upon information recorded in other fields. The Scope Note in AAT is analogous to the Descriptive Note in ULAN and TGN.
26
Thesauri
FORTH-ICS June 17, 2013 27
Thesauri
FORTH-ICS June 17, 2013 28
Example Thesaurus Record
Carmine (lake)
Scope Note, SN:
A generic name for two closely related organic red lakes that are obtained from scale insects, cochineal and kermes. Neither pigment is permanent enough for use in fine art because they discolor in sunlight. They were replaced first by madder and alizarin, then later by synthetic organic red colors.
Broader Terms, BT:
colorant (material), lake (pigment)
Alternative Terms, ALT:
carmine lake
Related Terms, RT:
cochineal (colorant), kermes (colorant)
Used For, UF: carmine lake, carmin (lake), Karmesin lake, new red lake, Kugel lake, Parisian lake, Munich lake, Venetian lake, Karmin (Lack)
Thesauri
FORTH-ICS June 17, 2013 29
AAT term record
Thesauri
FORTH-ICS June 17, 2013 30
AAT term record
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Construction Global knowledge and isolated sources
Most thesauri are small, agreement of few experts, integrated into
one local database, seen from a specific view, in one language. Some thesauri cover large “general” subjects, and fail in specialisation. Scientists and scholars share systems of global concepts. Thesauri should be organised by domains
Examples of different scope and scale: General purpose authorities, high-level: AAT, LCSH, RAMEAU, SWD Specialized vocabularies: Beasley, SHIC, ACM
Use CIDOC CRM for global concepts Relate your concepts to as many thesauri possible via persistent
identifiers. Make sure identity of concept after update.
31
Thesauri
FORTH-ICS June 17, 2013
Thesaurus Construction
Distinguish use case: thesauri for keyword search in free text (not my talk today) thesauri to fill in database (metadata) fields
The process Define a purpose/function Engineer terms from existing vocabularies, dictionaries, interviews Engineer concepts from terms, term use, interviews Relate concepts and terms Write concept records
It is a collaborative problem Manage information for common reference, expressions of opinion,
agreement, disagreement Think of long term maintenance: Only a curated KOS can be used.
32
Thesauri
FORTH-ICS June 17, 2013 33
Define a purpose, for example (from D. Soergel), A classification of diseases for diagnosis A classification of medical procedures for insurance billing A classification of medical outcomes to assist with treatment evaluation A classification of commodities for customs A classification of educational objective for instructional development A classification of occupations for matching job applicants with job openings
and for pay scale A classification of skills for employee task assignments
In cultural heritage, think of research question or preservation functions
Thesauri Construction
Thesauri
FORTH-ICS June 17, 2013
Engineering Terms
Words and terms depend on social group and context: Natural language, dialect, scientific language, slang
Σπίνος - fringilla coelebs - chaffinch, σκυλάκι - ορχεοειδές,….
Can be traditional, missing, phrases, “coined”, ad-hoc γιαταγάνι, kalathoi, gilded chairs, the Web, let’s call it...
Appear in different grammatical forms,or combination rules pre-coordinated : “rugs, Persian”, “Persian rugs”, post-coordinated : ”Persia + rug”
Use “coined terms” if necessary. Use “post coordination” (S/W will do it)
34
Thesauri
FORTH-ICS June 17, 2013
Engineering Terms
35
Controlling Synonyms
Preferred synonyms Term
Teenager
Inheritance
Teen
Alcoholism
Afro - American
Youth (young person)
Pubescent
Black
Adolescent
Echocardiograpgy
Adolescent
Heredity
African American
Adolescent
Adolescent
African American
Ultrasonic cardiography
Alcohol dependence
Concept-term relationships (terminological structure)
Thesauri
FORTH-ICS June 17, 2013
Engineering Terms
36
Stepwise reduction of a set of terms
Thesauri
FORTH-ICS June 17, 2013
Engineering Terms
37
Morphological variants consolidated
Spelling variants consolidated
Synonyms consolidated
Quasi- Synonyms
consolidated
Descriptors for- post combination ISAR system
Disease
Illness
Sickness
Ailment
Disease
Illness
Sickness
Ailment
Disease
Illness
Sickness
Ailment
Disease, illness Disease, illness
1 2 3 4 5
Following the lines from right to left, the searcher finds in column 1 all the terms and spelling variants to use.
Stepwise reduction of a set of terms
Thesauri
FORTH-ICS June 17, 2013
Engineering TermsDisambiguating homonyms Administration 1 (management) Administration 2 ( drugs) Läufer 1 (Sportler) English: runner (athlete) Läufer 2 (Teppich) English: long, narrow rug Läufer 3 (Schach) English: bishop (chess) Discharge 1 (from hospital or program)
German: Entlassung Discharge 2 (from organization or employment)
Preferred synonym: Dismissal
German: Entlassung Discharge 3 (medical symptom)
German: Absonderung, Ausfluss Discharge 4 (into a river)
German: Ausfluss Discharge 5 (electrical)
German: Entladung (which also means unloading)
38
Thesauri
FORTH-ICS June 17, 2013
Classifying by Term: A case
E.g. searching for comparative studies
How do I spell It? Ushabti, ushabty, ushebti, shawtaby?
Will it be written the same everywhere?
Should I call it : “grave goods”(AAT), “burial figurines”,“dolls”, “afterlife
helpers”, “personality surrogate”, “burial ritual”?
And what about “xαρώνειο, δανάκη” ?
Should I call it: “toll”, “cheap coin”, “afterlife helper”,
“corpse equipment”, “burial gift”, “burial rites” ?
Would be “grave goods” distinctive enough?
39
Thesauri
FORTH-ICS June 17, 2013
Using Classification for Querying
How to find the characteristic term itself ?
How to discover related literature ?
Relevant abstractions are not standardized
How to make statistics even about the same item?
The same items can be referred in a thousand ways
How to do comparative studies by features ?
Implicit features are not declared, explicit features need systematic
documentation
40
Thesauri
FORTH-ICS June 17, 2013
Thesauri and Classification:A Case of a Term
Analyzing a term: What is an ushebti, what a shawabty ?
What did it mean, and when?
What was is made for?
How was it made?
Where was it used ?
41
Ideas, concepts, rather than words
Multiple aspects of interest !
Thesauri
FORTH-ICS June 17, 2013
Concepts
A concept is class or set of entities which are grouped together on the basis of some criterion or rule
1. Inner representation- the personal comprehension cannot be communicated
2. A set of entities characterised by explicit properties (rules) “objective”, allows reasoning about analogous objects from other
cultures/domains BUT: how to characterise properties? find discriminative attributes (what is an Elephant?) non-verbal characteristics (aquarelle etc. ) often difficult, misleading, impossible
42
Thesauri
FORTH-ICS June 17, 2013
Concepts
3. the “words of mentalese” the common language of the human mind basis for communication in foreign languages completely unknown
4. A set of entities characterised by common agreement depends on a social group (must be noted!!) covers everything people can recognise and agree on (implicit
mentalese) does not allow for reasoning about analogous objects also called “primitive concept” This is what we need most (eventually plus rules)
43
Thesauri
FORTH-ICS June 17, 2013
Thesauri and Classification:Concepts
Concepts are relative to scope : fuzzy bounds , e.g. knife, weapon, seat,
outer bounds for retrieval, inner for science, if negated inner bounds for retrieval….
to purpose : weapon, friend, stone building, school house, neoclassic building there are essential classes (related to reason for existence) construction-related, morphological, functional, contextual
Concepts are related by nature : coffin - container, coffin - funerary object, bath tube - container
polyhierarchies of genus-species OR isA OR generalization OR subclass-superclass (provides also a notion of similarity)
associative : bridge - bridge construction, house - roof
44
Thesauri
FORTH-ICS June 17, 2013
Engineering Concepts
Concepts can be natural and explicit - there is a term for it in some language natural implicit (hidden) - there is no word
English “parts & accessories” , “too” translated to Greek terms need to be invented (“coined term”)
new - like “the Web” compounds - “blue rugs”, “19th century Persian rugs”, open problem
Natural concepts are the best, but often others are needed often contextually overloaded (sword, ushebti) need typically contextual redefinition to become precise (AAT “knife”) or need to be combined with other terms
In particular generic concepts often miss a term!
45
Thesauri
FORTH-ICS June 17, 2013
Engineering Concepts
Quality problem: Is classification reliable?
Completeness, at least partial?:
Do things not classified by one term not belong to this term?
Can at least partial sets of data be identified, that are completely
classified with respect to term x ?
Can I find things that may belong to term x under term x?
Classification for retrieval must be “inclusive” and complete for a given
collection
46
Thesauri
FORTH-ICS June 17, 2013
Engineering Concepts
Particularly Objects can be seen under different aspects
E.g.: School house, all-wooden building, 18th century American style
Characteristic aspects: functional
morphological
constructive
contextual
Need to make aspect explicit (open problem).
Interesting problem: repurposing resources for other aspects.
47
Thesauri
FORTH-ICS June 17, 2013
Concept Definition
By “Scope Note” : A statement that clarifies the meaning and usage of a term within the
thesaurus Definition by properties, occurrence, similarities Definition of scope - limitations and distinctions Guidance of users to similar, overlapping, associated concepts Context of usage, purpose, view Origin and history of the term and concept Reference to literature (“literature warrant”) Examples.
Often the scope note reminds only a certain meaning we share, and restricts it. Examples most helpful as reminder!
48
Thesauri
FORTH-ICS June 17, 2013
Thesauri and Classification:Concept Definition
Assisted by example A particular instance (e.g. Mona Lisa for “painting”)
Optically by graphics, drawing, images of models
Assisted by semantic placement Generalizations / specializations Associations to other concepts = co-occurrence in certain contexts,
producer-process-product relations etc. Synonyms, similar concepts, translations
49
Thesauri
FORTH-ICS June 17, 2013
S. R. Ranganathan
Three cognitive “planes”: Idea plane - Verbal plane - Notational plane confusion hinders analysis and problem solution: Missing terms for existing ideas (concepts are many, words are few) and notational limitations inhibit idea plane work.
The invention of the “facets” Priority of the idea plane (= concept, not term) Conceptual structures are multidimensional Shelving of books is no argument, a taxonomy is not an index.
Colon Classification is a system of library classification developed by S. R. Ranganathan between 1925-1965. It uses five primary categories, or facets, to further specify the sorting of a publication. Collectively, they are called PMEST.
50
Thesauri
FORTH-ICS June 17, 2013
Thesauri and Classification: A “Facet” can be...
Grammatical element of an indexing expression:
e.g. subdivision by period, geography, genre (MARC)
Fundamental category, major facet, basic facet:
Ranganathan: Personality, Matter, Energy, Space, Time
CIDOC CRM: Period, Physical Entity, Conceptual Object, Actor, Place, Time-Span, Type, Material, Language
AAT: Objects, Agents, Activities, Styles and Periods, Materials, Physical Attributes, Associated Concepts.
=> Used to form compound terms and descriptive expressions
51
Thesauri
FORTH-ICS June 17, 2013
E5 Event
E 77 Persistent Item
E2 Temporal Entity
E22 Man-Made Object
E4 Period
E73 Information Object
E18 Physical Thing
E57Material
E55 Type
E70 Thing
E28 Conc.Object
E55 Appellation
E1 CRM Entity
ATT FacetsACTIVITIES Disciplines
Events
Functions
…..
AGENTS Organizations
People
MATERIALS Materials
OBJECTS Components
Containers
Costume
…….
PHYSICAL ATTR.Attr. & Properties
Color
….
STYLES & PERIODS Styles & Periods
ASSOC. CONCEPTS Assoc. Concepts
CIDOC CRM / AAT mapping
Thesauri
FORTH-ICS June 17, 2013
Thesauri and Classification: About Facets
Aspects of analysis, “minor facets”:
What Ranganathan meant.
e.g. MDA archeological thesaurus: armour by construction : scale armour
armour by form : cuirass armour by function : parade armour
A striking example for explicit use of aspect: SHIC Social, Historical and Industrial Classification a “pure”, homogeneous thesaurus of human activities used by British museums to classify artifacts !
Use to clarify discriminant kind of criteria of concept definition.
53
Thesauri
FORTH-ICS June 17, 2013
Thesauri and Classification: Minor Facets in the AAT
The “Object” Facets (1998 edition) contains: About 1640 facet indicators, About 600 with explicit criteria (“by form etc..”) Using 150 criteria
Frequency of explicit criteria: Form: 35%, function: 30%, placement: 15%, construction: 15%,
social context: 5%… Conclusion:
Minor facets need not be idiosyncratic Facet criteria form hierarchies under fundamental categories
54
Thesauri
FORTH-ICS June 17, 201355
Example of three overlapping facets
objects
swords
sword-like objects
foils (swords)
weapons
sword-likeFighting and hunting
cutting and thrusting
fencing
cutting and thrusting weapons
Fencing swords
Wooden swords
Wooden
Term specialization
Criteria assignment
Thesauri
FORTH-ICS June 17, 2013
Thesauri structure
Uses of facet analysis and hierarchy
Help to organize the concept space and establish relationships Discover concepts, especially general concepts spanning several disciplines
Assist the user in analyzing and clarifying a search problem: Elicit the facets involved Present hierarchical structure within each facet
Facilitate the search for general concepts such as Inflammation or Dependence (which occurs in the context of medicine, psychology and social relation)
Hierarchic query term expansion These functions are useful in both
controlled vocabulary and free-text searching.
56
Thesauri
FORTH-ICS June 17, 2013
Thesauri structure
Concept discovery through facet analysis and hierarchy buildingThrough facet analysis and hierarchy building, the lexicographer often discovers concepts that
are needed in searching or that enhance the logic of the concept hierarchy; he then needs to create terms for these concepts.
Considertrain station, bus station, harbor, airport
Common semantic component: traffic station
gin, whiskey, cherry brandy, tequila, etc. Common semantic component: distinct distilled spirits
(counterpart of the already lexicalized neutral distilled spirits)
transactional analysis, dream analysis, insight therapy, Gestalt therapy, reality therapy, cognitive therapy
Umbrella concept for structuring the hierarchy and for retrieval: analytic psychotherapy
(methods that seek to assist patients in a personality reconstruction through insight into their inner selves)
57
Thesauri
FORTH-ICS June 17, 2013
Examples, AATArt and Architecture Thesaurus (AAT) Top-level facets (1) Associated Concepts: contains abstract concepts and phenomena that relate to the study
and execution of a wide range of human thought and activity, including architecture and art in all media, as well as related disciplines. Also covered here are theoretical and critical concerns, ideologies, attitudes, and social or cultural movements (e.g., beauty, balance, connoisseurship, metaphor, freedom, socialism).
Physical Attributes: This facet concerns the perceptible or measurable characteristics of materials and artifacts as well as features of materials and artifacts that are not separable as components. Included are characteristics such as size and shape, chemical properties of materials, qualities of texture and hardness, and features such as surface ornament and color (e.g., strapwork, borders, round, waterlogged, brittleness).
Styles and Periods: This facet provides commonly accepted terms for stylistic groupings and distinct chronological periods that are relevant to art, architecture, and the decorative arts (e.g., French, Louis XIV, Xia, Black-figure, Abstract Expressionist).
Agents: The Agents facet contains terms for designations of people, groups of people, and organizations identified by occupation or activity, by physical or mental characteristics, or by social role or condition (e.g., printmakers, landscape architects, corporations, religious orders). Animals and plants are also gradually being added to the Living Organisms hierarchy of this facet.
58
Thesauri
FORTH-ICS June 17, 2013
Examples, AATArt and Architecture Thesaurus (AAT) Top-level facets (2) Activities: encompasses areas of endeavor, physical and mental actions, discrete
occurrences, systematic sequences of actions, methods employed toward a certain end, and processes occurring in materials or objects. Activities may range from branches of learning and professional fields to specific life events, from mentally executed tasks to processes performed on or with materials and objects, from single physical actions to complex games.
Materials: deals with physical substances, whether naturally or synthetically derived. These range from specific materials to types of materials designed by their function, such as colorants, and from raw materials to those that have been formed or processed into products that are used in fabricating structures or objects.
Objects: It is the largest of all the AAT facets. It encompasses those discrete tangible or visible things that are inanimate and produced by human endeavor; that is, that are either fabricated or given form by human activity. These range, in physical form, from built works to images and written documents. They range in purpose from utilitarian to the aesthetic. Also included are landscape features that provide the context for the built environment.
Brand Names: A recently added facets that allow additions from the conservation community, particularly where a material or process does not have a generic name.
59
Thesauri
FORTH-ICS June 17, 2013
Examples, AATArt and Architecture Thesaurus (AAT) Top-level facets (3)
60
Thesauri
FORTH-ICS June 17, 2013 61
Examples, AAT
Thesauri
FORTH-ICS June 17, 2013
Examples, CRISATEL
62
Materials
Evidence of technique mark and trace
Diagnostic examination
Alteration
Intervention
<materials by function>
<materials by composition>
<materials by origin>
<materials by form before
use>
painting material
framing material
conservation restoration material
supporting material
coating material
organic material
inorganic material
compound material
Plant origin material
mineral origin material
synthetic material
animal origin material
solid material
Liquid, paste or soluble material
surface preparation material
inserted material
pasting material
paint layer material binding media
colorant
pigment
dye
Thesauri
FORTH-ICS June 17, 2013
Examples, CRISATEL
63
colorant
pigment
dye
black pigment
animal origin pigment
lake pigment
red pigment
yellow pigment
inorganic pigment
blue pigment
white pigment
brown pigment
violet pigment
inert pigment
mineral origin pigment
green pigment
synthetic pigment
organic pigment
plant origin pigment
carmine
artificial ultramarine blue
natural ultramarine blue
ultramarine blue
smalt
indigo
Thesauri
FORTH-ICS June 17, 2013
Examples, CRISATEL
64
<paint layer application by visual effect>
<painting techniques by method>
<painting techniques by binding media>
<painting techniques by binding media
application method>
Materials
Evidence of technique mark and trace
Diagnostic examination
Alteration
Intervention
Painting technique
Framing technique
Coating technique
Support manufacturing technique
Trace or mark
Painting portion or component
painting technique without binding media
Wax painting
painting technique with application of mixed binding media and pigment
painting technique with application of binding media before pigment
painting technique with application of binding media after pigment
oil painting
tempera
watercolor
Synthetic medium painting
Top level Facets Hierarchies
Thesauri
FORTH-ICS June 17, 2013
Examples, Polemon
65
Υλικά
Δραστηριότητες
Δράστες
Τεχνοτροπίες και περίοδοι
Φυσικά Χαρακτηριστικά
Κινητά
Σχετιζόμενες Έννοιες
Τόπος
Top level Facets Hierarchies
Προστασία
Άτομο - Ειδικότητα
Οργανισμοί
Υλικά
Είδη Μνημείου
Τοπωνύμια
Τόπος
Ιστορικοί περίοδοι
Μορφολογία τεχνοτροπία
Thesauri
FORTH-ICS June 17, 2013 66