̶ formal concept analysis ̶ · mathematics and operations research neubiberg, germany . summer...
TRANSCRIPT
Formal Concept Analysis
Erik Kropat
University of the Bundeswehr Munich Institute for Theoretical Computer Science,
Mathematics and Operations Research
Neubiberg, Germany
Summer School
“Achievements and Applications of Contemporary Informatics,
Mathematics and Physics” (AACIMP 2011)
August 8-20, 2011, Kiev, Ukraine
Formal Concept Analysis studies, how objects can be hierarchically grouped together
according to their common attributes.
Tree of Life
Source: Tree of Life Web Project http://tolweb.org/tree/
Formal Concept Analysis
www.arthursclipart.org
Formal Concept Analysis
What is a “concept” ?
A concept is a cognitive unit of meaning or a unit of knowledge.
Concept
objects
properties
Bird
− feathered
− winged
− bipedal
− warm-blooded
− egg-laying
− vertebrate
blackbird, sparrow, raven,…
Formal Concept Analysis
• . . . is a powerful tool for data analysis, information retrieval,
and knowledge discovery in large databases. • . . . is a conceptual clustering method,
which clusters simultaneously objects and their properties. • . . . can mathematically represent, identify and analyze
conceptual structures. red
yellow green
2-dim
3-dim disk
cylinder
triangle cube
red
yellow
green
2-dim 3-dim
Example
cube
disk
cylinder
triangle cube
cylinder
2-dim 3-dim
disk
triangle
yellow triangle
green disk
red
cube
cylinder
3-dim 2-dim
yellow green red
Formal Concept Analysis
• . . . models concepts as units of thought, consisting of two parts:
− extension = objects belonging to the concept
− intension = attributes common to all those objects.
• . . . is an exploratory data analysis technique for discovering new knowledge.
• . . . can be used for efficiently computing association rules
applied in decision support systems.
• . . . can extract and visualize hierarchies !!!
Formal Concept Analysis
Goal: Derive automatically an ontology from a – very large – collection of objects and their properties or features.
Set of objects
customers
Set of attributes age, sex, income level, spending habits, …
⇒ clusters of objects
clusters of attributes ⇒
⇔ correspond
one-for-one
Target Marketing
predict customer purchase decisions / recommend products to customers ⇒
Sensitive advertisement
clusters of objects
clusters of attributes
correspond one-for-one
Formal Contexts
Example: Classification of plants and animals
Objects
Dog Cat
Reed Water lily Oak
Carp Potato
Attributes
Animal
Plant
lives on land
lives in water
Formal Concept Analysis
Example: Classification of plants and animals
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
Objects
Attributes
x x
x
x
x x
x x
x
x
x
x
x
x
x
Question:
Has object g the attribute m ( Yes / No ) ?
Binary Relation A formal context can be represented by a cross table (bit-matrix).
Formal Context
A formal context (G, M, I) consists of
a set G of objects,
a set M of attributes and
a binary relation I ⊂ G x M.
Has object g the attribute m ( yes / no ) ?
A formal context describes the relation between
objects and attributes.
Notation
• g I m means: “object g has attribute m”. Example: (a) dog I animal
(b) carp I lives in water
Derivation Operators
The Derivation Operators (Type I)
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
A ⊂ G selection of objects.
Question: Which attributes from M are common to all these objects?
x x
x
x
x
x
x x
x
x x
x
x
x
x
Set of common attributes of the objects in A
A’ := A↑:= { m ∈ M | g I m for all g ∈ A }
A ⊂ G A′ ⊂ M {Dog, Cat} {Oak, Potato}
The Derivation Operators (Type I)
A ⊂ G selection of objects.
Question: Which attributes from M are common to all these objects?
Set of common attributes of the objects in A
A’ := A↑:= { m ∈ M | g I m for all g ∈ A }
A ⊂ G A′ ⊂ M {Dog, Cat} {Animal, lives on land} {Oak, Potato}
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
The Derivation Operators (Type I)
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
A ⊂ G selection of objects.
Question: Which attributes from M are common to all these objects?
x x
x
x
x
x
x x
x
x x
x
x
x
x
Set of common attributes of the objects in A
A’ := A↑:= { m ∈ M | g I m for all g ∈ A }
A ⊂ G A′ ⊂ M {Dog, Cat} {Animal, lives on land} {Oak, Potato}
The Derivation Operators (Type I)
A ⊂ G selection of objects.
Question: Which attributes from M are common to all these objects?
Set of common attributes of the objects in A
A’ := A↑:= { m ∈ M | g I m for all g ∈ A }
A ⊂ G A′ ⊂ M {Dog, Cat} {Animal, lives on land} {Oak, Potato} {Plant, lives on land}
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
B ⊂ M a set of attributes.
Question: Which objects have all the attributes from B?
Set of objects that have all the attributes from B
B’ := B↓:= { g ∈ G | g I m for all m ∈ B }
B ⊂ M B′ ⊂ G {Plant, lives on land} {Animal, lives in water}
The Derivation Operators (Type II)
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
B ⊂ M a set of attributes.
Question: Which objects have all the attributes from B?
Set of objects that have all the attributes from B
B’ := B↓:= { g ∈ G | g I m for all m ∈ B }
The Derivation Operators (Type II)
B ⊂ M B′ ⊂ G {Plant, lives on land} {Oak, Potato, Reed} {Animal, lives in water}
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
B ⊂ M a set of attributes.
Question: Which objects have all the attributes from B?
Set of objects that have all the attributes from B
B’ := B↓:= { g ∈ G | g I m for all m ∈ B }
B ⊂ M B′ ⊂ G {Plant, lives on land} {Oak, Potato, Reed} {Animal, lives in water}
The Derivation Operators (Type II)
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
B ⊂ M a set of attributes.
Question: Which objects have all the attributes from B?
Set of objects that have all the attributes from B
B’ := B↓:= { g ∈ G | g I m for all m ∈ B }
The Derivation Operators (Type II)
B ⊂ M B′ ⊂ G {Plant, lives on land} {Oak, Potato, Reed} {Animal, lives in water} {Carp}
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
Derivation Operators - Facts
Let (G, M, I) be a formal context.
A, A1, A2 ⊂ G sets of objects. B, B1, B2 ⊂ G sets of attributes.
1) A1 ⊂ A2 ⇒ A′2 ⊂ A′1 1′) B1 ⊂ B2 ⇒ B′2 ⊂ B′1 2) A ⊂ A′′ 2′) B ⊂ B′′ 3) A′ = A′′′ 3′) B′ = B′′′ 4) A ⊂ B′ ⇔ B ⊂ A′ ⇔ A x B ⊂ I
The derivation operators constitute a Galois connection between the power sets P(G) and P (M).
1) If a selection of objects is enlarged,
then
the attributes which are common to all objects of the larger selection
are among
the common attributes of the smaller selection.
Formal Concepts
Formal Concepts
Formal Context: Defines a relation between objects and attributes.
Real World: Objects are characterized by particular attributes.
Object
Attributes
Formal Concepts
Let (G, M, I) be a formal context, where A ⊂ G and B ⊂ M.
(A, B) is a formal concept of (G, M, I), iff The set A is called the extent and the set B is called the intent
of the formal concept (A, B).
A′ = B and B′ = A.
Formal Concepts
• Extent A and intent B of a formal concept (A,B) correspond to each other by the binary relation I of the underlying formal context. • The description of a formal concept is redundant, because each of the two parts determines the other
Extent (objects)
Intent (attributes)
Duality
How can we find “formal concepts”?
( {Dog, Cat}, {Animal, lives on land} )
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
A formal concept (A, B) corresponds to a
filled rectangular subtable
with row set A and column set B.
How can we find “formal concepts”?
( {Dog, Cat}, {Animal, lives on land} )
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
A formal concept (A, B) corresponds to a
filled rectangular subtable
with row set A and column set B.
Each of the two parts determines the other!
Exercise
Determine the sets of objects A and the set of attributes B
such that the pair (A, B) represents a formal concept.
(a) A = {oak, potato, reed}, B = ?
(b) A = ?, B = {animal, lives in water}
How can we find “formal concepts”?
( {Dog, Cat}, {Animal, lives on land} )
A formal concept (A, B) corresponds to a
filled rectangular subtable
with row set A and column set B.
( {Oak, Potato, Reed}, {Plant, lives on land} )
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
How can we find “formal concepts”?
( {Dog, Cat}, {Animal, lives on land} )
A formal concept (A, B) corresponds to a
filled rectangular subtable
with row set A and column set B.
( {Oak, Potato, Reed}, {Plant, lives on land} )
( {Carp}, {Animal, lives in water} )
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
How can we find “formal concepts”?
A formal concept (A, B) corresponds to a
filled rectangular subtable
with row set A and column set B.
( {Oak, Potato}, {Plant, lives on land} )
Question: Is the following pair a formal concept?
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
How can we find “formal concepts”?
A formal concept (A, B) corresponds to a
filled rectangular subtable
with row set A and column set B.
( {Oak, Potato}, {Plant, lives on land} )
Question: Is the following pair a formal concept?
Anim
al
Plan
t
Live
s on
land
Live
s in
wat
er
Dog Cat Oak Potato Carp Water lily Reed
x x
x
x
x
x
x x
x
x x
x
x
x
x
There exist filled rectangular subtables that do not determine formal concepts
Lemma
Each formal concept (A, B) of a formal context (G,M,I)
has the form (A′′, A′) for some subset A ⊂ G and the form (B′, B′′) for some subset B ⊂ M.
Conversely, all such pairs are formal concepts.
Compute all formal concepts
Computing all Formal Concepts
Observations
• (A′′, A′) ist a formal concept.
• A ⊂ G extent ⇔ A = A′′.
B ⊂ M intent ⇔ B = B′′.
• The intersection of arbitrary many extents is an extent.
The intersection of arbitrary many intents is an intent.
Algorithm for Computing all Formal Concepts
1. Initialize a list of concept extents.
Write for each attribute m ∈ M the extent {m}’ to the list.
2. For any two sets in the list, compute their intersection.
If the result is set that is not yet in the list, then extend the list by this set.
With the extended list, continue to build all pairwise intersections.
Extend the list by the set G.
⇒ The list contains all concept extents.
A) Determine all Concept Extents
B) Determine all Concept Intents 3. Compute intents
For every concept extent A in the list compute the corresponding intent A′ to obtain a list of all formal concepts (A, A′).
Exercise
Compute the formal concepts of the following formal context.
Exercise
1. Initialize a list of concept extents.
Write for each attribute m ∈ M the extent {m}’ to the list.
Item Extent {m}' Attribute m∈M e1 {Animal} e2 {Plant} e3 {Lives on land} e4 {Lives in water}
{Dog, Cat, Carp} {Oak, Potato, Water lily, Reed} {Dog, Cat, Oak, Potato, Reed} {Carp, Water lily, Reed}
Exercise
2. For any two sets in the list, compute their intersection.
- If the result is a set that is not yet in the list, then extend the list by this set.
- With the extended list, continue to build all pairwise intersections.
- Extend the list by the set G.
Item Extent Defined by e1 {Dog, Cat, Carp} {Animal} e2 {Oak, Potato, Water lily, Reed} {Plant} e3 {Dog, Cat, Oak, Potato, Reed} {Lives on land} e4 {Carp, Water lily, Reed} {Lives in water} e5 e1 ∩ e2
e6 e1 ∩ e3
e7 e1 ∩ e4
e8 e2 ∩ e3
e9 e2 ∩ e4
e10 e3 ∩ e4
e11 G
∅ {Dog, Cat} {Carp} {Oak, Potato, Reed} {Water lily, Reed} {Reed} {Dog, Cat, Oak, Potato, Carp, Water lily, Reed}
Exercise
2. For any two sets in the list, compute their intersection.
- If the result is a set that is not yet in the list, then extend the list by this set.
- With the extended list, continue to build all pairwise intersections.
- Extend the list by the set G.
Item Extent Defined by e1 {Dog, Cat, Carp} {Animal} e2 {Oak, Potato, Water lily, Reed} {Plant} e3 {Dog, Cat, Oak, Potato, Reed} {Lives on land} e4 {Carp, Water lily, Reed} {Lives in water} e5 e1 ∩ e2
e6 e1 ∩ e3
e7 e1 ∩ e4
e8 e2 ∩ e3
e9 e2 ∩ e4
e10 e3 ∩ e4
e11 G
∅ {Dog, Cat} {Carp} {Oak, Potato, Reed} {Water lily, Reed} {Reed} {Dog, Cat, Oak, Potato, Carp, Water lily, Reed}
Exercise
Item Extent A Intent A′ e1 {Dog, Cat, Carp} e2 {Oak, Potato, Water lily, Reed} e3 {Dog, Cat, Oak, Potato, Reed} e4 {Carp, Water lily, Reed} e5
e6 e7 e8 e9 e10 e11
∅ {Dog, Cat} {Carp} {Oak, Potato, Reed} {Water lily, Reed} {Reed} {Dog, Cat, Oak, Potato, Carp, Water lily, Reed}
3. Determine intents
For every concept extent A in the list compute the corresponding intent A′ to obtain a list of all formal concepts (A, A′).
{Animal} {Plant} {Lives on land} {Lives in water} M {Animal, lives on land} {Animal, lives in water} {Plant, lives on land} {Plant, lives in water} {Plant, lives on land, lives in water} ∅
Conceptual Hierarchies and
Concept Lattices
Is there a relation between the formal concepts?
Animal Dog, Cat, Carp
Dog, Cat Animal, lives on land Animal, lives in water
Carp sub-concept
super-concept
≤
Idea: Order concepts in a sub-concept super-concept hierarchy
Is there a relation between the formal concepts?
Animal Dog, Cat, Carp
Dog, Cat Animal, lives on land Animal, lives in water
Carp sub-concept
super-concept
≤
The extent of the sub-concept is a subset of the extent of the super-concept
The intent of the super-concept is a subset of the intent of the sub-concept
Let (A1, B1) and (A2, B2) be formal concepts of (G,M,I).
(A1, B1) sub-concept of (A2, B2) :⇔ A1 ⊂ A2 [⇔ B2 ⊂ B1 ].
Conceptual Hierarchy
• (A2, B2) is a super-concept of (A1, B1). • Notation: (A1, B1) ≤ (A2, B2)
Animal Dog, Cat, Carp
Dog, Cat Animal, lives on land
Conceptual Hierarchy
• The set of all formal concepts of (G, M, I)
is called the concept lattice of the formal context (G, M, I)
and is denoted by B (G,M,I) .
Theorem
The concept lattice of a formal context is a partially ordered set.
Conceptual Hierarchy
⇒ We can draw figures that indicate intricate relationships!!
We need a notion of neighborhood
Conceptual Hierarchy
Let P be a set and ≤ is a binary relation on P.
A partially ordered set is a pair (P, ≤), iff for all x, y, z ∈ P.
1) x ≤ x (reflexive)
2) x ≤ y and x ≠ y ⇒ ¬ y ≤ x (antisymmetric)
3) x ≤ y and y ≤ z ⇒ x ≤ z (transitive)
Let (A1, B1) and (A2, B2) be formal concepts of the context (G,M,I). (A1, B1) proper sub-concept of (A2, B2) [ (A1, B1) < (A2, B2)] :⇔ (A1, B1) ≤ (A2, B2) and (A1, B1) ≠ (A2, B2) .
Conceptual Hierarchy
(A1 , B1)
(A2 , B2)
Conceptual Hierarchy
(a)
(A1 , B1)
(A2 , B2)
(A1 , B1)
(A , B )
(A2 , B2)
Examples: In the following examples (A1, B1) is a proper sub-concept of (A2, B2)
(b)
Question: What is the difference between (a) and (b)?
Answer: In (a) the concept (A1, B1) is the lower neighbor of (A2, B2).
In (b) the concept (A1, B1) is not the lower neighbor of (A2, B2).
Proper sub-concepts can be used to define a notion of neighborhood.
Let (A1, B1) and (A2, B2) be formal concepts of the context (G,M,I)
and (A1, B1) is a proper sub-concept of (A2, B2). (A1, B1) is a lower neighbor of (A2, B2) [(A1, B1) (A2, B2)],
if no formal concept (A, B) exists with (A1, B1) < (A, B) < (A2, B2).
Conceptual Hierarchy
(A1 , B1)
(A , B )
(A2 , B2)
Drawing Concept Lattices
• Draw formal concepts
Draw a small circle for every formal concept.
A circle for a concept is always positioned higher than the circles of its proper sub-concepts.
• Draw lines
Connect each formal concept (circle) with the circles of its lower neighbors.
• Label with attribute names
Attach the attribute m to the circle representing the concept ( {m}′, {m}′′ ). • Label with object names
Attach each object g to the circle representing the ({g}′′ , {g}′).
Exercise
Compute the concept lattice of the following formal concept.
Drawing Concept Lattices
e2 e4 e1
e11
e9 e7 e6 e8
e10
e5
e3
water plant
plant animal terrestrial
water animal
land animal
terrestrial plants
aquatic
G
∅
plants, on land & in water
reed
water lily carp dog, cat oak, potato
Exercise
Compute the formal concepts of the following formal context:
Ga
s gia
nt
Terr
estr
ial
Moo
n
Habi
tal z
one
Earth Jupiter Mercury Mars
Objects
Attributes
x
x
x
x
x
x x
x
Exercise
1. Initialize a list of concept extents.
Write for each attribute m ∈ M the extent {m}’ to the list.
Item Extent {m}' Attribute m∈M e1 {gas giant} e2 {terrestrial} e3 {moon} e4 {habital zone}
{jupiter} {earth, mercury, mars} {earth, jupiter, mars} {earth}
Exercise
2. For any two sets in the list, compute their intersection.
If the result is a set that is not yet in the list, then extend the list by this set.
With the extended list, continue to build all pairwise intersections.
Extend the list by the set G.
Item Extent Defined by e1 {gas giant} e2 {terrestrial} e3 {moon} e4 {habital zone} e5 e1 ∩ e2
e6 e2 ∩ e3
e7 G
{jupiter} {earth, mercury, mars} {earth, jupiter, mars} {earth} ∅ {earth, mars} {earth, jupiter, mercury, mars}
Exercise 3. Determine intents
For every concept extent A in the list compute the corresponding intent A′ to obtain a list of all formal concepts (A, A′).
Item Extent Intent e1 {gas giant, moon} e2 {terrestrial} e3 {moon} e4 {terrestrial, moon, habital zone} e5 M
e6 {terrestrial, moon} e7
{jupiter} {earth, mercury, mars} {earth, jupiter, mars} {earth} ∅ {earth, mars} {earth, jupiter, mercury, mars} ∅
Exercise
Concept Lattice
e2 terrestrial
e6 terrestrial, moon
earth
e4
e3
e1
e5
e7
moon
jupiter
earth, mars
∅
G
earth, mercury, mars
earth, jupiter, mars
gas giant, moon
terrestrial, moon, habitual
Applications
Applications • Web information retrieval
→ How can web search results retrieved by search engines be conceptualized and represented in a human-oriented form. • Partner selection for interfirm collaborations
→ Identification of structural similarities between potential partners according to the characteristics of the prospective partner firms. • Information systems for IT security management
→ Identification of security-sensitive operations performed by a server. • Data warehousing and database analysis
→ Controlling the trade of stocks and shares.
Bioinformatics
Verducci J S et al. Physiol. Genomics 2006;25:355-363
©2006 by American Physiological Society
Verducci J S et al. Physiol. Genomics 2006;25:355-363
©2006 by American Physiological Society
Biclustering / co-clustering
Simultaneous clustering of the rows and columns of a matrix.
Bioinformatics
Summary
• Formal concept analysis provides methods for an automatic derivation of ontologies from very large collections of objects and their attributes.
• Reveal unknown, hidden and meaningful connections between groups of objects and groups of attributes.
• The methods are supported by algebra, lattice theory and order theory. • Visualization techniques are available.
• Strong connections to co-clustering (bi-clustering) methods (important tools in DNA-microarray analysis).
Literature
• Bernhard Ganter, Gerd Stumme, Rudolf Wille (ed.)
Formal Concept Analysis. Foundations and Applications.
Springer, 2005. • Claudio Carpineto, Giovanni Romano
Concept Data Analysis: Theory and Applications.
Wiley, 2004.
www.fcahome.org.uk/fcasoftware.html
Software
Thank you very much!