interval pattern structures: an introdution
DESCRIPTION
This is a simple tutorial on "interval pattern structures", a conceptual structure to be derived from numerical data within Formal Concept Analysis.TRANSCRIPT
Building Concept Lattices fromNumerical Data with Pattern Structures
A tutorial
Mehdi Kaytoue and Amedeo Napoli
November, 9th 2010
Context
Formal Concept AnalysisWorks on binary relationsClassification of objects w.r.t. the attributes they have incommon within formal concepts (extent, intent)Ordering concepts gives a mathematical structure
Ganter & Wille, Springer mathematical foundations 99
Concept lattice : useful for many tasksSimultaneous classification of objects and their attributesInformation organizationKnowledge discovery in databases (closed itemsets,associations rules)Information retrieval...
Valtchev & al., ICFCA 04 – Wille, JETAI 02
2 / 32
Problem and proposition
When facing numerical data ?
Transform data into binary, a general problemConceptual scaling (binarization)Important choices to be madeLoss of information, of links between objects
Avoiding binarization ? Considering a similarity relationbetween values ?
3 / 32
Outline
1 Formal Concept Analysis
2 Pattern structures and intervals
3 Introducing a similarity relation
4 Conclusion
Formal context
Given by (G,M, I) withG a set of objectsM a set of attributesI a binary relation between objects and attributes :(g,m) ∈ I means that “object g owns attribute m”
Represented by a binary table
m1 m2 m3
g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×
G = {g1, . . . , g5}M = {m1,m2,m3}(g1,m3) ∈ I
5 / 32
Galois connection
Two derivation operators forming a Galois connectionGives the set of common attributes owned by a set ofobjects A ⊆ G
A′ = {m ∈ M | ∀g ∈ A ⊆ G : (g,m) ∈ I}
Gives the set of objects owning all attributes in B ⊆ M
B′ = {g ∈ G | ∀m ∈ B ⊆ M : (g,m) ∈ I}
6 / 32
Formal concepts
Given by (A,B), withwith A′ = B and B′ = AA is the concept extentB is the concept intent
Illustration
{g1}′ = {m1,m3}
{m1,m3}′ = {g1,g5}
{g1,g5}′ = {m1,m3}
m1 m2 m3
g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×
({g1,g5}, {m1,m3}) is a formal concept
7 / 32
Concept lattice
Ordering relation on concepts
(A1,B1) ≤ (A2,B2)⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1)
({g1,g5}, {m1,m3}) ≤ ({g1,g2,g5}, {m1})
Concept lattices have interesting propertiesMaximalitySpecialization/generalisation hierarchySynthetic representation of the data without loss ofinformation
8 / 32
Problem
How to build a lattice from numerical data ?
How to consider “similar” objects in concepts ?
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
while avoiding discretization and associated problems(thresholds, size of binary table, information loss,calculability, interpretation)
9 / 32
Outline
1 Formal Concept Analysis
2 Pattern structures and intervals
3 Introducing a similarity relation
4 Conclusion
First elements
11 / 32
How to order object descriptions
Classical caseLattice of attributes (2M ,⊆). With N,O ∈ 2M , one has
N ⊆ O ⇐⇒ N ∩O = N
For example, with M = {a,b}
{a} ⊆ {a,b} ⇐⇒ {a} ∩ {a,b} = {a}
Pattern case∩ has the properties of a meet u in a semi lattice
A “similarity operator” that gives a description handlingthe similarity of its arguments
{a,b} ∩ {a,d} = {a}
12 / 32
Pattern structures
Given by (G, (D,u), δ)G a set of objectsD a meet semi-lattice of object descriptions called patternsδ a mapping associating to each object g ∈ G itsdescription δ(g) ∈ D
Patterns from (D,u) are ordered by
c v d ⇐⇒ c u d = c ∀c,d ∈ D
A Galois connection between (2G,⊆) and (D,v) gives rise toa (pattern) concept lattice
Existing algorithms of FCA (based on closure computation) caneasily be adapted
Ganter & Kuznetsov, ICCS01
13 / 32
Intervals are patterns
Let be [a1,b1] and [a2,b2] two intervals
Their meet is[a1,b1] u [a2,b2] = [min(a1,a2),max(b1,b2)]
[4,4] u [5,5] = [4,5]
Their order is given by
[a1,b1] v [a2,b2] ⇐⇒ [a1,b1] u [a2,b2] = [a1,b1][4,5] v [5,5] ⇐⇒ [4,5] u [5,5] = [4,5]
Semi lattice (D,u), or (D,v)
14 / 32
Interval vectors are patterns
Given the two following interval vectors
e = 〈[ai ,bi ]〉i∈[1,p] et f = 〈[ci ,di ]〉i∈[1,p]
Their meet is
e u f = 〈[ai ,bi ] u [ci ,di ]〉i∈[1,p]
〈[4,4], [3,4]〉 u 〈[2,3], [2,6]〉 = 〈[2,4], [2,6]〉
Their order is given by
e v f ⇔ [ai ,bi ] v [ci ,di ], ∀i ∈ [1,p]
〈[2,4], [2,6]〉 v 〈[4,4], [3,4]〉 car [2,4] v [4,4] et [2,6] v [3,4]
15 / 32
Galois connection
Two operators
Gives the description representing similarity of a set ofobjects
A� =l
g∈A
δ(g) pour A ⊆ G
Gives the maximal set of objects sharing a givendescription
d� = {g ∈ G|d v δ(g)} pour d ∈ (D,u)
Ganter & Kuznetsov, ICCS01
16 / 32
Numerical data are pattern structures
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g1,g2}� =l
g∈{g1,g2}
δ(g) = δ(g1) u δ(g2)
= 〈[5,5], [7,7], [6,6]〉 u 〈[6,6], [8,8], [4,4]〉= 〈[5,5] u [6,6], [7,7] u [8,8], [6,6] u [4,4]〉= 〈[5,6], [7,8], [4,6]〉
17 / 32
Numerical data are pattern structures
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
〈[5,6], [7,8], [4,6]〉� = {g ∈ G|〈[5,6], [7,8], [4,6]〉 v δ(g)}= {g1,g2,g5}
({g1,g2,g5}, 〈[5,6], [7,8], [4,6]〉) is a concept
18 / 32
Semantics in R|M|
Patterns are |M|-hyperrectanglesOrdering of patterns corresponds to rectangle inclusion
19 / 32
General problem
Lowest concepts : few objects, small intervalsHighest concepts : many objects, large intervalsOverwhelming : a single concept for each interval of value
20 / 32
1 Formal Concept Analysis
2 Pattern structures and intervals
3 Introducing a similarity relation
4 Conclusion
Introducing a similarity relation between objects
How to group within the same concept objects havingsimilar values ?
A simple similarity relation
a 'θ b ⇔ |a− b| ≤ θ
Examples
2 '2 4, 2 6'3 7
22 / 32
The meet : a similarity operator
Given two objects g and htheir descriptions are respectively δ(g) et δ(h)the similarity between g et h is represented byδ({g,h}) = δ(g) u δ(h)
For any arbitrary set of objectsevery objects are similar (since we can compute u)their level of similarity depends of the level of the meet oftheir description in the semi-lattice
How to consider a similarity relation w.r.t. a distance ?23 / 32
Towards a similarity between objects
Introduce an element ∗ ∈ (D,u) denoting dissimilarity
c u d 6= ∗ ⇐⇒ c and d are similar
c u d = ∗ ⇐⇒ c are d are not similar
For intervals, the meet is constrained by a threshold θ
[a,b]uθ[c,d ] = [min(a, c),max(b,d)] if max(b,d)−min(a, c) ≤ θ[a,b] uθ [c,d ] = ∗ otherwise
with θ = 0.2
Actually, we just “cut” the semi-lattice24 / 32
Going further ?
'θ is not a transitive relation, i.e. a tolerance relation
Projecting each pattern d ∈ D, i.e. ψ(d) v dFor each dimension, replacing each value with a largerintervalFrom a value d and its attribute domain
“Dilatation” : ball of patterns of radius θ (similarity)“Erosion” : delete pairs of values violating the similarity(maximality)Computing the meet of remaining values
Projection can be computing as preprocessing
Each projected pattern determines an equivalence class ofsimilar values : it reduces the number of concepts.
25 / 32
Projecting for changing lattice granularity
FIGURE : Classical case (no projection)
26 / 32
Projecting for changing lattice granularity
FIGURE : With θ = 0
27 / 32
Projecting for changing lattice granularity
FIGURE : With θ = 1
28 / 32
Projecting for changing lattice granularity
FIGURE : With θ = 2
29 / 32
1 Formal Concept Analysis
2 Pattern structures and intervals
3 Introducing a similarity relation
4 Conclusion
Conclusion
In a few wordsPattern structures for numerical dataIntroducing a similarity relation
Other worksLinks between binarization and projection of patternsAlgorithms for interval pattern structuresInterval dataMining closed interval patterns and their generatorsEnhancing information fusion with pattern structuresMining bi-sets in numerical data
ApplicationsGene expression data analysisFarmer practices evaluationRecommendation systems (movielens)
31 / 32
Some references
M. Kaytoue, S. Duplessis, S. O. Kuznetsov, and A. Napoli. Mining Gene Expression Data with PatternStructures in Formal Concept Analysis. In Information Sciences. Spec.Iss. : Lattices, 2010.
M. Kaytoue, Z. Assaghir, A. Napoli, and S. O. Kuznetsov. Embedding tolerance relations in Formal ConceptAnalysis for classifying numerical data In 19th Conference on Information and Knowledge Management(CIKM), 2010.
Z. Assaghir, M. Kaytoue, A. Napoli, and H. Prade. Managing Information Fusion with Formal ConceptAnalysis. In Modeling Decisions for Artificial Intelligence, 6th International Conference (MDAI), 2010.
B. Ganter et S. O. Kuznetsov. Pattern Structures and Their Projections. In International Conference onConceptual Structures, LNCS (2120), Springer, 2001
M. Kaytoue, S. Duplessis, S. O. Kuznetsov, et A. Napoli. Two FCA-Based Methods for Mining GeneExpression Data. In Formal Concept Analysis, LNCS (5548), Springer, pages 251–266, 2009.
M. Kaytoue, Z. Assaghir, N. Messai, et A. Napoli. Two Complementary Classification Methods for Designinga Concept Lattice from Interval Data. In Foundations of Information and Knowledge Systems, LNCS (5956),Springer, pages 345–362, 2010.
M. Kaytoue, S. Duplessis, and A. Napoli. Toward the Discovery of Itemsets with Significant Variations inGene Expression Matrices. In Studies in Classification, Data Analysis, and Knowledge Organization,Springer, 2010.
32 / 32