DESCRIPTIVE GRANULARITYBuilding Foundations of Data
Mining
In Memory of my Professors: Zdzislaw Pawlak,
Helena Rasiowa and Roman Sikorski
Anita Wasilewska
Computer Science Department
Stony Brook University
Stony Brook, NY
1
Part 1: INTRODUCTION
2
We all have scientific history;
All problems we work on have history;
It is important to trace history
of problems we work on;
We all build scientific history;
The future belongs to us,
and so does the past.
3
We all have scientific history;
Here is my LATEST history (of building Foun-dations of Data Mining)
1995- 1998 I supervised PhD Thesis of
Ernestina Menasalvas, now Professor and aVice-Rector of Madrid Polytechnic.
We (with some others) went from buildingmodels for concrete implementations (1996-2002) to
developing a general language for Founda-tions of Data Mining (2002 -2004) to
building a general foundational model for DataMining (2005- ).
4
It has been a slow process but finally a com-
munity and specialized conferences devel-
oped, books started to appear:
Foundations and Novel Approaches in Data
Mining, T.Y. Lin, S. Ohsuga, C. J. Liau,
and X. Hu , editors, Springer 2006,
Data Mining: Foundations and Practice, Tsau
Young Lin, Ying Xie, Anita Wasilewska,
Churn-Jung Liau, editors, Studies in Com-
putational Intelligence (SCI)118, Springer-
Verlag 2008,
and a field Foundations of Data Mining was
created.
We all build the scientific history and it takes
TIME and patience to do so.
5
Our work in Data Mining Foundations ma-
tured and finally we were invited by T.Y.
LIN to write a 20 pages long entry about
our research in the Encyclopedia of Com-
plexity and System Science published by
Springer in 2008.
The Encyclopedia is Springer’s latest and
prestigious initiative with its Board of Ed-
itors including between others Ahmed Ze-
wail, Nobel in Chemistry, Thomas Schelling,
Nobel in Economics, Richard E. Stearns,
1993 Turing Award, Pierre-Louis Lions, 1994
Fields Medal, and Lotfi Zadeh, IEEE Medal
of Honor.
All entries were by invitation only and the in-
clusion of our work shows the recognition
of the need for foundational studies in
newly developing domains.
6
All problems we work on have history
Short History of Foundational Studies
The origins of Foundational Studies can be
traced back to David Hilbert, a German
mathematician, recognized as one of the
most influential and universal mathemati-
cians of the 19th and early 20th centuries.
7
Hilbert Problems: In 1900 he proposed at the
Paris conference of the International Congress
of Mathematicians 23 problems for the fu-
ture century.
Several of them turned out to be very influ-
ential for 20th century mathematics and
later Computer Science.
Of the cleanly-formulated Hilbert problems,
TEN problems: 3, 7, 10, 11, 13, 14, 17, 19,
20, and 21 have solutions that are ac-
cepted by consensus.
8
TWO Problems: 1, 2 are FOUNDATIONAL
Problems; 1 concerning Continuum Hypoth-
esis was solved by Cohen in 1963, and 2
concerning Consistency of Arithmetic was
solved by and Godel and Gentzen in 1936
FIVE Problems: 5, 9, 15, 18, and 22 have
partial solutions,
FOUR problems: 4, 6, 16, and 23 are too
loosely formulated to be ever described
as possible to be solved.
TWO Problems: 8 (the Riemann Hypothe-
sis, along with the Goldbach conjecture is
a part of it) and 12 are still OPEN, both
being in number theory.
9
Riemann hypothesis was proposed by Bern-
hard Riemann (1859)
It is a conjecture about the distribution of the
zeros of the Riemann zeta function which
states that all non-trivial zeros have real
part 1/2.
The Riemann hypothesis implies results about
the distribution of prime numbers that are
in some ways as good as possible.
Along with suitable generalizations, it is con-
sidered by some mathematicians to be the
most important unresolved problem in pure
mathematics.
10
Pierre Deligne proved in 1973 analogue of the
Riemann Hypothesis for zeta functions of
varieties defined over finite fields.
The full version of the hypothesis remains un-
solved, although
computer calculations have shown that the
first 10 trillion zeros lie on the critical line.
11
Goldbach’s conjecture (1742) is one of the
oldest unsolved problems in number theory
and in all of mathematics. It states:
Every even integer greater than 2 can
be expressed as the sum of two primes
For example;
4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5,
10 = 7+3, or 5+5, 12 = 5+7,14 = ....
T. Oliveira e Silva is running a distributed com-
puter search that has verified the conjec-
ture for n ≤ 1.609× 1018 and some higher
small ranges up to 4× 1018.
12
Hilbert Program
Hilbert proposed, in 1920 a research project
that became known as Hilbert’s Program.
1. He wanted mathematics to be formulated
on a solid and complete logical founda-
tion.
2. He believed that in principle this could be
done, by showing that all of mathematics
follows from a correctly-chosen finite sys-
tem of axioms and that some such axiom
system is provably consistent.
3. He also believed that one can have such
a system in which proofs of theorems can
be deduced automatically from the way
the theorems are built.
13
In 1931 Kurt Godel showed that Hilbert’s grandplan 1. and 2. was impossible as stated.
Godel proved in what is now called Godel’sIncompleteness Theorem that any noncontradictory formal system, which was com-prehensive enough to include at least arith-metic, cannot demonstrate its complete-ness by way of its own axioms.
In 1933-34 Gerhard Gentzen gave a positiveanswer to 3. in a case of classical proposi-tional logic, and partially positive answer incase of (semi-undecidable) predicate logic.
Nevertheless Hilbert’s and Godel’s work ledto the development of recursion theory
and then mathematical logic and foun-
dations of mathematics as autonomousdisciplines.
14
Gentzen’s work led to the development of Proof
Theory and Automated Theorem Prov-
ing as separate Mathematics and Computer
Science domains.
Godel inspired works of Alonzo Church and
Alan Turing that became the basis for
theoretical computer science and also
led to the further development of a unique
phenomenon called the Polish School of
Mathematics and later to the creation of
Foundational Studies in Computer Science.
15
Personal History: my Master Thesis in Com-
puter Science (under Pawlak and Rasiowa)
consisted of a solution of Gentzen’s con-
juncture for Modal S4 and S5 Logics and
consequently I also developed first world
theorem prover for S4 Modal Logic in
1967.
As a result I have spent first 15 years of my
scientific life (before coming to USA) work-
ing in Proof Theory for non-classical log-
ics, formulated (as a pure mathematician)
a General Theory of Gentzen Type For-
malizations and established various re-
sults about connections and relationships
between certain Classes of Logics, For-
mal Languages and Theory of Programs
(as computer scientist).
16
Polish School of Mathematics
The term Polish School of Mathematics refers
to groups of mathematicians of the 1920’s
and 1930’s working on common subjects.
The main two groups were situated in War-
saw and Lvov (now Lviv, the biggest city
in Western Ukraine).
We talk hence more specifically about War-
saw and Lvov Schools of Mathematics
and additionally of Warsaw-Lvov School
of Logic working in Warsaw.
17
Any list of important twentieth century math-
ematicians contains Polish names in a fre-
quency out of proportion to the size of the
country.
Poland was partitioned by Russia, Germany,
and Austria and was under foreign domi-
nation for 200 years, from 1795 until the
end of World War I.
What was to become known as the Polish
School of Mathematics was possible be-
cause it was carefully planned, agreed
upon, and executed.
18
Independent Poland was crated in 1918 and
University of Warsaw re-opened with
Janiszewski, Mazurkiewicz, and Sierpin-
ski as professors of mathematics.
They chose logic, set theory, point-set topol-
ogy, and real functions as the area of
concentration.
The journal Fundamenta Mathematicae was
founded in 1920 and is still in print.
It was the first specialized mathematical
journal in the world.
19
The choice of title was deliberate to reflect
that all areas published there were to be
connected with foundational studies.
It should be remembered that at the time
these areas had not yet received full
acceptance by the mathematical commu-
nity.
The choice reflected both insight and courage
20
The notable mathematicians of the Warsaw
and Lvov Schools of Mathematics were,
between others Stefan Banach, Stanis-
lam Ulam and after the war, Roman
Sikorski.
Stefan Banach was self-taught mathematics
prodigy and the founder of modern func-
tional analysis.
Mathematical concepts named after Banach
include the Banach-Tarski paradox, Hahn-
Banach theorem, BanachSteinhaus theo-
rem, Banach-Mazur game and Banach spaces.
21
Stanislaw Ulam emigrated to America just be-
fore the war and became American math-
ematician of Polish-Jewish origins.
He participated in the Manhattan Project
and originated the Teller-Ulam design of
thermonuclear weapons.
He also invented nuclear pulse propulsion and
developed a number of mathematical tools
in number theory, set theory, ergodic the-
ory and algebraic topology.
22
Roman Sikorski reputation was established by
his outstanding results in Boolean algebras,
functional analysis, theories of distribution,
measure theory, general topology, descrip-
tive set theory, and in Algebraic Math-
ematical Logic (with collaboration with
Rasiowa).
In axiomatic set theory, the Rasiowa-Sikorski
Lemma is one of the most fundamental
facts used in the technique of forcing.
23
The notable logicians of the Lvov-Warsaw
School of Logic were:
Alfred Tarski - since 1942 in Berkeley and
founder of American School of Founda-
tions of Mathematics,
Jan Lukasiewicz, Andrzej Mostowski, and
after the second world war Helena Ra-
siowa.
24
Helena Rasiowa became, in 1977 the founder
of Fundamenta Informaticae the first world
journal specialized in foundation of com-
puter science.
The choice of the title Fundamenta Infor-
maticae was again deliberate.
It reflected not only the subject, but also
stresses that the new research area being
developed in Warsaw is a direct continu-
ation of the tradition of the Foundational
Studies of Polish School of Mathemat-
ics.
25
Part 2:
DESCRIPTIVE GRANULARITY
A Model for Data Mining
26
We present here a formal syntax and seman-
tics for a notion of a descriptive granu-
larity.
We do so in terms of three abstract models:
Descriptive, Semantic, and Granular.
Descriptive model formalizes the syntactical
concepts and properties of the data min-
ing, or learning process.
Semantic model formalizes its semantical prop-
erties.
Granular model establishes a relationship be-
tween the Descriptive and Semantic mod-
els in terms of a formal satisfaction rela-
tion.
27
Data Mining - Informal Definition
One of the main goals of Data Mining is to
provide comprehensible descriptions of
information extracted from the data bases.
We are hence interested in building models
for a descriptive data mining, i.e. the
data mining which main goal is to produce
a set of descriptions in a language easily
comprehensible to the user.
28
The descriptions come in different forms.
In case of classification problems it might be
a set of characteristic or discriminant rules,
it might be a decision tree or a neural net-
work with fixed set of weights.
In case of association analysis it is a set of
associations (frequent item sets), or asso-
ciation rules with accuracy parameters.
In case of cluster analysis it is a set of clus-
ters, each of which has its own description
and a cluster name.
29
In case of approximate classification by the
Rough Set analysis it is usually a set of dis-
criminant or characteristic rules (with or
without accuracy parameters) or a set of
decision tables.
Data Mining results are usually presented to
the user in their descriptive, i.e. syntac-
tic form as it is the most natural form of
communication.
But the Data Mining process is deeply
semantical in its nature.
We hence build our Granular Model on two
levels: syntactic and semantic.
30
SYNTAX
We understand] by syntax, or syntacticalconcepts simple relations among symbolsand expressions of formal symbolic lan-guages.
A symbolic language is a pair
L = (A, E),where A is an alphabet and E is the set ofexpressions of L.
The expressions of formal languages, even ifcreated with a specific meaning in mind,do not carry themselves any meaning, theyare just finite sequences of certain symbols.
The meaning is being assigned to themby establishing a proper semantics.
31
SEMANTICS
Semantics for as given symbolic language Lassigns a specific interpretation in some
domain to all symbols and expressions
of the language.
It also involves related ideas such as truth
and model. They are called semantical
concepts to distinguish them from the syn-
tactical ones.
32
MODEL
The word model is used in many situations
and has many meanings but they all reflect
some parts, if not all, of its following formal
meaning.
A structure M, called also an interpretation,
is a model for a set E0 ⊆ E of expressions
of a formal language L if and only if every
expression E ∈ E0 is true in M .
33
All our Models are abstract structures that
allow us to formalize some general prop-
erties of Data Mining process and address
the semantics-syntax duality inherent to
any Data Mining process.
Moreover, it allows us to provide a formal def-
inition of a generalization and of Data
Mining as the process of information gen-
eralization.
34
The notion of generalization is defined in
terms of granularity of steps of the pro-
cess.
Data is represented in the model in a form of
Knowledge Systems.
Each Knowledge System has a granularity
associated with it and the process changes,
or not, its granularity.
Granularity is the crucial for defining some
notions and components of the model, hence
the Granular Model name.
35
Granular Model
Granular Model is a system
GM = ( SM, DM, |= ) where:
• SM is a Semantic Model;
• DM is a Descriptive Model;
• |= ⊆ P(U) × E is called a satisfaction
relation, where U is the universe of SMand E is the set of descriptions defined
by the DM.
Satisfaction |= establishes truth relationship
between the data mining model and the
descriptive model.
36
Semantic Model definition motivation.
First step in any data mining procedures is to
drop the key attribute.
This step allows us to introduce similarities
in the database as records do not have their
unique identification anymore.
The input into the data mining process is
hence always a a data table obtained from
the target data by removal of the key at-
tribute.
We call it a target data table.
37
As the next step we represent, following Rough
Set model our target data table as Pawlak’s
Information System with the universe U
by adding a new, non attribute column for
the record names, i.e. objects of U . We
take this set U as the universe of our model
of SM.
Why Information system?
We want to model Data Mining as a process
of generalization.
In order to model this process we have first
to define what does it mean from seman-
tical point of view that one stage of the
process is more general then the other.
38
The idea behind is very simple. It is the
same as saying that (a+b)2 = a2+2ab+b2
is a more general formula then the formula
(2 + 3)2 = 22 + 2 · 2 · 3 + 32.
This means that one description (formula)
is more general then the other if it de-
scribes more objects.
From semantical point of view it means that
data mining process consists of putting ob-
jects (records) in sets of objects.
From syntactical point of view data min-
ing process consists of building descrip-
tions (in terms of attribute, values of at-
tributes pairs) of these sets of objects, with
some extra parameters, if needed.
39
To model a situation that allows us to talk
about descriptions of sets of records (ob-
jects) we extend the notion of Pawlak’s
model of information system to our notion
of Knowledge System.
The universe of a knowledge system con-
tains some subsets of U , i.e. elements of
P(U).
For example a target data table (after pre-
processing) and the corresponding repre-
sentation by Pawlak’s information system,
and a knowledge system with universe
U of granularity one are as follows.
40
Target Data Table T0a1 a2 a3
small small mediummedium small mediumsmall small mediumbig small small
medium medium bigsmall small mediumbig small small
medium medium bigsmall small mediumbig small medium
medium medium smallsmall small mediumbig small big
medium medium small
Target Information System I0U a1 a2 a3x1 small small mediumx2 medium small mediumx3 small small mediumx4 big small smallx5 medium medium bigx6 small small mediumx7 big small smallx8 medium medium bigx9 small small mediumx10 big small mediumx11 medium medium smallx12 small small mediumx13 big small bigx14 medium medium small
41
Knowledge System of granularity one (all
objects are one element sets) correspond-
ing to target table T0 is as follows.
Target Knowledge System K0
P1(U) a1 a2 a3{x1} small small medium{x2} medium small medium{x3} small small medium{x4} big small small{x5} medium medium big{x6} small small medium{x7} big small small{x8} medium medium big{x9} small small medium{x10} big small medium{x11} medium medium small{x12} small small medium{x13} big small big{x14} medium medium small
42
Assume now that we have applied some algo-
rithm ALG1 and it has returned a following
set
D = {D1, D2, ...D7}of descriptions.
D1 : (a1 = s) ∩ (a2 = s) ∩ (a3 = m),
D2 : (a1 = m) ∩ (a2 = s) ∩ (a3 = m),
D3 : (a1 = m) ∩ (a2 = m) ∩ (a3 = b),
D4 : (a1 = m) ∩ (a2 = m) ∩ (a3 = s),
D5 : (a1 = b) ∩ (a2 = s) ∩ (a3 = s),
D6 : (a1 = b) ∩ (a2 = s) ∩ (a3 = m),
D7 : (a1 = b) ∩ (a2 = s) ∩ (a3 = b).
43
Questions
Q1 How well this set of descriptions describes
our original data i.e. how accurate is the
algorithm ALG1 we have used to find them,
Q2 how accurate is the knowledge we have
thus obtained out of our data.
The answer is formulated in terms of the tar-
get information system with the universe
U , and the sets S(D) defined (after Pawlak)
for any description D ∈ D as follows.
S(D) = {x ∈ U : D}.
We call S(D) the truth set for D.
44
Intuitively, the sets
S(D) = {x ∈ U : D}contain all records (i.e. their identifiers)
with the same description given in terms
of attribute, values of attribute pairs.
The descriptions do not need to utilize all at-
tributes of the target data, as it is often
the case, and one of ultimate goals of data
mining is to find descriptions with as few
attributes as possible.
45
In association analysis the descriptions can rep-
resent the frequent item sets.
For example , for a frequent three itemset
D = i1i2i3, the truth set S(D) represents
all all transactions that contain items i1, i2, i3.
In general description come in different forms,
depending on the data mining goal and ap-
plication.
We define formally a general form of descrip-
tions as a part of the Descriptive Model
46
For the target data and descriptions Di ∈ Dpresented in the above examples the sets
S(Di) are as follows.
S1 = S(D1) = {x ∈ U : D1} = {x1, x3, x6, x9, x12},
S2 = S(D2) = {x ∈ U : D2} = {x2},
S3 = S(D3) = {x ∈ U : D3} = {x5, x8},
S4 = S(D4) = {x ∈ U : D4} = {x11, x14},
S5 = S(D5) = {x ∈ U : D5} = {x4, x7},
S6 = S(D6) = {x ∈ U : D6} = {x10},
S7 = S(D7) = {x ∈ U : D7} = {x13}.
47
We represent our results in a form of a Knowl-
edge System as follows.
Resulting Knowledge System K1P(U) a1 a2 a3{x1, x3, x6, x9, x12} s s m{x2} m s m{x5, x8} m m b{x11, x14} m m s{x4, x7} b s s{x10} b s s{x13} b s b
P(U) a1 a2 a3S1 s s mS2 m s mS3 m m bS4 m m sS5 b s sS6 b s sS7 b s b
48
The representation of data mining results in
a form of a knowledge system allows us to
define how good is the knowledge ob-
tained by a given algorithm.
In our case the knowledge obtained describes
100% of our target data as
S1 ∪ S2 ∪ S3 ∩ ... ∪ S7 = {x1, x2, ..., x14} = U.
Observe that the sets S1, ..S7 are also disjoint
and non-empty, i.e. they form a partition
of the universe U .
We define such knowledge as exact.
49
Moreover, we can see that the resulting sys-
tem K1 is more general then the input
data K0 because its granularity is higher
the the granularity of K0.
Definition: Granularity of a knowledge sys-
tem is the maximum of cardinality of its
granules, i.e. elements of its universe.
The granularity of all Target Knowledge Sys-
tems is one.
Granularity of K1 is
max{|S1|, ...|S7|} = max{5,1,2, } = 5.
50
Now assume that we have applied to out tar-get data T (represented by K0 ) anotheralgorithm ALG2 and it returned two de-scriptions D1, D2 under a condition that weneed only descriptions of the length 2 andwith frequency ≥ 30%. The descriptionsare:
D1 : (a1 = s) ∩ (a2 = s),
D2 : (a2 = s) ∩ (a3 = m).
Now we evaluate:
S1 = S(D1) = {x1, x3, x6, x9, x12},
S2 = S(D2) = {x1, x2, x3, x6, x9, x10, x12}.
51
Incorporating the algorithm parameters im-posed by the ALG2 into our KnowledgeSystem we obtain the following table.
Resulting Knowledge System K2
P(U) a1 a2 a3 #of attr frequencyS1 s s - 2 36%S2 - s m 2 50%
The sets S1, S2 do not form a partition of theuniverse U as S1 ∩ S2 6= ∅ and moreover,S1 ∪ S2 6= U .
The knowledge obtained by the algorithm ALG2
is hence not exact.
It describes only 57% of the target data andwhat is described is described following cer-tain (frequency) conditions.
Of course K2 is more general then K0.
52
The algorithm ALG2 generalized the target
data, even if in an incomplete way.
The formal definitions of Information System,
Knowledge and Target Knowledge Systems,
and their granularity and exactness are as
follows.
53
Knowledge System is an extension of the fol-
lowing notion of Pawlak’s information sys-
tem.
Information System is a system
I = (U, A, VA, f),
where U 6= ∅ is called a set of objects,
A 6= ∅, VA 6= ∅ are called the set of at-
tributes and values of of attributes, re-
spectively,
f is called an information function and
f : U ×A −→ VA
54
A knowledge system based on the informa-
tion system
I = (U, A, VA, f)
is a system
KI = (P(U), A, E, VA, VE, g)
where
E is a finite set of knowledge attributes (k-
attributes) such that A ∩ E = ∅.
VE is a finite set of values of k- attributes.
55
g is a partial function called knowledge in-
formation function(k-function)
g : P(U)× (A ∪ E) −→ (VA ∪ VE)
such that
(i) g | (⋃x∈U{x} ×A) = f
(ii) ∀S∈P(U)∀a∈A((S, a) ∈ dom(g) ⇒ g(S, a) ∈VA)
(iii) ∀S∈P(U)∀e∈E((S, e) ∈ dom(g) ⇒ g(S, e) ∈VE)
56
We use the above notion of knowledge sys-
tem to define the granules of the universe
and the granularity of the system, an hence
later, the granularity of the data mining
process.
Granule: Any set S ∈ P(U) i.e. S ⊆ U is
called a granule of U .
Granularity of S: The cardinality |S| of S is
called a granularity of S.
Granule Universe: The set
GrK = {S ∈ P : ∃b ∈ (E∪A)((S, b) ∈ dom(g))}is called a granule universe of KI.
Granularity of K: A number grK = max{|S| :S ∈ GrK} is called a granularity of K.
57
A knowledge system K = (P(U), A, E, VA, VE, g)
is called exact if and only if all its granules
GrK form a partition of the universe U .
Operators: In our Model we represent data
mining algorithms as certain operators.
For example our ALG1 is represented in the
semantic model by an operator p1 acting
on some subset of a set K of knowledge
systems, such that
p1(K0) = K1.
ALG2 is represented in the model by an op-
erator p2 also acting on some (may be dif-
ferent) subset of the set K of knowledge
systems, such that
p2(K0) = K2.
58
We put all the above observations into a for-
mal notion of a semantic model.
Semantic Model is a system
SM = (P(U), K, G),where:
• U 6= ∅ is the universe;
• K 6= ∅ is a set of knowledge systems,
called also data mining process states;
• G 6= ∅ is the set of operators;
• Each operator p ∈ G is a partial function
on the set of all data mining process
states, i.e. p : K −→ K.
59
The semantic model is always being built for
a given application.
The target data is represented first in a form
the target information system with the uni-
verse U , and then in the form of target
knowledge system K0, as we showed in our
examples.
60
The semantic model based on our examples
is as follows.
SM = (P(U), K, G),where:
• U = {x1, x2, ...x14};
• K = {K0, K1, K2};
• G = {p1, p2};
• Each pi ∈ G for (i = 1,2) is a partial
function pi : K1 −→ K1, such that
p1(K0) = K1, p2(K0) = K2.
61
Data Mining as Generalization
We model data mining as a process of gen-
eralization in terms of the generalization
relation based on a notion of granularity
and generalization operators.
Definition: A relation ¹⊆ K × K is called a
generalization relation if the following
condition holds for any K, K′ ∈ K.
K ¹ K′ if and only if grK ≤ grK′,
where grK denotes the granularity of K.
62
Observe that for K0, K1, K2 from our exam-
ples grK0= 1 ≤ 5 = grK1
≤ 7 = grK2, and
the system K2 is the most general.
But at the same time K1 is exact and K2 is
not exact, so we have a trade off between
exactness and generality.
Definition: an operator g ∈ G is called a gen-
eralization operator if for any K, K′ ∈ Ksuch that g(K) = K′, we have that
K ¹ K′.
Observe that both operators p1, p2 in our ex-
ample are generalization operators.
63
Data Mining Operators G
In data mining process the preprocessing and
data mining proper are disjoint , inclu-
sive/exlusive categories.
The preprocessing is an integral and very im-
portant stage of the data mining process
and needs as careful analysis as the data
mining proper.
Our framework allows us distinguish two dis-
joint classes of operators: the preprocess-
ing operators Gprep and data mining proper
operators Gdm and we put
G = Gprep ∪ Gdm.
64
We provide also a detailed formal definitions,
their motivation, and discussion of these
two classes.
Data Mining and preprocessing operators de-
fine different kind of generalizations.
The model presented in our examples didn’t
include the preprocessing stage; it used the
data mining proper operators only.
65
The main idea behind the concept of the
operator is to capture not only the fact
that data mining techniques generalize the
data but also to categorize existing meth-
ods.
We define within our model three classes of
data mining operators: classification Gclass,
clustering Gclust, and association Gassoc.
We don’t include in our analysis purely sta-
tistical methods like regression, etc...
66
We prove the following theorem.
Theorem Let Gclass,Gclust and Gassoc be the
sets of all classification, clustering, and as-
sociation operators, respectively.
The following conditions hold.
(1) Gclass 6= Gclust 6= Gassoc
(2) Gassoc ∩ Gclass = ∅,
(3) Gassoc ∩ Gclust = ∅.
67
Data Mining Process
Definition Any sequence
K1, K2, ....Kn (n ≥ 1)
of data mining states is called a data pre-
processing process, if there is a prepro-
cessing operator G ∈ Gprep, such that
G(Ki) = Ki+1, i = 1,2, ...n− 1.
Definition Any sequence
K1, K2, ....Kn (n ≥ 1)
of data mining states is called a data min-
ing proper process , if there is a data
mining proper operator G ∈ Gdm, such
that
G(Ki) = Ki+1, i = 1,2, ...n− 1.
68
The data mining process consists of the pre-
processing process (that might be empty)
and the data mining proper process.
We know that the sets Gprep and Gdm are dis-
joint. This justifies the the following defi-
nition.
Definition Data mining process process is any
sequence
K1, K2, ....Kn (n ≥ 1)
of data mining states, such that
K1, ..Ki (0 ≤ i ≤ n)
is a preprocessing process and
Ki+1, ...., Kn
is a data mining proper process.
69
Granular ModelSyntax- Semantic Duality of Data Mining
Granular Model is a system
GM = ( SM, DM, |= ) where:
• SM is a Semantic Model;
• DM is a Descriptive Model;
• |= ⊆ P(U) × E is called a satisfaction
relation, where U is the universe of SMand E is the set of descriptions defined
by the DM.
Satisfaction |= establishes relationship between
the semantic model and the descriptive model.
70
Descriptive Model
For any Semantic Model SM = (P(U), K, G, )we associate with it its descriptive counter-part defined below.
A Descriptive Model is a system
DM = ( L, E,DK ),
where:
L = ( A, E ) is called a descriptive lan-
guage;
A is a countably infinite set called the alpha-
bet;
E 6= ∅ and E ⊆ A∗ is the set of descriptive
expressions of L;
71
DK 6= ∅ and DK ⊆ P(E) is a set of descrip-tions of knowledge states.
As in a case of semantic model, we build thedescriptive model for a given application.
We define here only a general form of themodel.
We assume however, that whatever is the ap-plication, the descriptions are always buildin terms of attributes and values of theattributes, some logical connectives, somepredicates and some extra parameters, ifneeded.
The commonly used descriptions have the form(a = v) to denote that the attribute a hasa value v, but one might also use, as it isoften done, a predicate form a(v) or a(x, v)instead.
72
For example, a neural network with its nodes
and weights can be seen as a formal de-
scription (in an appropriate descriptive lan-
guage), and the knowledge states would
represent changes in parameters during the
neural network training process.
The model we build here is a model for, what
we call a descriptive data mining, i.e. the
data mining for which the goal of the data
mining process is to produce a set of de-
scriptions in a language easily comprehen-
sible to the user.
For that purpose in the model we identify the
decision tree constructed by the classifica-
tion by Decision Tree algorithm with the
set of discriminant rules obtained from the
tree.
73
Granular Model is a systemGM = ( SM, DM, |= ) where:
• SM is a Semantic Model;
• DM is a Descriptive Model;
• |= ⊆ P(U) × E is called a satisfaction
relation, where U is the universe of SMand E is the set of descriptions definedby the DM.
Satisfaction |= establishes relationship betweenthe semantic model and the descriptive model.
We define the Satisfaction |= component ofthe Granular Model DM in the followingstages.
Stage1 For each K ∈ K, we define its owndescriptive language LK = ( AK, EK ).
74
Stage2 For each K ∈ K, and descriptive ex-
pression F ∈ EK, we define what does it
mean that D satisfied in K; i.e. we define
a satisfaction relation |=K.
Stage3 For each K ∈ K, and descriptive ex-
pression F ∈ EK, we define what does it
mean that D is true K, i.e. |=KD.
Stage4 We use the satisfaction relation |=K
to define, for each K ∈ K, the set DK ⊆P(EK) of descriptions of its own knowl-
edge.
Stage5 We use the languages LK to define
the descriptive language L.
Stage6 We use the descriptive expressions
EK of LK to define the set E of descriptive
expressions of L.
Stage7 We use the satisfaction relations |=K
to define the satisfaction relation |= of
the Granular Model GM.
75
Part 3: TRACING THE
HISTORY
Mathematics Genealogy Projectgenealogy.math.ndsu.nodak.edu
76
We all have a history
We are all mathematicians
Mission Statement of the Mathematics Ge-
nealogy Project defines a mathematician
as follows.
” ... Throughout this project when we use
the word ”mathematics” or ”mathemati-
cian” we mean that word in a very inclu-
sive sense. Thus, all relevant data from
statistics, computer science, or operations
research is welcome....”
Computer Science classification within the
project is: Mathematics Subject Classifi-
cation: 68Computer Science.
77
The Genealogy Project solicits information from
all schools who participate in the devel-
opment of research level mathematics and
from all individuals who may know desired
information. It means Computer Science
as well.
For them, and the history, we are all math-
ematicians.
78
Below are some links (sequences of connected
people) for a computer scientist.
Any two people in the sequence are listed in
order PhD student, Adviser.
If a person has more then one adviser the ad-
viser is preceded with a number; i.e.
adviser 1 is listed as 1. adviser Name,
adviser 2 is listed as 2. adviser Name, etc...
79
A mathematician would say:
For any element A of the sequence, if A
has more then one adviser, then for any
1 ≤ k ≤ n , an adviser k is listed as k.Name
of the adviser k,
and the number in front of the name is
omitted otherwise.
80
Link to Nicolaus Copernicus
(Mikolaj Kopernik)
He has 1598 descendants
Anita Wasilewska, Ph.D. Warsaw University,
1975, Poland, Helena Rasiowa, Ph.D. War-
saw University,1950, Andrzej Mostowski,
Ph.D. Warsaw University, 1938, 2. Alfred
Tarski, Ph.D. Warsaw University, 1924,
Stanislaw Lesniewski, Ph.D. University of
Lvov, 1912, Kazimierz Twardowski, Ph.D.
Universitat Wien, 1891, Franz Clemens
Brentano, Ph.D. Eberhard Karls Universi-
tat, Tubingen 1862, 2. Friedrich Adolf
Trendelenburg, Dr. phil. Universitat Leipzig,
1826, 1. Georg Ludwig Konig, Artium
Liberalium Magister, Georg August Univer-
sitat, Gottingen, 1790, Christian Heyne,
Magister Juris, Universitat Leipzig, 1752,
81
1. Johann August Bach, Magister philosophiae,
Universitat Leipzig, 1744, 1.Christian Kust-
ner, Magister philosophiae, Universitat Leipzig,
1742, Johann Ernesti, Magister philosophiae,
Universitat Leipzig, 1730, Johann Gesner,
Magister artium, Friedrich Schiller Univer-
sitat Jena, 1715, Johann Buddeus, Magis-
ter artium, Martin Luther Universitat, Halle
Wittenberg, 1687, Michael Walther, Jr.,
Magister artium, Theol. Dr., Martin Luther
Universitat, Halle Wittenberg, 1661, 1687,
2.Johann Quenstedt, Magister artium, Theol.
Dr., Universitat Helmstedt, Martin Luther
Universitat,b Halle Wittenberg, 1643, 1644,
Christoph Notnagel, Magister artium, Mar-
tin Luther Universitat, Halle Wittenberg,
1630, Ambrosius Rhodius, Magister artium,
Medicinae Dr., Martin Luther Universitat,
Halle Wittenberg, 1600, 1610,
82
1.Melchior Jostel, Magister artium, Medici-
nae Dr., Martin Luther Universitat, Halle
Wittenberg, 1583, 1600, 1.Valentin Otto,
Magister artium, Martin Luther Universi-
tat, Halle Wittenberg, 1570, Georg Joachim
Rheticus, Magister artium, Martin Luther
Universitat, Halle Wittenberg 1535,
2. Nicolaus Copernicus, Juris utriusque,
Doctor, Uniwersytet Jagiellonski (Cra-
cow Jagellonian University), Universita
di Bologna, Universita degli Studi di
Ferrara, Universita di Padova, 1499,
Poland-Italy,
2.Domenico Novara da Ferrara, Universita di
Firenze, 1483, 1. Johannes Regiomon-
tanus, Magister artium, Universitat Leipzig,
Universitat Wien, 1457,
83
Georg von Peuerbach, Magister artium, Uni-
versitat Wien, 1440, Johannes von Gmunden,
Magister artium, Universitat Wien, 1406,
Heinrich von Langenstein, Magister artium,
Theol. Dr., Universite de Paris, 1363,
1375, unknown.
Georg von Peuerbach, 1375 is my ”oldest”
ancestor.
THERE ARE 3 more lines of ancestry; also
interesting, if not so illustrious. Here they
are.
84
Link to Gottfried Leibniz
(54209 descendants),
Immanuel Kant
( 2176 descendants), and
Desiderius Erasmus of Rotterdam
(57416 descendants)
Anita Wasilewska, Ph.D. Warsaw University,
1975, Poland, Helena Rasiowa, Ph.D. War-
saw University, 1950, Andrzej Mostowski,
Ph.D. Warsaw University, 1938, 2. Alfred
Tarski, Ph.D. Warsaw University, 1924,
Stanislaw Lesniewski, Ph.D. University of
Lvov, 1912, Kazimierz Twardowski, Ph.D.
Universitat Wien, 1891, Franz Clemens
Brentano, Ph.D. Eberhard Karls Univer-
sitat, Tubingen 1862, 2. Friedrich Adolf
Trendelenburg, Dr. Phil. Universitat Leipzig,
1826, 2. Karl Reinhold, PhD.,
85
Immanuel Kant, Ph.D. Universitat Konigs-
berg 1770,
Martin Knutzen, Dr. Phil. Universitat Konigs-
berg, 1732, Christian von Wolff, Dr. phil.,
Universitat Leipzig, 1700,
2. Gottfried Leibniz, Dr. jur. Universitat
Altdorf, 1666,
2. Christiaan Huygens, Artium Liberalium
Magister, Jurisutriusque Doctor, Universiteit
Leiden, Universite d’Angers, 1647, 1655,
Frans van Schooten, Jr., Artium Liberal-
ium Magister, Universiteit Leiden, 1635,
Jacobus Golius, Artium Liberalium Magis-
ter, Philosophiae Doctor Universiteit Lei-
den, 1612, 1621, 1. Willebrord (Snel van
Royen) Snellius, Artium Liberalium Magis-
ter, Universiteit Leiden, 1607, 2. Rudolph
86
(Snel van Royen) Snellius, Artium liberal-ium Magister, Universitat zu Koln, RuprechtKarls Universitat Heidelberg, 1572, 1. Valen-tine Naibod, Magister Artium, Martin LutherUniversitat, Halle Wittenberg, UniversitatErfur, Erasmus Reinhold, Magister Artium,Martin Luther Universitat, Halle Witten-berg, 1535, Jakob Milich, Liberalium Ar-tium Magister, Med. Dr., Albert LudwigsUniversitat Freiburg, Breisgau, UniversitatWien, 1520, 1524,
Desiderius Erasmus Roterodamus (sometimesknown as Desiderius Erasmus of Rot-terdam), University of Paris, TheologiaeBaccalaureus, College de Montaigu, 1497,
Jan Standonck, Magister Artium, Theol. Dr.,College Sainte-Barbe, College de Montaigu,1474, 1490, unknown
Link to Pierre-Simon Laplace
( 50295 descendants) andJean Le Rond d’Alembert
Anita Wasilewska, Ph.D. Warsaw University,1975, Poland, Helena Rasiowa, Ph.D. War-saw University, 1950, Andrzej Mostowski,Ph.D. Warsaw University, 1938, 1. Kaz-
imierz Kuratowski, Ph.D. Warsaw Uni-versity, 1921, 1. Stefan Mazurkiewicz,Ph.D. University of Lvov, 1913, WaclawSierpinski, Ph.D. Uniwersytet Jagiellonski,1906, 1. Stanislaw Zaremba, Ph.D. Uni-versite Paris IV-Sorbonne, 1889, GastonDarboux, Ph.D. Ecole Normale Superieure,Paris, 1866, Michel Chasles, Ph.D. EcolePolytechnique, 1814, Simeon Poisson, Ph.D.Ecole Polytechnique, 1800, 2. Pierre-Simon
Laplace, Ph.D., Jean Le Rond d’Alembert,unknown
87
Link to Emile Borel
(2506 descendants),
Leonhard Euler
(52555 descendants)
Anita Wasilewska, Ph.D. Warsaw University,
1975, Poland, Helena Rasiowa, Ph.D. War-
saw University, 1950, Andrzej Mostowski,
Ph.D. Warsaw University, 1938, 2. Zyg-
munt Janiszewski, Ph.D. Ecole Normale
Superieure Paris, 1911, Henri Lebesgue,
Ph.D. Universite Henri Poincare Nancy 1,
1902, Emile Borel, Ph.D. Ecole Normale
Superieure, Paris, 1893, Gaston Darboux,
Ph.D. Ecole Normale Superieure, Paris, 1866,
Michel Chasles, Ph.D., Ecole Polytechnique,
1814, Simeon Poisson, Ph.D. Ecole Poly-
technique, 1800,
88
1. Joseph Lagrange, no degree, student of
Leonhard Euler, Ph.D. Universitat Basel,
1726, Dr. med. Universitat Basel, 1694,
Dr. hab. Sci. Universitat Basel, 1684,
Gottfried Leibniz, Dr. jur. Universitat Alt-
dorf, 1666, 1.Johann Bernoulli, Dr. med.
Universitt Basel 1694, Jacob Bernoulli, Dr.
hab. Sci. Universitt Basel, 1684, Got-
tfried Wilhelm Leibniz, Dr. jur. Universitt
Altdorf, 1666, 1. Erhard Weigel, Ph.D.
Universitt Leipzig, 1650, unknown.
89
Link to Andrei Markov
(4824 descendants), and
Pafnuty Chebyshev (5964 descendants)
Anita Wasilewska, Ph.D. Warsaw University,
1975, Poland, Helena Rasiowa, Ph.D. War-
saw University, 1950, Andrzej Mostowski,
Ph.D. Warsaw University, 1938, 1. Kaz-
imierz Kuratowski, Ph.D. Warsaw Uni-
versity,1921, 1. Stefan Mazurkiewicz,
Ph.D. University of Lvov, 1913, Waclaw
Sierpinski, Ph.D. Uniwersytet Jagiellonski,
1906, 2. Georgy Fedoseevich Voronoy,
Ph.D. University of St. Petersburg, 1896,
Andrei Markov, Ph.D. University of St.
Petersburg, 1884, Pafnuty Chebyshev,
Ph.D. University of St. Petersburg, 1849,
Nikolai Dmitrievich Brashman, Ph.D. Moscow
State University, 1834, Joseph Johann von
Littrow, Ph.D., unknown
90
MY PhD COUSINS include
Kurt Goedel
Alain Turing
Alonso Church
Roman Sikorski
Zdzislam Pawlak
and many others....I am sure some of them
in this room!
91
In Stony Brook CS Department I traced 10
of them.
WE ALL ARE A BIG SCIENTIFIC
FAMILY!
92