linguistic summaries on relational databases miroslav hudec university of economics in bratislava,...
TRANSCRIPT
![Page 1: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/1.jpg)
Linguistic summaries on relational databases
Miroslav Hudec
University of Economics in Bratislava,
Department of Applied Informatics
FSTA, 2014
![Page 2: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/2.jpg)
Relational knowledge from a data set
Most of municipalities with high altitude have small pollution?
Validity of rule 1] [0, v
If then rules: if population density is high then waste production is high?
![Page 3: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/3.jpg)
Linguistic summary - introduction
Q is a linguistic quantifier, X ={x} is a universe of disclosure and P(x) is a predicate depicting summariser S
Qx(Px)
Q entities in database are (have) S
Truth value of summaries called validity and gets values from the [0, 1] interval
![Page 4: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/4.jpg)
Linguistic summary - elementary
Q entities in database are (have) S
))(n
1())((
1P
n
iiQ xPxQxT
where n is the cardinality of database (number of entities),
is the proportion of objects in a database that satisfy P(x),
µq is quantifier
)(n
1
1P
n
iix
![Page 5: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/5.jpg)
Linguistic summary - extended
Q R objects in database are (have) S
))(
))(),((())((
1R
R1
S
n
ii
i
n
ii
Q
x
xxtPxQxT
the proportion of R objects in a database that satisfy S, t is a t-norm, µq is quantifier.
![Page 6: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/6.jpg)
Linguistic summary - graph
Q R objects in database are (have) S
![Page 7: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/7.jpg)
Issues
![Page 8: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/8.jpg)
Summarizer
Let Dmin and Dmax be the lowest and the highest domain values of attribute A i.e. Dom(A) = [Dmin, Dmax] and L and H be the lowest and the highest values in the current content of a database respectively. In practice, [L, H] [Dmin, Dmax]. This fact should be considered in linguistic summaries.
![Page 9: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/9.jpg)
Family of summarizer
variable
F(X)
small medium high
A B C D
L H
1
0
μP(xi)
LA
LB
HCHD
)(8
1LH
)(4
1LH
The uniform domain covering method (Tudorie, 2008)
![Page 10: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/10.jpg)
Quantifier
3.0 ,0
8.03.0 6.02
8.0 ,1
)(
yfor
yfory
yfor
yQ
For a regular non-decreasing quantifier (e.g. most) its membership function should meet the following property:
)()( yxyx QQ 1)1( ;0)0( QQ
Quantifier most might be given as (Kacprzyk and Zadrożny 2009)
m
1
0
µ(Q)
quantifier most
n 1
![Page 11: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/11.jpg)
Example
Linguistic summary (rule) Validity
Most municipalities having high population density have high production of waste
0,662
Most municipalities having medium population density have medium production of wa
0
Most municipalities having small population density have small production of waste
1
if population density is small then production of waste is small with cf = 1;if population density is high then production of waste is high with cf = 0.662.
Rules
![Page 12: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/12.jpg)
Family of quantifiers
quantifier
Q
fewabout half
most
AQ BQ CQ DQ
QQ QQ Q
0 1
1
0
μQ(y)
Uniform domain covering method on the [0, 1] interval
8
1Q
4
1Q
25.0QA
375.0QB
625.0QC
75.0QD
,
,
,
,
,
![Page 13: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/13.jpg)
Comparison of quantifiers
quantifier
Q
0.25 0.375 0.675 0.750 1
1
0
μQ(y)
0.2 0.3 0.7 0.8
Quantifiers most (Kacprzyk and Zadrożny, 2009) and few
Quantifiers most, about half and few (our approach)
![Page 14: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/14.jpg)
Optimization of summaries1. Decision maker creates particular linguistic summary
or sentence of interest and evaluate its validity2. Automatic generation of relevant linguistic summaries
(Liu, 2011).
),,(
,
_
_
_
RSQv
RR
SS
tosubject
RandSQFind
is a set of relevant quantifiers, is a set of relevant linguistic expressions, is a set defining subpopulation of interest and β is the threshold value from the {0, 1] interval. Each solution produces a linguistic summary Q* R * are S*.
_
Q_
S_
R
![Page 15: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/15.jpg)
Optimization of summaries
}),(|),{(___
RxSRSRSPc __
),( cr PPRS
),,(
),(
,
__
_
_
_
RSQv
PPRS
RR
SS
tosubject
RandSQFind
cr
{(small, small), (small, medium),(medium, medium), (high, high)}
_
rP
![Page 16: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/16.jpg)
Attribute A Attribute B
Fuzzy functional dependencies
small
medium
high
t1
tn
...
t1
tn
t1
tn
...
Attribute A
small
medium
hightn
...
t1Attribute B
Linguistic summaries
...
Fuzzy functional dependencies and linguistic summaries
![Page 17: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/17.jpg)
R
i
N
jjiQi nRixPxQxT
i
1i
1P
i
N ,1 , ))(N
1())((
Queries by summaries
Data on lower hierarchical level are basis for summaries but only data on higher level are revealed ranked downward from the best to the worst. Select regions where most of municipalities has small attitude above sea level
where n is number of entities in whole database, Ni is number of entities in cluster i (municipalities in region i), R is number of clusters in database (regions), µp(xji) is matching degree of j-th entity in i-th cluster.
Advantages:1.Sensitive or data that are not free of charge remain hidden2.Policy maker… is interested in general overview not in data
![Page 18: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/18.jpg)
Example
Select regions where most of municipalities has small attitude above sea level
Region Validity of the
summaryBratislava 1Trnava 1Nitra 1Trenčín 0.7719Košice 0.6314Banská Bystrica 0.2116Žilina 0
Prešov 0
![Page 19: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/19.jpg)
Conclusion
The work demonstrates how we can start with a simple linguistic summary and build more complex summaries by merging knowledge from several fields: mining parameters for functions of summarizers from data and extending to defining parameters of quantifiers, optimization of summaries, fuzzy queries. Although fuzzy set theory has been already established as an adequate framework to deal with linguistic summaries, there is still space for improvements.
![Page 20: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/20.jpg)
Some topics for further research
• Linguistic summaries on fuzzy databases,• Operations research task for optimisation the process of
rules generation • Full applications for practitioners• Fuzzy functional dependencies and linguistic summaries
in data mining
![Page 21: Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014](https://reader036.vdocuments.us/reader036/viewer/2022062805/5697c0021a28abf838cc2bc7/html5/thumbnails/21.jpg)
Thank you for your attention