1 combining fuzzy and statistical uncertainty: probabilistic fuzzy systems and their applications...
TRANSCRIPT
1
Combining fuzzy and statistical uncertainty: probabilistic fuzzy
systems and their applications
prof.dr.ir. Jan van den Berg- TUDelft: Faculties of TPM and EWI
- Cyber Security Academy The Hague
http://tbm.tudelft.nl/index.php?id=30084&L=1
2
Summary of the talk
•Two complementary conceptualizations of uncertainty will be discussed: statistical and fuzzy uncertainty.
•These uncertainties can be combined into one theory on probabilistic fuzzy events. Using this theory,classical fuzzy systems can be generalized to probabilistic fuzzy systems (PFS).
•PFS can be induced using both expert knowledge and data enabling both interpretability and accuracy (despite the fact there remains an accuracy-interpretability dilemma we have to deal with in practice).
•To finalize, one or two examples of PFSs we developed will be shown: next time!
3
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications (next time)•Conclusions
4
Probabilitic/statistical uncertainty
•Probabilitic/statistical uncertainty: well-known notion
•Crisp events occur with a certain probability
•These probabilities can be assessed statistically• E.g., frequentist approach: by repeating experiments• A lot of theory on unbiased estimation, ML estimation, etc.
•In continuous outcome spaces, probability distributions are used
•Mathematical statistics: descriptive and inferential statistics(the latter on drawing conclusions from data using some model for the data)
5
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions
6 6
Crisp sets
Collection of definite, well-definable objects (elements) to form a whole, having crisp boundaries
Representations of sets:
• list of all elements
A = {x1, ,xn}, xj X
• elements with property P
A={x|x satisfies P }, x X
• Venn diagram
• characteristic functionfA: X {0,1}, fA(x) = 1, x XfA(x) = 0, x X
A
Real numbers larger than 3:
0
1
3X
7
Fuzzy sets
• Sets with fuzzy, gradual boundaries (Zadeh 1965)
• A fuzzy set A in X is characterized by its membership function A: X [0,1]
A fuzzy set A is completely determined by the set of
ordered pairs
A={(x,A(x))| x X}
X is called the domain or universe of discourse
Real numbers about 3:
0
1
3
X
A(x)
8
Fuzzy sets on discrete universes
•Fuzzy set C = “desirable city to live in”X = {SF, Boston, LA} (discrete and non-ordered)C = {(SF, 0.9), (Boston, 0.8), (LA, 0.6)}
•Fuzzy set A = “sensible number of children”X = {0, 1, 2, 3, 4, 5, 6} (discrete universe)A = {(0, .1), (1, .3), (2, .7), (3, 1), (4, .6), (5, .2), (6, .1)}
Num ber of Children
Mem
bers
hip
Gra
des
9
Fuzzy sets on continuous universes
• Fuzzy set B = “about 50 years old”
X = Set of positive real numbers (continuous)
B = {(x, B(x)) | x in X}
B xx
( )
1
150
10
2
Me
mb
ers
hip
Gra
des
Age
10
•Fuzzy partition formed by the linguistic values “young”, “middle aged”, and “old”
•For any age: sum of membership values = 1
Fuzzy partition
0 10 20 30 40 50 60 70 80 900
0.5
1Young M idd le A ged O ld
Age
Me
mb
ersh
ip G
rad
es
11
Operations with fuzzy sets
)()())(),(max()( xxxxxBAC BABAc
)()())(),(min()( xxxxxBAC BABAc
A X A x xA A ( ) ( )1
)()()()()( xxxxxBAC BABAc
)()()( xxxBAC BAc Note the multiple definitions!
12
0
0.2
0.4
0.6
0.8
1
(a) Fuzzy Sets A and B
A B
0
0.2
0.4
0.6
0.8
1
(b) Fuzzy Set "not A"
0
0.2
0.4
0.6
0.8
1
(c) Fuzzy Set "A OR B"
0
0.2
0.4
0.6
0.8
1
(d) Fuzzy Set "A AND B"
Set theoretic operations, examples
maximum
minimum
13
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions
14
Fuzzy Modeling (expert-driven)
•FM can be defined based on expert knowledge:
•Many human concepts (big, long, high, very much, adequate, satisfactory, …) are defined in a (context dependent) quantitative way, describing state of nature
•Examples of facts about the world:•The president is middle-aged•The water supply is insufficient •Birth weight of Romanian children is quite low
Need for formalization: linguistic variable
15
Linguistic variable
• A numerical variable takes numerical values:
Age = 65 (defines a crisp event)• A linguistic variables takes linguistic values:
Age is old (defines a fuzzy event)• A linguistic value is defined by a fuzzy set (enabling a
characterization with gradual transitions):
Ex: A fuzzy partition of linguistic cariable ‘Age’ formed with linguistic values “young”, “middle aged”, and “old”:
0 10 20 30 40 50 60 70 80 900
0.5
1Young M idd le A ged O ld
Age
Me
mb
ersh
ip G
rad
es
16
Expert-driven fuzzy modeling, cont.
•Experts can express their knowledge in •Facts about the world (see above), and•Fuzzy IF-THEN rules
•Example fuzzy rules:•IF Nutrician state is poor AND Birth weight is medium AND Respiration disease is absent THEN Child mortality rate is medium
•IF Nutrician state is medium AND Birth weight is not too low AND Respiration disease is absent THEN Child mortality rate is rather low
•Need for Reasoning/Inference mechanism
17
Fuzzy (Inference) System (FS)
•Fuzzifier (interface: from crisp to fuzzy)•Rule base (enabling interpretability/transparency)•Inference engine (implements fuzzy reasoning)•Defuzzifier (interface: from fuzzy to crisp)•Note: Fuzzifier or defuzzifier may be absent
Fuzzifier
Defuzzifier
Rule base(knowledge base)
Inference engineinput
output
18
Example Fuzzy System: Mamdani model
•Five major steps:1. Fuzzification2. Degree of fulfillment3. Inference4. Aggregation5. Defuzzification
•Computations according to Mamdani reasoning apply a generalized form of classical modus ponens:
Given: x is A' and If x is A, then y is B, Conclude: y is B'
21
Inducing a fuzzy model from data
If income is Low then tax is Low
If income is High then tax is High
22
Bias-variance dilemma (!)Interpretability-accuracy dilemma (!)
Algorithms exist to gradually induce more and more rules from data
•Bias-variance dilemma to find models of right complexity
•Interpretability-accuracy dilemma is another key issue (of Data Mining)
23
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions
24
Probability of a crisp and fuzzy events
•Crisp
•Fuzzy (Zadeh, 1968)
•
dxxpxdxxpAX AAx
)()()()Pr(
dxxpxAX A )()()Pr(
Probability of a fuzzy event is the expectation of its membership function!
X B
X BA
dxxpx
dxxpxx
B
BABA
)()(
)()()(
)Pr(
)Pr( )|Pr(
Conditional probability:
Satisfies P(A|A) = 1
25
An example, continuous domain
Answering the question:
“What is the probability that a randomly selected Indian woman is tall?”
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
1.50 1.68 2.0
calculating the probability that a randomly selected Indian woman is tall
tall
pdf of length
fuzzy pdf
))((
)()( )Pr(
xE
dxxfxA
A
A
26
Probabilistic fuzzy events, discrete case
• 2 discrete probabilistic fuzzy events:
A1 = ( (x 1), (x 2)) = (m, 1 - n) and A2 = ( ’ (x 1),
’ (x 2)) = (1 – m,n)
• If x 1 occurs, then
A1 occurs with degree m
and A2 occurs with degree 1- m
• Similarly, if x 2 occurs…
• E.g., A1 means “tall” and
A2 = “small”
where x 1 and x 2 are two
values of the x - variable length
• Pr(A1) = mp + (1 – n)(1 – p), and
Pr(A 2) = (1 –m )p + n (1 – p)
27
Simple estimation of probabilities
•Let x1, …, xn be a random sample on a domain X•The probability of a crisp event A can be estimated by
•The probability of a fuzzy event Ai can be estimated by
assuming that X is well-formed (~ fuzzy partition), i.e.
k
kAn)(
1x
k
kAin)(
1x
i
kAk i 1)(xx
28
Deterministic and probabilistic rules
qqq CyAR is then is If : Rule x
Model parameters: qq CA ,
Ex: If current returns are large, then future returns will be large
If current returns are large, then future returns will be large with probability 1
future returns will be small with probability 1-1
Probabilistic uncertainty
Linguistic vagueness
29
Probabilistic fuzzy rules
)Pr( with is
...and and )Pr( with is
...and and )Pr( with is
then is If : Rule
11
qJqJq
qjqjq
qqq
|ACCy
|ACCy
|ACCy
AR x
Model parameters: )Pr( , , qjqjqq |ACCA
30
A ‘crazy’ theoretical sidestep
•Probabilistic Fuzzy Entropy
•To be used to induce fuzzy decision trees
31
Definition of PF Entropy, discrete source
•Given a (well-formed) sample space with a fuzzy partition of fuzzy events A1, …, AC defined by membership functions occurring with probabilities Pr(A1), …, Pr(AC ), the PFE is defined as:
PFE is a probabilistic type of entropy defined in a fuzzily partitioned (sample) space!
)(,),(1 xx C
)))(((log()))(((log))((
)))(Pr(log())(Pr(log)Pr(
2
C
1c2
2
C
1c2
),(
xEExExE
AEAAsf
ccc
cccpH
32
A very special information source
Consider 2 discrete strictly complementary statistical fuzzy events:
A1 = ( (x1), (x2)) = (m, 1 - m) , A2 = ( ’ (x1),
’ (x2)) = (1 – m,m)
33
A very special information source, cont.
Since A1 = (m, 1 - m), A2 = (1 – m, m), Pr (x1 ) = p , and Pr (x2 ) = 1 – p,
it follows that:
Pr(A1) = mp + (1 – m)(1 – p) = 2mp + 1 – m – p = 1 – Q (1)
Pr(A2) = p (1 – m) + (1 – p) m = m + p – 2mp = Q (2)
34
Entropy of information source generating 2 strictly complementary stat fuzzy events
• Using definition of PFE (sh. 31 ) and equations (1) and (2) from previous sheet, it follows that
Hsf (m,p ) = - Q log2 Q - (1 - Q ) log2 (1 - Q ) where
Q (m,p ) = m + p - 2 m p
• Q (like 1 – Q) relates to the combined uncertainty of the probabilistic fuzzy events based on their fuzziness m and the probability of occurrence p:
35
Further interpreting Q• Q (m,p ) = m + p - 2 m p• Q = 0 or 1 no uncertainty;
Q = 0.5 highest uncertainty• Illumination and interpretation:
- m = p = 0 or 1, Q = 0 two crisp events, one of which occurs with probability 1- m = 0, p = 1 or m = 1, p = 0
Q = 1, same explanation!- p = 0.5 or m = 0.5 Q = 0.5 two fuzzy events having equal prob, or two non-distinguishable fuzzy events!!
36
Interpretation of Hpf (m,p)
Hpf (m,p ) = - Q log2 Q - (1 - Q ) log2 (1 - Q )
• PFE quantifies the combined uncertainty
• Illumination:- Q = 0 or 1 ~ no uncertainty ~ H (m,p ) = 0, in 4 corners- p = 0.5 or m = 0.5 ~ Q = 0.5 ~ two fuzzy events having equal prob, or two non-distinguishable fuzzy events ~ highest uncertainty ~ H (m,p ) = 1- if m = 0 or 1: classical ‘crisp’ entropy- if p = 0 or 1: ‘fuzzy’ entropy only
37
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions
38
Probabilistic fuzzy systems, withdiscrete probability distribution
antecedent
consequent
fuzzy set
39
If x is A4 then y is B1 with probability p(B1 | A4), and y is B2 with probability p(B2 | A4), and y is B3 with probability p(B3 | A4).
Deterministic vs. probabilistic FS
X
A4
B2
Y If x is A4 then y is B2
A5A3A1
A2B1
B3
40
Probabilistic fuzzy system, with continuous probability distribution
)|()( then is If : Rule qqq AyfyfAR x
Additive reasoning:
Q
qqq
Q
qqqQ
qA
Q
qqA
Ayyy
Ayf
Ayf
yf
q
q
1
1
1
1
)|E()()|E(
)|()(
)(
)|()(
)|(
xx
x
x
x
x
41
Probability distribution characterization
•In general, different characterizations can be used for the conditional probability density in the rule consequents
•This characterization could be an approximation with a histogram or an explicit model for density, e.g., a normal or other distribution
•In PFS, we can select a fuzzy histogram characterization
42
Histograms, classical crisp case
• Let x1, …, xn be a random sample from a univariate distribution with pdf f(x)• Let the characteristic functions i (x) (defining crisp bins/intervals Ai) constitute a crisp partitioning:
• A histogram estimates f (x) (from data xk ) as follows:
i i
kkii
i i
ii
dxx
xn
x
A
pxxf
)(
)(1
)(
||
)( )(
)(^
0)().(: 1)(: xxjixx jii
i
44
Crisp vs. fuzzy histogram
Membership values
0
0.2
0.4
0.6
0.8
1
1.2
-3.5
-3.2
-2.8
-2.5
-2.1
-1.8
-1.4
-1.1
-0.7
-0.4 -0
0.3
5
0.7
1.0
5
1.4
1.7
5
2.1
2.4
5
2.8
3.1
5
3.5
A1
A2
A3
A4
A5
A6
A7
pdf normal distribution Crisp histogram Fuzzy histogram
47
Probabilistic Mamdani systems
)Pr( with
...and and )Pr( with
...and and )Pr( with
then is If : Rule
11
qJJ
qjj
q
|ACCy
|ACCy
|ACCy
AR
x
Q
qjqj
J
jq z|ACyy
1 1
)Pr()()|E( xx Reasoning:
Centroid of fuzzy consequent set Cj
49
Probabilistic TS systems
Zero-order probabilistic Takagi-Sugeno
)Pr( with
...and and )Pr( with
and )Pr( with
then is If : Rule
22
11
qJJ
q
q
|Ayyy
|Ayyy
|Ayyy
AR
x
In essence, consequents are crisp sets centered around jy
Q
qqj
J
jjq |Ayyyy
1 1
)Pr()()|E( xx Reasoning:
50
Relation to deterministic FSs
•Zero-order Takagi-Sugeno system
Takagi-Sugeno reasoning
Q
qqq cy
1
)(* x
c.f.
Q
qjqj
J
jq z|Ayyy
1 1
)Pr()()|E( xx
Select jqj
J
jq z|Ayc )Pr(
1
51
Probabilistic fuzzy systems
• Essentially a fuzzy system that estimates a probability density function, i.e. the fuzzy system approximates a p.d.f.
• Usually p.d.f. is conditional on the input• Linguistic information is coded in fuzzy rules• Combine linguistic uncertainty with
probabilistic uncertainty• Different types of fuzzy systems can be extended
to the PFS equivalent (e.g. Mamdani fuzzy systems, Takagi-Sugeno fuzzy systems)
summary
52
PFS design
• Identifying mental world vs. observed world (van den Eijkel 1999)
• Mental world: linguistic descriptions, fuzzy conceptualization, experts’ knowledge
• Observed world: data measurements, probability density functions, optimal consequent parameters
• Optimal design given a mental world: application of conditional probability measures for fuzzy events
• Optimal design given an observed world: nonlinear optimization techniques
53
Parameter determination
Part 2 - OptimizationPart 2 - Optimization
Maximum Likelihood Method
MF parameter
s
MF parameter
s
Probability
parameters
Probability
parameters
Part 1 – Initialization
Sequential Method
Part 1 – Initialization
Sequential Method
Sequential Method
Part 1MF
parameters
Part 1MF
parameters
Part 2Probabili
ty parameter
s
Part 2Probabili
ty parameter
s
MF – Membership function
54
Part 1: Finding the membership function (MF) parameters
is a fuzzy set defined by a membership function
E.g. – Gaussian MF parameters:
o v – center of MFo σ – width of MF
Sequential method
FCM ClusteringFCM Clustering
55
MF determination
•For the first part of the sequential method, well-known techniques from fuzzy modeling can be applied• Fuzzy clustering in input-output product space• Fuzzy clustering in input and output space• Expert-driven design• Similarity-based rule-base simplification• Feature selection• Heuristic approaches• Etc.
56
Sequential method
56
Part 2: Finding the probability parameters - Pr(ωc|Aj)
Set the parameters Pr(ωc|Aj) equal to estimates of the conditional probabilities - conditional probability estimation
57
Estimation of probability parameters
• Conditional probabilities Pr(Cj | Aq) can be assessed
directly by using the definition of the probability of joint events:
• This method does not provide maximum likelihood estimates of the probability parameters.
k q
kk jq
kA
y kCkA
q
jqqj
y
A
CAAC
x
x
x
x
)(
)()(
)Pr(
)Pr( )|Pr( ),(
57
58
Maximum likelihood method
58
Likelihood of a data set
Minimizationof the negative log-
likelihood
Optimize parameters vj, σj and Pr(ωc|Aj) that minimize the error function
Constrained optimization problem (probability parameters Pr(ωc|Aj) must satisfy summation conditions)
Part 2: Optimization of vj, σj and Pr(ωc|Aj)
59
Unconstrained minimization of vj, σj and ujc
Gradient descent optimization algorithm is used to minimize the objective function – i.e. the available classification examples are processed one by one and updates are performed after each sample
Constrained optimization
problem
Constrained optimization
problem
Unconstrained optimization
problem
Unconstrained optimization
problemusing ujc
Maximum likelihood method
59
60
Fuzz IEEE 2013, Hyderabad India
Experimental comparison (1)
• Use Gaussian membership functions
• The centers cql are determined using fuzzy c-means
clustering
• The widths σql are set equal to σql = minj′ ≠ j ||cq – cq
′||
d
l ql
qllA
cxq
12
2)(exp)(
x
60
61Fuzz IEEE 2013, Hyderabad India
Experimental comparison (2)
• Misclassification rates
• Calculated using ten-fold cross-validation• Standard deviations reported within parentheses
Wisconsin breast cancer
Wine
Sequential method0.261
(0.036)0.034
(0.048)
Maximum likelihood0.029
(0.021)0.023
(0.041)
61
62Fuzz IEEE 2013, Hyderabad India
Future research directions
• New estimation methods for the model parameters• Joint estimation• Information-theory based techniques• Better optimization methods
• Interaction linguistic knowledge and data-driven estimation• Optimizing model complexity, model simplification• Interpretability of probabilistic fuzzy models• Linguistic descriptions of probability density functions• Equivalence to other systems: e.g. fuzzy Markov models• Density estimation using more complex models as rule
consequents: e.g. fuzzy GARCH models• New applications
62
63
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions
65
Agenda
•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions
66
Concluding remarks
•Probabilistic fuzzy systems combine linguistic uncertainty and probabilistic uncertainty
•Very useful in applications where a probabilistic model (pdf estimation) has to be conditioned (or constrained) by linguistic information
•Good parameter estimation methods exist and the added value of these models has been demonstrated in various applications
67
Conclusions
•Fuzzy models usually show smooth non-linear behavior•If certain measures are taken, fuzzy models are interpretable
•Fuzzy models can be induced from• expert knowledge (grid selection and definition of fuzzy
rules): this expert-driven approach was very successful in
control theory)
• a data set: the data-driven approach•The accuracy-interpretability dilemma is prominent!•The bias-variance dilemma is prominent!
68
Selected bibliography
• J. van den Berg, W. M. van den Bergh, and U. Kaymak. Probabilistic and statistical fuzzy set foundations of competitive exception learning. In Proceedings of the Tenth IEEE International Conference on Fuzzy Systems, volume 2, pages 1035–1038, Melbourne, Australia, Dec. 2001.
• J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Probabilistic reasoning in fuzzy rule-based systems. In P. Grzegorzewski, O. Hryniewicz, and M. A. Gil, editors, Soft Methods in Probability, Statistics and Data Analysis, Advances in Soft Computing, pages 189–196. Physica Verlag, Heidelberg, 2002.
• J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Fuzzy classification using probability-based rule weighting. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems, pages 991–996, Honolulu, Hawaii, May 2002.
• U. Kaymak, W.-M. van den Bergh, and J. van den Berg. A fuzzy additive reasoning scheme for probabilistic Mamdani fuzzy systems. In Proceedings of the 2003 IEEE International Conference on Fuzzy Systems, volume 1, pages 331–336, St. Louis, USA, May 2003.
• U. Kaymak and J. van den Berg. On probabilistic connections of fuzzy systems. In Proceedings of the 15th Belgium-Netherlands Artificial Intelligence Conference, pages 187–194, Nijmegen, Netherlands, Oct. 2003.
• J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Financial markets analysis by using a probabilistic fuzzy modelling approach. International Journal of Approximate Reasoning, 35: 291–305, 2004.
69
Selected bibliography• L. Waltman, U. Kaymak, and J. van den Berg. Maximum likelihood parameter estimation in
probabilistic fuzzy classifiers. In Proceedings of the 14th Annual IEEE International Conference on Fuzzy Systems, pages 1098–1103, Reno, Nevada, USA, May 2005.
• D. Xu and U. Kaymak. Value-at-risk estimation by using probabilistic fuzzy systems. In Proceedings of the 2008 IEEE World Congress on Computational Intelligence (WCCI 2008), pages 2109–2116, Hong Kong, June 2008.
• R. J. Almeida and U. Kaymak. Probabilistic fuzzy systems in value-at-risk estimation. International Journal of Intelligent Systems in Accounting, Finance and Management, 16(1/2):49–70, 2009.
• J. Hinojosa, S. Nefti, and U. Kaymak. Systems control with generalized probabilistic fuzzy-reinforcement learning. IEEE Transactions on Fuzzy Systems, 19(1):51–64, February 2011.
• R. J. Almeida, N. Basturk, U. Kaymak, and V. Milea. A multi-covariate semi-parametric conditional volatility model using probabilistic fuzzy systems. In Proceedings of the 2012 IEEE International Conference on Computational Intelligence in Financial Engineering and Economics (CIFEr 2012), pages 489–496, New York City, USA, 2012.
• J. van den Berg, U. Kaymak, and R.J. Almeida. Function approximation using probabilistic fuzzy systems. IEEE Transactions on Fuzzy Systems, 2013.