1 combining fuzzy and statistical uncertainty: probabilistic fuzzy systems and their applications...

69
1 Combining fuzzy and statistical uncertainty: probabilistic fuzzy systems and their applications prof.dr.ir. Jan van den Berg - TUDelft: Faculties of TPM and EWI - Cyber Security Academy The Hague [email protected] http://tbm.tudelft.nl/index.php?id=30084&L=1

Upload: colin-anderson

Post on 17-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

1

Combining fuzzy and statistical uncertainty: probabilistic fuzzy

systems and their applications

prof.dr.ir. Jan van den Berg- TUDelft: Faculties of TPM and EWI

- Cyber Security Academy The Hague

[email protected]

http://tbm.tudelft.nl/index.php?id=30084&L=1

2

Summary of the talk

•Two complementary conceptualizations of uncertainty will be discussed: statistical and fuzzy uncertainty.

•These uncertainties can be combined into one theory on probabilistic fuzzy events. Using this theory,classical fuzzy systems can be generalized to probabilistic fuzzy systems (PFS).

•PFS can be induced using both expert knowledge and data enabling both interpretability and accuracy (despite the fact there remains an accuracy-interpretability dilemma we have to deal with in practice).

•To finalize, one or two examples of PFSs we developed will be shown: next time!

3

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications (next time)•Conclusions

4

Probabilitic/statistical uncertainty

•Probabilitic/statistical uncertainty: well-known notion

•Crisp events occur with a certain probability

•These probabilities can be assessed statistically• E.g., frequentist approach: by repeating experiments• A lot of theory on unbiased estimation, ML estimation, etc.

•In continuous outcome spaces, probability distributions are used

•Mathematical statistics: descriptive and inferential statistics(the latter on drawing conclusions from data using some model for the data)

5

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions

6 6

Crisp sets

Collection of definite, well-definable objects (elements) to form a whole, having crisp boundaries

Representations of sets:

• list of all elements

A = {x1, ,xn}, xj X

• elements with property P

A={x|x satisfies P }, x X

• Venn diagram

• characteristic functionfA: X {0,1}, fA(x) = 1, x XfA(x) = 0, x X

A

Real numbers larger than 3:

0

1

3X

7

Fuzzy sets

• Sets with fuzzy, gradual boundaries (Zadeh 1965)

• A fuzzy set A in X is characterized by its membership function A: X [0,1]

A fuzzy set A is completely determined by the set of

ordered pairs

A={(x,A(x))| x X}

X is called the domain or universe of discourse

Real numbers about 3:

0

1

3

X

A(x)

8

Fuzzy sets on discrete universes

•Fuzzy set C = “desirable city to live in”X = {SF, Boston, LA} (discrete and non-ordered)C = {(SF, 0.9), (Boston, 0.8), (LA, 0.6)}

•Fuzzy set A = “sensible number of children”X = {0, 1, 2, 3, 4, 5, 6} (discrete universe)A = {(0, .1), (1, .3), (2, .7), (3, 1), (4, .6), (5, .2), (6, .1)}

Num ber of Children

Mem

bers

hip

Gra

des

9

Fuzzy sets on continuous universes

• Fuzzy set B = “about 50 years old”

X = Set of positive real numbers (continuous)

B = {(x, B(x)) | x in X}

B xx

( )

1

150

10

2

Me

mb

ers

hip

Gra

des

Age

10

•Fuzzy partition formed by the linguistic values “young”, “middle aged”, and “old”

•For any age: sum of membership values = 1

Fuzzy partition

0 10 20 30 40 50 60 70 80 900

0.5

1Young M idd le A ged O ld

Age

Me

mb

ersh

ip G

rad

es

11

Operations with fuzzy sets

)()())(),(max()( xxxxxBAC BABAc

)()())(),(min()( xxxxxBAC BABAc

A X A x xA A ( ) ( )1

)()()()()( xxxxxBAC BABAc

)()()( xxxBAC BAc Note the multiple definitions!

12

0

0.2

0.4

0.6

0.8

1

(a) Fuzzy Sets A and B

A B

0

0.2

0.4

0.6

0.8

1

(b) Fuzzy Set "not A"

0

0.2

0.4

0.6

0.8

1

(c) Fuzzy Set "A OR B"

0

0.2

0.4

0.6

0.8

1

(d) Fuzzy Set "A AND B"

Set theoretic operations, examples

maximum

minimum

13

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions

14

Fuzzy Modeling (expert-driven)

•FM can be defined based on expert knowledge:

•Many human concepts (big, long, high, very much, adequate, satisfactory, …) are defined in a (context dependent) quantitative way, describing state of nature

•Examples of facts about the world:•The president is middle-aged•The water supply is insufficient •Birth weight of Romanian children is quite low

Need for formalization: linguistic variable

15

Linguistic variable

• A numerical variable takes numerical values:

Age = 65 (defines a crisp event)• A linguistic variables takes linguistic values:

Age is old (defines a fuzzy event)• A linguistic value is defined by a fuzzy set (enabling a

characterization with gradual transitions):

Ex: A fuzzy partition of linguistic cariable ‘Age’ formed with linguistic values “young”, “middle aged”, and “old”:

0 10 20 30 40 50 60 70 80 900

0.5

1Young M idd le A ged O ld

Age

Me

mb

ersh

ip G

rad

es

16

Expert-driven fuzzy modeling, cont.

•Experts can express their knowledge in •Facts about the world (see above), and•Fuzzy IF-THEN rules

•Example fuzzy rules:•IF Nutrician state is poor AND Birth weight is medium AND Respiration disease is absent THEN Child mortality rate is medium

•IF Nutrician state is medium AND Birth weight is not too low AND Respiration disease is absent THEN Child mortality rate is rather low

•Need for Reasoning/Inference mechanism

17

Fuzzy (Inference) System (FS)

•Fuzzifier (interface: from crisp to fuzzy)•Rule base (enabling interpretability/transparency)•Inference engine (implements fuzzy reasoning)•Defuzzifier (interface: from fuzzy to crisp)•Note: Fuzzifier or defuzzifier may be absent

Fuzzifier

Defuzzifier

Rule base(knowledge base)

Inference engineinput

output

18

Example Fuzzy System: Mamdani model

•Five major steps:1. Fuzzification2. Degree of fulfillment3. Inference4. Aggregation5. Defuzzification

•Computations according to Mamdani reasoning apply a generalized form of classical modus ponens:

Given: x is A' and If x is A, then y is B, Conclude: y is B'

19

Mamdani reasoning - example

20

Resulting FS: a smooth non-linear mapping

21

Inducing a fuzzy model from data

If income is Low then tax is Low

If income is High then tax is High

22

Bias-variance dilemma (!)Interpretability-accuracy dilemma (!)

Algorithms exist to gradually induce more and more rules from data

•Bias-variance dilemma to find models of right complexity

•Interpretability-accuracy dilemma is another key issue (of Data Mining)

23

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions

24

Probability of a crisp and fuzzy events

•Crisp

•Fuzzy (Zadeh, 1968)

dxxpxdxxpAX AAx

)()()()Pr(

dxxpxAX A )()()Pr(

Probability of a fuzzy event is the expectation of its membership function!

X B

X BA

dxxpx

dxxpxx

B

BABA

)()(

)()()(

)Pr(

)Pr( )|Pr(

Conditional probability:

Satisfies P(A|A) = 1

25

An example, continuous domain

Answering the question:

“What is the probability that a randomly selected Indian woman is tall?”

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

1.50 1.68 2.0

calculating the probability that a randomly selected Indian woman is tall

tall

pdf of length

fuzzy pdf

))((

)()( )Pr(

xE

dxxfxA

A

A

26

Probabilistic fuzzy events, discrete case

• 2 discrete probabilistic fuzzy events:

A1 = ( (x 1), (x 2)) = (m, 1 - n) and A2 = ( ’ (x 1),

’ (x 2)) = (1 – m,n)

• If x 1 occurs, then

A1 occurs with degree m

and A2 occurs with degree 1- m

• Similarly, if x 2 occurs…

• E.g., A1 means “tall” and

A2 = “small”

where x 1 and x 2 are two

values of the x - variable length

• Pr(A1) = mp + (1 – n)(1 – p), and

Pr(A 2) = (1 –m )p + n (1 – p)

27

Simple estimation of probabilities

•Let x1, …, xn be a random sample on a domain X•The probability of a crisp event A can be estimated by

•The probability of a fuzzy event Ai can be estimated by

assuming that X is well-formed (~ fuzzy partition), i.e.

k

kAn)(

1x

k

kAin)(

1x

i

kAk i 1)(xx

28

Deterministic and probabilistic rules

qqq CyAR is then is If : Rule x

Model parameters: qq CA ,

Ex: If current returns are large, then future returns will be large

If current returns are large, then future returns will be large with probability 1

future returns will be small with probability 1-1

Probabilistic uncertainty

Linguistic vagueness

29

Probabilistic fuzzy rules

)Pr( with is

...and and )Pr( with is

...and and )Pr( with is

then is If : Rule

11

qJqJq

qjqjq

qqq

qq

|ACCy

|ACCy

|ACCy

AR x

Model parameters: )Pr( , , qjqjqq |ACCA

30

A ‘crazy’ theoretical sidestep

•Probabilistic Fuzzy Entropy

•To be used to induce fuzzy decision trees

31

Definition of PF Entropy, discrete source

•Given a (well-formed) sample space with a fuzzy partition of fuzzy events A1, …, AC defined by membership functions occurring with probabilities Pr(A1), …, Pr(AC ), the PFE is defined as:

PFE is a probabilistic type of entropy defined in a fuzzily partitioned (sample) space!

)(,),(1 xx C

)))(((log()))(((log))((

)))(Pr(log())(Pr(log)Pr(

2

C

1c2

2

C

1c2

),(

xEExExE

AEAAsf

ccc

cccpH

32

A very special information source

Consider 2 discrete strictly complementary statistical fuzzy events:

A1 = ( (x1), (x2)) = (m, 1 - m) , A2 = ( ’ (x1),

’ (x2)) = (1 – m,m)

33

A very special information source, cont.

Since A1 = (m, 1 - m), A2 = (1 – m, m), Pr (x1 ) = p , and Pr (x2 ) = 1 – p,

it follows that:

Pr(A1) = mp + (1 – m)(1 – p) = 2mp + 1 – m – p = 1 – Q (1)

Pr(A2) = p (1 – m) + (1 – p) m = m + p – 2mp = Q (2)

34

Entropy of information source generating 2 strictly complementary stat fuzzy events

• Using definition of PFE (sh. 31 ) and equations (1) and (2) from previous sheet, it follows that

Hsf (m,p ) = - Q log2 Q - (1 - Q ) log2 (1 - Q ) where

Q (m,p ) = m + p - 2 m p

• Q (like 1 – Q) relates to the combined uncertainty of the probabilistic fuzzy events based on their fuzziness m and the probability of occurrence p:

35

Further interpreting Q• Q (m,p ) = m + p - 2 m p• Q = 0 or 1 no uncertainty;

Q = 0.5 highest uncertainty• Illumination and interpretation:

- m = p = 0 or 1, Q = 0 two crisp events, one of which occurs with probability 1- m = 0, p = 1 or m = 1, p = 0

Q = 1, same explanation!- p = 0.5 or m = 0.5 Q = 0.5 two fuzzy events having equal prob, or two non-distinguishable fuzzy events!!

36

Interpretation of Hpf (m,p)

Hpf (m,p ) = - Q log2 Q - (1 - Q ) log2 (1 - Q )

• PFE quantifies the combined uncertainty

• Illumination:- Q = 0 or 1 ~ no uncertainty ~ H (m,p ) = 0, in 4 corners- p = 0.5 or m = 0.5 ~ Q = 0.5 ~ two fuzzy events having equal prob, or two non-distinguishable fuzzy events ~ highest uncertainty ~ H (m,p ) = 1- if m = 0 or 1: classical ‘crisp’ entropy- if p = 0 or 1: ‘fuzzy’ entropy only

37

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions

38

Probabilistic fuzzy systems, withdiscrete probability distribution

antecedent

consequent

fuzzy set

39

If x is A4 then y is B1 with probability p(B1 | A4), and y is B2 with probability p(B2 | A4), and y is B3 with probability p(B3 | A4).

Deterministic vs. probabilistic FS

X

A4

B2

Y If x is A4 then y is B2

A5A3A1

A2B1

B3

40

Probabilistic fuzzy system, with continuous probability distribution

)|()( then is If : Rule qqq AyfyfAR x

Additive reasoning:

Q

qqq

Q

qqqQ

qA

Q

qqA

Ayyy

Ayf

Ayf

yf

q

q

1

1

1

1

)|E()()|E(

)|()(

)(

)|()(

)|(

xx

x

x

x

x

41

Probability distribution characterization

•In general, different characterizations can be used for the conditional probability density in the rule consequents

•This characterization could be an approximation with a histogram or an explicit model for density, e.g., a normal or other distribution

•In PFS, we can select a fuzzy histogram characterization

42

Histograms, classical crisp case

• Let x1, …, xn be a random sample from a univariate distribution with pdf f(x)• Let the characteristic functions i (x) (defining crisp bins/intervals Ai) constitute a crisp partitioning:

• A histogram estimates f (x) (from data xk ) as follows:

i i

kkii

i i

ii

dxx

xn

x

A

pxxf

)(

)(1

)(

||

)( )(

)(^

0)().(: 1)(: xxjixx jii

i

43

Fuzzy histograms

i i

kkii

i i

ii

dxx

xn

x

A

pxxf

)(

)(1

)(

||

)( )(

)(^

i

i xx 1)(:

44

Crisp vs. fuzzy histogram

Membership values

0

0.2

0.4

0.6

0.8

1

1.2

-3.5

-3.2

-2.8

-2.5

-2.1

-1.8

-1.4

-1.1

-0.7

-0.4 -0

0.3

5

0.7

1.0

5

1.4

1.7

5

2.1

2.4

5

2.8

3.1

5

3.5

A1

A2

A3

A4

A5

A6

A7

pdf normal distribution Crisp histogram Fuzzy histogram

45

PFS: fuzzy histogram model

Fuzz IEEE 2013, Hyderabad India 45

46

Fuzzy histogram model

Fuzz IEEE 2013, Hyderabad India 46

47

Probabilistic Mamdani systems

)Pr( with

...and and )Pr( with

...and and )Pr( with

then is If : Rule

11

qJJ

qjj

q

qq

|ACCy

|ACCy

|ACCy

AR

x

Q

qjqj

J

jq z|ACyy

1 1

)Pr()()|E( xx Reasoning:

Centroid of fuzzy consequent set Cj

48

Probabilistic fuzzy output model

49

Probabilistic TS systems

Zero-order probabilistic Takagi-Sugeno

)Pr( with

...and and )Pr( with

and )Pr( with

then is If : Rule

22

11

qJJ

q

q

qq

|Ayyy

|Ayyy

|Ayyy

AR

x

In essence, consequents are crisp sets centered around jy

Q

qqj

J

jjq |Ayyyy

1 1

)Pr()()|E( xx Reasoning:

50

Relation to deterministic FSs

•Zero-order Takagi-Sugeno system

Takagi-Sugeno reasoning

Q

qqq cy

1

)(* x

c.f.

Q

qjqj

J

jq z|Ayyy

1 1

)Pr()()|E( xx

Select jqj

J

jq z|Ayc )Pr(

1

51

Probabilistic fuzzy systems

• Essentially a fuzzy system that estimates a probability density function, i.e. the fuzzy system approximates a p.d.f.

• Usually p.d.f. is conditional on the input• Linguistic information is coded in fuzzy rules• Combine linguistic uncertainty with

probabilistic uncertainty• Different types of fuzzy systems can be extended

to the PFS equivalent (e.g. Mamdani fuzzy systems, Takagi-Sugeno fuzzy systems)

summary

52

PFS design

• Identifying mental world vs. observed world (van den Eijkel 1999)

• Mental world: linguistic descriptions, fuzzy conceptualization, experts’ knowledge

• Observed world: data measurements, probability density functions, optimal consequent parameters

• Optimal design given a mental world: application of conditional probability measures for fuzzy events

• Optimal design given an observed world: nonlinear optimization techniques

53

Parameter determination

Part 2 - OptimizationPart 2 - Optimization

Maximum Likelihood Method

MF parameter

s

MF parameter

s

Probability

parameters

Probability

parameters

Part 1 – Initialization

Sequential Method

Part 1 – Initialization

Sequential Method

Sequential Method

Part 1MF

parameters

Part 1MF

parameters

Part 2Probabili

ty parameter

s

Part 2Probabili

ty parameter

s

MF – Membership function

54

Part 1: Finding the membership function (MF) parameters

is a fuzzy set defined by a membership function

E.g. – Gaussian MF parameters:

o v – center of MFo σ – width of MF

Sequential method

FCM ClusteringFCM Clustering

55

MF determination

•For the first part of the sequential method, well-known techniques from fuzzy modeling can be applied• Fuzzy clustering in input-output product space• Fuzzy clustering in input and output space• Expert-driven design• Similarity-based rule-base simplification• Feature selection• Heuristic approaches• Etc.

56

Sequential method

56

Part 2: Finding the probability parameters - Pr(ωc|Aj)

Set the parameters Pr(ωc|Aj) equal to estimates of the conditional probabilities - conditional probability estimation

57

Estimation of probability parameters

• Conditional probabilities Pr(Cj | Aq) can be assessed

directly by using the definition of the probability of joint events:

• This method does not provide maximum likelihood estimates of the probability parameters.

k q

kk jq

kA

y kCkA

q

jqqj

y

A

CAAC

x

x

x

x

)(

)()(

)Pr(

)Pr( )|Pr( ),(

57

58

Maximum likelihood method

58

Likelihood of a data set

Minimizationof the negative log-

likelihood

Optimize parameters vj, σj and Pr(ωc|Aj) that minimize the error function

Constrained optimization problem (probability parameters Pr(ωc|Aj) must satisfy summation conditions)

Part 2: Optimization of vj, σj and Pr(ωc|Aj)

59

Unconstrained minimization of vj, σj and ujc

Gradient descent optimization algorithm is used to minimize the objective function – i.e. the available classification examples are processed one by one and updates are performed after each sample

Constrained optimization

problem

Constrained optimization

problem

Unconstrained optimization

problem

Unconstrained optimization

problemusing ujc

Maximum likelihood method

59

60

Fuzz IEEE 2013, Hyderabad India

Experimental comparison (1)

• Use Gaussian membership functions

• The centers cql are determined using fuzzy c-means

clustering

• The widths σql are set equal to σql = minj′ ≠ j ||cq – cq

′||

d

l ql

qllA

cxq

12

2)(exp)(

x

60

61Fuzz IEEE 2013, Hyderabad India

Experimental comparison (2)

• Misclassification rates

• Calculated using ten-fold cross-validation• Standard deviations reported within parentheses

Wisconsin breast cancer

Wine

Sequential method0.261

(0.036)0.034

(0.048)

Maximum likelihood0.029

(0.021)0.023

(0.041)

61

62Fuzz IEEE 2013, Hyderabad India

Future research directions

• New estimation methods for the model parameters• Joint estimation• Information-theory based techniques• Better optimization methods

• Interaction linguistic knowledge and data-driven estimation• Optimizing model complexity, model simplification• Interpretability of probabilistic fuzzy models• Linguistic descriptions of probability density functions• Equivalence to other systems: e.g. fuzzy Markov models• Density estimation using more complex models as rule

consequents: e.g. fuzzy GARCH models• New applications

62

63

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions

64

Applications

•Next time!

65

Agenda

•Probabilistic/statistical uncertainty•Fuzzy uncertainty•Fuzzy systems•Probabilistic fuzzy theory•Probabilistic fuzzy systems•Applications•Conclusions

66

Concluding remarks

•Probabilistic fuzzy systems combine linguistic uncertainty and probabilistic uncertainty

•Very useful in applications where a probabilistic model (pdf estimation) has to be conditioned (or constrained) by linguistic information

•Good parameter estimation methods exist and the added value of these models has been demonstrated in various applications

67

Conclusions

•Fuzzy models usually show smooth non-linear behavior•If certain measures are taken, fuzzy models are interpretable

•Fuzzy models can be induced from• expert knowledge (grid selection and definition of fuzzy

rules): this expert-driven approach was very successful in

control theory)

• a data set: the data-driven approach•The accuracy-interpretability dilemma is prominent!•The bias-variance dilemma is prominent!

68

Selected bibliography

• J. van den Berg, W. M. van den Bergh, and U. Kaymak. Probabilistic and statistical fuzzy set foundations of competitive exception learning. In Proceedings of the Tenth IEEE International Conference on Fuzzy Systems, volume 2, pages 1035–1038, Melbourne, Australia, Dec. 2001.

• J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Probabilistic reasoning in fuzzy rule-based systems. In P. Grzegorzewski, O. Hryniewicz, and M. A. Gil, editors, Soft Methods in Probability, Statistics and Data Analysis, Advances in Soft Computing, pages 189–196. Physica Verlag, Heidelberg, 2002.

• J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Fuzzy classification using probability-based rule weighting. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems, pages 991–996, Honolulu, Hawaii, May 2002.

• U. Kaymak, W.-M. van den Bergh, and J. van den Berg. A fuzzy additive reasoning scheme for probabilistic Mamdani fuzzy systems. In Proceedings of the 2003 IEEE International Conference on Fuzzy Systems, volume 1, pages 331–336, St. Louis, USA, May 2003.

• U. Kaymak and J. van den Berg. On probabilistic connections of fuzzy systems. In Proceedings of the 15th Belgium-Netherlands Artificial Intelligence Conference, pages 187–194, Nijmegen, Netherlands, Oct. 2003.

• J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Financial markets analysis by using a probabilistic fuzzy modelling approach. International Journal of Approximate Reasoning, 35: 291–305, 2004.

69

Selected bibliography• L. Waltman, U. Kaymak, and J. van den Berg. Maximum likelihood parameter estimation in

probabilistic fuzzy classifiers. In Proceedings of the 14th Annual IEEE International Conference on Fuzzy Systems, pages 1098–1103, Reno, Nevada, USA, May 2005.

• D. Xu and U. Kaymak. Value-at-risk estimation by using probabilistic fuzzy systems. In Proceedings of the 2008 IEEE World Congress on Computational Intelligence (WCCI 2008), pages 2109–2116, Hong Kong, June 2008.

• R. J. Almeida and U. Kaymak. Probabilistic fuzzy systems in value-at-risk estimation. International Journal of Intelligent Systems in Accounting, Finance and Management, 16(1/2):49–70, 2009.

• J. Hinojosa, S. Nefti, and U. Kaymak. Systems control with generalized probabilistic fuzzy-reinforcement learning. IEEE Transactions on Fuzzy Systems, 19(1):51–64, February 2011.

• R. J. Almeida, N. Basturk, U. Kaymak, and V. Milea. A multi-covariate semi-parametric conditional volatility model using probabilistic fuzzy systems. In Proceedings of the 2012 IEEE International Conference on Computational Intelligence in Financial Engineering and Economics (CIFEr 2012), pages 489–496, New York City, USA, 2012.

• J. van den Berg, U. Kaymak, and R.J. Almeida. Function approximation using probabilistic fuzzy systems. IEEE Transactions on Fuzzy Systems, 2013.