soft computing techniques for statistical databases miroslav hudec infostat – bratislava msis 2009

17
SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Upload: darrell-flowers

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

SOFT COMPUTING TECHNIQUES FOR

STATISTICAL DATABASES

Miroslav Hudec

INFOSTAT – Bratislava

MSIS 2009

Page 2: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Introduction

• Soft computing (by fuzzy logic)

• Database query (SQL - fuzzy)

• case study

• Data classification (usual - fuzzy)

• case study

• Conclusion

Page 3: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Soft computing

The essential property of soft computing (SC) is to “soften” hard computing (HC) techniques for coping with the imprecision, ambiguity and uncertainty.

HC uses two-valued logic (e.g. the element satisfies or not the criterion)Fuzzy logic as a part of SC uses many valued logic (e.g. the element can partly satisfy the criterion)

Computing with words is inspired by the human capability to perform a wide variety of tasks without exact measurements and computations. (Flexible database query. Interesting for statistical IS?)

Page 4: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Database queries (SQL)

select * from Tablewhere attribute_p > P and attribute_r < R.

0 attribute_p

attri

bute

_r

R

P

two-valued logic

Page 5: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

SQL and fuzzy queries

SQLconditions>=, <=, =

many-valued logic

fuzzy

Ld

1

0

µ(B)

attributeLp 0

1

Ld Lg

µ(A)

LqLp attribute

µ(S)

0

1

LgLp attribute

)(WHERE1

ixi

n

iLa

or

and

About is , and

Small is a ,

Big is a ,

i

i

iigiidi

igi

idi

ixi

aLaLa

La

La

La

logical operatorsand, or:1 and 1 =10 and 1 =0one function for and and or operator

two-valued logic

0,7 and 0,358=?n 1,...,i , ))(amin( :minimum i i

n 1,...,i )),(a( :product ii

(0.358)

(0.2506)

for {0,1} logic minimum and product become ordinary and operator

big small about

Page 6: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Case study

select district, roads, area from Twhere roads is Big and area is Small

The length of road indicator is represented by „Big value“ fuzzy set with these parameters Ld=200km and Lp =300km. The „Small value“ fuzzy set with parameters Lp=450km2 and Lg =650km2 describes the area of district attribute.

Page 7: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Solution

If SQL was used, this additional valuable information would remain hidden.

Page 8: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Discussion

For the very soft gradation, the infinite number of SQL queries has to be used. In case of fuzzy queries, one query is sufficient.

The advantages of this approach for users are as follows: • the connection to a database (connection string) and data

accessing (SQL command) do not have to be modified;• users do not need to learn a new query language;• the interface supports (quasi) natural language;• presenting of obtained data is in similar way as from

SQL but with additional valuable information;• users see data “behind the corner“ (colored areas in table) and can take into account possible interested data.

Page 9: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Data classificationtwo-valued logic

How to solve this problem without additional calculation?

Approximate reasoning and fuzzy logic

C3 C4

C1 C2

Roads [km]0

67

124

T1

T2

T3

T4

Snow [days]

0

6030

Page 10: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Data classificationmany-valued logic

The same GLC

classify_into [classCx]select [attributes]from [tables, views]

)( WHERE11

ixi

n

i

K

kLa

C3 C4

C1 C2

25 35

6075

I1

I2

Page 11: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Case study In this case study municipalities are classified according to the

percentage of needs for the winter road maintenance.

60 75

(x)

1 S B

25 35

(x)

1 S B

P1 - length ofroads [km]

P2 - number ofdays with snow

This example contains following fuzzy rules :If Road is Small and Snow is Small Then Maintenance is Small; If Road is Small and Snow is Big Then Maintenance is Medium; If Road is Big and Snow is Small Then Maintenance is Medium;If Road is Big and Snow is Big Then Maintenance is Big.

(0.1)

(0.5)

(0.9)

Page 12: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Case study

classify_into Sselect * from Table where roads is Small and snow is Small;

classify_into Mselect * from Table where (roads is Small and snow is Big) or (roads is Big and snow is Small);

classify_into Bselect * from Tablewhere roads is Big and snow is Big.

Page 13: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Case study

If classical classification were used, this additional valuable information would remain hidden (Softer classification between objects T1-T4).

Page 14: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Implementation

Knowledgebase

IF-THENrules

FuzzySQL

Database

Selection

Classification

Ci CjUser

User

Page 15: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

SQL and fuzzy approach

SQL queries are useful when a clean and exact boundary between selected and non selected data is required (faster and less calculations).

Fuzzy queries provide flexibility for the definition of query and inclusion of records that almost meet the query criterion (more operations, more information).

User decides which type of query is better for each task.

Tools basedon HC

Tools basedon SC

Database

Page 16: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Conclusion

This approach allows users of statistical information systems to use their approximate reasoning during work with data.

When users work with usual software tools they have to change their many-valued logical thinking (approximate reasoning) into the two-valued computer logic.

This fuzzy approach supports work with linguistic expressions on the client side, nevertheless it does not need any modification of relational databases.

Page 17: SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009

Thank you for your attention