feature grouping-based fuzzy-rough feature selection richard jensen neil mac parthaláin chris...

21
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis

Upload: angelo-sowl

Post on 14-Dec-2015

223 views

Category:

Documents


5 download

TRANSCRIPT

Feature Grouping-Based Fuzzy-Rough Feature Selection

Richard JensenNeil Mac Parthaláin

Chris Cornelis

Outline

• Motivation/Feature Selection (FS)

• Rough set theory

• Fuzzy-rough feature selection

• Feature grouping

• Experimentation

The problem: too much data

• The amount of data is growing exponentially– Staggering 4300% annual growth in global data

• Therefore, there is a need for FS and other data reduction methods– Curse of dimensionality: a problem

for machine learning techniques

• The complexity of the problem is vast– (e.g. the powerset of features for FS)

Feature selection

• Remove features that are: – Noisy – Irrelevant – Misleading

• Task: find a subset that– Optimises a measure of subset goodness– Has small/minimal cardinality

• In rough set theory, this is a search for reducts– Much research in this area

Rough set theory (RST)

• For a subset of features P

Upper approximation

Set X

Lower approximation

Equivalence class [x]P

Rough set feature selection

• By considering more features, concepts become easier to define…

Rough set theory

• Problems:– Rough set methods (usually) require data

discretization beforehand– Extensions require thresholds, e.g. tolerance

rough sets– Also no flexibility in approximations• E.g. objects either belong fully to the lower (or upper)

approximation, or not at all

Fuzzy-rough sets

• Extends rough set theory– Use of fuzzy tolerance instead of crisp equivalence – Approximations are fuzzified– Collapses to traditional RST when data is crisp

• New definitions:

Fuzzy upper approximation:

Fuzzy lower approximation:

Fuzzy-rough feature selection

• Search for reducts– Minimal subsets of features that preserve the fuzzy lower

approximations for all decision concepts

• Traditional approach– Greedy hill-climbing algorithm used– Other search techniques have been applied (e.g. PSO)

• Problems– Complexity is problematic for large data (e.g. over several

thousand features)– No explicit handling of redundancy

Feature grouping• Idea: don’t need to consider all features

– Those that are highly correlated with each other carry the same or similar information

– Therefore, we can group these, and work on a group by group basis

• This paper: based on greedy hill-climbing– Group-then-rank approach

• Relevancy and redundancy handled by– Correlation: similar features grouped together– Internal ranking (correlation with decision feature)

f7f4

f1f2 f9

f8

F1

Forming groups of features

Calculate correlations

F1 F2 F3 Fn. . .

#1 f3

#2 f12 #3 f1

…#m fn

#1 f#2 f #3 f…#m fn

#1 f#2 f #3 f…#m fn

#1 f#2 f #3 f…#m fn

Feature groups

Internally-rankedfeature groups

Correlation measure

Threshold:

Redundancy

Relevancy

Dataτ

. . .

Selecting features

Feature subset search and selection

Search mechanism

Subset evaluation

Selected subset(s)

Fuzzy-rough feature grouping

Initial experimentation• Setup:

– 10 datasets (9-2557 features)– 3 classifiers– Stratified 5 x 10-fold cross-validation

• Performance evaluation in terms of– Subset size– Classification accuracy– Execution time

• FRFG compared with – Traditional greedy hill-climber (GHC)– GA & PSO (200 generations, population size: 40)

Results: average subset size

Results: classification accuracy

JRip

IBk (k=3)

Results: execution times (s)

Conclusion

FRFG– Motivation: reduce computational overhead; improve

consideration of redundancy– Group-then-rank approach– Parameter determines granularity of grouping– Weka implementation available: http://bit.ly/1oic2xM

Future work– Automatic determination of parameter τ

– Experimentation using much larger data, other FS methods, etc

– Clustering of features– Unsupervised selection?

Thank you!

Simple exampleDataset of six features

After initialisation, the following groups are formed

Within each group, rank determines relevance: e.g. f4 more relevant than f3

Ordering of groups

Greedy hill-climber

f1f4 f3

f2

F1

F2

f1f3F3

f5f4 f1F4

etc…

{F4, F1, F3, F5, F2, F6}F =

Simple example...

• First group to be considered: F4 – Feature f4 is preferable over others– So, add this to current (initially empty) subset R – Evaluate M(R + {f4}):

• If better score than the current best evaluation, store f4

• Current best evaluation = M(R + {f4})

– Set of features which appear in F4: ({f1 , f4 , f5}) • Add to the set Avoids

• Next feature group with elements that do not appear in Avoids: F1

And so on…

f5f4 f1

F4

f1f4

f3F1