Feature Grouping-Based Fuzzy-Rough Feature Selection
Richard JensenNeil Mac Parthaláin
Chris Cornelis
Outline
• Motivation/Feature Selection (FS)
• Rough set theory
• Fuzzy-rough feature selection
• Feature grouping
• Experimentation
The problem: too much data
• The amount of data is growing exponentially– Staggering 4300% annual growth in global data
• Therefore, there is a need for FS and other data reduction methods– Curse of dimensionality: a problem
for machine learning techniques
• The complexity of the problem is vast– (e.g. the powerset of features for FS)
Feature selection
• Remove features that are: – Noisy – Irrelevant – Misleading
• Task: find a subset that– Optimises a measure of subset goodness– Has small/minimal cardinality
• In rough set theory, this is a search for reducts– Much research in this area
Rough set theory (RST)
• For a subset of features P
Upper approximation
Set X
Lower approximation
Equivalence class [x]P
Rough set feature selection
• By considering more features, concepts become easier to define…
Rough set theory
• Problems:– Rough set methods (usually) require data
discretization beforehand– Extensions require thresholds, e.g. tolerance
rough sets– Also no flexibility in approximations• E.g. objects either belong fully to the lower (or upper)
approximation, or not at all
Fuzzy-rough sets
• Extends rough set theory– Use of fuzzy tolerance instead of crisp equivalence – Approximations are fuzzified– Collapses to traditional RST when data is crisp
• New definitions:
Fuzzy upper approximation:
Fuzzy lower approximation:
Fuzzy-rough feature selection• Search for reducts– Minimal subsets of features that preserve the fuzzy lower
approximations for all decision concepts
• Traditional approach– Greedy hill-climbing algorithm used– Other search techniques have been applied (e.g. PSO)
• Problems– Complexity is problematic for large data (e.g. over several
thousand features)– No explicit handling of redundancy
Feature grouping• Idea: don’t need to consider all features
– Those that are highly correlated with each other carry the same or similar information
– Therefore, we can group these, and work on a group by group basis
• This paper: based on greedy hill-climbing– Group-then-rank approach
• Relevancy and redundancy handled by– Correlation: similar features grouped together– Internal ranking (correlation with decision feature)
f7f4
f1f2 f9
f8
F1
Forming groups of features
Calculate correlations
F1 F2 F3 Fn. . .
#1 f3
#2 f12 #3 f1
…#m fn
#1 f#2 f #3 f…#m fn
#1 f#2 f #3 f…#m fn
#1 f#2 f #3 f…#m fn
Feature groups
Internally-rankedfeature groups
Correlation measure
Threshold:
Redundancy
Relevancy
Dataτ
. . .
Selecting features
Feature subset search and selection
Search mechanism
Subset evaluation
Selected subset(s)
Fuzzy-rough feature grouping
Initial experimentation• Setup:
– 10 datasets (9-2557 features)– 3 classifiers– Stratified 5 x 10-fold cross-validation
• Performance evaluation in terms of– Subset size– Classification accuracy– Execution time
• FRFG compared with – Traditional greedy hill-climber (GHC)– GA & PSO (200 generations, population size: 40)
Results: average subset size
Results: classification accuracy
JRip
IBk (k=3)
Results: execution times (s)
ConclusionFRFG– Motivation: reduce computational overhead; improve
consideration of redundancy– Group-then-rank approach– Parameter determines granularity of grouping– Weka implementation available: http://bit.ly/1oic2xM
Future work– Automatic determination of parameter τ – Experimentation using much larger data, other FS methods, etc– Clustering of features– Unsupervised selection?
Thank you!
Simple exampleDataset of six features
After initialisation, the following groups are formed
Within each group, rank determines relevance: e.g. f4 more relevant than f3
Ordering of groups
Greedy hill-climber
f1f4 f3
f2
F1
F2
f1f3F3
f5f4 f1F4
etc…
{F4, F1, F3, F5, F2, F6}F =
Simple example...• First group to be considered: F4
– Feature f4 is preferable over others– So, add this to current (initially empty) subset R – Evaluate M(R + {f4}):
• If better score than the current best evaluation, store f4
• Current best evaluation = M(R + {f4})
– Set of features which appear in F4: ({f1 , f4 , f5}) • Add to the set Avoids
• Next feature group with elements that do not appear in Avoids: F1
And so on…
f5f4 f1
F4
f1f4
f3F1