feature selection for regression problems

22
Feature Selection for Regression Problems M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece

Upload: hinda

Post on 23-Jan-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Feature Selection for Regression Problems. M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece. Scope. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Feature Selection for Regression Problems

Feature Selection for Regression Problems

M. Karagiannopoulos, D. Anyfantis, S. B. Kotsiantis, P. E. Pintelas

Educational Software Development Laboratory

andComputers and Applications Laboratory

Department of Mathematics, University of

Patras, Greece

Page 2: Feature Selection for Regression Problems

Scope

To investigate the most suitable wrapper feature selection technique (if any) for some well known regression algorithms.

Page 3: Feature Selection for Regression Problems

Contents

Introduction Feature selection techniques Wrapper algorithms Experiments Conclusions

Page 4: Feature Selection for Regression Problems

Introduction What is the feature subset selection

problem? Occurs prior to the learning (induction) algorithm Selection of the relevant features (variables) that

influence the prediction of the learning algorithm

Page 5: Feature Selection for Regression Problems

Why feature selection is important?

May improve performance of learning algorithm

Learning algorithm may not scale up to the size of the full feature set either in sample or time

Allows us to better understand the domain

Cheaper to collect a reduced set of features

Page 6: Feature Selection for Regression Problems

Characterising features Generally, features are characterized as:

Relevant: These are features which have an influence on the output and their role can not be assumed by the rest

Irrelevant: Irrelevant features are defined as those features not having any influence on the output, and whose values are generated at random for each example.

Redundant: A redundancy exists whenever a feature can take the role of another (perhaps the simplest way to model redundancy).

Page 7: Feature Selection for Regression Problems

Typical Feature Selection – First step

Generation Evaluation

Stopping Criterion Validation

OriginalFeature Set Subset

Goodness of the subset

No Yes

1 2

3 4

Generates subset of features for evaluation

Can start with:

•no features

•all features

•random subset of features

Page 8: Feature Selection for Regression Problems

Typical Feature Selection – Second step

Generation Evaluation

Stopping Criterion Validation

OriginalFeature Set Subset

Goodness of the subset

No Yes

1 2

3 4

Measures the goodness of the subset

Compares with the previous best subset

if found better, then replaces the previous best subset

Page 9: Feature Selection for Regression Problems

Typical Feature Selection – Third step

Generation Evaluation

Stopping Criterion Validation

OriginalFeature Set Subset

Goodness of the subset

No Yes

1 2

3 4

Based on Generation Procedure:

•Pre-defined number of features

•Pre-defined number of iterations

Based on Evaluation Function:

•whether addition or deletion of a

feature does not produce a better

subset•whether optimal subset based on

some evaluation function is achieved

Page 10: Feature Selection for Regression Problems

Typical Feature Selection - Fourth step

Generation Evaluation

Stopping Criterion Validation

OriginalFeature Set Subset

Goodness of the subset

No Yes

1 2

3 4

Basically not part of the feature selection process itself

- compare results with already established results or results from competing feature selection methods

Page 11: Feature Selection for Regression Problems

Categorization of feature selection techniques

Feature selection methods are grouped into two broad groups: Filter methods that take the set of data

(features) attempting to trim some and then hand this new set of features to the learning algorithm

Wrapper methods that use as evaluation measure the accuracy of the learning algorithm

Page 12: Feature Selection for Regression Problems

Argument for wrapper methods

The estimated accuracy of the learning algorithm is the best available heuristic for measuring the values of features.

Different learning algorithms may perform better with different feature sets, even if they are using the same training set.

Page 13: Feature Selection for Regression Problems

Wrapper selection algorithms (1)

The simplest method is forward selection (FS). It starts with the empty set and greedily adds features one at a time (without backtracking).

Backward stepwise selection (BS) starts with all features in the feature set and greedily removes them one at a time (without backtracking).

Page 14: Feature Selection for Regression Problems

Wrapper selection algorithms (2) The Best First search starts with an empty set of features

and generates all possible single feature expansions. The subset with the highest evaluation is chosen and is expanded in the same manner by adding single features (with backtracking). The Best First search (BFFS) can be combined with forward or backward selection (BFBS).

Genetic algorithm selection. A solution is typically a fixed length binary string representing a feature subset—the value of each position in the string represents the presence or absence of a particular feature. The algorithm is an iterative process where each successive generation is produced by applying genetic operators such as crossover and mutation to the members of the current generation.

Page 15: Feature Selection for Regression Problems

Experiments

For the purpose of the present study, we used 4 well known learning algorithms (RepTree, M5rules, K*, SMOreg), the presented feature selection algorithms and 12 dataset from the UCI repository.

Page 16: Feature Selection for Regression Problems

Methodology of experiments The whole training set was divided into ten

mutually exclusive and equal-sized subsets and for each subset the learner was trained on the union of all of the other subsets.

The best features are selected according to the feature selection algorithm and the performance of the subset is measured by how well it predicts the values of the test instances.

This cross validation procedure was run 10 times for each algorithm and the average value of the 10-cross validations was calculated.

Page 17: Feature Selection for Regression Problems

Experiment with regression tree - RepTree

BS is slightly better feature selection method (on the average)than the others for the RepTree.

WS FS BS BFFS BFBS GS Average correlation coefficient

0.72 0.73 0.74 0.73 0.73 0.73

Page 18: Feature Selection for Regression Problems

Experiment with rule learner- M5rules

Datasets WS FS BS BFFS BFBS GS Average correlation coefficient

0.79 0.82 0.83 0.82 0.83 0.83

BS, BFBS and GS are the best feature selection methods (on the average) for the M5rules learner.

Page 19: Feature Selection for Regression Problems

Experiment with instance based learner - K*

Datasets WS FS BS BFFS BFBS GS Average correlation coefficient

0.71 0.79 0.8 0.79 0.8 0.79

BS and BFBS is the best feature selection methods (on the average) for K* algorithm

Page 20: Feature Selection for Regression Problems

Experiment with SMOreg

Datasets WS FS BS BFFS BFBS GS Average correlation coefficient

0.8 0.81 0.81 0.81 0.81 0.81

Similar results from all feature selection methods

Page 21: Feature Selection for Regression Problems

Conclusions None of the described feature selection algorithms is

superior to others in all data sets for a specific learning algorithm.

None of the described feature selection algorithms is superior to others in all data sets.

Backward selection strategies are very inefficient for large-scale datasets, which may have hundreds of original features.

Forward selection wrapper methods are less able to improve performance of a given classifier, but they are less expensive in terms of the computational effort and use fewer features for the induction.

Genetic selection typically requires a large number of evaluations to reach a minimum.

Page 22: Feature Selection for Regression Problems

Future Work

We will use a light filter feature selection procedure as a preprocessing step in order to reduce the computational cost of the wrapping procedure without harming accuracy.