filtering noisy continuous labeled examples

9
Filtering isy continuous labeled exampl José Ramón Quevedo María Dolores García Elena Montañés Artificial Intelligence Centre Oviedo University (Spain) IBERAMIA 2002

Upload: jakeem-wolfe

Post on 31-Dec-2015

22 views

Category:

Documents


0 download

DESCRIPTION

Filtering noisy continuous labeled examples. IBERAMIA 2002. José Ramón Quevedo María Dolores García Elena Montañés. A rtificial I ntelligence C entre Oviedo University (Spain). Index. 1. Introduction 2. The Principle 3. The Algorithm 4. Divide and Conquer 5. Experimentation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDC

Filtering noisy continuous labeled examples

José Ramón Quevedo

María Dolores García

Elena Montañés

Artificial Intelligence CentreOviedo University (Spain)

IBERAMIA 2002

Page 2: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDCIndex

1. Introduction

2. The Principle

3. The Algorithm

4. Divide and Conquer

5. Experimentation

6. Conclusions

Page 3: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDC

INDEX

1.Introduction

INDEX

Introduction

x

f(x)

Good examplesNoisy examples

Machine LearningSystem

Noisy Continuous Labeled

Examples Filter x

f(x)

Page 4: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDC

The examples whose neighbour is a noisy onewould improve their k-cnn errors if the noisyexample was removed.

If removing a example gets an improvement inthe k-cnn errors of the rest of examples in thedata set, that example is, probably, a noisy one.

INDEX

1.Introduction

2.The Principle

The Principle

INDEX

1.Introduction

If removing a example gets an improvement inthe k-cnn errors of the rest of examples in thedata set, that example is, probably, a noisy one.

1

0

Example: Step Function

2-cnn error

e3 e6

1

0With out e3

1

0

With out e6

Page 5: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDCThe Algorithm

INDEX

1.Introduction

2.The Principle

3.The Algorithm

INDEX

1.Introduction

2.The Principle

Noisy ContinuousLabeled Examples Filter

OriginalData Set

FilteredData Set

for each example e sorted by more k-cnnError {if(k-cnnError(e)<=MinError) break;if(prudentNoisy(DS-{e})

DS=DS-{e}else break;

}return DS;

)()( ii ecnnErrorkecnnErrorkMinError

)()(')( 1 eEN

movedExamplesReNeEesyprudentNoi NN

Page 6: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDCDivide and Conquer

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

INDEX

1.Introduction

2.The Principle

3.The Algorithm

Problem : High Computational CostO(NCLEF)=N·O(LOO(A-cnn))=O(A2N3)

Solution : Use Divide & Conquer over the data set:• Split : choose a example with || ||1 that splits the

data set in two with similar number of examples• Stop : constant threshold: M, max. number of examples

Result : O(NCLEFDC)=O(N·log(N)+NA2)

OriginalData Set

D&C

Data SubSet

Data SubSet

NCLEF

NCLEF

Filtered SubSet

Filtered SubSet

FilteredData Set

NCLEFDC

Page 7: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDC

40%

45%

50%

55%

60%R

elat

ive

Err

or

without filter 0% 46% 42% 48%

NCLEFDC 0% 47% 42% 47%

without filter 10% 0% 0% 0%

NCLEFDC 10% 0% 0% 0%

Cubist 1.10 m5' RT4.1

Experimental Results

• Experimentation data Sets: Torgo’s Repository• 29 Continuous Data Sets• High Diversity : Examples and Attributes

• Experiment : Cross Validation with 10 folders

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

40%

45%

50%

55%

60%R

elat

ive

Err

or

without filter 0% 46% 42% 48%

NCLEFDC 0% 47% 42% 47%

without filter 10% 54% 54% 60%

NCLEFDC 10% 51% 49% 55%

Cubist 1.10 m5' RT4.1

Page 8: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDCConclusions

• NCLEFDC:– Filter Noisy Continuous Examples

– O(NCLEFDC)=O(Nlog2(N)+NA2)

• Use of NCLEFDC:– Without noisy examples: similar error– With noise : significant improvement

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation • Future work:– Filter Noisy Discrete Examples– Filter at same time noisy examples and

noisy attributes

Page 9: Filtering  noisy continuous  labeled  examples

INDEX

1.Introduction

2.The Principle

3.The Algorithm

4.Divide & Conquer

5.Experimentation

6.Conclusions

x

f(x)

NCLEFDC

Filtering noisy continuous labeled examples

José Ramón Quevedo

María Dolores García

Elena Montañés

Artificial Intelligence CentreOviedo University (Spain)

IBERAMIA 2002