filtering noisy continuous labeled examples
DESCRIPTION
Filtering noisy continuous labeled examples. IBERAMIA 2002. José Ramón Quevedo María Dolores García Elena Montañés. A rtificial I ntelligence C entre Oviedo University (Spain). Index. 1. Introduction 2. The Principle 3. The Algorithm 4. Divide and Conquer 5. Experimentation - PowerPoint PPT PresentationTRANSCRIPT
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDC
Filtering noisy continuous labeled examples
José Ramón Quevedo
María Dolores García
Elena Montañés
Artificial Intelligence CentreOviedo University (Spain)
IBERAMIA 2002
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDCIndex
1. Introduction
2. The Principle
3. The Algorithm
4. Divide and Conquer
5. Experimentation
6. Conclusions
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDC
INDEX
1.Introduction
INDEX
Introduction
x
f(x)
Good examplesNoisy examples
Machine LearningSystem
Noisy Continuous Labeled
Examples Filter x
f(x)
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDC
The examples whose neighbour is a noisy onewould improve their k-cnn errors if the noisyexample was removed.
If removing a example gets an improvement inthe k-cnn errors of the rest of examples in thedata set, that example is, probably, a noisy one.
INDEX
1.Introduction
2.The Principle
The Principle
INDEX
1.Introduction
If removing a example gets an improvement inthe k-cnn errors of the rest of examples in thedata set, that example is, probably, a noisy one.
1
0
Example: Step Function
2-cnn error
e3 e6
1
0With out e3
1
0
With out e6
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDCThe Algorithm
INDEX
1.Introduction
2.The Principle
3.The Algorithm
INDEX
1.Introduction
2.The Principle
Noisy ContinuousLabeled Examples Filter
OriginalData Set
FilteredData Set
for each example e sorted by more k-cnnError {if(k-cnnError(e)<=MinError) break;if(prudentNoisy(DS-{e})
DS=DS-{e}else break;
}return DS;
)()( ii ecnnErrorkecnnErrorkMinError
)()(')( 1 eEN
movedExamplesReNeEesyprudentNoi NN
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDCDivide and Conquer
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
INDEX
1.Introduction
2.The Principle
3.The Algorithm
Problem : High Computational CostO(NCLEF)=N·O(LOO(A-cnn))=O(A2N3)
Solution : Use Divide & Conquer over the data set:• Split : choose a example with || ||1 that splits the
data set in two with similar number of examples• Stop : constant threshold: M, max. number of examples
Result : O(NCLEFDC)=O(N·log(N)+NA2)
OriginalData Set
D&C
Data SubSet
Data SubSet
NCLEF
NCLEF
Filtered SubSet
Filtered SubSet
FilteredData Set
NCLEFDC
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDC
40%
45%
50%
55%
60%R
elat
ive
Err
or
without filter 0% 46% 42% 48%
NCLEFDC 0% 47% 42% 47%
without filter 10% 0% 0% 0%
NCLEFDC 10% 0% 0% 0%
Cubist 1.10 m5' RT4.1
Experimental Results
• Experimentation data Sets: Torgo’s Repository• 29 Continuous Data Sets• High Diversity : Examples and Attributes
• Experiment : Cross Validation with 10 folders
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
40%
45%
50%
55%
60%R
elat
ive
Err
or
without filter 0% 46% 42% 48%
NCLEFDC 0% 47% 42% 47%
without filter 10% 54% 54% 60%
NCLEFDC 10% 51% 49% 55%
Cubist 1.10 m5' RT4.1
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDCConclusions
• NCLEFDC:– Filter Noisy Continuous Examples
– O(NCLEFDC)=O(Nlog2(N)+NA2)
• Use of NCLEFDC:– Without noisy examples: similar error– With noise : significant improvement
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation • Future work:– Filter Noisy Discrete Examples– Filter at same time noisy examples and
noisy attributes
INDEX
1.Introduction
2.The Principle
3.The Algorithm
4.Divide & Conquer
5.Experimentation
6.Conclusions
x
f(x)
NCLEFDC
Filtering noisy continuous labeled examples
José Ramón Quevedo
María Dolores García
Elena Montañés
Artificial Intelligence CentreOviedo University (Spain)
IBERAMIA 2002