aistats 2010 active learning challenge: a fast active learning algorithm based on parzen window...

Download AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple

If you can't read please download the document

Upload: laurence-glenn

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Slide 2 AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple University Slide 3 2 Introduction Pool-based Active Learning Data labeling is expensive Large amount of unlabeled data are available at low cost. Goal is to label as few of the unlabeled examples and achieve as high accuracy as possible 2010 Active Learning Challenge Provide an opportunity for practitioners to evaluate active learning algorithms within an unbiased setup Data sets came from 6 various application domains Active Learning Algorithm Based on Parzen Window Classification Slide 4 Challenge Data Sets Common Properties Binary Classification Class-Imbalanced Differences Features Concept 3 Data Set DomainFeat. Type Feat. Num. Sparsity % Missing % Label Train num. Train pos:neg Test num. AHandWritting Rec.mixed9279.020binary175351267:1626817535 BMarketingmixed25046.8925.76binary250002289:2271125000 CChemo-informaticsmixed8518.60binary257202095:2362525720 DText classificationbinary1200099.670binary100002519:748110000 EEmbryologycontinuous1540.040.0004binary322522912:2934032252 FEcologymixed1200binary676285194:6243467628 Active Learning Algorithm Based on Parzen Window Classification Slide 5 4 Challenge Setup Given 1 positive seed example Repeat Select which unlabeled examples to label Train a classifier Evaluate its accuracy (AUC: Area Under ROC Curve) Evaluate the active learning algorithm using ALC (Area under the Learning Curve) Active Learning Algorithm Based on Parzen Window Classification Slide 6 Algorithm Design Issues Querying Strategy How many examples to label at each stage What unlabeled examples to select Classification Algorithm Simple vs. Powerful Easy to implement vs. Involved Preprocessing and Feature Selection Often the critical issue Slide 7 6 Components of Our Approach Data preprocessing Normalization Feature selection filtering Pearson correlation; Kruskal-Wallis test Regularized Parzen Window Classifier Parameter tuning by cross-validation Ensemble of classifiers Classifiers differ by the selected features Active learning strategy Uncertainty sampling + Clustering-based random sampling Active Learning Algorithm Based on Parzen Window Classification Slide 8 7 Algorithm Details Data preprocessing Missing value (did not address this issue) Normalization (mean = 0 and std = 1 of all non-binary features) Feature selection filters Pearson Correlation Test Kruskal-Wallis Test Calculated p-value for each feature Selected M features with lowest p-values Selected all features with p-value below 0.05 Active Learning Algorithm Based on Parzen Window Classification Slide 9 Classification Model Regularized Parzen Window Classifier (RPWC) is regularizing parameter (set to 10 -5 in our experiments) K is the Gaussian Kernel of form: where the represents the kernel size RPWC easy to implement can learn highly nonlinear problems 8 Algorithm Details Active Learning Algorithm Based on Parzen Window Classification Slide 10 9 Algorithm Details Active Learning Algorithm Based on Parzen Window Classification Classification Model Ensemble of RPWC classifiers Base classifiers differ in features used all features p-value of Pearson correlation