![Page 1: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/1.jpg)
Negative Selection Algorithms at GECCO 20057/22/2005
![Page 2: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/2.jpg)
AIS track of GECCO 2005• 11 regular paper
– 5 “negative selection algorithm” related
– 3 “immune network model” related
– multi –agent simulation, gene library, antigenic search
• 2 posters– Immune network model,
clonal selection
![Page 3: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/3.jpg)
Papers on “Negative selection algorithms”• Ji & Dasgupta “Estimating the
detector coverage in a negative selection algorithm”
• Gonzalez et al “Discriminating and visualizing anomalies using negative selection algorithm and self-organizing maps”
• Stibor et al, “Is negative selection appropriate for anomaly detection?”
• Shaprio et al, “An evolutionary algorithm to generate hyper-ellipsoid detectors for negative selection”
• Hang et al, “Applying both positive and negative selection to supervise learning for anomaly detection”
![Page 4: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/4.jpg)
“Discriminating and visualizing anomalies using negative selection algorithm and self-organizing maps”
Main Idea:• Combination of NS and
SOM (self-organizing map)• Visualize the anomalies
![Page 5: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/5.jpg)
Key feature
• Using negative selection to produce artificial anomalies instead of detectors
![Page 6: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/6.jpg)
SOM
• A type of neural network• To capture the feature in the
input and to provide a structural representation
• Output neurons are organized in a one- or two-dimensional lattice
• The weight vectors of these neurons represent prototypes (cluster centroid)
![Page 7: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/7.jpg)
Three phases of NS-SOM
![Page 8: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/8.jpg)
NS-SOM model
• “training SOP with only normal samples will produce a map that only reflect the structure of the self space, ignoring the non-self space”
• N-dimensional real-valued• During the second phase: if the input
samples are labels, … (moving the third phase).
• The first phase is executed just once, but the second and third phases could be executed as many times as sets of new samples are available
• Visual representation by a 2-D grid corresponding to the network
![Page 9: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/9.jpg)
SOP output
• “A visual representation of the feature (self/non-self) space could be generated by drawing the 2-dimensional grid corresponding to the network, and assigning each node a different color depending on the category it represents (normal, unknown anomaly, or known anomaly).”
• “Two different SOM topologies were used with a rectangular output layer of 8×8 and 16×16 nodes.”
![Page 10: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/10.jpg)
Output visualization
![Page 11: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/11.jpg)
• Implementation– NS : RRNS algorithm by
Gonzalez et al– SOP : using the SOM-PAK
package by Helsinki University of Technology http://www.cis.hut.fi/
• Experiments– Iris data set– Wisconsin Breast Cancer data set
![Page 12: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/12.jpg)
“Is negative selection appropriate for anomaly detection?”
• Problems in negative selection (specific schemes and applications)
• Compare with SVM (Support Vector Machine): requiring examples of one class or two classes?
![Page 13: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/13.jpg)
• General problem : candidates are generated by a simple random search
• Shape space <-> affinity• “holes are necessary, to
generalizing beyond training set”– No hole: overfitting– Too many hole: underfitting
![Page 14: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/14.jpg)
Criticism for binary representation
• “the hamming shape-space and the r-chunk matching rule only appropriate and applicable for anomaly detection problems for a small value of l (e.g. 0<l<32)” – Totally based on Esponda et
al’s analysis about number of holes
* Although I want to focus on introducing instead criticizing this work. The authors seems confused between hamming and r-chunk.
![Page 15: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/15.jpg)
Criticism for real-valued representation
• Positive selection (Self Detection Classification) is more straightforward.
• It is not clear how to choose self radius.– “From our point of view, it is an approach
which requires two classes in the learning phase in order to determine the self-radius.” – no reason given.
• It is a problem how to find an optimal distribution do the detector (Gonzalez et al’s method takes “a vast amount of time”).
![Page 16: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/16.jpg)
Occam’s razor principle
• When you have two competing theories which make exactly the same predictions, the one that is simpler is the better.
![Page 17: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/17.jpg)
Comparison with SVM
• SVM is a machine learning algorithm for a two-class classification problem.
• The input data is mapped into a higher-dimensional feature space, where a linear decision region is constructed.
• A one-class SVM was proposed by Scholkopf et al.– Provides good results in high
dimensional space (no detail or results provided)
![Page 18: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/18.jpg)
Summary
• Unfortunately, citing several related works, then making a scary claim.
• Little was done to analyze or propose alternatives, except proposing “Self Detector Classification” – detection by directly check all training samples.
![Page 19: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/19.jpg)
“Applying both positive and negative selection to supervise learning for anomaly detection”
• Use synthetic anomalies to deal with anomaly-detection (supervised learning from class-imbalance data sets)– GA: Positive selection– Synthetic data: negative
selection
• Categorical/discrete data
![Page 20: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/20.jpg)
Two categories of methods• At data level: main
focusing on re-sampling– Under-sampling the normal
class– Over-sampling the anomaly
class– combination
• At algorithm level
![Page 21: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/21.jpg)
Other works using this strategy• Gonzales et al• SMOTE (Synthetic Minority
Over-sampling TEchniques)– “taking each minority class
sample and introducing synthetic examples along the line segment joining any/all of the k minority class nearest neighbors.”
![Page 22: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/22.jpg)
The way of SMOTE generating synthetic samples
![Page 23: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/23.jpg)
Phase 1: co-evolving patterns of the normal data (positive selection)• A number of non-interbreeding
subpopulation: no cooperation, no competition
• Randomly initialized• All converged scheme together
form the decision boundary.• Individuals consist of four
sections:
![Page 24: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/24.jpg)
• fitness-proportionate selection• Uniform crossover• Bit flipping mutation• Subpopulation size=100• Crossover rate=0.65• Mutation rate=0.15
![Page 25: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/25.jpg)
Phase 2: synthetic generation of anomalous samples
• Strategy 1: with seed– Starting with vacant neighbors of
the examples of the anomaly class• 2n neighbors for n-dimensional• “Vacant” means neither normal nor
anomaly
– Check if candidates is covered by schema of normal class. Those covered are removed.
• Strategy 2: without seed – in the case of no anomaly examples– Starting with random position
![Page 26: Negative Selection Algorithms at GECCO 2005 7/22/2005](https://reader033.vdocuments.us/reader033/viewer/2022061306/55148540550346f06e8b4bd0/html5/thumbnails/26.jpg)
experiments
• UCI data sets: 14 used• Multi-class data are mapped into a
2-class dataset– Version 1: Natural distribution– Version 2: Balanced natural distribution– Version 3: balanced extreme
distribution(“balanced” means “processed by the
approach described in this paper”)
• Classifiers used: C4.5 and Naive Bayes
• Result: v2>v3>>v1