date : 21 st of may, 2014. shri ramdeo baba college of engineering and management presentation by :...

22
Classification and Novel Class Detection of Feature Based Stream Data. Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak. A Technical Seminar on

Upload: maximillian-wheeler

Post on 18-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Classification and Novel Class Detection of Feature Based Stream

Data.

Date : 21st of May, 2014.

Shri Ramdeo Baba College of Engineering and Management

Presentation By:Rimjhim Singh

Under the Guidance of:Dr. M.B. Chandak.

A Technical Seminar on

Stream Data Classification. Novel class Detection. Data Generation. Training Classifiers. Steps Involved. Applications. Conclusion. Future Scope.

Contents:

Stream Data : Sequence of data or packets.

Managing online transactions requires classification of data.

Minimize space and time required.

Dynamic nature of data.

Stream Data Classification

Intrusion Detection : - On a network, data arriving may also contain attacks, viruses , worms etc. Hence we need to classify them and the cause of their arrival. Here, stream data classification can be used.

Example:

Infinite Length: - Fast and continuous. - Impractical to store. - Incremental learning.

Concept Drift: - Underlying concept of stream changes. - Updations in classifier. - Classifiers must adapt to changes.

Characteristics of Stream Data.

Concept Evolution: - New classes evolve in data. - Example: During intrusion detection in network, a new type to attack evolves.

Feature evolution: - New features evolve.

- Example: Text streams on Twitter.

Labelling of Data: - Difficult Process. - Data arrives at huge speed.

Novel class: -Let M be the current ensemble of classification models. A class c is an existing class if at least one of the models Mi in M has been trained with class c. Otherwise, c is a novel class.

Single model or an ensemble of models can be used.

Novel Class Detection:

Chunks of data are created.

Recent chunks are classified.

Labelling is done.

Data is ready for training.

Data Generation:

K clusters are built. Cluster summaries are saved. Also Known as Pseudopoints. Summary contains data: - centroid of cluster.

- radius of cluster. - frequency of data points.

Training a Classifier:

Classfication of test instance Xj by Mi: -pseudopoint ‘h’Є Mi , its centroid is closest to Xj, predicted class will be the one with highest frequency in ‘h’. - point is classified by the voting of all models.

Decision Boundary of ‘Mi’: - equal to Union of feature spaces encompassed by pseudo points.

Decision Boundary of ‘M’: - equal to union of Mi , where Mi belong to M.

Properties of Ensemble ‘M’

Lossy Fixed : - Same feature set is used. Lossy Local: - Each model or training chunk has its own featue set. Lossless Homogenizing: - Both model and the incoming instance expand their feature set.

- union of the feature sets is performed . - best technique.

Feature Selection:

Outlier Detection using Adaptive Threshold. Novel Class Detection. Simultaneous Novel Class Detection.

Steps Involved in Classification and Novel Class Detection:

Check whether the instance is Outlier. - F_outlier or Outlier. Adaptive Threshold is used. Lesser False Alarm Rate: -Marginal False-Novel Instance.

-Marginal False-Existing Instance.

Outlier Detection Using Adaptive Threshold:

F_outliers occur due to 3 reasons: -Noise, concept drift or concept evolution.

Get F_outliers occurring due to concept evolution.

Here we need to calculate: - Distance between Outlier and existing class pseudopoint. - Cohesion between different outliers in buffer.

Novel Class Detection:

Possibility of occurrence of multiple novel classes simultaneously.

Principle: -Cohesion between instances of same class should be high. -Distance between instances of different classes shoud be more. Graphs are used. Two Phases: 1. Separation phase. 2. Merging Phase.

Simultaneous Multi Class Detection:

Network security. Social Media. Credit Card Frauds etc.

Applications:

To classify and detect Novel Classes in feature based stream data using some tool in more efficient way.

Problem Definition:

Majority of the algorithms used for “Classification and Detection of novel Classes” suffer from either feature-evolution or False alarm rate.

The methodology adapts properly to normal concept-drifts, but for handling abrupt drifts it takes time.

Multiple novel classes are generated and separated efficiently.

Conclusion:

Work can be done on making the cluster size dynamic and adaptive.

Work can be done on handling abrupt drift efficiently. If existing class is divided into two, then work can be

done on judging whether they have same feature space, or whether they are novel or not.

Future Scope:

M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Feature Based Sream Data,” IEEE Trans. Knowledge andData Eng , vol. 25, no. 7, July 2013.

M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge andData Eng,vol. 23, no. 6, pp. 859-874, June 2011.

M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.

References:

A Review of Classification and Novel Class DetectionTechnique of Data Streams by Manish rai, Rekha Pandit2

M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams,” IEEE Trans. Knowledge andData Eng , vol. 25, no. 7, July 2009.

M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classication and Novel Class Detection in Data Streams with Active Mining,”.

References:

THANK YOU