active learning genetic programming for record deduplication
DESCRIPTION
presentationTRANSCRIPT
M.KIRUTHIKAB2711409
GUIDEMrs. R.SANTHINI RAJESWARI, M.C.A., M.Phil.,
The Active Learning GP (AGP), a semi-supervised GP, and instantiates it for the data Deduplication problem.
AGP uses an active learning approach in which a committee of multi-attribute functions votes for classifying record pairs as duplicates or not.
FRONT END VB.net BACK END Sql server 2005
Security Lapse on Cloud Computing Cloud Cannot support confidential data. Cloud was less in security aspects Duplication cannot be founded in cloud . Duplication must be avoided manualy.
Cloud with automatic de duplication finder. The duplication are completely avoided Server can feel free and undoubted about data
duplication.
Identifying Labeled data preprocessing in Server DB Frame Learning Environment Voting Duplication Detecting deduplication
This module is initial module involve in grapping best file from the set of duplication file or consider to be duplication file.
These modules compare pair from top record to bottom records and pairs
Voting is rather similar process picking up duplicate by comparing final record test and function test.
This module use definitive source such as labeled example to find duplicates.
Learning programming is introduced in this module it pick up deduplicate from the server space and produce accurate result.
Sl.No Field DataType&Size Description
1. Fnam Varchar(30) First Name
2. Lnam Varchar(30) Last Name
3. Doj Date(8) Date of joining
4. Dob Date(8) Date of birth
5. Conno Integer(10) Contact Number
6. Desig Varchar(20) Designation
7. Addr Varchar(30) Address
8. Mid Varchar(30) Mail id
User DetailsUser Details
Load File
Client File Upload
Server InterfaceCloud Space
Monitor Duplication
Capture Retrieved File
File Deduplication
Open Loaded File
Testing Methodology
Error Founded Solution Description
Validation All file format are difficult to compare
Maintain a document file format for uploading /downloading
Duplication cannot be identified if the file environment changed
It can be implemented in ◦Cloud Service◦Server maintainer◦Data warehouse◦Date Management system
Introduced an active learning genetic programming algorithm, named AGP, and instantiated it for the task of record de duplication. The method follows a semi-supervised approach and uses the principles of active and reinforcement learning to evolve a committee of multi attribute function .
Deduplication maintains redundancy to ensure that the data is recoverable in the event of data corruption. Deduplication is only on files on a file server; it is not supported for Exchange databases and SQL databases.
Install de duplication Enable and configure de duplication on an
existing volume Observe the results of de duplication.