active learning genetic programming for record deduplication

20

Upload: mohana-sundaram

Post on 28-Dec-2015

18 views

Category:

Documents


0 download

DESCRIPTION

presentation

TRANSCRIPT

Page 1: Active Learning Genetic Programming for Record Deduplication
Page 2: Active Learning Genetic Programming for Record Deduplication

M.KIRUTHIKAB2711409

GUIDEMrs. R.SANTHINI RAJESWARI, M.C.A., M.Phil.,

Page 3: Active Learning Genetic Programming for Record Deduplication

The Active Learning GP (AGP), a semi-supervised GP, and instantiates it for the data Deduplication problem.

AGP uses an active learning approach in which a committee of multi-attribute functions votes for classifying record pairs as duplicates or not.

Page 4: Active Learning Genetic Programming for Record Deduplication

FRONT END VB.net BACK END Sql server 2005

Page 5: Active Learning Genetic Programming for Record Deduplication

Security Lapse on Cloud Computing Cloud Cannot support confidential data. Cloud was less in security aspects Duplication cannot be founded in cloud . Duplication must be avoided manualy.

Page 6: Active Learning Genetic Programming for Record Deduplication

Cloud with automatic de duplication finder. The duplication are completely avoided Server can feel free and undoubted about data

duplication.

Page 7: Active Learning Genetic Programming for Record Deduplication

Identifying Labeled data preprocessing in Server DB Frame Learning Environment Voting Duplication Detecting deduplication

Page 8: Active Learning Genetic Programming for Record Deduplication

This module is initial module involve in grapping best file from the set of duplication file or consider to be duplication file.

These modules compare pair from top record to bottom records and pairs

Voting is rather similar process picking up duplicate by comparing final record test and function test.

Page 9: Active Learning Genetic Programming for Record Deduplication

This module use definitive source such as labeled example to find duplicates.

Learning programming is introduced in this module it pick up deduplicate from the server space and produce accurate result.

Page 10: Active Learning Genetic Programming for Record Deduplication

Sl.No Field DataType&Size Description

1. Fnam Varchar(30) First Name

2. Lnam Varchar(30) Last Name

3. Doj Date(8) Date of joining

4. Dob Date(8) Date of birth

5. Conno Integer(10) Contact Number

6. Desig Varchar(20) Designation

7. Addr Varchar(30) Address

8. Mid Varchar(30) Mail id

User DetailsUser Details

Page 11: Active Learning Genetic Programming for Record Deduplication

Load File

Client File Upload

Server InterfaceCloud Space

Monitor Duplication

Capture Retrieved File

File Deduplication

Open Loaded File

Page 12: Active Learning Genetic Programming for Record Deduplication

Testing Methodology

Error Founded Solution Description

Validation All file format are difficult to compare

Maintain a document file format for uploading /downloading

Duplication cannot be identified if the file environment changed

Page 13: Active Learning Genetic Programming for Record Deduplication

It can be implemented in ◦Cloud Service◦Server maintainer◦Data warehouse◦Date Management system

Page 14: Active Learning Genetic Programming for Record Deduplication

Introduced an active learning genetic programming algorithm, named AGP, and instantiated it for the task of record de duplication. The method follows a semi-supervised approach and uses the principles of active and reinforcement learning to evolve a committee of multi attribute function .

Page 15: Active Learning Genetic Programming for Record Deduplication

Deduplication maintains redundancy to ensure that the data is recoverable in the event of data corruption. Deduplication is only on files on a file server; it is not supported for Exchange databases and SQL databases.

Install de duplication Enable and configure de duplication on an

existing volume Observe the results of de duplication.

Page 16: Active Learning Genetic Programming for Record Deduplication
Page 17: Active Learning Genetic Programming for Record Deduplication
Page 18: Active Learning Genetic Programming for Record Deduplication
Page 19: Active Learning Genetic Programming for Record Deduplication
Page 20: Active Learning Genetic Programming for Record Deduplication