gntsuntechnologies.comgntsuntechnologies.com/projects/2016_dotnet_ieee/29.docxweb viewquery...

12

Click here to load reader

Upload: lamque

Post on 03-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

Query Reorganisation Algorithms for Efficient Boolean Information

FilteringAbstract

In the information filtering paradigm, clients subscribe to a server with

continuous queries that express their information needs and get notified every time

appropriate information is published. To perform this task in an efficient way,

servers employ indexing schemes that support fast matches of the incoming

information with the query database. Such indexing schemes involve (i) main-

memory trie-based data structures that cluster similar queries by capturing

common elements between them and (ii) efficient filtering mechanisms that exploit

this clustering to achieve high throughput and low filtering times. However, state-

of-the-art indexing schemes are sensitive to the query insertion order and cannot

adopt to an evolving query workload, degrading the filtering performance over

time. In this paper, we present an adaptive trie-based algorithm that outperforms

current methods by relying on query statistics to reorganise the query database.

Contrary to previous approaches, we show that the nature of the constructed tries,

rather than their compactness, is the determining factor for efficient filtering

performance. Our algorithm does not depend on the order of insertion of queries in

the database, manages to cluster queries even when clustering possibilities are

limited, and achieves more than 96% filtering time improvement over its state-of-

the-art competitors. Finally, we demonstrate that our solution is easily extensible to

multi-core machines.

Page 2: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

Front End (MVC RAZOR)

Page 3: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

Back End (SQL Server)Software Tools

(Visual Studio 2012, SQL 2008).

User:1. Register the user and then login to the System.2. User Search to the player details.

3. User Search Boolean Filtration queries.

4. Response of the User queries.

5. User modify their Account Details.

Server:1. Login the systems.2. Upload the player details.

3. View the player details.

4. View the player Tree Structure details using AngularJs.

5.View the Average length of Queries.

Page 4: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

1.Database -> Online Social (As My Database) ->I am using entity frameworkController1. Server controller2. Home controller3. Data controller4. My controller

Angular Controller1. Part7 Controller

2. Part6 Controller

3. Part5 Controller

4. Part4 Controller

5. Part3 Controller

6. Part2 Controller

There are 4 Mvc Controller and 6 Angular Controller have been created based on the Action method.

Page 5: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

SYSTEM ANALYSIS

EXISTING SYSTEM

Algorithm TREE creates a new node for every term that can not be indexed into an existing trie.

We outline existing trie-based solutions and discuss extensions of our algorithm to support more expressive queries. The second query will consider being stored at the existing trie or create a new trie. In general, to insert a new query q, STAR iterates through its keywords and utilises the hash table to find all candidate tries;

PROPOSED SYSTEM

The proposed family of algorithms include (i) reduced efficiency on limited

query vocabularies and/or very short continuous queries, (ii) increased memory

usage for indexing queries with disjunctions as the different need to be split and

indexed at different tries, and (iii) corpus-dependent parameter/algorithm setup.

Efficiency issues were identified by many researchers that proposed tree and

trie-based algorithms for supporting fast filtering under various data models and

query languages (e.g., Boolean, VSM), both for main-memory and secondary

storage. However, all these approaches use a greedy clustering method that is

sensitive to the insertion order of submitted queries and do not consider that an

evolving query workload might require the reorganisation of the query database to

achieve efficient filtering performance.

The proposed algorithm is to use tries to capture common elements of

queries. However, the key differences with these approaches lie in (i) the collection

and utilisation of statistics on the importance of keywords in the indexed queries,

Page 6: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

(ii) the reorganisation of the query database according both to word and query

importance, and (iii) the demonstration that the nature of the trie forest is more

important than its compactness when it comes to filtering efficiency. Interestingly,

all previous works were aiming at minimising the size of the trie forest, since there

was an implicit conjecture that a small forest would result in lower filtering times

due to less node visits.

ALGORITHM

Trie-based algorithm:

A trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search tree—an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.

Page 7: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

Filtering Algorithm:

Models are developed using data mining, machine learning algorithms to find patterns based on training data. These are used to make predictions for real data. Most of the models are based on creating a classification or clustering technique to identify the user based on the training set.

Page 8: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

SYSTEM SPECIFICATION

HARDWARE REQUIREMENTS:

System : Pentium IV 2.4 GHz.

Hard Disk : 40 GB.

Floppy Drive : 1.44 Mb.

Monitor : 14’ Colour Monitor.

Mouse : Optical Mouse.

Ram : 512 Mb.

Page 9: gntsuntechnologies.comgntsuntechnologies.com/Projects/2016_dotnet_ieee/29.docxWeb viewQuery Reorganisation Algorithms for Efficient Boolean Information Filtering. Abstract. In the

SOFTWARE REQUIREMENTS:

Operating system : Windows 7 Ultimate.

Coding Language : MVC 4 Razor with Angular Js

Front-End : Visual Studio 2012 Professional.

Data Base : SQL Server 2008.

CONCLUSION

The most interesting conclusions concerning the performance of the algorithms are

extracted from the cross comparison results of the examined query collections. By

comparing the filtering performance of the examined algorithms, we observe that TREE,

RETRIE, STAR-HF, and STAR-LF are very sensitive to vocabulary variations as the

increase in filtering time is 122%, 46%, 229%, and 196% respectively when increasing

the vocabulary size, for the same query database size and query length. On the other

hand, Algorithms STAR-LR and STAR-HR present a decrease in filtering time, since

word statistics for bigger vocabularies contain more information to be exploited. Finally,

comparing the absolute filtering times for the two collections, we conclude that

Algorithms STAR-LR and STAR-HR deliver a steady filtering efficiency independently

of the vocabulary size used.