gntsuntechnologies.comgntsuntechnologies.com/projects/2016_dotnet_ieee/29.docxweb viewquery...
TRANSCRIPT
Query Reorganisation Algorithms for Efficient Boolean Information
FilteringAbstract
In the information filtering paradigm, clients subscribe to a server with
continuous queries that express their information needs and get notified every time
appropriate information is published. To perform this task in an efficient way,
servers employ indexing schemes that support fast matches of the incoming
information with the query database. Such indexing schemes involve (i) main-
memory trie-based data structures that cluster similar queries by capturing
common elements between them and (ii) efficient filtering mechanisms that exploit
this clustering to achieve high throughput and low filtering times. However, state-
of-the-art indexing schemes are sensitive to the query insertion order and cannot
adopt to an evolving query workload, degrading the filtering performance over
time. In this paper, we present an adaptive trie-based algorithm that outperforms
current methods by relying on query statistics to reorganise the query database.
Contrary to previous approaches, we show that the nature of the constructed tries,
rather than their compactness, is the determining factor for efficient filtering
performance. Our algorithm does not depend on the order of insertion of queries in
the database, manages to cluster queries even when clustering possibilities are
limited, and achieves more than 96% filtering time improvement over its state-of-
the-art competitors. Finally, we demonstrate that our solution is easily extensible to
multi-core machines.
Front End (MVC RAZOR)
Back End (SQL Server)Software Tools
(Visual Studio 2012, SQL 2008).
User:1. Register the user and then login to the System.2. User Search to the player details.
3. User Search Boolean Filtration queries.
4. Response of the User queries.
5. User modify their Account Details.
Server:1. Login the systems.2. Upload the player details.
3. View the player details.
4. View the player Tree Structure details using AngularJs.
5.View the Average length of Queries.
1.Database -> Online Social (As My Database) ->I am using entity frameworkController1. Server controller2. Home controller3. Data controller4. My controller
Angular Controller1. Part7 Controller
2. Part6 Controller
3. Part5 Controller
4. Part4 Controller
5. Part3 Controller
6. Part2 Controller
There are 4 Mvc Controller and 6 Angular Controller have been created based on the Action method.
SYSTEM ANALYSIS
EXISTING SYSTEM
Algorithm TREE creates a new node for every term that can not be indexed into an existing trie.
We outline existing trie-based solutions and discuss extensions of our algorithm to support more expressive queries. The second query will consider being stored at the existing trie or create a new trie. In general, to insert a new query q, STAR iterates through its keywords and utilises the hash table to find all candidate tries;
PROPOSED SYSTEM
The proposed family of algorithms include (i) reduced efficiency on limited
query vocabularies and/or very short continuous queries, (ii) increased memory
usage for indexing queries with disjunctions as the different need to be split and
indexed at different tries, and (iii) corpus-dependent parameter/algorithm setup.
Efficiency issues were identified by many researchers that proposed tree and
trie-based algorithms for supporting fast filtering under various data models and
query languages (e.g., Boolean, VSM), both for main-memory and secondary
storage. However, all these approaches use a greedy clustering method that is
sensitive to the insertion order of submitted queries and do not consider that an
evolving query workload might require the reorganisation of the query database to
achieve efficient filtering performance.
The proposed algorithm is to use tries to capture common elements of
queries. However, the key differences with these approaches lie in (i) the collection
and utilisation of statistics on the importance of keywords in the indexed queries,
(ii) the reorganisation of the query database according both to word and query
importance, and (iii) the demonstration that the nature of the trie forest is more
important than its compactness when it comes to filtering efficiency. Interestingly,
all previous works were aiming at minimising the size of the trie forest, since there
was an implicit conjecture that a small forest would result in lower filtering times
due to less node visits.
ALGORITHM
Trie-based algorithm:
A trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search tree—an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.
Filtering Algorithm:
Models are developed using data mining, machine learning algorithms to find patterns based on training data. These are used to make predictions for real data. Most of the models are based on creating a classification or clustering technique to identify the user based on the training set.
SYSTEM SPECIFICATION
HARDWARE REQUIREMENTS:
System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
Floppy Drive : 1.44 Mb.
Monitor : 14’ Colour Monitor.
Mouse : Optical Mouse.
Ram : 512 Mb.
SOFTWARE REQUIREMENTS:
Operating system : Windows 7 Ultimate.
Coding Language : MVC 4 Razor with Angular Js
Front-End : Visual Studio 2012 Professional.
Data Base : SQL Server 2008.
CONCLUSION
The most interesting conclusions concerning the performance of the algorithms are
extracted from the cross comparison results of the examined query collections. By
comparing the filtering performance of the examined algorithms, we observe that TREE,
RETRIE, STAR-HF, and STAR-LF are very sensitive to vocabulary variations as the
increase in filtering time is 122%, 46%, 229%, and 196% respectively when increasing
the vocabulary size, for the same query database size and query length. On the other
hand, Algorithms STAR-LR and STAR-HR present a decrease in filtering time, since
word statistics for bigger vocabularies contain more information to be exploited. Finally,
comparing the absolute filtering times for the two collections, we conclude that
Algorithms STAR-LR and STAR-HR deliver a steady filtering efficiency independently
of the vocabulary size used.