thesis online library

50
PROJECT REPORT ON Development of Application to Enhance The Web Page Search ResultsSubmitted for partial fulfillment of the degree of BACHELOR OF ENGINERING (Computer Science & Engineering) By Khushbu Wandhe Komal Sahare Monica Pardhi Rashmeet Sabharwal VIII Semester B.E. CSE Department of Computer Science & Engineering Under the Guidance of Prof. N.M. Nirkhi Department of Computer Science & Engineering Department of Computer Science & Engineering G.H.Raisoni College of Engineering, Nagpur. (An Autonomous Institution Under UGC Act 1956) 2011-2012

Upload: legendmahen

Post on 24-Apr-2017

219 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Thesis online library

PROJECT REPORT ON

“Development of Application to Enhance The Web Page Search Results”

Submitted for partial fulfillment of the degree of

BACHELOR OF ENGINERING

(Computer Science & Engineering)

ByKhushbu Wandhe

Komal SahareMonica Pardhi

Rashmeet Sabharwal

VIII Semester B.E. CSEDepartment of Computer Science & Engineering

Under the Guidance ofProf. N.M. Nirkhi

Department of Computer Science & Engineering

Department of Computer Science & EngineeringG.H.Raisoni College of Engineering, Nagpur.

(An Autonomous Institution Under UGC Act 1956)2011-2012

Page 2: Thesis online library

CERTIFICATE

This is to certify that the dissertation entitled

“Development of Application to Enhance the Web Page Search Results”

Is a bonafide work and it is submitted to the Rashtrasant Tukdoji Maharaj University, Nagpur.

By

Khushbu WandheKomal SahareMonica Pardhi

Rashmeet Sabharwal

in the partial fulfillment of the degree of BACHELOR OF ENGINEERING in Computer Science & Engineering, during the academic year 2011-22012

under my guidance.

Prof. N.M. NirkhiDepartment of Computer Sc. & Engineering

G.H.Raisoni College of Engineering,Nagpur.

Head Dr.P.R.Bajaj Department of Computer Sci & Engineering Director, G.H.Raisoni College of Engineering, G.H.Raisoni College of Engineering, Nagpur. Nagpur

Department of Computer Science & EngineeringG.H.Raisoni College of Engineering, Nagpur.

(An Autonomous Institution Under UGC Act 1956)

Page 3: Thesis online library

2011-2012

Acknowledgement

It is our great pleasure to express our sincere gratitude to all direct and indirect help for completing the project work prepared for the course of B.E. Computer Science & Engineering(8th Sem) at G.H.R.C.E., Nagpur.

We wish to express our gratitude to our guide Prof. N.M.Nirkhi and Head of Department for giving there every possible help, excellent suggestions in improving our programming.

We also thank the staff of G.H.R.C.E. who helped us time to time throughout this project work.

Projectee:

Khushbu WandheKomal SahareMonica PardhiRashmeet Sabharwal

Page 4: Thesis online library

INDEX

Sr. No. NAME OF TOPIC PAGE No.

1 Introduction2 Requirement Analysis3 Process View4 Design5 Implementation &

Testing6 Future Plans7 Bibliography

Page 5: Thesis online library

LIST OF FIGURES

FIG. No. NAME OF FIGURE PAGE No.

1 Partition of Search Results2 Selecting a Cluster3 Initialization4 Authentication5 File Menu6 Search Menu7 Search for Keyword8 Result of Search9 Selected Page Opened in

Browser

Page 6: Thesis online library

INTRODUCTION

Page 7: Thesis online library

INTRODUCTION

Web page clustering puts together web pages in groups, based on similarity or other relationship measures. Tightly coupled pages, pages in the same cluster, are considered as singular items for following data analysis steps. A complete data mining analysis could be performed by using web pages information as it appears in web logs, but when the number of pages taken into account increases (i.e., in a corporative large- scale web server or a server using dynamic web pages) this process could be quite hard or even unbearable. In order to deal with this issue, web page clustering appears as a reasonable solution. These techniques group pages together based on some kind of relationship measure. Pages in the same cluster will be considered as a single item for further data analysis steps.

Traditional Web page clustering algorithms use the full-text in the documents to generate feature vectors. Such methods often produce unsatisfactory results because there is much noisy information, such as decoration, interaction, and advertisement, in Web pages. The varying-length problem of the Web pages is also a significant negative factor affecting the performance. In this paper, we investigate the use of several summarization techniques to tackle these issues when clustering Web pages. Compared with the full-text representation of the Web pages, our experimental results indicate that our proposed approach effectively solves the problems of noisy information and varying-length, and thus significantly boosts the clustering performance.

The web information usually is acceded by search engines and by thematic web directories. Search engines, as Google1, return to us a sorted list which is not conceptually sorted and it does not connect information extracted from several web pages. Nevertheless, there are search engines, for example Vivısimo2, which besides the list of relevant documents they show us a cluster hierarchy. When thematic web directories are used, the documents are showed classified in taxonomies and the search process uses that taxonomy.

In this context, the document clustering algorithms are very useful to apply to tasks such as: automatic grouping before and after the search, search by similarity, and search results visualization on a structured way.

Page 8: Thesis online library

Requirement analysis

Page 9: Thesis online library

REQUIREMENT ANALYSIS

Why “Visual Basic .NET” Framework

Microsoft Visual Basic .NET is faster and the easiest way to create applications for Microsoft Windows. Visual Basic .NET provides a complete set of tools to simplify rapid application development for the experienced as well as inexperienced users.

The Graphical User Interface (GUI) provided by visual basic .NET avoids writing of numerous lines of codes to describe the appearance and location of interface elements. VB .NET has evolved from the original BASIC language. It contains several hundred statements, functions and keywords. Beginners can create useful application by learning just a few of the keywords, yet the power of the language allows professionals to accomplish anything that can be accomplished using any other Windows programming language.VB.NET provides a graphical environment in which you visually design the forms and controls that become the building blocks of your applications. VB.NET supports many useful tools that will help you to be more productive. Time consumed by the project in VB.NET is less than that of in any other language.

Features of Visual Basic.NET

Timer control responds to the passage of time. They are independent of the user, and user can program them to take actions at regular intervals. A typical response is checking the system clock to see if it is time to perform some task. Timer also is useful for other kinds of background processing. Each timer control has an Interval property that specifies the number of milliseconds that pass between one-timer events to the next. Unless it is disabled, a timer continues to receive an appropriately named the timer event at roughly equal intervals of time.At run time timer is invisible and its position and size are irrelevant. Timer event is periodic.

As VB.NET provides such functions which helps in capturing image, this project is developed using visual basic.

Page 10: Thesis online library

Web Browser A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier (URI) and may be a web page, image, video, or other piece of content. Mini Database A mini database is a collection of web pages that we would require to run the project, as a part of demonstration. When this project is integrated with a website or website-based software we would not require this database as the tool will itself access the internet and display the web pages.

Hardware Requirements:1. Pentium based PC or Processor2. RAM required more than 128 for better performance.

Page 11: Thesis online library

Process view

Page 12: Thesis online library

Why cluster web page search results

With current search engines getting ever more powerful, is there any need for a new type of search engine? The answer is a resounding yes. With current search engines, the largest problem is the ordered list of results coupled with the enormous size of the internet. The first few links in any search may not be the ones desired; in fact, the first sever hundred or thousand may be on topics completely unrelated to what the user was searching for. The solution with current search engines is to refine the search. This often involves adding more keywords, altering the keywords or using advanced Boolean features. This can be a timeconsuming process and even after this process, the user may not find the pages they are looking for, as they may still be thousands down the ordered list. A change is called for, a new way of visualizing the result set is required and web search clustering is one way to do this. By dynamically generating a series of clusters that can be used as filters on the result set, a user can very quickly get an overview of the entire result set, the information it contains and can filter to the topic they require with ease.

How to cluster web page search results

The clusters are formed by partitioning the result set into clusters of pages, where the pages within each cluster are in some way related. The aim is to generate clusters which contain pages about the same topic, thus dynamically partitioning the result set into topics of potential interest to the user. Ideally, the semantic properties of the content of the pages should be used for clustering, but this is intractable. There are many syntactic properties which may indicate that two pages could be considered related: pages may have common words, common phrases, common in-links, common out-links, or even common words or phrasesin the in-linking or out-linked pages. The state of the art web technology for search result clustering and the contributions made by this project consider forming these clusters solely using properties such as these, and little to no attempt is made to understand the semantics of the natural language within the pages or to devise the topics based on understanding this.

Page 13: Thesis online library

Fig 1:- Partition of Search Results

As shown in figure 1, there are two main tasks for web search clustering. The first is to partition the search results into a set of clusters; the second is to generate an accurate description or name for each cluster. Both tasks are important in enabling users to find what they need easily, but this report focuses primarily on the first problem.

Page 14: Thesis online library

Fig: Selecting a Cluster

Information

There are two main kinds of information available for clustering web documents: Textual information Link Information.

Textual information is the raw data contained in the pages; this may be in the form of individual words or phrases of arbitrary length. Textual information can be found in many sources: it may occur in the page directly as plain text or it may occur as hidden text associated with alt text of images, meta tags such as keywords or page description, the page title and it can occur in the URL. This Project

In this project, textual information is used: the clustering algorithm uses all phrases of arbitrary length shared by two or more documents. Link information was considered carefully throughout the project, but the overhead of the approximately forty-fold increase in page downloads and processing time proved too large.

Fast String Algorithm

Page 15: Thesis online library

The algorithm preprocesses the target string (key) that is being searched for, but not the string being searched in (unlike some algorithms that preprocess the string to be searched and can then amortize the expense of the preprocessing by searching repeatedly). The execution time of this algorithm, while still linear in the size of the string being searched, can have a significantly lower constant factor than many other search algorithms: it doesn't need to check every character of the string to be searched, but rather skips over some of them. Generally the algorithm gets faster as the key being searched for becomes longer. Its efficiency derives from the fact that with each unsuccessful attempt to find a match between the search string and the text it is searching, it uses the information gained from that attempt to rule out as many positions of the text as possible where the string cannot match.

- - - - - - - X - - - - - - -A N P A N M A N - - - - - - -- A N P A N M A N - - - - - -- - A N P A N M A N - - - - -- - - A N P A N M A N - - - -- - - - A N P A N M A N - - -- - - - - A N P A N M A N - -- - - - - - A N P A N M A N -- - - - - - - A N P A N M A N

The X in position 8 excludes all 8 of the possible starting positions shown.

Fast String Algorithm attempts to check whether a match exists at a particular position—work backwards. If it starts a search at the beginning of a text for the word "ANPANMAN", for instance, it checks the eighth position of the text to see if it contains an "N". If it finds the "N", it moves to the seventh position to see if that contains the last "A" of the word, and so on until it checks the first position of the text for an "A".

Why Fast String Algorithm takes this backward approach is clearer when we consider what happens if the verification fails—for instance, if instead of an "N" in the eighth position, we find an "X". The "X" doesn't appear anywhere in "ANPANMAN", and this means there is no match for the search string at the very start of the text—or at the next seven positions

Page 16: Thesis online library

following it, since those would all fall across the "X" as well. After checking the eight characters of the word "ANPANMAN" for just one character "X", we're able to skip ahead and start looking for a match ending at the sixteenth position of the text.

This explains why the best-case performance of the algorithm, for a text of length and a fixed pattern of length , is : in the best case, only one in characters needs to be checked. This also explains the somewhat counter-intuitive result that the longer the pattern we are looking for, the faster the algorithm will usually be able to find it.

The algorithm pre computes two tables to process the information it obtains in each failed verification: one table calculates how many positions ahead to start the next search based on the value of the character that caused the mismatch; the other makes a similar calculation based on how many characters were matched successfully before the match attempt failed. (Because these two tables return results indicating how far ahead in the text to "jump", they are sometimes called "jump tables", which should not be confused with the more common meaning of jump tables in computer science.) The algorithm will shift the larger of the two jump values when a mismatch occurs.

The first table

- - - - A M A N - - - - - - -A N P A N M A N - - - - - - -- A N P A N M A N - - - - - -- - A N P A N M A N - - - - -- - - A N P A N M A N - - - -- - - - A N P A N M A N - - -- - - - - A N P A N M A N - -- - - - - - A N P A N M A N -

The mismatch "A" in position 5 (3 back from the last letter of the needle) excludes the first 6 of the possible starting positions shown.

Populate the first table as follows. For each i less than the length of the search string, construct the pattern consisting of the last i characters of the string preceded by a mis-matched character, right-align the pattern and string, and record the fewest characters the pattern must shift left for a match.

For instance, for the search string ANPANMAN, the table would be as follows:(NMAN signifies a substring in ANPANMAN consisting of a character that is not 'N' plus the characters 'MAN'.)

Page 17: Thesis online library

i Pattern Left Shift

0 N It is true that the next letter to the left in 'ANPANMAN' is not N (it is A), therefore the pattern N must shift one left position for a match; then = 1

1 AN AN is not a substring in ANPANMAN, then : Left_Shift is the number of letters in 'ANPANMAN' = 8

2 MAN Substring MAN match with ANPANMAN three positions to the left. Then Left_Shift = 3

3 NMAN We see that 'NMAN' is not a substring of 'ANPANMAN' but 'NMAN' is a possible substring 6 positions away to the left : ('NMANPANMAN'); then = 6

4 ANMAN 65 PANMAN 66 NPANMAN 67 ANPANMAN 6

The second table

The second table is easier to calculate: Start at the last character of the sought string and move towards the first character. Each time you move left, if the character you are on is not in the table already, add it; its Shift value is its distance from the rightmost character. All other characters receive a count equal to the length of the search string.

Example: For the string ANPANMAN, the second table would be as shown (for clarity, entries are shown in the order they would be added to the table): (The N which is supposed to be zero is based on the second N from the right because we only record the calculation for the first

letters)

Character ShiftA 1M 2N 3P 5

all other characters 8

Page 18: Thesis online library

design

Page 19: Thesis online library
Page 20: Thesis online library

CODING FOR SEARCH BUTTON:

Private Sub btnSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSearch.Click

Dim SerchKey As String Dim index As Integer SerchKey = Trim(txtSearchMain.Text) SerchKey = " " & SerchKey & " " Dim filename As String Dim foundnowords As Integer MessageBox.Show("co=" + itemcount.ToString()) page = 0 For index = 0 To itemcount - 1 page = page + 1 pb.Value = page filename = databaselist.FileListBox1.Items(index) ReadFile(filename) foundnowords = SearchForWord(ReadText, SerchKey)

If foundnowords < 1 Then ' MessageBox.Show("dd" + foundnowords.ToString()) Else listAvailable.Items.Add(databaselist.FileListBox1.Items(page - 1)) End If

Next lbltot.Text = listAvailable.Items.Count

End Sub

SECND SEARCH BUTTON:

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load databaselist.FileListBox1.Path = My.Application.Info.DirectoryPath & "\database" itemcount = databaselist.FileListBox1.Items.Count pb.Maximum = itemcount 'lbltot.Text = itemcount End Sub

Private Sub btnSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSearch.Click

Dim SerchKey As String Dim index As Integer SerchKey = Trim(txtSearchMain.Text) SerchKey = " " & SerchKey & " " Dim filename As String Dim foundnowords As Integer MessageBox.Show("co=" + itemcount.ToString()) page = 0 For index = 0 To itemcount - 1 page = page + 1 pb.Value = page filename = databaselist.FileListBox1.Items(index) ReadFile(filename) foundnowords = SearchForWord(ReadText, SerchKey)

If foundnowords < 1 Then ' MessageBox.Show("dd" + foundnowords.ToString()) Else listAvailable.Items.Add(databaselist.FileListBox1.Items(page - 1)) End If

Page 21: Thesis online library

Next lbltot.Text = listAvailable.Items.Count

End Sub Private Sub ReadFile(ByVal file As String) Dim mindex As Integer Dim apppath As String = My.Application.Info.DirectoryPath Dim reader As System.IO.StreamReader Readtext = "" reader = New System.IO.StreamReader(apppath & "\database\" & file) mindex = 0 While reader.Peek <> -1 ReadText = ReadText & reader.ReadLine & vbCrLf ' TextBox1.Text = TextBox1.Text & reader.ReadLine & vbCrLf mindex = mindex + 1 End While reader.Close() End Sub Private Function SearchForWord(ByVal FileText As String, ByVal Findtext As String) As Integer Dim i, mlen, flen As Integer Dim s As String 'Dim pre As Integer mlen = Len(FileText) flen = Len(Findtext) Dim count As Integer = 0 For i = 1 To mlen s = Mid(FileText, i, flen)

If LCase(s) = Findtext Then count = count + 1 End If Next If count >= 2 Then Return count

End If

Return count = 0

End Function

Private Sub btnSearchCat_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSearchCat.Click Dim ch As Char() Dim mycount As Integer sm = True EnterString = txtCat.Text ch = EnterString.ToCharArray() For i = 0 To EnterString.Length - 1 If ch(i) = "," Then mycount = mycount + 1

End If Next

If (mycount < 5) Then

hhh: EnterString = BuildWord(EnterString) If (EnterString.Length > 1) Then

GoTo hhh End If

Page 22: Thesis online library

For cn = 0 To wordcount FindSite(cn) MakeHtmlFile(cn) MsgBox("Search Complete For " + word(cn), MsgBoxStyle.Information, "Result") Next cn

Process.Start(My.Application.Info.DirectoryPath & "\Main.html") Else MessageBox.Show("Only Five Or Less Then Five Category Enter the String ") End If End Sub Private Function BuildWord(ByVal str As String) As String

Dim ch As Char() Dim j As Integer = 0 Dim i As Integer

ch = str.ToCharArray() EnterString = "" If (wordcount > 0) Then

ch(0) = " " End If

For i = 0 To str.Length - 1

If ch(i) = "," Then wordcount = wordcount + 1

Exit For

End If word(wordcount) = word(wordcount) + ch(i)

Next i

For j = i To str.Length - 1

EnterString = EnterString + ch(j)

Next j

Return EnterString

End Function Private Sub FindSite(ByVal cn As Integer) If (cn = 0) Then

Dim serchfound As Integer = 0 Dim numofsite As Integer numofsite = listAvailable.Items.Count pb.Maximum = numofsite searchpage = 0 For i = 0 To numofsite - 1 searchpage = searchpage + 1 pb.Value = searchpage filestr = listAvailable.Items(i) ReadFile(filestr)

Page 23: Thesis online library

' Label4.Text = fileStr serchfound = SearchForWord(ReadText, " " & word(0) & " ") If serchfound < 1 Then Else Categry1(noofsite) = listAvailable.Items(searchpage - 1) noofsite = noofsite + 1 End If Next i

End If

If cn = 1 Then

Dim serchfound As Integer = 0 Dim numofsite1 As Integer numofsite1 = listAvailable.Items.Count pb.Maximum = numofsite1 searchpage = 0 For i = 0 To numofsite1 - 1 searchpage = searchpage + 1 pb.Value = searchpage filestr = listAvailable.Items(i) ReadFile(filestr) ' Label4.Text = fileStr serchfound = SearchForWord(ReadText, " " & word(1) & " ") If serchfound < 1 Then Else Categry2(noofsite1) = listAvailable.Items(searchpage - 1) noofsite1 = noofsite1 + 1 End If Next i

End If

If cn = 2 Then Dim serchfound As Integer = 0 Dim numofsite2 As Integer numofsite2 = listAvailable.Items.Count pb.Maximum = numofsite2 searchpage = 0 For i = 0 To numofsite2 - 1 searchpage = searchpage + 1 pb.Value = searchpage filestr = listAvailable.Items(i) ReadFile(filestr) ' Label4.Text = fileStr serchfound = SearchForWord(ReadText, " " & word(2) & " ") If serchfound < 1 Then Else Categry3(noofsite2) = listAvailable.Items(searchpage - 1) noofsite2 = noofsite2 + 1 End If Next i End If

If cn = 3 Then

Dim serchfound As Integer = 0 Dim numofsite3 As Integer numofsite3 = listAvailable.Items.Count pb.Maximum = numofsite3 searchpage = 0

Page 24: Thesis online library

For i = 0 To numofsite3 - 1 searchpage = searchpage + 1 pb.Value = searchpage filestr = listAvailable.Items(i) ReadFile(filestr) ' Label4.Text = fileStr serchfound = SearchForWord(ReadText, " " & word(3) & " ") If serchfound < 1 Then Else Categry4(noofsite3) = listAvailable.Items(searchpage - 1) noofsite3 = noofsite3 + 1 End If Next i End If

If cn = 4 Then

Dim serchfound As Integer = 0 Dim numofsite4 As Integer numofsite4 = listAvailable.Items.Count pb.Maximum = numofsite4 searchpage = 0 For i = 0 To numofsite4 - 1 searchpage = searchpage + 1 pb.Value = searchpage filestr = listAvailable.Items(i) ReadFile(filestr) ' Label4.Text = fileStr serchfound = SearchForWord(ReadText, " " & word(4) & " ") If serchfound < 1 Then Else Categry5(noofsite4) = listAvailable.Items(searchpage - 1) noofsite4 = noofsite4 + 1 End If Next i End If

End Sub Private Sub MakeHtmlFile(ByVal cn As Integer) Dim Writer As System.IO.StreamWriter Dim i As Integer = 0 Dim _readthread As System.Threading.Thread Dim lineString As String Dim path As String = My.Application.Info.DirectoryPath ''If System.IO.File.Exists(FileName) Then Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\Main.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">YOUR RESULT " & "</font></b></p>") For i = 0 To wordcount If i = 0 Then Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" & My.Application.Info.DirectoryPath & "/" + "SubCat1.html" & """" & ">" & word(0) & "</a></font></p>") End If If i = 1 Then Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" & My.Application.Info.DirectoryPath & "/" + "SubCat2.html" & """" & ">" & word(1) & "</a></font></p>")

Page 25: Thesis online library

End If If i = 2 Then Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" & My.Application.Info.DirectoryPath & "/" + "SubCat3.html" & """" & ">" & word(2) & "</a></font></p>") End If If i = 3 Then Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" & My.Application.Info.DirectoryPath & "/" + "SubCat4.html" & """" & ">" & word(3) & "</a></font></p>") End If

If i = 4 Then Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" & My.Application.Info.DirectoryPath & "/" + "SubCat5.html" & """" & ">" & word(4) & "</a></font></p>") End If

Next i

'If cn = 3 Then 'Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" & My.Application.Info.DirectoryPath & "/" + "SubCat4.html" & """" & ">" & word(cn) & "</a></font></p>") 'End If

'Writer.WriteLine("<p><font size=" & 5 & "> " + txtCat.Text & "</font></p>")

' For i = 0 To noofsite - 1 'Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" & path & "\database\" & Categry(i) & """" & ">Page" & i + 1 & "</a></b></font></p>") 'Next

Writer.WriteLine("</body>") Writer.WriteLine("</html>") Writer.Close() For i = 0 To wordcount If i = 0 Then SubCat1()

' Return End If If i = 1 Then SubCat2()

End If

If i = 2 Then SubCat3()

'Return End If

If i = 3 Then SubCat4()

'Return End If

Page 26: Thesis online library

If i = 4 Then SubCat5()

'Return End If

Next i 'MsgBox("Search Complete For" + word(aa), MsgBoxStyle.Information, "Result") sm = False End Sub Public Sub SubCat1() Dim Writer As System.IO.StreamWriter Dim i As Integer = 0 Dim _readthread As System.Threading.Thread Dim lineString As String Dim path As String = My.Application.Info.DirectoryPath ''If System.IO.File.Exists(FileName) Then Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat1.html", False) Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH RESULT FOR " & UCase(word(0)) & "</font></b></p>") If noofsite > 0 Then Writer.WriteLine("<p><font size=" & 5 & "> " + word(0) & +noofsite.ToString() & "</font></p>")

For i = 0 To noofsite - 1 Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" & path & "\database\" & Categry1(i) & """" & ">Page" & i + 1 & "</a></b></font></p>") Next End If

Writer.WriteLine("</body>") Writer.WriteLine("</html>") Writer.Close()

End Sub Public Sub SubCat2() Dim Writer As System.IO.StreamWriter Dim i As Integer = 0 Dim _readthread As System.Threading.Thread Dim lineString As String Dim path As String = My.Application.Info.DirectoryPath ''If System.IO.File.Exists(FileName) Then Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat2.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH RESULT FOR " & word(1) & "</font></b></p>") If noofsite1 > 0 Then Writer.WriteLine("<p><font size=" & 5 & "> " + UCase(word(1)) & +noofsite1.ToString() & "</font></p>")

For i = 0 To noofsite1 - 1 Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" & path & "\database\" & Categry2(i) & """" & ">Page" & i + 1 & "</a></b></font></p>") Next

Page 27: Thesis online library

End If

Writer.WriteLine("</body>") Writer.WriteLine("</html>") Writer.Close()

End Sub Public Sub SubCat3() Dim Writer As System.IO.StreamWriter Dim i As Integer = 0 Dim _readthread As System.Threading.Thread Dim lineString As String Dim path As String = My.Application.Info.DirectoryPath ''If System.IO.File.Exists(FileName) Then Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat3.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH RESULT FOR " & word(2) & "</font></b></p>") If noofsite2 > 0 Then Writer.WriteLine("<p><font size=" & 5 & "> " + word(2) & +noofsite2.ToString() & "</font></p>")

For i = 0 To noofsite2 - 1 Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" & path & "\database\" & Categry3(i) & """" & ">Page" & i + 1 & "</a></b></font></p>") Next End If

Writer.WriteLine("</body>") Writer.WriteLine("</html>") Writer.Close()

End Sub Public Sub SubCat4() Dim Writer As System.IO.StreamWriter Dim i As Integer = 0 Dim _readthread As System.Threading.Thread Dim lineString As String Dim path As String = My.Application.Info.DirectoryPath ''If System.IO.File.Exists(FileName) Then Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat4.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH RESULT FOR " & word(3).ToString() & "</font></b></p>") If noofsite3 > 0 Then Writer.WriteLine("<p><font size=" & 5 & "> " + word(3) + noofsite3.ToString() & "</font></p>")

For i = 0 To noofsite3 - 1 Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" & path & "\database\" & Categry4(i) & """" & ">Page" & i + 1 & "</a></b></font></p>") Next End If

Page 28: Thesis online library

Writer.WriteLine("</body>") Writer.WriteLine("</html>") Writer.Close()

End Sub

Public Sub SubCat5() Dim Writer As System.IO.StreamWriter Dim i As Integer = 0 Dim _readthread As System.Threading.Thread Dim lineString As String Dim path As String = My.Application.Info.DirectoryPath ''If System.IO.File.Exists(FileName) Then Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat5.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH RESULT FOR " & word(4) & "</font></b></p>") If noofsite4 > 0 Then Writer.WriteLine("<p><font size=" & 5 & "> " + word(4) + noofsite4.ToString() & "</font></p>")

For i = 0 To noofsite4 - 1 Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" & path & "\database\" & Categry5(i) & """" & ">Page" & i + 1 & "</a></b></font></p>") Next End If

Writer.WriteLine("</body>") Writer.WriteLine("</html>") Writer.Close()

End Sub

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click databaselist.Show() End Sub

Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click Me.Hide() End Sub

Private Sub btnBoth_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBoth.Click

Dim SerchKey As String Dim index As Integer SerchKey = Trim(txtSearchMain.Text) SerchKey = " " & SerchKey & " " Dim filename As String Dim foundnowords As Integer MessageBox.Show("co=" + itemcount.ToString()) page = 0 For index = 0 To itemcount - 1 page = page + 1 pb.Value = page filename = databaselist.FileListBox1.Items(index)

Page 29: Thesis online library

ReadFile(filename) foundnowords = SearchForWord(ReadText, SerchKey)

If foundnowords < 1 Then ' MessageBox.Show("dd" + foundnowords.ToString()) Else listAvailable.Items.Add(databaselist.FileListBox1.Items(page - 1)) End If

Next lbltot.Text = listAvailable.Items.Count

Dim ch As Char() Dim mycount As Integer sm = True EnterString = txtCat.Text ch = EnterString.ToCharArray() For i = 0 To EnterString.Length - 1 If ch(i) = "," Then mycount = mycount + 1

End If Next

If (mycount < 5) Then

hhh: EnterString = BuildWord(EnterString) If (EnterString.Length > 1) Then

GoTo hhh End If

For cn = 0 To wordcount FindSite(cn) MakeHtmlFile(cn) MsgBox("Search Complete For " + word(cn), MsgBoxStyle.Information, "Result") Next cn

Process.Start(My.Application.Info.DirectoryPath & "\Main.html") Else MessageBox.Show("Only Five Or Less Then Five Category Enter the String ") End If End SubEnd Class

CODING FOR LOGIN FORM

Public Class LoginForm1 Private Sub OK_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles OK.Click If UsernameTextBox.Text = "ABC" And PasswordTextBox.Text = "ABC" Then MessageBox.Show("Login Successfully Done") Me.Hide() MAIN.Show()

Page 30: Thesis online library

Else MessageBox.Show("Invalid Log In") End If

End Sub

Private Sub Cancel_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Cancel.Click Me.Close() End Sub

End Class

CODING FOR MAIN MENU

Public Class MAIN Dim childform(1) As databaselist Dim childform1(1) As Form1

Dim chil As Integer = 0 Dim SorucePath As String = "" Dim Filename As String = ""

Private Sub CreateWebDataToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CreateWebDataToolStripMenuItem.Click Dim objOpenFileDialog As New OpenFileDialog 'Set the Open dialog properties With objOpenFileDialog .Filter = "Html files (*.html)|*.htm|All files (*.*)|*.*" .FilterIndex = 1 .Title = "Select a File"

End With

'Show the Open dialog and if the user clicks the Open button, 'load the file If objOpenFileDialog.ShowDialog = Windows.Forms.DialogResult.OK Then

Try

SorucePath = objOpenFileDialog.FileName

Catch fileException As Exception Throw fileException End Try End If

'Clean up objOpenFileDialog.Dispose() objOpenFileDialog = Nothing

Filename = FindFilename()

Try

File.Copy(SorucePath, My.Application.Info.DirectoryPath & "\database\" + Filename)

Page 31: Thesis online library

MsgBox("File Create ") ' File.Delete("c:\testFile.txt")

Catch ex As Exception MessageBox.Show("" + ex.Message) End Try End Sub

Private Function FindFilename() As String Dim revesefilename As String = "" Dim filename As String = "" Dim ch As Char() Dim ch1 As Char() ch = SorucePath.ToCharArray() Dim co As Integer = SorucePath.Length - 1 While co > 1 If ch(co) = "\" Then Exit While End If revesefilename = revesefilename + ch(co) co = co - 1 End While ch1 = revesefilename.ToCharArray() co = revesefilename.Length - 1 While co > -1

filename = filename + ch1(co) co = co - 1 End While Return filename

End Function

Private Sub FindAllWebDataToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles FindAllWebDataToolStripMenuItem.Click chil = chil + 1 childform(chil) = New databaselist() childform(chil).MdiParent = Me childform(chil).Show() End Sub

Private Sub SearchWebPageToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles SearchWebPageToolStripMenuItem.Click Dim ind As Integer = 0 ind = ind + 1 childform1(ind) = New Form1() childform1(ind).MdiParent = Me childform1(ind).Show() 'Form1.Show() End Sub

Private Sub SearchWebDataToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles SearchWebDataToolStripMenuItem.Click Dim Path As String = My.Application.Info.DirectoryPath + "\Database\My Folder" Dim info As IO.FileInfo = My.Computer.FileSystem.GetFileInfo(Path) Path = info.DirectoryName Process.Start("explorer.exe", Path) End Sub

Private Sub MAIN_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Form1.Hide() databaselist.Hide()

Page 32: Thesis online library

End Sub

Private Sub ExitToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ExitToolStripMenuItem.Click Application.Exit()

End Sub

Private Sub FileMenuToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles FileMenuToolStripMenuItem.Click

End SubEnd Class

CODING FOR DATBASE DIALOG BOX

Public Class databaselist Public itemcount As Integer

Private Sub databaselist_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load FileListBox1.Path = My.Application.Info.DirectoryPath & "\database" itemcount = FileListBox1.Items.Count 'pb.Maximum = itemcount lbltot.Text = itemcount End Sub

Private Sub FileListBox1_SelectedIndexChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles FileListBox1.SelectedIndexChanged

End Sub

Private Sub lbl_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles lbl.Click

End SubEnd Class

CODING FOR TIMER

Public Class Splash Dim w As Integer = -10 Dim per As Integer = 0 Dim temp As Integer = 0 Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick

w = w + 1 Button1.Width = w If (w Mod 5 = 0) Then per = per + 1 lblPer.Text = "" + per.ToString() + "%" If (per = 100) Then lblPer.Text = "100% Complete" End If

Page 33: Thesis online library

End If If (w > 490) Then Timer1.Stop() Label2.Visible = True Timer2.Start() End If End Sub

Private Sub Splash_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Timer1.Start() End Sub

Private Sub Timer2_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer2.Tick temp = temp + 1 If (temp = 10) Then LoginForm1.Show() Me.Hide() Timer2.Stop() End If End Sub

Private Sub PictureBox1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles PictureBox1.Click

End SubEnd Class

Page 34: Thesis online library

Implementation & testing

Page 35: Thesis online library

Implementation & Testing

MODULES:

In the first module, we will be creating the used interfaces. Here all the forms will be created and the flow will be decided. Second module will be creation of the user entry module where all the details of the persons using the system have to be entered. Hardware circuitry will be developed in the first, second and third module simultaneously. In the last module the software and hardware will be tested and changed if necessary.

Fig.3:- Initialization

Page 36: Thesis online library

Fig. 4:- Authentication

Fig.5:- File Menu

Page 37: Thesis online library

Fig.6:- Search Menu

Fig.7:- Search for Keyword

Page 38: Thesis online library

Fig.8:- Result of Search

Fig.9:- Selected Page opened in Browser

Page 39: Thesis online library

Future plans

Page 40: Thesis online library

FUTURE PLANS

In Social Networking Sites

Clustering Based Search engine will help the users find the exact community, fan pages, or discussion boards, they desire.

Various Fields in Corporate World

It can also be used in various fields like banking system, Colleges, Business Firms, Census System and so on.

Mobile Systems

This software can also be implemented in mobile devices which supports a large group of internet users.

Page 41: Thesis online library

bibliography

Page 42: Thesis online library

BIBLIOGRAPHY

Improvements To Web Page Clustering Method- Daniel Wayne Crabtree

IEEE Papers based on Web Page Clustering

John.M.Pierre, Practical Issues for Automated Categorization of Web Pages,September 2000.