restricted search engine
DESCRIPTION
ESSI2 Project. June 2002. Restricted Search Engine. Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq. Supervisor: Johny BOND. Introduction(1). What is a search engine? 3 types: disciplinary global thematic Internet users spend more than 50% of their time to search!. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/1.jpg)
Restricted Search Engine
Laurent Balat
Christophe Decis
Thomas Forey
Sebastien Leclercq
ESSI2 Project
Supervisor: Johny BOND
June 2002
![Page 2: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/2.jpg)
Introduction(1)
• What is a search engine?
• 3 types:– disciplinary– global– thematic
• Internet users spend more than 50% of their time to search!
![Page 3: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/3.jpg)
Introduction (2)
• Lots of pages can’t be reached.
WEB
Indexable WEB Google
![Page 4: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/4.jpg)
How does it work ?
• The search engine is composed of two parts
First processing : the WEB site spider
WEB Spider Processing
indexing
PDFunitDOC
unitHTMLprocessing
unit
DATABASE
Constraint
![Page 5: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/5.jpg)
How does it work ?
• User part architecture
DATABASEQuery engine
Query Interface
User
![Page 6: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/6.jpg)
Constraints
• Domain Restriction.
• Search depth.
• Theme: words accepted or not.
• Document type.
• Time delay.
![Page 7: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/7.jpg)
The Spider Part
Check if link already visited
Check type data in constraints
Error download
HTTP HEADlink
linkpriority queue
Stackdata pagePush pageDownload
![Page 8: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/8.jpg)
Document Processing
• Analyse of type• Send to the appropriate unit.• Extract words and links• Trying to resolve bad links
![Page 9: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/9.jpg)
Indexation
• Binary Search Tree:- quick building- efficient use
• Check constraints:- start list and stop list.
![Page 10: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/10.jpg)
Database
• MySQL database.• General Structure:
KeywordsWeb links
Correspondencebetween keywords and links
![Page 11: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/11.jpg)
User interface and query engine
• The web page is generated by a script (cgi).
• The query engine questions the database
• Formatting the results
![Page 12: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/12.jpg)
Demonstration (1)• Fill the Database
![Page 13: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/13.jpg)
Demonstration (2)
• How to search pages?
![Page 14: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/14.jpg)
Conclusion
• Results and perspective– Original search engine.– Easy to improve by adding units to process
differents file format (ps, doc, xls,…).• Team working and repartition. • This Project shows us how to use the
different tools seen this year
![Page 15: Restricted Search Engine](https://reader036.vdocuments.us/reader036/viewer/2022062516/56812ad5550346895d8eb9e4/html5/thumbnails/15.jpg)
References
http://www.w3c.org
http://www.mysql.com
http://www.sgi.com/tech/stl
http://www.searchengineshowdown.com