![Page 1: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/1.jpg)
Searching the Internet
CSCI-N 100 Department of Computer and Information Science
![Page 2: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/2.jpg)
Searching the Internet What is the Internet
Does anyone own the Internet
How is the Internet controlled
![Page 3: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/3.jpg)
The Internet… It is not a centrally owned or organized institution. It is not a single entity. It is not a 'Den of Iniquity' It is not crawling with eight - year - old children
controlling nuclear bombs. The Internet is not a hive of viruses waiting to attack
your computer. The Internet is not just for pimple-faced teenagers
with propeller beanies.
![Page 4: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/4.jpg)
The Internet… Is a vast repository of information. Is relatively universal Is dynamic – changing minute-by-minute
![Page 5: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/5.jpg)
The Internet InterNIC
- Internet Network Information Center - An international coalition of Internet organization that has what control there is of the Internet
IAB - Internet Architecture Board - An organization that sets standards for the
Internet
ICANN - Internet Corporation for Assigned Names and Numbers – An organization
responsible for the global coordination of the Internet's system of unique identifiers
W3C World Wide Web Consortium - develops interoperable technologies,
specifications, guidelines, software, and tools
![Page 6: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/6.jpg)
Search engines Search Engines
an information retrieval system allows one to ask for content meeting specific
criteria list is often sorted with respect to some measure
of relevance of the results use regularly updated indexes to operate quickly
and efficiently
![Page 7: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/7.jpg)
Search engines First search engines
Archie - archive" without the "v" created in 1990 by a student at in Montreal program downloaded the directory listings of all the
files located on public anonymous FTP (File Transfer Protocol) sites
creating a searchable database of filenames could not search by file contents
![Page 8: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/8.jpg)
Search engines Gopher
indexed plain text documents created in 1991 at the University of Minnesota:
Gopher was named after the school's mascot most of the Gopher sites became websites after the
creation of the World Wide Web because these were text files
![Page 9: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/9.jpg)
Search engines Veronica (Very Easy Rodent-Oriented Net-wide
Index to Computerized Archives) provided a keyword search of most Gopher menu
titles in the entire Gopher listings
Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) a tool for obtaining menu information from various
Gopher servers
![Page 10: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/10.jpg)
And the answer is … People have trouble with
How to ask What to ask Where to ask When to ask
![Page 11: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/11.jpg)
How to ask Search criteria
Build a query Date File name Location Keyword Domain Country
![Page 12: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/12.jpg)
How to ask Boolean phrases
And, + (plus) Finds documents containing all of the specified words or phrases Peanut AND butter finds documents with both the word peanut and the word butter.
Or Finds documents containing at least one of the specified words or phrases Peanut OR butter finds documents containing either peanut or butter. The found
documents could contain both items, but not necessarily. Not, - (minus)
Excludes documents containing the specified word or phrase Peanut NOT butter finds documents with peanut but not containing butter
Wild card (*) Finds documents with just given information, * fills in the rest Pea* returns all pages with the phrase pea (Be Careful!!)
![Page 13: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/13.jpg)
What to ask All of these words
Documents must contain all of the words you list This exact phrase
Documents must contain these exact words in the order you typed them
Any of these words Documents must contain at least one of the words you list
None of these words Documents that contain these words will be omitted from
your results
![Page 14: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/14.jpg)
Where to ask Search engines
Do not really search the World Wide Web directly Searches a database of the full text of web pages selected
from the billions of web pages out there residing on servers
Search engine databases are selected and built by computer robot programs called “spiders”
After spiders find pages, they pass them on to another computer program for "indexing."
![Page 15: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/15.jpg)
Types of Search Tools Search engines
built by computer robot programs ("spiders") -- not by human selection
NOT organized by subject categories -- all pages are ranked by a computer algorithm
contain full-text (every word) of the web pages they link to -- you find pages by matching words in the pages you want
huge and often retrieve a lot of information -- for complex searches use ones that allow you to search within results
Unevaluated -- contain the good, the bad, and the ugly -- YOU must evaluate everything you find Google, Yahoo, Ask.com
![Page 16: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/16.jpg)
Types of Search Tools Subject directories
built by human selection -- not by computers or robot programs
organized into subject categories, classification of pages by subjects -- subjects not standardized and vary according to the scope of each directory
NEVER contain full-text of the web pages they link to -- you can only search what you can see (titles, descriptions, subject categories, etc.) -- use broad or general terms
small and specialized to large, but smaller than most search engines -- huge range in size
often carefully evaluated and annotated (but not always!!)
![Page 17: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/17.jpg)
Directories Librarians Index
www.lii.org Infomine
infomine.ucr.edu AcademicInfo
www.academicinfo.us About.com
www.about.com Google Directory
directory.google.com Yahoo!
dir.yahoo.com
![Page 18: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/18.jpg)
Types of Search Tools Searchable database contents or the "Invisible Web"
Invisible Web is estimated to offer two to three times as many pages
as the visible web Pages in non-HTML formats (pdf, Word, Excel, Corell suite, etc.) are
"translated" into HTML Script-based pages, whose links contain a ? or other script coding, no
longer cause most search engines to exclude them Pages generated dynamically by other types of database software
(e.g., Active Server Pages, Cold Fusion) can be indexed if there is a
stable URL somewhere that search engine spiders can find
![Page 19: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/19.jpg)
Types of search engines Meta-Search Engines
submit keywords in its search box it transmits your search simultaneously to
several individual search engines and their databases of web pages
Meta-search engines do not own a database of Web pages Examples
Dopgpile.com Clusty.com Surfwax.com
![Page 20: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/20.jpg)
References Module #8: Communication and Internet protocols
http://www.cs.iupui.edu/~aharris/mmcc/mod8/abip.html
Module #2: Communication and the World Wide Web http://www.cs.iupui.edu/~aharris/mmcc/mod2/abwww.html
World Wide Web Consortium http://www.w3.org/
Search engine http://en.wikipedia.org/wiki/Search_engine
![Page 21: Searching the Internet CSCI-N 100 Department of Computer and Information Science](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649e855503460f94b8751d/html5/thumbnails/21.jpg)
References The BEST Search Engines
UC Berkeley - Teaching Library Internet Workshops http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html