web search
DESCRIPTION
Web Search. Module 6 INST 734 Doug Oard. Agenda. The Web Crawling Web search. Washington Post, February 10, 2011. Email. FTP. RTSP. “The Web”. HTML. Web Server. HTTP. URL. File System. Internet communication protocols. HTTP (transfer). URL - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/1.jpg)
Web Search
Module 6
INST 734
Doug Oard
![Page 2: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/2.jpg)
Agenda
The Web
• Crawling
• Web search
![Page 3: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/3.jpg)
Washington Post, February 10, 2011
![Page 4: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/4.jpg)
HTML(data/display)
Internetcommunication
protocols
RTSPFTPEmail
WebServer
HTTP(transfer)
File System
URL(e.g.,http://www.foo.org/snarf.html)
HTMLHTTPURL
“The Web”
![Page 5: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/5.jpg)
Internet Web
• Internet: collection of global networks
• Web: way of managing information exchange
• There are many other uses for the Internet– File transfer (FTP)– Email (SMTP, POP, IMAP)
![Page 6: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/6.jpg)
What “Caused” the Web?
• Affordable storage– 300,000 (typed) words/$ by 1995
• Adequate network capacity– 25,000 simultaneous transfers by 1995– 1 second/screen (of text) by 1995
• Display capability– 10% of US population could see images by 1995
• Effective search capabilities– Lycos and Yahoo! achieved useful scale in 1994-1995
![Page 7: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/7.jpg)
Internet Hosts
Jan-81 Jan-86 Jan-91 Jan-96 Jan-01 Jan-06 Jan-110
100,000,000
200,000,000
300,000,000
400,000,000
500,000,000
600,000,000
700,000,000
800,000,000
900,000,000
1,000,000,000
![Page 8: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/8.jpg)
Internet Users
Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08 Jan-10 Jan-120%
5%
10%
15%
20%
25%
30%
35%
Por
tion
of
the
Glo
bal
Pop
ula
tion
http://www.internetworldstats.com/
![Page 9: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/9.jpg)
64%5%
4%
5%
2%
8%
2%4%
5% 0%
33%
28%
9%
6%
5%
5%
4%
4%
4%2%
EnglishChineseSpanishJapanesePortuguese GermanArabicFrenchRussianKorean
![Page 10: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/10.jpg)
Billions of Queries per Month
Google; 114.7
Baidu; 14.5
Bing+Yahoo; 13.1
Yandex; 4.8
Other; 28.7
December, 2012
![Page 11: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/11.jpg)
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
20,000,000
Mar
-03
Apr-0
3
May
-03
Jun-
03
Jul-0
3
Aug-0
3
Sep-
03
Oct
-03
Nov-0
3
Dec-0
3
Jan-
04
Feb-
04
Mar
-04
Apr-0
4
May
-04
Jun-
04
Jul-0
4
Aug-0
4
Sep-
04
Oct
-04
Nov-0
4
Dec-0
4
Jan-
05
Feb-
05
Mar
-05
Apr-0
5
May
-05
Jun-
05
Jul-0
5
Aug-0
5
Sep-
05
Oct
-05
Doubling
Doubling
Doubling
18.9 Million Weblogs TrackedDoubling in size approx. every 5 monthsConsistent doubling over the last 36 months
An Adoption Curve: BlogsDoubling
![Page 12: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/12.jpg)
November 2013http://www.searchenginejournal.com/growth-social-media-2-0-infographic/77055/
![Page 13: Web Search](https://reader036.vdocuments.us/reader036/viewer/2022062718/56812c39550346895d90bff3/html5/thumbnails/13.jpg)
Agenda
• The Web
Crawling
• Web search