introduction to web science

65
1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

Upload: kevork

Post on 07-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Web Science. Web 1.0. Introducing Web 1.0. Packet switching network IP Addressing Internet Applications The WWW and markup Searching the WWW Intelligent Agents Internet Governance. Packet-Switched Networks (1). Local area network (LAN) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Web Science

1

Dr Alexiei Dingli

Introduction to Web Science

Web 1.0

Page 2: Introduction to Web Science

2

• Packet switching network

• IP Addressing

• Internet Applications

• The WWW and markup

• Searching the WWW

• Intelligent Agents

• Internet Governance

Introducing Web 1.0

Page 3: Introduction to Web Science

3

• Local area network (LAN)

– Network of computers located close together

• Wide area networks (WANs)

– Networks of computers connected over greater distances

• Circuit

– Combination of telephone lines and closed switches that connect them to each other

Packet-Switched Networks (1)

Page 4: Introduction to Web Science

4

• Circuit switching is used in telephone communication

• The Internet uses packet switching

• Packet switching needs computers called ‘routers’ and the programs called ‘routing algorithms’

Packet-Switched Networks (2)

Page 5: Introduction to Web Science

5

Packet-Switched Networks (3)

• Information is divided into packets

• It is passed from node to node

• It is recomposed as one chunk on the destination server

Page 6: Introduction to Web Science

6

Page 7: Introduction to Web Science

7

• Routing computers– Computers that decide how best to forward

packets

• Routing algorithms– Rules contained in programs on router computers

that determine the best path on which to send packets

– Programs apply their routing algorithms to information they have stored in routing tables

Routing Packets

Page 8: Introduction to Web Science

8

• Communications protocol suite

– Packet switched protocol• No end-to-end connection is required• Each message broken down into small pieces called packets• Packets possibly routed to destination over different paths

– Transmission Control Protocol (TCP)• Breaks messages into packets• Numbers packets in order• Reorders packets at the destination

– Internet Protocol (IP)• Routes packets to the proper destination

TCP/IP

Page 9: Introduction to Web Science

9

Open Systems Interconnections Model

OSI Model (also called TCP/IP protocol suite) layers (from the highest to the lowest):

7 Application

{ HTTP, SMTP, FTP, Telnet, SSH, Whois, etc.

6 Presentation

5 Session

4 Transport TCP, UDP

3 Network IP

2 Data Link Ethernet

1 Physical Wire, Radio, Fibre Optic

Page 10: Introduction to Web Science

10

• Internet addresses are based on a 32-bit number called an IP address

• IP addresses appear as a series of up to four separate numbers delineated by a period

• An address such as 126.204.89.56 uniquely identifies a computer connected to the Internet

• IP Subnetting conceptually divides a large network into smaller sub-networks

IP Address

Page 11: Introduction to Web Science

11

IP Classes (1)

Page 12: Introduction to Web Science

12

IP Classes (2)

Class Leading Value

Network Numbers

Addresses Per Network

Class A     0     126     16,777,214

Class B     10     16,384     65,534

Class C     110     2,097,152     254

Page 13: Introduction to Web Science

13

Subnetting

Page 14: Introduction to Web Science

14

• Explosion in size of IP routing tables.

• Every time more address space was needed, the administrator would have to apply for a new block of addresses.

• Any changes to the internal structure of a company's network would potentially affect devices and sites outside the organization.

• Keeping track of all those different Class C networks would be a bit of a headache in its own right.

Without subnetting …

Page 15: Introduction to Web Science

15

• Better Match to Physical Network Structure

• Flexibility

• Invisibility To Public Internet

• No Need To Request New IP Addresses

• No Routing Table Entry Proliferation

Benefits of Subnetting

Alexiei Dingli
Better Match to Physical Network Structure: Hosts can be grouped into subnets that reflect the way they are actually structured in the organization's physical network. Flexibility: The number of subnets and number of hosts per subnet can be customized for each organization. Each can decide on its own subnet structure and change it as required. Invisibility To Public Internet: Subnetting was implemented so that the internal division of a network into subnets is visible only within the organization; to the rest of the Internet the organization is still just one big, flat, “network”. This also means that any changes made to the internal structure are not visible outside the organization. No Need To Request New IP Addresses: Organizations don't have to constantly requisition more IP addresses, as they would in the workaround of using multiple small Class C blocks. No Routing Table Entry Proliferation: Since the subnet structure exists only within the organization, routers outside that organization know nothing about it. The organization still maintains a single (or perhaps a few) routing table entries for all of its devices. Only routers inside the organization need to worry about routing between subnets.
Page 16: Introduction to Web Science

16

• Network Layer• Developed in 1994

• Will replace the IP Vr4 standard– limits on network addresses will eventually lead to

exhaustion of available addresses (by 2023)– supports only 4,294,967,296 addresses (32bits)

• Improvements include– providing future cell phones and mobile devices their own

unique & permanent addresses– supports about 3.4 × 1038 (128bits)

IP Vr6 (or IP Next Generation)

Page 17: Introduction to Web Science

17

• A Uniform Resource Locator (URL) consists of names and abbreviations that are much easier to remember than IP addresses

• The HTTP protocol defines how an Internet resource is accessed

• An address such as www.microsoft.com is called a domain name

• Domain Name System (DNS)– A database of Internet names– DNS Servers convert Internet names to IP addresses– Top level domains

Domain Names

Page 18: Introduction to Web Science

18

Top-Level Domain Names

• Internet Corporation for Assigned Names and Numbers (ICANN)

– Responsible for managing domain names and coordinating them with IP address registrars

Page 19: Introduction to Web Science

19

• The web was not an ‘open’ place

• One company available where you could buy a .com, .net or .org domain

• Price of 100 dollars and a two year minimum

• Back then, there was a big chance you would be able to buy a dictionary word as .com

• In 2000, they lost the monopoly position and domain prices dropped over 95%

• Since then innovation halted and Network Solutions became one of the thousands anonymous domain registrars

Domain Name case study

Page 20: Introduction to Web Science

20

• E-Mail

• File transfers

• Instant messaging (IM)

• Newsgroups

• Streaming audio and video

• Internet telephony

• World Wide Web (WWW)

Internet Applications

Page 21: Introduction to Web Science

21

• Most popular and widely used Internet application

• 30 billion e-mails sent every day– Spam – junk e-mail messages– Spam costs corporate America $9 billion per year

• Every e-mail message contains head that describes source and destination for the message

• E-mail messages are text, but may have attachments of many types of digital data– Viruses often transmitted via e-mail

E-Mail

Page 22: Introduction to Web Science

22

• E-mail is sent across the Internet is managed and stored by mail servers

• Simple Mail Transfer Protocol (SMTP) is the standard to send mails to the server

• Post Office Protocol (POP) is the standard to get mails from the server

• The Interactive Mail Access Protocol (IMAP) is a newer e-mail protocol

SMTP, POP, and IMAP (1)

Page 23: Introduction to Web Science

23

SMTP, POP, and IMAP (2)

Page 24: Introduction to Web Science

24

• Use complex email addresses rather than name and surname combination– Why? Bots? Name Directories?

• Control exposure of email address– How? Java script? JPEG?

• Use multiple email addresses for different purposes– In what occasions?

• Use content-filtering software– black list spam filter – white list spam filter – challenge response using graphical challenges ?

Controlling Spam

Page 25: Introduction to Web Science

25

• Hotmail (1995)

• First place to get a free email address, disconnected from an ISP

• 4 years later, 30 million people worldwide were exchanging @hotmail email addresses

• Bought by Microsoft in 1998 for just 400 million dollars

• 2007 the end of Hotmail– transformation to “Live” mail to become an

integrated part of the Microsoft’s “Live” family

E-Mail Case Study

Page 26: Introduction to Web Science

26

• File transfer protocol (FTP)– Protocol providing for transmission of a file between

an Internet server and a user’s computer

• Peer-to-peer (P2P) file sharing– Share data from one computer to another– Every user can be a server– Napster

• Kazaa• Gnutella• Torrent

– With P2P, every user on the network can make data available to every other user on the network

File Transfers

Page 27: Introduction to Web Science

27

• Allows user to create a private chat session with another user

• IM started with AOL

• IM sneaking into corporate networks

• Many Web-based companies use IM technology for customer service– eBay

Instant Messaging

Page 28: Introduction to Web Science

28

Page 29: Introduction to Web Science

29

• ICQ abbreviation of “I seek you”

• 1996 first easy to use instant messenger program where you could add friends to your list, and see if they were online

• Back then it was revolutionary for the masses and it became the ‘application’ everybody had installed

• Acquired by AOL in June 1998 for a whopping $287 million  

• Eventually the program got too many additional features that made the application heavy and unorganized

• Competition of AOL IM, Yahoo IM, and MSN Messenger increased, and friends on your ICQ-list left the application eventually resulting in a mass abandoning of the network

ICQ case study

Page 30: Introduction to Web Science

30

• Online, bulletin board discussion forums

• Users post and read messages

• More than 100,000 newsgroups

• Millions of newsgroup readers

• Important information resource, especially for technical issues and products

• Newsgroup messages distributed using open standard – Many are uncensored

Usenet Newsgroups

Page 31: Introduction to Web Science

31

• Creating and sending audio and video files

– Sports• Basketball at sports.yahoo.com• Major league baseball

– News• Fox News• CNN radio

– Business• ZDNet

– Education• Warriors of the Net

Streaming Audio and Video

Page 32: Introduction to Web Science

32

• Voice-over Internet Protocol (VoIP)

• Use your computer like a telephone

• Software connects computers via the Internet and transmits voice data

• Savings comes from eliminating toll charges between locations

Internet Telephony

Page 33: Introduction to Web Science

33

Internet TV

Page 34: Introduction to Web Science

34

• Collection of hyperlinked computer files on the Internet

• Client-server application– Web servers– Web browsers as clients

• WWW standards– Hypertext markup language (HTML)

• Current standard for writing Web pages• Tags in HTML instruct the client browser how to format and display the

Web page content

– Hypertext transfer protocol (HTTP)• Establishes a connection between Web server and client

– Extensible markup language (XML)• A meta-markup language• Gives meaning to the data enclosed within XML tags

The World Wide Web

Page 35: Introduction to Web Science

35

• Create your own free homepage on the web

• 1997 Fifth most popular website, with over 500,000 homepages created

• Yahoo bought Geocities two years later for $3.57 billion dollars and started to actively commercialize the homepages with various advertising types that resulted in their death sentence

• ‘Real’ web hosting becoming affordable for anybody, the need for free homepages in this form vanished

Website case study

Page 36: Introduction to Web Science

36

• SGML is a rich meta language that is useful for defining markup languages

• HTML is particularly useful for displaying Web pages

• XML defines data structures for electronic commerce (and much more …)

Overview of Markup Languages

Page 37: Introduction to Web Science

37

Development of Markup Languages

http://www.w3.org/

Page 38: Introduction to Web Science

38

• The ISO adopted SGML standard in 1986

• SGML is nonproprietary and platform-independent

• SGML supports user-defined tags and architecture to complement the required richness of documents

Standard Generalized Markup Language

Page 39: Introduction to Web Science

39

• XML is a descendant of SGML

• XML allows designers to easily describe and deliver structured data from any application in a standard, consistent way

• XML can be embedded within an HTML document

• XML allows you to create your own customized markup language.

Extensible Markup Language

Page 40: Introduction to Web Science

40

• Tag – a piece of Markup– An opening tag <name>– A closing tag </name>

• Element – well formed usage of tags– <name>Alexiei</name>

• Attribute – properties– <name length=“7”>Alexiei</name>

• Rules to keep XML well formed1. Can be nested but not overlapping 2. Case sensitivity3. Quoted attributes4. Required end tag

• Short hand– <abc></abc> is equivalent to <abc/>

Learn XML in a slide

Page 41: Introduction to Web Science

41

<book>E-Commerce</booK>

<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”><title>E-Commerce</title><author>

<name>Gary</name><surname>Schneider</surname>

</author></book>

Some XML examples

Page 42: Introduction to Web Science

42

<book>E-Commerce</booK>

<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”><title>E-Commerce</title><author>

<name>Gary</name><surname>Schneider</surname>

</author></book>

Some XML examples

Page 43: Introduction to Web Science

43

Processing a Request for an XML Page

• Why going through all this hassle?• How would you go about displaying HTML on a

– PC– Handheld – Mobile

Page 44: Introduction to Web Science

44

• Tim Berners-Lee invented HTML

• HTML is a document production language that includes a set of tags that define the format and style of a document

• HTML is based on SGML

• HTML is an instance of one particular SGML document type – Document Type Definition (DTD)

Hypertext Markup Language

Page 45: Introduction to Web Science

45

• An HTML document contains both document content and tags

• The tags are the HTML codes inserted in a document to specify the format on screen

• Each tag is enclosed in brackets (< >)

• Most tags are two-sided – opening and closing tags

• Well formed tags, bots, meta tags?? Why are they important?

HTML Tags

Page 46: Introduction to Web Science

46

• Hyperlinks are bits of text that connect the current document to:– Another location in the same document– Another document on the same host machine– Another document on the Internet– Can they link to a toaster at home?

• Hyperlinks are created using the HTML anchor tag

• Two popular link structures:– Linear hyperlink structure– Hierarchical hyperlink structure

HTML Links

Page 47: Introduction to Web Science

47

• HTML version 1.0 was introduced in 1991

• HTML 2.0 was released in Sept. 1995

• HTML 3.2 was introduced in 1997

• HTML 4.0 was released by W3C in Dec 1997

• HTML 4.01 was released in Dec 1999

• XHTML 1.0 became a W3C recommendation in Jan 2000

HTML Version History

Page 48: Introduction to Web Science

48

• Low end editor displays HTML code on the screen and allow you to insert HTML tag pairs by clicking selected buttons

• High end editor are Web site builder programs, they provide a rich environment that displays the Web page, not the HTML code

• Microsoft FrontPage and Macromedia Dreamweaver are examples of Web site builders

HTML Editors (1)

Page 49: Introduction to Web Science

49

HTML Editors (2)

Page 50: Introduction to Web Science

50

• HTML and XML only display and exchange data• No interactivity; no processing of data

• Scripting languages– Provides basic interactivity

• Rollovers• Crawling text

– JavaScript– VBScript

• Full-featured Web programming– Java– Client side scripting or browser side scripting– Applets– J2EE

• Common Gateway Interface (CGI)– Allows passing of data between a static HTML page and a

computer program

Static versus Dynamic Pages

Page 51: Introduction to Web Science

51

• Most data on the Internet is part of the WWW

• Search engines – large databases that index WWW content

• Building the search engine database– Submit a site to the search engine administrator for listing

– Spiders• Metatags

– Google– Yahoo

Searching the WWW

Page 52: Introduction to Web Science

52

• A search engine is a special kind of Web page software that finds other Web pages that match a word or phrase you entered

• A Web directory is a listing of hyperlinks to Web pages that is organized into hierarchical categories Eg: http://directory.google.com/

• Search engines contain three major parts: spider, index, and utility

Search Engines

Page 53: Introduction to Web Science

53

Popular Search Engines

Page 54: Introduction to Web Science

54

Spiders and Crawlers

Page 55: Introduction to Web Science

55

Indexing

Page 56: Introduction to Web Science

56

• Search engine AltaVista was the Google of the last millennium

• First real effort to index the World Wide Web

• One of the few search engines that actually came up with good search results

• Had a hard time fighting spam listings in their results

• While spam grew logarithmic in Altavista, some company named Google found a way to prioritize web pages more intelligently, and thus keep spam out better

Search Engine case study

Page 57: Introduction to Web Science

57

• PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value

• Google interprets a link from page A to page B as a vote, by page A, for page B

• But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote

• Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Case Study: ’s PageRank

Page 58: Introduction to Web Science

58

• An intelligent agent is a program that performs functions such as – information gathering, – information filtering, – mediation running, – in the background on behalf of a person or

entity

• What agents can you think of?

Intelligent Agents

Page 59: Introduction to Web Science

59

• Search Agents– Improve your information retrieval on the Internet – Used to find pages on the Web easily and quickly

• Meta Agents, Specialised (MP3), etc

• Web Agents– Improve browsing experience

• Automate form filling, off-line browsing, etc

• Monitoring Agents– Monitor web sites or specific themes – Used to get automatic alerts about the latest news

Intelligent Agents (2)

Page 60: Introduction to Web Science

60

• Virtual Assistants– Artificial life– Characters, plants, animals or people living on your desktop

• Shop Bots– Allow users to compare prices on the Internet– Find the best price for books, CDs, movies, etc.

• Webmastering Agents– Make it easy to manage a Web site and make it more effective– Monitor broken links, content gathering etc.

Intelligent Agents (3)

Page 61: Introduction to Web Science

61

• Other agents …

– Development agents• Used to develop other agents

– Games agents• Used in games

Intelligent Agents (4)

Page 62: Introduction to Web Science

62

Ms Dewey not your ordinary search agent!

Page 63: Introduction to Web Science

63

• Internet Engineering Task Force (IETF)– Works in groups to develop standards

• Internet Engineering Steering Group (IESG)– Approves or disapproves standards developed by the

IETF

• Internet Architecture Board (IAB)– The oversight authority for the standards development

process

• World Wide Web Consortium (W3C)– Promotes the WWW and develops new web technologies

and standards

Internet Governance

Page 64: Introduction to Web Science

64

• We’re all very familiar with Web 1.0

• But what makes Web 2.0?

• Next lecture …

Conclusion

Page 65: Introduction to Web Science

65

Questions?