introduction to web programming ics213, 1 / 2011 dr. seung hwan kang
TRANSCRIPT
Introduction to Web Programming
ICS213, 1 / 2011Dr. Seung Hwan Kang
2
Outline•1
.1 A Brief Introduction to the Internet•1
.2 The World Wide Web•1
.3 Web Browsers•1
.4 Web Servers•1
.5 Uniform Resource Locators (URL)•1
.6 Multipurpose Internet Mail Extensions (MIME)•1
.7 The Hypertext Transfter Protocol (HTTP)•1
.8 Security•1
.9 The Web Programmer’s Toolbox
2
3
1.1 A Brief Introduction to the Internet
1.1.1 Origins
–Advanced Research Projects Agency Network (ARPANET) - late 1960s and early 1970s
• Network reliability • For ARPA-funded research organizations
•BITnet, CSnet - late 1970s & early 1980s
• email and file transfer for other institutions
–National Science Foundation internet (NSFnet) - 1986 • Originally for non-DOD (Department of Defense) funded places• Initially connected five supercomputer centers• By 1990, it had replaced ARPAnet for non-military uses• Soon became the network for all (by the early 1990s)
–NSFnet eventually became known as the Internet
3
4
1.1 A Brief Introduction to the Internet
1.1.2 What the Internet is:
• A world-wide network of computer networks• At the lowest level, since 1982, all connections use TCP/IP
• TCP/IP hides the differences among devices connected to the Internet
4
5
1.1 A Brief Introduction to the Internet
1.1.2 What the Internet is: (cont’d)
•Transmission Control Protocol (TCP) Provides–Error-free data transportation–In-order delivery (data will always arrive in the order in which it was sent)
–Unsegmented data stream (can dribble out data in any size at any time)
•In networking terms, the HTTP (HyperText Transfer Protocol) is layered over TCP. HTTP uses TCP to transport its message data. Likewise, TCP is layered over IP(Internet Protocol)
5
6
1.1 A Brief Introduction to the Internet
1.1.3 Internet Protocol (IP) Addresses
–Every node has a unique numeric address
–Form: 32-bit binary number• New standard, IPv6, has 128 bits (1998)
•Organizations are assigned groups of IPs for their computers
•Problem: By the mid-1980s, several different protocols had been invented and were being used on the Internet, all with different user interfaces (Telnet, FTP, Usenet, mailto)
6
7
1.1 A Brief Introduction to the Internet
1.1.4 Domain Names
• Form: host-name.domain-names
• First domain is the smallest; last is the largest
• Last domain specifies the type of organization
• Fully qualified domain name - the host name and all of the domain names
• DNS (Domain Name System) servers - convert fully qualified domain names to IPs
7
8
1.1 A Brief Introduction to the Internet
•1.1.4 Domain Names (cont’d)
8
Figure 1.1 Domain Name Conversion
9
1.1 A Brief Introduction to the Internet
Client and Server
•Clients and Servers are programs that communicate with each other over the Internet
•A Server runs continuously, waiting to be contacted by a Client–Each Server provides certain services–Services include providing web pages
•A Client will send a message to a Server requesting the service provided by that server–The client will usually provide some information, parameters, with the request
9
1010
1.1 A Brief Introduction to the Internet
1. Browser requests for a particular HTML file
2. The server locates the file and sends it to the browser
3. The browser displays the file
BROWSER
SERVER
Client and Server - Static HTML pages172.17.28.45hello.htm
1111
1.1 A Brief Introduction to the Internet
1. Browser sends request to the server
2. The server locates the CGI program and passes the request information
5. The browser displays the data
BROWSER
SERVER
3. The CGI program processes the request and sends data to the server
4. The server sends the data to the browser
Client and Server - Static CGI Scriptshello.cgi
1212
1.1 A Brief Introduction to the Internet
1. Browser requests to the server
2. The server checks the file and executes the embedded scripts
4. The browser displays the document
BROWSER
SERVER
3. This code is passed to the appropriate interpreter then it generates the final formatted document
Client and Server – Server-side Scriptinghello.php
13
1.2 The World-Wide Web•A
possible solution to the proliferation of different protocols being used on the Internet
1.2.1 Origins–Tim Berners-Lee at CERN (The European Organization for Nuclear Research) proposed the Web in 1989• Purpose: to allow scientists to have access to many databases of
scientific work through their own computers• http://www.w3.org/History/1989/proposal.html
–Document form: hypertext–Pages? Documents? Resources?
• We’ll call them documents
–Hypermedia – more than just text – images, sound, etc.
13
14
1.2 The World-Wide Web1
.2.2 Web or the Internet?• The Web uses one of the protocols, HTTP, that runs on the Internet--there are several others (telnet, FTP, mailto, etc.)
•The Semantic Web - “Machine-Understandable information” by Tim Berners-Lee (1998)
14The Semantic Web as a “layer cake” (Source: James Hendler, 2001 p. 30)
15
1.3 Web Browsers•B
rowsers are clients - always initiate, servers react (although sometimes servers require responses)
•Mosaic - NCSA (The National Center for Supercomputing Applications as a unit of the Univ. of Illinois), in early 1993–First to use a GUI (Graphic User Interface), led to explosion of Web use
–Initially for X-Windows, under UNIX, but was ported to other platforms by late 1993
•Most requests are for existing documents, using HTTP–But some requests are for program execution, with the output being returned as a document
15
1616
Mosaic Beta version 0.4, Sep 9 1994
17
1.4 Web Servers•P
rovide responses to browser requests, either existing documents or dynamically built documents
•Browser-server connection is now maintained through more than one request-response cycle
•All communications between browsers and servers use HTTP
17
18
1.4 Web Servers (cont’d)1
.4.1 Web Server Operation
•Web servers run as background processes in the operating system
–Monitor a communication port on the host, accepting HTTP messages when it appears
•All current Web servers came from either
1.The original from CERN
2.The second one, from NCSA
18
19
1.4 Web Servers (cont’d)1.4.2 General
Server Characteristics•W
eb servers have two main directories:1.Document root (servable documents)
2.Server root (server system software)
•Document root is accessed indirectly by clients– Its actual location is set by the server configuration file–Requests are mapped to the actual location
•Virtual document trees
•Virtual hosts
•Proxy servers
•Web servers now support other Internet protocols
19
C:\xampp\apache\conf\httpd.conf
20
1.4 Web Servers (cont’d)1
.4.3 Apache•A
pache (open source, fast, reliable)–Directives (operation control):
ServerName
ServerRoot
ServerAdmin,
DocumentRoot
Alias
Redirect
DirectoryIndex
UserDir
http://httpd.apache.org/
20
21
1.4 Web Servers (cont’d)1
.4.4 IIS
•I
nternet Information Server
- Operation is maintained through a program with a
GUI interface
21
22
1.5 Uniform Resource Locators (URLs)
1
.5.1 URL Formats
scheme:object-address
– The scheme is often a communications protocol, such as
http, gopher, news, telnet or ftp
•F
or the http protocol, the object-address is: fully
qualified domain name/doc path
•F
or the file protocol, only the doc path is needed
22
23
1.5 Uniform Resource Locators1
.5.1 URL Formats
•H
ost name may include a port number, as in zeppo:80
(80 is the default)
•U
RLs cannot include spaces or any of a collection of
other special characters (semicolons, colons, ...)
• e.g. http://ic.payap.ac.th/index.php
23
2424
1.5 Uniform Resource Locators1.5.2 URL
Paths
•T
he doc path may be abbreviated as a partial path
–The rest is furnished by the server configuration
// C:\xampp\apache\conf\httpd.conf
DocumentRoot "C:\xampp\htdocs"
•I
f the doc path ends with a slash, it means it is a directory
http://
www.payap.ac.th/
http://
cis.payap.ac.th/index.php
25
1.6 Multipurpose Internet Mail Extensions (MIME)
•Originally developed for e-mail
•Used to specify to the browser the form of a file returned by the server (attached by the server to the beginning of the document)
25
2626
1.6 Multipurpose Internet Mail Extensions (MIME)
1.6.1 Type specifications
–Form: type/subtype
–Examples: text/plain, text/html, image/gif, image/jpeg
•Server gets type from the requested file name’s suffix (.html implies text/html)
C:\xampp\apache\conf\mime.types
•Browser gets the type explicitly from the server
27
1.6 Multipurpose Internet Mail Extensions (MIME)
•1.6.2 Experimental Document Types
•Subtype begins with x-
• e.g., video/x-msvideo
•Experimental types require the server to send a helper application or plug-in so the browser can deal with the file
27
28
1.7 The Hypertext Transfer Protocol (HTTP)
•The protocol used by ALL Web communications
•Invented by Tim Berners-Lee in 1990
•RFC 1945 (1996) - HTTP/1.0
•RFC 2068 (1997) - HTTP/1.1
•RFC 2616 (1999) - HTTP/1.1 • (update to 2068)
28
29
1.7 HTTP (cont’d)Fea
tures of HTTP•A
pplication level, client-server protocol• Primarily for distributed hypermedia systems• Flexible - thus has many other uses - e.g.
• Name servers• Distributed & collaborative document management systems
•HTTP is small and fast• Minimal performance overhead• Easy to implement
•HTTP is a stateless protocol• Each request is an independent transaction - unrelated to any previous
requests (unlike session-based protocols, e.g. FTP)• Advantage
• Simplifies server design - information about previous transactions does not need to be stored
• Disadvantage• More information must be included in each request
29
30
1.7 HTTP (cont’d)H
TTP Operation
•On the Internet HTTP usually uses TCP/IP connections
•TCP Port 80 is the default (though others can be specified)
•HTTP uses a Request/Response paradigm• Client establishes a connection to the server, and sends it a request
• Server responds to the request by generating a response (which may or may not contain contents)
30
31
1.7.1 The Request Phase•R
equest Phase
• Form:
1.HTTP method domain part of URL HTTP version
2.Header fields
3.blank line
4.Message body
• An example of the first line of a request:GET /degrees.html HTTP/1.1
31
32
1.7.1 The Request Phase
32
Table 1.1 HTTP Request Methods
33
1.7.1 The Request PhaseH
TTP Headers
•F
our categories of header fields:
General, request, response, & entity
•C
ommon request fields:
Accept: text/plain
Accept: text/*
If-Modified_since: date33
34
1.7.1 The Request PhaseHTTP
Headers (cont’d)
•C
ommon response fields:
Content-length: 488
Content-type: text/html
•C
an communicate with HTTP without a browser
> telnet blanca.uccs.edu http
GET /respond.html HTTP/1.1
Host: blanca.uccs.edu
34
35
1.7.2 The Response Phase•F
orm:
• Status line
• Response header fields
• blank line
• Response body
•Status line format:
HTTP version status code explanation
• e.g. HTTP/1.1 200 OK
(Current version is 1.1)
35
36
1.7.2 The Response Phase (cont’d)•S
tatus code is a three-digit number; first digit specifies the general status
1 => Informational
2 => Success
3 => Redirection
4 => Client error
5 => Server error
•The header field, Content-type, is required
36
37
1.7.2 The Response Phase (cont’d)200 : OK201 : Created202 : Accepted204 : No Content301 : Moved
Permanently302 : Moved
Temporarily400 : Bad Request401 :
Unauthorized403 : Forbidden404 : Not Found500 : Internal
Server Error503 : Service
Unavailable504: Gateway
Timeout505: HTTP Version
Not Supported
37
38
1.7.2 The Response Phase (cont’d)HTTP Response
Example
HTTP/1.1 200 OK
Date: Tues, 18 May 2004 16:45:13 GMT
Server: Apache (Red-Hat/Linux)
Last-modified: Tues, 18 May 2004 16:38:38 GMT
Etag: "841fb-4b-3d1a0179"
Accept-ranges: bytes
Content-length: 364
Connection: close
Content-type: text/html, charset=ISO-8859-1
•Both request headers and response headers must be followed by a blank line
38
39
1.8 Security•T
he security issues for security breaches are as follows:
1. Privacy – it must not be possible for the private information to be stolen while on its way to the server. e.g. credit card number
2. Integrity – it must not be possible for the private information to be modified on its way to the server. e.g. encrypted data instead of $1,000
3. Authentication – it must be possible for both the client and the server to be certain of each other’s identity.
4. Nonrepudiation – it must be possible to legally prove that the message was actually sent and received.
40
1.9 The Web Programmer’s Toolbox•D
ocument languages and programming languages that are the building blocks of the web and web programming
• (X)HTML• Plug-ins• Filters• XML• JavaScript• Java, Perl, Ruby, PHP
40
41
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.1 Overview of (X)HTML
•To describe the general form and layout of documents
•An (X)HTML document is a mix of content and controls–Controls are tags and their attributes
•Tags often delimit content and specify something about how the content should be arranged in the document
•Attributes provide additional information about the content of a tag
41
42
1.9 The Web Programmer’s Toolbox
(cont’d)1
.9.2 Tools of Creating (X)HTML documents
•XHTML editors - make document creation easier • Shortcuts to typing tag names, spell-checker,
•WYSIWYG (what-you-see-is-what-you-get) XHTML editors
• Need not know XHTML to create XHTML documents
42
43
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.3 Plug-ins and Filters
•Plug ins• Integrated into tools like word processors, effectively converting them to WYSIWYG XHTML editors
•Filters• Convert documents in other formats to XHTML
43
44
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.3 Plug-ins and Filters (cont’d)
•Advantages of both filters and plug-ins:–Existing documents produced with other tools can be converted to XHTML documents
–Use a tool you already know to produce XHTML
•Disadvantages of both filters and plug-ins:–XHTML output of both is not perfect - must be fine tuned–XHTML may be non-standard–You have two versions of the document, which are difficult to synchronize
44
45
1.9 The Web Programmer’s Toolbox
(cont’d)1.9.
4 Overview of XML
•A meta-markup language
•Used to create a new markup language for a particular purpose or area
•Because the tags are designed for a specific area, they can be meaningful
•No presentation details
•A simple and universal way of representing data of any textual kind
45
46
1.9 The Web Programmer’s Toolbox
(cont’d)1
.9.5 JavaScript
•A client-side HTML-embedded scripting language
•Only related to Java through syntax
•Dynamically typed and not object-oriented
•Provides a way to access elements of HTML documents and dynamically change them
46
47
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.6 Overview of Java
•General purpose object-oriented programming language
•Based on C++, but simpler and safer
•Our focus is on applets, servlets, and JSP (JavaServer Pages)
47
48
1.9 The Web Programmer’s Toolbox
(cont’d)1.9.
7 Overview of Perl
•Provides server-side computation for HTML documents, through CGI
•Perl is good for CGI programming because:–Direct access to operating systems functions–Powerful character string pattern-matching operations–Access to database systems
•Perl is highly platform independent, and has been ported to all common platforms
•Perl is not just for CGI
48
49
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.9 Overview of PHP
•A server-side scripting language
•An alternative to CGI programming
•Similar to JavaScript
•Great for form processing and database access through the Web
49
50
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.9 Overview of Ruby
•A server-side scripting language
•An alternative to CGI
•Similar to PHP but it is pure object-oriented
•Great for form processing and database access through the Web
50
51
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.10 Overview of Rails
•Ruby on Rail is based on Model-View-Controller (MVC) architecture for applications, which clearly separates the presentation and the data model from program logic.
52
1.9 The Web Programmer’s Toolbox (cont’d)
1.9.11 Overview of Ajax
•Asynchronous JavaScript + XML (not AJAX)
•An enriched Web experience for those using a certain category of Web interactions• e.g. Google API (http://code.google.com)
52
53
TIOBE Programming Community Index for June 2010
(source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html)
54
References1.B
erners-Lee, T. (1989) Information Management: A Proposal, http://www.w3.org/History/1989/proposal.html Accessed: 27 November 2003.
2.Berners-Lee, T., Hendler, J. & Lassila, O. (2001) Agents and the Semantic Web, http://www.cs.umd.edu/users/hendler/AgentWeb.html Accessed: 27 November 2003.
3.Berners-Lee, T. (1989) Semantic Web Road map, http://www.w3.org/DesignIssues/Semantic.html Accessed: 27 November 2003.
4.Google Code (2010) Google, http://code.google.com/ Accessed: 1 March 2010.
5.Hendler, J. (2001) Agents and the Semantic Web, The IEEE Intelligent Systems, (Mar./Apr.), pp. 30-37.
6.Robert W. Sebesta (2008) Programming the World Wide Web, 4th edn, Pearson/Addison Wesley. (Chapter 1)
7.TIOBE SOFTWARE ( 2010) TIOBE Programming Community Index for May 2011, http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Accessed: 06 June 2011.
54