unit 6

38
1 UNIT 6 BUILDING WEB APLLICATIONS 6.1 APLLICATION LAYER PROTOCLS Domain Name System (DNS): TCP/UDP port 53 HTTP: TCP port 80 Simple Mail Transfer Protocol (SMTP): TCP port 25 Post Office Protocol (POP): UDP port 110 Telnet: TCP port 23 DHCP: UDP port 67 FTP: TCP ports 20 and 21 6.1.1 DNS Services and Protocol In data networks, devices are assigned IP addresses so that they can participate in sending and receiving messages over the network. However, most people have a hard time remembering this numeric address. Hence, domain names were created to convert the numeric address into a simple, recognizable name. On the Internet, these domain names, such as http://www.cisco.com, are much easier for people to remember than 198.132.219.25, which, at the time of this writing, is the numeric address for this server. Also, if Cisco decides to change the numeric address, it is transparent to the user, because the domain name will remain http://www.cisco.com. The new address will simply be linked to the existing domain name and connectivity is maintained, as shown in Figure 3-11. When networks were small, it was a simple task to maintain the mapping between domain names and the addresses they represented. However, as networks began to grow and the number of devices increased, this manual system became unworkable.

Upload: vaibhav-panwar

Post on 12-Mar-2016

212 views

Category:

Documents


0 download

DESCRIPTION

tcp, protocols

TRANSCRIPT

Page 1: unit 6

1

UNIT 6

BUILDING WEB APLLICATIONS6.1 APLLICATION LAYER PROTOCLS

Domain Name System (DNS): TCP/UDP port 53 HTTP: TCP port 80 Simple Mail Transfer Protocol (SMTP): TCP port 25 Post Office Protocol (POP): UDP port 110 Telnet: TCP port 23 DHCP: UDP port 67 FTP: TCP ports 20 and 21

6.1.1 DNS Services and Protocol

In data networks, devices are assigned IP addresses so that they can participate in sending and receiving messages over the network. However, most people have a hard time remembering this numeric address. Hence, domain names were created to convert the numeric address into a simple, recognizable name. On the Internet, these domain names, such as http://www.cisco.com, are much easier for people to remember than 198.132.219.25, which, at the time of this writing, is the numeric address for this server. Also, if Cisco decides to change the numeric address, it is transparent to the user, because the domain name will remain http://www.cisco.com. The new address will simply be linked to the existing domain name and connectivity is maintained, as shown in Figure 3-11. When networks were small, it was a simple task to maintain the mapping between domain names and the addresses they represented. However, as networks began to grow and the number of devices increased, this manual system became unworkable.

Page 2: unit 6

2

DNS was created for domain name–to–address resolution for these networks. DNS uses a distributed set of servers to resolve the names associated with these numbered addresses.

6.1.1.1 HOW DNS WORKS

The DNS protocol defines an automated service that matches resource names with the required numeric network address. It includes the format for queries, responses, and data formats. DNS protocol communications use a single format called a message. This message format is used for all types of client queries and server responses, error messages, and the transfer of resource record information between servers. DNS is a client/server service; however, it differs from the other client/server services that you are examining. Whereas other services use a client that is an application (web browser, e-mail client, and so on), the DNS client runs as a service itself. The DNS client, sometimes called the DNS resolver, supports name resolution for the other network applications and other services that need it. When configuring a network device, you generally provide one or more DNS server addresses that the DNS client can use for name resolution. Usually the Internet service provider (ISP) gives you the addresses to use for the DNS servers. When a user’s application requests to connect to a remote device by name, the requesting DNS client queries one of these DNS servers to resolve the name to a numeric address.

Computer operating systems also have a utility called nslookup that allows the user to manually query the name servers to resolve a given host name. You also can use this utility to troubleshoot name resolution issues and to verify the current status of the name servers.

6.1.1.2 NAME RESOLUTION AND CACHING

A DNS server provides the name resolution using the name daemon, which is often called named (pronounced name-dee). The DNS server acts as the phone book for the Internet: It translates human-readable computer host names, for example, http://www.cisco.com, into the IP addresses that networking equipment needs for delivering information. The DNS server stores different types of resource records used to resolve names. These records contain the name, address, and type of record. Some of these record types are

A: An end device address NS: An authoritative name server CNAME: The canonical name (or fully qualified domain name [FQDN]) for an alias;

used when multiple services have the single network address but each service has its own entry in DNS

MX: Mail exchange record; maps a domain name to a list of mail exchange servers for that domain.

When a client makes a query, the “named” process first looks at its own records to see whether it can resolve the name. If it is unable to resolve the name using its stored records, it contacts other servers to resolve the name. The request can be passed along to a number of servers, which can take extra time and consume bandwidth. When a match is found and returned to the original requesting server, the server temporarily stores the numbered address that matches the name in the cache. If that same name is requested again, the first server can return the address by using the value stored in its name cache. Caching reduces both the DNS query data network traffic and the workloads of servers higher up the hierarchy. The DNS client service on Windows PCs optimizes the performance of DNS name resolution by storing previously resolved names in memory, as well. The ipconfig/displaydns command displays all the cached DNS entries on a Windows XP or 2000 computer system.

Page 3: unit 6

3

6.1.1.3 DNS HIERARCHY

DNS uses a hierarchical system to create a name database to provide name resolution. The hierarchy looks like an inverted tree with the root at the top and branches below. At the top of the hierarchy, the root servers maintain records about how to reach the toplevel domain servers, which in turn have records that point to the secondary-level domain servers and so on. The different top-level domains represent either the type of organization or the country oforigin. The following are examples of top-level domains are:

.au: Australia .co: Colombia .com: A business or industry .jp: Japan .org: A nonprofit organization

After top-level domains are second-level domain names, and below them are other lower-level domains. A great example of that is the domain name http://www.cisco.netacad.net. The .net is the top-level domain, .netacad is the second-level domain, and .cisco is at the lower level. Each domain name is a path down this inverted tree starting from the root. For example, as shown in Figure 3-12, the root DNS servers might not know exactly where the e-mail server mail.cisco.com is located, but they maintain a record for the .com domain within the top-level domain. Likewise, the servers within the .com domain might not have a record for mail.cisco.com, but they do have a record for the cisco.com secondary-level domain. Theservers within the cisco.com domain have a record (an MX record to be precise) for mail.cisco.com. DNS relies on this hierarchy of decentralized servers to store and maintain these resource records. The resource records list domain names that the server can resolve and alternative servers that can process requests. If a given server has resource records that correspond to its level in the domain hierarchy, it is said to be authoritative for those records.

Page 4: unit 6

4

6.1.2 WWW SERVICE AND HTTP

When a web address (or URL) is typed into a web browser, the web browser establishes a connection to the web service running on the server using HTTP. URLs and URIs (uniform resource identifiers) are the names most people associate with web addresses. Web browsers are the client applications computers use to connect to the World Wide Web and access resources stored on a web server. As with most server processes, the web server runs as a background service and makes different types of files available. To access the content, web clients make connections to the server and request the desired resources. The server replies with the resources and, upon receipt, the browser interprets the data and presents it to the user. Browsers can interpret and present many data types, such as plain text or HTML, the language in which web pages are constructed). Other types of data, however, might require another service or program, typically referred to as a plug-in or add-on. To help the browser determine what type of file it is receiving, the server specifies what kind of data the file contains. First, the browser interprets the three parts of the URL:

http: The protocol or scheme www.cisco.com: The server name web-server.htm: The specific filename requested

The browser then checks with a name server to convert http://www.cisco.com into a numeric address, which it uses to connect to the server. Using the HTTP requirements, the browser sends a GET request to the server and asks for the file web-server.htm. The server in turn sends the HTML code for this web page to the browser. Finally, the browser deciphers the HTML code and formats the page for the browser window.

HTTP, one of the protocols in the TCP/IP suite, was originally developed to publish and retrieve HTML pages and is now used for distributed, collaborative information systems. HTTP is used across the world wide web for data transfer and is one of the most used application protocols.HTTP specifies a request/response protocol. When a client, typically a web browser, sends a request message to a server, the HTTP protocol defines the message types the client uses to request the web page and the message types the server uses to respond. The three common message types are:

GET POST PUT

GET is a client request for data. A web browser sends the GET message to request pages from a web server. As shown in Figure 3-13, when the server receives the GET request, it responds with a status line, such as HTTP/1.1 200 OK, and a message of its own, the body of which can be the requested file, an error message, or some other information.

POST and PUT are used to send messages that upload data to the web server. For example, when the user enters data into a form embedded in a web page, POST includes the data in the message sent to the server. PUT uploads resources or content to the web server. Although it is remarkably flexible, HTTP is not a secure protocol. The POST messages upload information to the server in plain text that can be intercepted and read. Similarly, the server responses, typically HTML pages, are unencrypted. For secure communication across the Internet, the Secure HTTP (HTTPS) protocol is used for accessing and posting web server information. HTTPS can use authentication and encryption to secure data as it travels between the client and server. HTTPS specifies additional rules for passing data between the application layer and the transport layer.

Page 5: unit 6

5

6.1.3 E-MAIL SERVICES AND SMTP/POP PROTOCOLS

E-mail, the most popular network service, has revolutionized how people communicate through its simplicity and speed. Yet to run on a computer or other end device, e-mail requires several applications and services. Two examples of application layer protocols are Post Office Protocol (POP) and Simple Mail Transfer Protocol (SMTP). As with HTTP, these protocols define client/server processes.

POP and POP3 (Post Office Protocol, version 3) are inbound mail delivery protocols and are typical client/server protocols. They deliver e-mail from the e-mail server to the client (MUA).

SMTP, on the other hand, governs the transfer of outbound e-mail from the sending client to the e-mail server (MDA), as well as the transport of e-mail between e-mail servers (MTA). (These acronyms are defined in the next section.) SMTP enables e-mail to be transported across data networks between different types of server and client software and makes e-mail exchange over the Internet possible. When people compose e-mail messages, they typically use an application called a Mail User Agent (MUA), or e-mail client. The MUA allows messages to be sent and places received messages into the client mailbox, both of which are distinct processes, as shown in Figure 3-14. To receive e-mail messages from an e-mail server, the e-mail client can use POP. Sending e-mail from either a client or a server uses message formats and command strings defined by the SMTP protocol. Usually an e-mail client provides the functionality of both protocols within one application.

Page 6: unit 6

6

6.1.4 E-MAIL SERVER PROCESSES: MTA AND MDA

The e-mail server operates two separate processes: Mail Transfer Agent (MTA) Mail Delivery Agent (MDA)

The Mail Transfer Agent (MTA) process is used to forward e-mail. As shown in Figure 3-15, the MTA receives messages from the MUA or from another MTA on another e-mail server. Based on the message header, it determines how a message has to be forwarded to reach its destination. If the mail is addressed to a user whose mailbox is on the local server, the mail is passed to the MDA. If the mail is for a user not on the local server, the MTA routes the e-mail to the MTA on the appropriate server.

In Figure 3-16, you see that the Mail Delivery Agent (MDA) accepts a piece of e-mail from a Mail Transfer Agent (MTA) and performs the delivery. The MDA receives all the inbound mail from the MTA and places it into the appropriate users’ mailboxes. The MDA can also resolve final delivery issues, such as virus scanning, spam filtering, and return-receipt handling.

Most e-mail communications use the MUA, MTA, and MDA applications. However, there are other alternatives for e-mail delivery. A client can be connected to a corporate e-mail

Page 7: unit 6

7

system, such as IBM Lotus Notes, Novell Groupwise, or Microsoft Exchange. These systems often have their own internal e-mail format, and their clients typically communicate with the e-mail server using a proprietary protocol.

The server sends or receives e-mail through the Internet through the product’s Internet mail gateway, which performs any necessary reformatting. If, for example, two people who work for the same company exchange e-mail with each other using a proprietary protocol, theirmessages can stay completely within the corporate e-mail system of the company. As another alternative, computers that do not have an MUA can still connect to a mail service on a web browser to retrieve and send messages in this manner. Some computers can run their own MTA and manage interdomain e-mail themselves.

The SMTP protocol message format uses a rigid set of commands and replies. These commands support the procedures used in SMTP, such as session initiation, mail transaction, forwarding mail, verifying mailbox names, expanding mailing lists, and the opening and closing exchanges. Some of the commands specified in the SMTP protocol are:

HELO: Identifies the SMTP client process to the SMTP server process EHLO: Is a newer version of HELO, which includes services extensions MAIL FROM: Identifies the sender RCPT TO: Identifies the recipient DATA: Identifies the body of the message.

6.1.5 FTP

FTP is another commonly used application layer protocol. FTP was developed to allow file transfers between a client and a server. An FTP client is an application that runs on a computer that is used to push and pull files from a server running the FTP daemon (FTPd). To successfully transfer files, FTP requires two connections between the client and the server: one for commands and replies, and the other for the actual file transfer. The client establishes the first connection to the server on TCP port 21.

This connection is used for control traffic, consisting of client commands and server replies. The client establishes the second connection to the server over TCP port 20. This connection is for the actual file transfer and is created every time a file is transferred. The file transfer can happen in either direction, as shown in Figure 3-17. The client can download (pull) a file from the server or upload (push) a file to the server.

Page 8: unit 6

8

6.1.6 DHCP

The DHCP enables clients on a network to obtain IP addresses and other information from a DHCP server. The protocol automates the assignment of IP addresses, subnet masks, gateway, and other IP networking parameters. DHCP allows a host to obtain an IP address dynamically when it connects to the network. The DHCP server is contacted by sending a request, and an IP address is requested. The DHCP server chooses an address from a configured range of addresses called a pool and assigns it to the host client for a set period. On larger networks, local networks, or where the user population changes frequently, DHCP is preferred. New users might arrive with laptops and need a connection. Others have new workstations that need to be connected. Rather than have the network administrator assign IP addresses for each workstation, it is more efficient to have IP addresses assigned automatically using DHCP. When a DHCP-configured device boots up or connects to the network, the client broadcasts a DHCP DISCOVER packet to identify any available DHCP servers on the network. A DHCP server replies with a DHCP OFFER, which is a lease offer message with an assigned IP address, subnet mask, DNS server, and default gateway information as well as the duration of the lease.

DHCP-distributed addresses are not permanently assigned to hosts but are only leased for a period of time. If the host is powered down or taken off the network, the address is returned to the pool for reuse. This is especially helpful with mobile users who come and go on a network. Users can freely move from location to location and re-establish network connections. The host can obtain an IP address after the hardware connection is made, either through a wired or wireless LAN.

DHCP makes it possible for you to access the Internet using wireless hotspots at airports or coffee shops. As you enter the area, your laptop DHCP client contacts the local DHCP server through a wireless connection. The DHCP server assigns an IP address to your laptop. Various types of devices can be DHCP servers when running DHCP service software. The DHCP server in most medium to large networks is usually a local dedicated PC-based server. With home networks, the DHCP server is usually located at the ISP, and a host on the home network receives its IP configuration directly from the ISP. Many home networks and small businesses use an Integrated Services Router (ISR) device to connect to the ISP. In this case, the ISR is both a DHCP client and a server. The ISR acts as a client to receive its IP

Page 9: unit 6

9

configuration from the ISP and then acts a DHCP server for internal hosts on the local network.

DHCP can pose a security risk because any device connected to the network can receive an address. This risk makes physical security an important factor when determining whether to use dynamic or static (manual) addressing. Dynamic and static addressing have their places in network designs. Many networks use both DHCP and static addressing. DHCP is used for general-purpose hosts such as end-user devices, and static, or fixed, addresses are used for network devices such as gateways, switches, servers, and printers.

The client can receive multiple DHCP OFFER packets if the local network has more than one DHCP server. The client must choose between them and broadcast a DHCP REQUEST packet that identifies the explicit server and lease offer that it is accepting. A client can choose to request an address that it had previously been allocated by the server. Assuming that the IP address requested by the client, or offered by the server, is still valid, the chosen server would return a DHCP ACK (acknowledgment) message. The ACK message lets the client know that the lease is finalized. If the offer is no longer valid for some reason, perhaps because of a timeout or another client allocating the lease, the chosen server must respond to the client with a DHCP NAK (negative acknowledgment) message. When the client has the lease, it must be renewed prior to the lease expiration through another DHCP REQUEST message. The DHCP server ensures that all IP addresses are unique. (An IP address cannot be assigned to two different network devices simultaneously.)

Page 10: unit 6

10

6.1.7 FILE-SHARING SERVICES AND SMB PROTOCOLServer Message Block (SMB) is a client/server file-sharing protocol. IBM developed SMB in the late 1980s to describe the structure of shared network resources, such as directories, files, printers, and serial ports. It is a request/response protocol. Unlike the file sharing supported by FTP, clients establish a long-term connection to servers. After the connection is established, the user of the client can access the resources on the server as if the resource is local to the client host. SMB file-sharing and print services have become the mainstay of Microsoft networking. With the introduction of the Windows 2000 series of software, Microsoft changed the underlying structure for using SMB. In previous versions of Microsoft products, the SMB services used a non-TCP/IP protocol to implement name resolution. Beginning with Windows 2000, all subsequent Microsoft products use DNS naming. This allows TCP/IP protocols to directly support SMB resource sharing, as shown in Figure 3-19.The Linux and UNIX operating systems also provide a method of sharing resources with Microsoft networks using a version of SMB called SAMBA. The Apple Macintosh operating systems also support resource sharing using the SMB protocol.

The SMB protocol describes file system access and indicates how clients can make requests for files. It also describes the SMB protocol interprocess communication. All SMB messages share a common format. This format uses a fixed-sized header followed by a variablesized parameter and data component.

SMB messages can perform the following tasks: Start, authenticate, and terminate sessions Control file and printer access Allow an application to send or receive messages to or from another device

6.1.8 TELNET SERVICES AND PROTOCOL

Long before desktop computers with sophisticated graphical interfaces existed, people used text-based systems that were often just display terminals physically attached to a central computer. After networks were available, people needed a way to remotely access the computer systems in the same manner that they did with the directly attached terminals. Telnet was developed to meet that need. It dates back to the early 1970s and is among the oldest of the application layer protocols and services in the TCP/IP suite. Telnet is a client/server protocol that provides a standard method of emulating text-based terminal

Page 11: unit 6

11

devices over the data network. Both the protocol itself and the client software that implements the protocol are commonly referred to as Telnet. The Telnet service is depicted inFigure 3-21.

Appropriately enough, a connection using Telnet is called a VTY (Virtual Terminal) session, or connection. Telnet specifies how a VTY session is established and terminated. It also provides the syntax and order of the commands used to initiate the Telnet session, and it provides control commands that can be issued during a session. Each Telnet command consists of at least 2 bytes. The first byte is a special character called the Interpret as Command (IAC) character. As its name implies, the IAC character defines the next byte as a command rather than text. Rather than using a physical device to connect to the server, Telnet uses software to create a virtual device that provides the same features of a terminal session with access to the server command-line interface (CLI).

To support Telnet client connections, the server runs a service called the Telnet daemon. A virtual terminal connection is established from an end device using a Telnet client application.

Most operating systems include an application layer Telnet client. On a Microsoft Windows PC, Telnet can be run from the command prompt. Other common terminal applications that run as Telnet clients are HyperTerminal, Minicom, and TeraTerm. When a Telnet connection is established, users can perform any authorized function on the server, just as if they were using a command-line session on the server itself. If authorized, they can start and stop processes, configure the device, and even shut down the system. The following are some sample Telnet protocol commands:

Are You There (AYT): Enables the user to request that a response, usually a prompt icon, appear on the terminal screen to indicate that the VTY session is active.

Erase Line (EL): Deletes all text from the current line. Interrupt Process (IP): Suspends, interrupts, aborts, or terminates the process to

which the virtual terminal is connected. For example, if a user started a program on the Telnet server through the VTY, he or she could send an IP command to stop the program.

Although the Telnet protocol supports user authentication, it does not support the transport of encrypted data. All data exchanged during a Telnet session is transported as plain text across the network. This means that the data can be intercepted and easily understood. The Secure Shell (SSH) protocol offers an alternate and secure method for server access. SSH provides the structure for secure remote login and other secure network services. It also provides stronger authentication than Telnet and supports the transport of session data using encryption. As a best practice, network professionals should use SSH in place of Telnet, whenever possible.

Page 12: unit 6

12

6.2 PRINCIPLES OF WEB ENGINEERING

In the absence of a disciplined approach to Web-based system development, we will find sooner or later that Web-based applications are not delivering desired performance and quality, and that development process becomes increasingly complex and difficult to manage and refine and also expensive and grossly behind schedule.Web Engineering, an emerging new discipline, advocates a process and a systematic approach to development of high quality Internet- and Web-based systems.

We provide a broad and objective definition of Web engineering as follows.

Web engineering is the establishment and use of sound scientific, engineering and management principles and disciplined and systematic approaches to the successful development, deployment and maintenance of high quality Web-based systems and applications.

Web engineering principles and approaches can bring the potential chaos in Web-based system development under control, minimise risks, and enhance maintainability and quality.

An emerging discipline is stretching itself across the professional Web world, one that some say will help pave the often bumpy road to the Internet for aspiring and practicing professionals and the customers they serve. It is called Web engineering. Although the title remains relatively unkown, (introduced in Brisbane at the 1998 WWW7 annual conference), many Web pros have been applying and advocating its principles for years.Loosely defined, Web engineering incorporates principles and guidelines to design, manage and maintain complex websites, applications and functionality with a keen eye toward usability, performance, security, quality and reliability. Simply put, Web engineering takes existing best practices and formalizes them into a sound methodology that everyone can comfortably adhere to and understand.Several professions, including software engineering, publishing, accounting, medical and legal fields have used the processes to produce better services — and Web professionals can benefit from their experience.Others agree and, to their credit, have led the cause. In a 2005 paper titled “Web Engineering: Introduction and Perpectives” by San Murugesan, of Southern Cross University and Athula Ginige of University of Western Sydney, they state “Many organizations are heading toward a Web crisis in which they are unable to keep the system updated and/or grow their system at the rate that is needed. This crisis involves the proliferation of quickly ‘hacked together’ Web systems that are kept running via continual stream of patches or upgrades developed without systematic approaches.” Essentially, as the Web continues to grow at an exponential rate and methods of producing and managing websites becomes more fragmented, a standard is necessary to promote a professional environment and create a consistent user experience.However, what’s been missing from the effort to support Web engineering principles — and the title of Web Engineer — is a solid business case for why Web engineering is important to the aspiring and practicing Web professionals and those that teach, hire or employ them.For the practicing Web professional within the enterprise (large sites and applications) Web engineering can provide:

Processes, techniques and principles that can improve upon the development, deployment and management of sites and Web applications saving time and valuable resources.

Page 13: unit 6

13

Better metrics and ROI benchmarks that can be shared across departments, business units and among external clients.

A respectable title to existing skill sets that are often misunderstood and taken for granted by human resource and hiring manager communities.

For the small to medium business Web professionals (Webmasters, Web Designers and Web Developers) Web engineering represents:

Resources to better educate their customer base (internal and external) regarding the complexity of the skills required to develop and manage today’s websites.

Keeping up with industry advances and developments that can be effectively and efficiently incorporated across all departments.

Promoting a higher level professionalism on the Web For the World Organization of Webmasters (WOW), Web engineering represents: An elevation of the profession and promotion of a higher wage scale. Validation of its organizational efforts to standardize and support the Web profession

dating back to 1996It’s time to take the Web profession to a higher level. Perhaps Web engineering will not only be a boon to the industry but elevate the status of Web professionals worldwide. Managing complex and useful websites is no easy task, and using basic engineering principles is one way to improve and standardize the ongoing development of the Web as a whole.

6.3 DATABASE DRIVEN WEBPAGES

6.3.1 WHAT ARE DYNAMIC WEB PAGES?

To understand dynamic web pages, you have to understand normal or in other words 'static' web pages. Typical non-dynamic web pages do not change every time the page is loaded by the browser, nor do they change if a user clicks on a button. The only change that you will see in static web pages is to see them load and unload, like what happens when you click on a hyperlink. In a nutshell: static web pages (normal pages you build) always look the same and the content never changes unless you load a new page or you change the page yourself and upload the new version of the page to the web server. Dynamic pages do the opposite, they can change every time they are loaded (without you having to make those changes) and they can change their content based on what users do, like clicking on some text or an image. (I am not talking about loading a new page!)

6.3.2 DATABASE DRIVEN WEB PAGES

One of the most common types of dynamic web pages is the database driven type. This means that you have a web page that grabs information from a database (the web page is connected to the database by programming,) and inserts that information into the web page each time it is loaded. If the information stored in the database changes, the web page connected to the database will also change accordingly (and automatically,) without human intervention. This is commonly seen on online banking sites where you can log in (by entering your user name and password) and check out your bank account balance. Your bank account information is stored in a database and has been connected to the web page with programming thus enabling you to see your banking information. Imagine if the web page holding your banking information had to be built traditionally (that is by hand,) every time your bank balance changed! Even a thousand monkeys working 24/7 drinking 5 cups of coffee a day, would not be able to keep up!

Page 14: unit 6

14

Database driven sites can be built using several competing technologies, each with it’s own advantages. Some of those technologies/tools include:

1. PHP 2. JSP

3. ASP

4. PERL

5. Cold Fusion

6.3.4 BENEFITS OF A DATABASE DRIVEN WEBSITE

Database driven websites can provide much more functionality than a static site can. Extended functionality could include:

Enabling many (potentially non-technical) users to provide content for the website. Users can publish articles on the website without needing to FTP them to a web server.

Shopping cart

You can provide advanced search functionality that enables users to filter the results based on a given field. They can then sort those results by a field - say "Price" or "Date".

Customized homepage

You can allow your users to perform tasks such as registering for a newsletter, post questions to your forums, provide comments on a blog, update their profile, etc.

Integration with corporate applications such as CRM systems, HR systems etc.

6.4 MIDDLEWARE

Middleware is a class of software technologies designed to help manage the complexity and heterogeneity inherent in distributed systems. It is defined as a layer of software above the operating system but below the application program that provides a common programming abstraction across a distributed system, as shown in Figure 1. In doing so, it provides a higher-level building block for programmers than Application Programming Interfaces (APIs) such as sockets that are provided by the operating system. This significantly reduces the burden on application programmers by relieving them of this kind of tedious and error-prone programming. Middleware is sometimes informally called “plumbing” because it connects parts of a distributed application with data pipes and then passes data between them.

Middleware frameworks are designed to mask some of the kinds of heterogeneity that programmers of distributed systems must deal with. They always mask heterogeneity of networks and hardware. Most middleware frameworks also mask heterogeneity of operating systems or programming languages, or both. A few such as CORBA also mask heterogeneity among vendor implementations of the same middleware standard. Finally, programming abstractions offered by middleware can provide transparency with respect to distribution in

Page 15: unit 6

15

one or more of the following dimensions: location, concurrency, replication, failures, and mobility.

The classical definition of an operating system is “the software that makes the hardware useable.” Similarly, middleware can be considered to be the software that makes a distributed system programmable. Just as a bare computer without an operating system could be programmed with great difficulty, programming a distributed system is in general much more difficult without middleware, especially when heterogeneous operation is required. Likewise, it is possible to program an application with an assembler language or even machine code, but most programmers find it far more productive to use high-level languages for this purpose, and the resulting code is of course also portable.

6.4.1 CATEGORIES OF MIDDLEWAREThere are a small number of different kinds of middleware that have been developed. These vary in terms of the programming abstractions they provide and the kinds of heterogeneity they provide beyond network and hardware.

DISTRIBUTED TUPLES:

A distributed relational databases (see Distributed Databases) offers the abstraction of distributed tuples, and are the most widely deployed kind of middleware today. Its Structured Query Language (SQL) allows programmers to manipulate sets of these tuples (a database) in an English-like language yet with intuitive semantics and rigorous mathematical foundations based on set theory and predicate calculus. Distributed relational databases also offer the abstraction of a transaction. Distributed relational database products typically offer heterogeneity across programming languages, but most do not offer much, if any, heterogeneity across vendor implementations. Transaction Processing Monitors (TPMs) are commonly used for end-to-end resource management of client queries, especially server-side process management and managing multi-database transactions.

Page 16: unit 6

16

Linda (see Linda) is a framework offering a distributed tuple abstraction called Tuple Space (TS). Linda’s API provides associative access to TS, but without any relational semantics. Linda offers spatial decoupling by allowing depositing and withdrawing processes to be unaware of each other’s identities. It offers temporal decoupling by allowing them to have non-overlapping lifetimes. Jini is a Java framework for intelligent devices, especially in the home. Jini is built on top of JavaSpaces, which is very closely related to Linda’s TS.

REMOTE PROCEDURE CALL:

Remote procedure call (RPC; see Remote Procedure Calls) middleware extends the procedure call interface familiar to virtually all programmers to offer the abstraction of being able to invoke a procedure whose body is across a network. RPC systems are usually synchronous, and thus offer no potential for parallelism without using multiple threads, andthey typically have limited exception handling facilities.

MESSAGE-ORIENTED MIDDLEWARE:

Message-Oriented Middleware (MOM) provides the abstraction of a message queue that can be accessed across a network. It is a generalization of the well-known operating system construct: the mailbox. It is very flexible in how it can be configured with the topology of programs that deposit and withdraw messages from a given queue. Many MOM products offer queues with persistence, replication, or real-time performance. MOM offers the same kind of spatial and temporal decoupling that Linda does.

DISTRIBUTED OBJECT MIDDLEWARE:

Distributed object middleware provides the abstraction of an object that is remote yet whose methods can be invoked just like those of an object in the same address space as the caller. Distributed objects make all the software engineering benefits of object-oriented techniques encapsulation, inheritance, and polymorphism available to the distributed application developer.The Common Object Request Broker Architecture is a standard for distributed object computing. It is part of the Object Management Architecture (OMA), developed by the Object Management Group (OMG), and is the broadest distributed object middleware available in terms of scope. It encompasses not only CORBA’s distributed object abstraction but also other elements of the OMA which address general purpose and vertical market components helpful for distributed application developers. CORBA offers heterogeneity across programming language and vendor implementations. CORBA (and the OMA) is considered by most experts to be the most advanced kind of middleware commercially available and the most faithful to classical object oriented programming principles. Its standards are publicly available and well defined. DCOM is a distributed object technology from Microsoft that evolved from its Object Linking and Embedding (OLE) and Component Object Model (COM). DCOM’s distributed object abstraction is augmented by other Microsoft technologies, including Microsoft Transaction Server and Active Directory. DCOM provides heterogeneity across language but not across operating system or tool vendor. COM+ is the next-generation DCOM that greatly simplifies the programming of DCOM. SOAP is a distributed object framework from Microsoft that is based on XML and HyperText Transfer Protocols (HTTP). Its specification is public, and it provides heterogeneity across both language and vendor. Microsoft’s distributed object framework .NET also has heterogeneity across language and vendor among its stated goals.

Page 17: unit 6

17

Java has a facility called Remote Method Invocation (RMI) that is similar to the distributed object abstraction of CORBA and DCOM. RMI provides heterogeneity across operating system and Java vendor, but not across language. However, supporting only Java allows closer integration with some of its features, which can ease programming and provide greater functionality.

6.4.2 ROLE

Middleware is a term which refers to the set of services composed of IAA, APIs, and management systems which support the needs of a distributed, networked computing environment. As the name ``middleware'' suggests, it is, using the hierarchical model, a layer of services which sits between applications and the services they access. An excellent overview of middleware can be found in RFC 27681. The RFC's authors admit that even today, middleware is still not a well-defined term but the above definition is consistent with the RFC's and is sufficient for the purposes of this paper.

Middleware, at first glance, seems somewhat innocuous or invisible, especially when it works well. Looks can be deceiving, however. Middleware is a critical component in the CoS model. It is the middleware which supports the creation and removal of users, provides the ability to verify an identity, and manages the addition and removal of services associated with user ids, and even the addition and removal of services available to users. Within the enterprise, a good middleware solution makes the delivery of services transparent to users. There is no need to know which server is providing a given service and a single login/password provides access to all of the enterprise's services, subject to whatever policies have been established. In the business-to-business world, where multiple enterprises establish electronic relationships, middleware will perform the same function but it is complicated by the fact that there are currently no middleware standards in place.

The IT community today certainly recognizes the importance of middleware. The Internet2 community has formed a working group to investigate middleware. Companies like Microsoft (Active Directory) and Novell (NDS) have committed huge amounts of money and, in effect, bet their corporate futures on gaining adoption of their middleware implementations as standards. This makes middleware, in the authors' opinion; the single most important component in future networked computing environments. The Open Source community and the Linux community in particular, must recognize its importance.

A proprietary solution such as Active Directory or NDS will irreparably harm the Open Source movement. The success of Linux in the enterprise has largely been because of its adherence to open standards and availability of source code. This has permitted managers to easily deploy Linux into existing enterprise IT architectures where open standards exist. If the middleware standard isn't open the implications are obvious. For example, consider the SMB file protocol and the open-source Samba software. With the introduction of Microsoft Windows 2000 and the incorporation of Active Directory into file and print services, Samba's viablity was immediately threatened. This scenario will repeat itself over and over unless the middleware is standards-based and open.

Page 18: unit 6

18

6.5 ENTERPRISE WIDE WEB BASE

Over the last few years, many of our customers have struggled with the decision of the type of implementation they should follow for their reliability solutions. Quite often, there are conflicting requirements and needs, from both engineering and an information technologies (IT) standpoint, which necessitate a compromise between functionality, usability, ease of initial deployment, long-term support, cost, etc. The first question to consider is whether you need enterprise-wide software for your particular application. By “enterprise-wide,” we are referring to software that provides centralized data storage and allows multiple users to access the system simultaneously. This type of software is typically more expensive and more complex to implement. It requires adequate server hardware, experienced IT personnel to configure and support the server(s), appropriate licensing for the underlying database (e.g. ORACLE or SQL Server), a secure and reliable connection to the server for each user, ongoing database maintenance and backups, and so on.

An enterprise-wide system is likely to be appropriate if you are dealing with process-oriented analyses that require input and review by multiple people, such as FMEA or FRACAS. In these cases, the organization will benefit from centralized data storage and the ability for multiple users to log in to the system from various locations and query or update the shared information. ReliaSoft's Xfmea Enterprise and XFRACAS products are examples of enterprise-wide software designed to support the FMEA and FRACAS needs of the entire organization, while providing consistency, a feedback loop for corrective actions, a searchable "knowledge base" of known issues, etc.

Another powerful application for an enterprise-wide system would be an automated data analysis and presentation system, such as ReliaSoft's Dashboard. This type of system is designed to collect data from a variety of sources (e.g. shipments, warranty claims, failure analyses, etc.), automatically analyze the data and present the analysis results throughout the organization. This approach is appropriate for analyses that can be performed without input from an experienced analyst (such as line charts showing trends over time or bar charts of issues ranked by quantity) and usually requires custom development to establish the data flow, analysis and presentation mechanisms that are appropriate for an organization’s particular processes.

However, if you are working with individual statistical data analyses (such as fitting a distribution to life data or simulating the operation of a complex system over time), an enterprise-wide system is not required, even if you have many users across the enterprise. In this case, the analysis is typically (and appropriately) performed by one analyst at a time and usually requires the computing power and usability that a standalone software product such as ReliaSoft's Weibull++ or BlockSim can provide. Under this scenario, employing an enterprise-wide Web-based system would be both prohibitive and unnecessary.

Page 19: unit 6

19

If you have decided that an enterprise-wide system is appropriate, the next question to consider is whether it should be Web-based or client-server. With systems that are deployed via a Web browser, there is little or no software that needs to be installed or updated on each user's computer. This characteristic is understandably attractive to many IT departments! If a Web-based system can deliver the desired functionality, usability and performance then it may be preferred over a client-server approach, which requires software to be installed and updated for each user.

The technology available for developing Web-based systems has been improving and continues to improve all the time. However, for many types of tasks, users continue to expect a higher level of usability and performance than can currently be achieved in a Web-based interface. In those cases, your organization will need to carefully weigh the advantages and disadvantages of the relative convenience of the Web-based system versus the superior performance and usability of the client-server option… or consider a terminal server approach, as described next.

6.6 DISTRIBUTED OBJECTS

“An OBJECT which resides on a computer in a distributed system to which messages can easily be sent by programs executing on other computers. Such messages abstract away much of the programming details required to send data over a network, thus reducing the programming effort needed. The three main distributed object technologies are common object request broker architecture, remote method invocation, and distributed component object model.”

Distributed object technology seamlessly integrates different modules of software developed in more than one programming language and residing on systems of various platforms and architecture.

One of the advantages most commonly cited for adopting the object approach is that it is a more "natural" way of capturing real-world situations within a programme. It is easy to see that the object concepts, of self-contained, encapsulated data entities communicating with other such entities, are easily paralleled within real distributed systems.

An object in our system consists of data encapsulated by a fixed number of methods (operations). That is, the only way the data can. be accessed (read from or written to) is by invoking (calling) one of its methods. The data of an object can be stored in state, or instance variables. Each method of the object may access some or all of the object's variables, may interact with other objects, and may perform transformations (computations) on the object's variables, and variables local to the method. The interaction between objects is by messaging. Objects can only communicate with one another by sending messages to each other's interfaces (so no shared variables). The interface to the method is all that a user of the object is concerned with. The message structure relates to the method interface, and not necessarily to the method internals, or implementation. In our model it is important to notice that there are essentially two kinds of messages, namely those requesting the invocation of a method (the "call"), and those returning the result of the method invocation (the "return"). The reason that we distinguish the call from the return is that in both cases data may be communicated between objects.

Page 20: unit 6

20

In real distributed systems objects may be at remote sites, and so may be communicating over an underlying network. It is natural, therefore, to model the messaging as asynchronous communication. If we do not wish to consider the possibility of underlying network errors, then we could use a synchronous model, where communication events can be considered as shared, atomic interactions. (Note that there are several definitions of synchrony. In this instance we are not referring to the request/confirmation model.)

6.6.1 SECURITY ISSUES

Security is an essential requirement at different stages of an application developed using this technology, because:

Application is depending on the code from the systems connected over the network (could be Internet also). Authenticity and integrity of the code must be verified before allowing access to protected resources. Network transfer should not compromise the data sensitivity.

Application is developed in different programming languages, so exposures like buffer overflows must be considered very carefully.

Each module of the application may require varying degree of access to system resources and hence resource access should be granted in a very controlled manner.

Following are the examples of the distributed object technology platforms available today: Microsoft.NET CORBA Java RMI

6.6.1.1 THREATS

1. Unauthorized Access to Objects: This class of threats is essentially concerned with an object accessing another object (e.g. invoking a method in another object) for which it has no authorization. It also includes threats against propagation of authorizations. To counteract such threats in an object-oriented system, the natural point of control is the reception of the message at an object. For instance, one can propose a scheme whereby each object has associated with it some access control information (e.g. access control lists) which can be used to determine whether the received method call from a given object should be granted or not. We may either state who may be allowed to invoke a method, state that may not be able to invoke the method, or state that only objects with certain properties (attributes) may invoke the method. Then we can develop suitable mechanisms to prevent misuse in access control information propagation. Such policies form part of a discretionary access control model for an object-oriented system and they will be addressed in a separate paper.

2. Unauthorized Information Flow: By information flow control, we mean that the flow of information between two entities must be suitably restricted not to violate a prescribed set of flow policies. For instance, when information is transferred between objects during messaging, we wish to ensure that the flow of information does not contravene our flow policy. This paper is primarily concerned with such information flow security policies for an object oriented system. Such an information flow model addresses the mandatory security issues of an object-oriented system. In fact, we view

Page 21: unit 6

21

the discretionary access control model to be "on top" of the information flow (mandatory security) model for an object system.

3. Identification and Authentication: Implicit in the above two requirements is that an object can be uniquely identified. However, we do not just require unique identifiers, but also a guarantee that the identifier does in fact belong to the object that it claims to be. We do not. Want to allow objects to masquerade as other objects by falsifying their identifiers. Policies and mechanisms to implement such guarantees fall within the notion of authentication.

4. Data Protection: Finally we need to protect the communications themselves. There are a number of possible threats to communication which are the domain of communications security. The main approach to providing communications security employs cryptographic techniques. We shall not be concerned with this aspect in this paper.

6.6.1.2 THE OBJECT MODEL PROPERTIES

1. All objects and factory objects have a unique identifier.2. All methods in an object have a unique name.3. Messaging is asynchronous message passing, with non-deterministic delay.4. All factory objects have two methods - "create-object" and "delete-object".5. A factory object maintains a record of the identifiers of objects it has created but not

deleted (i.e. those existing objects).6. Factory objects assign unique identifiers to objects that they create.7. All objects are single threaded - only one method may be invoked on an object at a

time (thus avoiding problems of deletion and change of security labels with multithreaded objects).

8. All methods have local storage.9. Creation is via objects invoking the "create-object" method of a factory. This creation

is atomic and complete (all variables and labels are assigned). The factory manages the mapping from the initial interface values and labels to the implementation (internal) values and labels.

10. Objects can only request their own deletion. To delete itself, an object must invoke the "delete-object" factory method, and then terminate itself.

11. All objects have a "Change-Label" method. These methods manage the internal mapping from the interface variables and labels to their internal representations.

6.6.1.3 SECURITY PROPERTIES

GROUP 1 - SYSTEM SET UP

• The labels should be attached to variables, methods, and messages, and should form a lattice.

• Methods, messages, and variables should be assigned single labels, except for factory methods.

• Factory methods may have a set of labels assigned to them; Factory objects are "trusted" to work at many labels.

• The labels of the method's are fixed. Assignment of labels to methods falls outside this model.

Page 22: unit 6

22

• The underlying inter-object communication is "trusted" not to breach the security policy. That is, it faithfully communicates the messages but performs no other actions.

• The object monitor is "trusted" in that it faithfully manages the interactions between the method and the object's variables.

GROUP 2 - MESSAGE PASSING

• The label of a message should be equal to the label of the method in which it originates, except if the message contains no data, in which case the label of the message is 'system-low' or 'unclassified'.

• The contents of a message are considered to be at the level of the message.• The label of a message should be dominated by the label of the receiving method.• If a method ml sends a message msgl to invoke method m!, and receives message

msg! on successful termination of m!, where data is exchanged in both directions, then ml and m! must be at the same label.

GROUP 3 – METHODS

• The label of a method should dominate the labels of the object's variables that the method has read access to.

• The label of a method should be dominated by the labels of the object's variables that the method has write access to.

• If a method is performing both reading and writing operations on a variable, then the label of the method should be equal to that of the variable.

GROUP 4 - MODIFYING LABELS

• The labels of the variables of a newly created object should dominate the label of the requesting message.

• A message requesting the invocation of a "Change-Label" method must contain the name of the variable (whose label is to change) and the new label; the new label must dominate the current label of that method.

• The label of the "Change-Label" method must be dominated by the label of each of the object's variables.

6.7 REMOTE PROCEDURE CALL

Page 23: unit 6

23

Page 24: unit 6

24

Page 25: unit 6

25

6.8 LIGHTWEIGHT DISTRIBUTED OBJECTS

One of the disadvantages of object-oriented development is that it can, and often does, mean a more slowly performing application. Unfortunately, this has more to do with a lack of care when designing objects than it does with a fundamental flaw in object-oriented development.

CRITERIA BALANCE

First, you must realize that every development project requires balance. No project is without certain drawbacks. For a certain investment in time and money, there's only so much quality, so many features, and so much performance that you can buy. There's no way to have the highest quality software, along with the most features and the best performance, without spending a great deal of time and money. In any software development project, you must focus on which performance characteristic is critical and make compromises on other less important areas. Of course, if you have unlimited time and money, then you will have more flexibility than those of us with time and money constraints. In many cases, performance is one of the criteria that is considered last - and in today's world of high-speed processors, that may be the right answer. However, it's not possible to completely ignore performance without eventually having performance issues.

ONLY WHAT YOU NEED

An economical way to address performance is by creating lightweight objects - in other words, objects that only initialize what they need when needed. An object should be a full-featured way for the user to utilize a specific kind of information; however, providing a full-featured solution doesn't mean that you have to completely prepare all of the features. You can wait until the feature of the object is needed and then initialize it. For instance, a product object may allow you to view the extended attributes of a product. However, it may not be necessary to initialize those extended attributes if the user doesn't ask for them. This can substantially reduce the amount of time necessary to create an instance of the object. This is important when you need to initialize large numbers of objects, such as when you display an object list.

PATTERNS FOR LIGHTWEIGHT OBJECTS

Creating a lightweight object involves an internal field, a sentinel value, and an initialization function. These work together to allow you to selectively initialize parts of the object as needed. The property is what the user of the object uses to access the functionality. This is so that the call can be intercepted with a small amount of code to determine whether or not the functionality is already loaded. The way that the property determines if the functionality is loaded is to look for a sentinel value in the internal field that, if present, indicates that the functionality has not been loaded. The sentinel value is set during the constructor for the object or during the object's initialization. Thus, if the field has any value other than the sentinel value, it has been initialized. In some rare cases, you may not be able to determine the state by utilizing a sentinel value. In those cases, a Boolean flag is used to determine if the functionality within the object has been initialized. If that portion of the functionality hasn't been initialized, the private initialization function is called. This initialization function initializes all of the functionality - or throws an error if unsuccessful. Once the initialization

Page 26: unit 6

26

function has completed, the control is returned to the property, which can then return the internal field.

PICKING PROPERTIES AND METHODS

Now that you have a pattern for how to make the object lightweight, it's important to decide which properties and methods you want to make lightweight. When making that decision, it's important to consider how many times the object will be created, the amount of effort required to make the request when the developer-user first asks for the extended functionality, and the amount of time that will be saved. In general, you need to make a property or method lightweight when the object will be initialized fairly regularly as a course of the application, and the performance necessary to initialize the functionality in the object will take substantial resources. For instance, any remote call to get information, such as a database call, is probably worth making lightweight. Conversely, objects that are only created once or functionality which won't save much in the way of resources are probably not worthy of consideration for becoming lightweight.

COUPLING WHERE NECESSARY

A final tactic for making lightweight objects is allowing collections to initialize the object differently than any other user of the object. In general, each object is initialized by providing its key identifying information, and the object loads itself from a database. However, this doesn't take into account the efficiencies that simultaneous retrieval of multiple database records can create. In order to accomplish this, collections are allowed to initialize objects with a record from a query result set instead of utilizing the standard methods of construction. This technique is allowed from the collection since collections are tightly coupled with the object that the collection holds. In this way, the collection can control the kinds of collections of objects and can allow objects to be initiated much quicker than if you were to initialize each object individually. This technique can be very powerful since it leverages the native behaviour of set based database results.