web browser

5

MINI PROJECT REPORT

PROJECT NAME : WEB BROWSER & DOWNLOAD MANAGERREPRENSENTED BY: Abhijeet Kumar Shah

WEB BROWSER INTRODUCTION

A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web.The World Wide Web (abbreviated as WWW or W3,commonly known as the Web), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks.

An information resource is identified by a Uniform Resource Identifier (URI) and may be a web page, image, video, or other piece of content. Hyperlinks present in resources enable users easily to navigate their browsers to related resources. A web browser can also be defined as an application software or program designed to enable users to access, retrieve and view documents and other resources on the Internet.

History

The major web browsers are Firefox, Google Chrome, Internet Explorer, Opera, and Safari.

The first web browser WorldWideWeb(later renamed Nexus), was invented in 1990 by Sir Tim Berners-Lee.

In 1992,Robert Cailliau developed the first web browser for the Macintosh, called Samba.

In 1994, Netscape built the first commercial web browser, Mozilla 1.0, providing a major driver of the development of the web.

In 1993,Marc Andreessen invented Mosaic (later Netscape) ,one of the first graphical web browsers and “the world's first popular browser”. Mosaic introduced support for sound, video clips, forms support, bookmarks, and history files.

In 1994, the Opera browser was developed by a team of researchers at a telecommunication company called Telenor in Oslo, Norway. Opera was first made available on the Internet in 1996. opera the fast-growing mobile phone web browser market, being preinstalled on over 40 million phones.

in 1995, Microsoft responded with its Internet Explorer, also heavily influenced by Mosaic, initiating the industry's first browser war.

The most recent major entrant to the browser market is Google's Chrome, first released in September 2008.Chrome‘s take-up has increased significantly year on year.

Apple's Safari had its first beta release in January 2003; as of April 2011, it had a dominant share of Apple-based web browsing, accounting for just over 7% of the entire browser market.

The most commonly used browsers are Lynx(1993), chrome(2008),opera(1995), IE(1995), seamonkey(2005), firefox(2002),safari(2003),maxthon(2004),lunascape(2005),netsurf(2007),iron(2008),chromeplus(2009),chimera(2002).

Historical Web Browsers

Active Worlds MacWeb

Air_Mosaic NetAttache

Amiga NetCaptor

Arachne NETCOMplete

Charlotte NetCruiser

EI*Net NetManage Chameleon

EmailSiphon NetPositive

Enhanced NCSA Mosaic PlanetWeb

GetRight Quarterdeck WebC

HotJava SPRY_Mosaic

IBM WebExplorer Spyglass Enhanced Mosaic

internetMCI TueV Mosaic for X

IWENG WWWC

User Interface

Back and forward buttons to go back to the previous resource and forward respectively.

A refresh or reload button to reload the current resource.

A stop button to cancel loading the resource. In some browsers, the stop button is merged with the reload button.

A home button to return to the user's home page.

An address bar to input the Uniform Resource Identifier(URI) of the desired resource and display it.

A search bar to input terms into a search engine. In some browsers, the search bar is merged with the address bar.

A status bar to display progress in loading the resource and also the URI of links when the cursor hovers over them, and page zooming capability.

Browser Structure

Browser structure

The user interface - this includes the address bar, back/forward button, bookmarking menu etc. Every part of the browser display except the main window where you see the requested page.

The browser engine - marshalls the actions between the UI and the rendering engine.

The rendering engine - responsible for displaying the requested content. For example if the requested content is HTML, it is responsible for parsing the HTML and CSS and displaying the parsed content on the screen.

Networking - used for network calls, like HTTP requests. It has platform independent interface and underneath implementations for each platform.

UI backend - used for drawing basic widgets like combo boxes and windows. It exposes a generic interface that is not platform specific. Underneath it uses the operating system user interface methods.

JavaScript interpreter. Used to parse and execute the JavaScript code.

Data storage. This is a persistence layer. The browser needs to save all sorts of data on the hard disk, for examples, cookies. The new HTML specification (HTML5) defines 'web database' which is a complete (although light) database in the browser.

It is important to note that Chrome, unlike most browsers, holds multiple instances of the rendering engine - one for each tab. Each tab is a separate process.

Rendering Engine

A web browser engine or layout engine or rendering engine, is a software component that takes marked up content (such as HTML, XML, image files, etc.) and formatting information (such as CSS,XSL, etc.) and displays the formatted content on the screen.

the basic flow of the rendering engine

The rendering engine will start parsing the HTML document and turn the tags to DOM(Document Object Model) nodes in a tree called the "content tree”.

The styling information together with visual instructions in the HTML will be used to create another tree - the render tree.

Layout process, means giving each node the exact coordinates where it should appear on the screen.

The next stage is painting- the render tree will be traversed and each node will be painted using the UI backend layer.

Parse tree

Parsers usually divide the work between two components - the lexer (tokenizer) that is responsible for breaking the input into valid tokens, and the parser that is responsible for constructing the parse tree by analyzing the document structure according to the language syntax rules.

The parsing process is iterative. The parser will usually ask the lexer for a new token and try to match the token with one of the syntax rules. If a rule is matched, a node corresponding to the token will be added to the parse tree and the parser will ask for another token.

Dom Tree

the "parse tree" is a tree of DOM(Document Object Model) element and attribute nodes. DOM is the object presentation of the HTML document and the interface of HTML elements to the outside world like JavaScript. The root of the tree is the "Document" object.

The DOM has an almost one-to-one relation to the markup. <html> <body> <p>Hello World </p> <div> <img src="example.png"/></div> </body> </html>

DOM tree of above markup is

Parser Algorithm

HTML cannot be parsed using the regular top down or bottom up parsers. The algorithm consists of two stages - tokenization and tree construction.

Tokenization is the lexical analysis, parsing the input into tokens. Among HTML tokens are start tags, end tags, attribute names and attribute values.

The tokenizer recognizes the token, gives it to the tree constructor, and consumes the next character for recognizing the next token, and so on until the end of the input.

Tokenization

Basic example - tokenizing the following HTML:

<html> <body> Hello world </body> </html>

Tree construction algorithm

The input to the tree construction stage is a sequence of tokens from the tokenization stage.

The first mode is the "initial mode". Receiving the html token will cause a move to the "before html" mode and a reprocessing of the token in that mode. This will cause a creation of the HTMLHtmlElement element and it will be appended to the root Document object.

The state will be changed to "before head". We receive the "body" token. An HTMLHeadElement will be created implicitly although we don't have a "head" token and it will be added to the tree.

We now move to the "in head" mode and then to "after head". The body token is reprocessed, an HTMLBodyElement is created and inserted and the mode is transferred to "in body".

The character tokens of the "Hello world" string are now received. The first one will cause creation and insertion of a "Text" node and the other characters will be appended to that node.

The receiving of the body end token will cause a transfer to "after body" mode. We will now receive the html end tag which will move us to"after after body" mode. Receiving the end of file token will end the parsing.

LAYOUT

When the renderer is created and added to the tree, it does not have a position and size. Calculating these values is called layout or reflow.

HTML uses a flow based layout model, meaning that most of the time it is possible to compute the geometry in a single pass. HTML tables may require more than one pass. Layout can proceed left-to-right, top-to-bottom through the document.

Layout is a recursive process. It begins at the root renderer, which corresponds to the <html> element of the HTML document. Layout computes geometric information for each renderer that requires it.

The position of the root renderer is 0,0 and its dimensions are the viewport - the visible part of the browser window.

Rendering Engine Used by Browsers

Graphical Based

Boxely- for AOL applications

Gecko - for Firefox, Camino, K-Meleon, SeaMonkey, Netscape, and other Gecko-based browsers.

GtkHTML - for Novell Evolution and other GTK+ programs

HTMLayout - embeddable HTML/CSS rendering engine - component for Windows and Windows Mobile operating systems

KHTML - for Konqueror

NetFront - for Access NetFront

NetSurf - for NetSurf

Presto- for Opera 7 and above, Macromedia Dreamweaver MX and MX 2004 (Mac), and Adobe Creative Suite 2.

Prince XML - for Prince XML.

Robin - for The Bat!

Tasman - for Internet Explorer 5 for Mac, Microsoft Office 2004 for Mac, and Microsoft Office 2008 for Mac.

Trident - for Internet Explorer since version 4.0.

Tkhtml - for hv3

WebKit - for Google Chrome, iOS, Safari, Arora, Midori, OmniWeb, Shiira, iCab since version 4, Web, SRWare Iron, Rekonq, and in Maxthon 3.

Text based

Lynx

Links

W3m

Download Manager INTRODUCTION

A download manager is a computer program dedicated to the task of downloading files from the Internet for storage.

The typical download manager at a minimum provides means to recover from errors without losing the work already completed, and can optionally split the file to be downloaded into 2 or more segments, which are then moved in parallel, potentially making the process faster within the limits of the available bandwidth.

Multi-source is the name given to files that are downloaded in parallel.

Feature

Pausing the downloading of large files, and connect again to continue download.

Downloading files on poor connections, especially for slow networks.

Downloading several files from a site automatically according to simple rules.

Enable mirror download, that means download the same file from different sites.

Scheduled downloads (including, automatic hang-up and shutdown).

Can limit the speed of downloading while remain good stability of connections.

Automatic subfolder generation.

Download Accelerator Plus - Speeds up file downloads and resumes interrupted downloads. Features include file preview, file shredder and top downloads list.

FlashGet - Automatically splits files into sections, and downloads each split simultaneously. Download jobs can be placed in specifically-named categories for quick access.

Internet Download Accelerator - Integrates with Internet Explorer, Firefox, Mozilla, Opera, Nescape and others. You can download and save video from popular video sharing services: YouTube, Google Video, Metacafe and others.

Internet Download Manager - Accelerate downloads, resume broken or interrupted downloads, and schedule downloads. The program features dynamic file segmentation and download logic optimizer to achieve better download speed and higher Internet connection performance.

TubeTilla Pro - Download YouTube videos and convert them to various formats like wmv, mp4 and mp3.

Video Get - Downloads video from YouTube and others. Converts video to variety of video formats.

WebPix - Automatically download pictures from a web site, view them quickly and browse thumbnails in an instant

Download manager support different protocol like-

HTTP,HTTPS,FTP,SFTP,MMS,RTSP,Metlink,Magnet link, Bittorrent,eDonkey etc.

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems.

Hypertext Transfer Protocol Secure (HTTPS) is a widely-used communications protocol for secure communication over a computer network, with especially wide deployment on the Internet.

File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet.

Microsoft Media Server (MMS) is the name of Microsoft's proprietary network streaming protocol used to transfer unicast data in Windows Media Services (previously called NetShow Services). MMS can be transported via UDP or TCP. The MMS default port is UDP/TCP 1755.

The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between end points.

Magnet links, which mainly refer to resources available for download via peer-to-peer networks.

Real Time Messaging Protocol (RTMP) was initially a proprietary protocol developed by Macromedia for streaming audio, video and data over the Internet, between a Flash player and a server.

BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data over the Internet

USES

For dial-up users, they can automatically dial the Internet Service Provider at night, when rates or tariffs are usually much lower, download the specified files, and hang-up. They can record which links the user clicks on during the day, and queue these files for later download.

For broadband users, download managers can help download very large files by resuming broken downloads, by limiting the bandwidth used, so that other internet activities are not affected (slowed) and the server is not overloaded, or by automatically navigating a site and downloading pre-specified content (photo galleries, MP3 collections, etc.).

THANK YOU

web browser

Documents