lis650lecture 0 introductory lecture thomas krichel 2005-01-21 and 2005-01-28

74
LIS650 lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Upload: april-williamson

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

LIS650 lecture 0

Introductory lecture

Thomas Krichel2005-01-21 and 2005-01-28

Page 2: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

today• Administrative introduction to the course

• Talk about you

• Substantive introduction to the course. The subject matter is not just about HTML– web

• servers

• client

– XML

– HTML

• Fairly general but abstract

• Probably the second- toughest lecture in the course

Page 3: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

course resources• Course home page is at

http://wotan.liu.edu/home/krichel/lis650p05s

• Subscribe to class mailing list https://lists.liu.edu/mailman/listinfo/cwp-lis650-krichel

• Me. Do not hesitate to ask. Send me email. I will usually answer to the class mailing list.

• I plan to come here on several days to council students. I will announce all times publicly. Students who are in need of extra tuition should ask.

Page 4: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

general assessment

• First quiz next lecture.

• If you miss a lecture, let me know in advance.

• In addition to the quizzes, we have– the web site assessment

– the final web site

• Final grade is calculated by computer. Quizzes go through a complicated discounting scheme. It disregards the worst performance.

Page 5: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

web site assessment

• Look at the web site of a university Library and Information Science department.

• A list is at http://informationr.net/wl/

• Write a text not describing, but commenting on the web site.

• State the site URL, I will look at it.

• Try to keep you text short please, no more than 2 pages.

• Ask others for opinions if you want.

Page 6: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

the final web site

• Contents should be equivalent to a student essay.

• Good contents and good architecture are important to a straight A.

• It should be a contribution to knowledge on a topic.

• Personal sites are no longer allowed.

• Deadline to finish web site: one week after the end of the last lecture.

• You will not be able to change your web site between the deadline and the time that the grade is issued.

Page 7: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

course history

• Course was first run as an institute 2002-05-13 to 2002-05-17

• Title was “Webmastering I: the static web site”.

• To the curriculum committee, this title did not sound academic enough.

• Since “Web Site Architecture and Design” is now the full title, WeSAD (pronounced like “wizard”) is the official abbreviation.

• Webmastering is still what we want to learn.

Page 8: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

teaching WeSAD

• WeSAD combines many aspects:– Authoring pages

– Work on the organization of data to fit onto pages

– Set display style of different pages

– Organize the contribution of data

– Maintain a technical web installation

• Some of them can be learned in a course, but others can not.

• Emphasis has to be on learnable elements.

Page 9: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

teaching philosophy

• Point and click on a computer software is not enough

• Explain underlying principles

• Promote standards– XHTML 1.0

– CSS level 2.1

• Avoid proprietary software

Page 10: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

WeSAD contents

• Deals with the maintenance of a passive web site. Such a web site remains the same whatever the user does with it.

• Topics include– (x)html

– css

– site usability and information architecture, as far as relevant for static web sites

– http, uri, web server

Page 11: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

things this course does not do

• Forms: allow you to design forms that users fill in. But you do not have the programming skills to do something with the form.

• Frames: allow you to put several documents into one physical document. Most experts advise against them.

• We do not cover image maps.

• We don’t do some advanced CSS properties.

• Some exotic features of HTML are overlooked.

Page 12: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Other courses: webmastering II

• Deals with building active web sites. – Users fill in a form

– Users submit the form

– Web server return a page that is specific to the request of the user.

• Teaches a language called PHP, that is widely used to generate such web sites.– Gets you introduced to computer programming

– Gets you to train analytical thinking.

Page 13: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other courses: webmastering III

• It deals with XML– XML is a syntax to encode any kind of data.

– XML can be constrained to only allow certain types of data (XML Schema)

– XML can be transformed to render the data in various ways (XSLT)

• The aim is to achieve a separation of contents and presentation of a web page.

• This is an advanced course. It covers both Schema and Transformation

Page 14: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28
Page 15: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

literature

• I work from the text of the official standard at http://www.w3.org/TR/html4/

• You can work from any HTML book.

• The W3C is the standard making body for the Web. Anything that they say is the standard.

• But some people don't behave according to the standard.

Page 16: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

world wide web

The World Wide Web (Web) is a network of information resources. The Web relies on three mechanisms to make these resources readily available to the widest possible audience:– A uniform naming scheme for locating resources on

the Web (i.e. URIs).

– Protocols, for access to named resources over the Internet (e.g., http).

– Hypertext, for easy navigation among resources (e.g., HTML).

Page 17: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

URI introduction

• Every resource available on the Web -- HTML document, image, video clip, program, etc. -- has an address that may be encoded by a Uniform Resource Identifier, or "URI".

• URIs typically consist of three pieces:– The name of the mechanism used

• to access the resource

• or the otherwise “resolve” it

– The name of the machine hosting the resource.

– The name of the resource itself, given as a path.

Page 18: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

example URI

• http://openlib.org/home/krichel

This URI may be read as follows: There is a document available via the HTTP protocol, residing on the site openlib.org, accessible via the path "/home/krichel".

• mailto:[email protected]

This URI may be read as follows: There is email user krichel in a domain openlib.org to whom email may be sent.

Page 19: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Internet application protocols

• On the Internet machines use different application level protocols to do things

• Common protocols include– http -- dns --telnet

– smtp -- ssh --ftp

• All of the ones cited are client/server protocols– client issues a request

– server gives a response

Page 20: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

client and server• The web operates on a client/server model

• The client software is run on the local PC that you are using, called – a web browser (not politically correct)

– a user agent (that's better)

• Our server is a piece of hardware called wotan.liu.edu, “wotan” for short– It runs the Debian GNU/Linux operating system on a

Intel architecture.

– It provides http daemon software that serves http requests. The particular software is called Apache.

Page 21: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

the http protocol• http is the most widely used application level

protocol on the web.

• http is stateless. Each transaction is self-contained. Each transaction has no relationship to the previous one.

• http has a limited vocabulary of requests and responses. It is no good, say, to operate a machine remotely.

• http is insecure. The contents of http transactions (requests/responses) can be observed.

• We can therefore not use it to build web pages

Page 22: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

working with a remote machine

• There are two traditional ways to work with a remote machine– issue commands to it

• used to be done with “telnet”

– transfer files to and from it• used to be done with “ftp”

• Telnet and ftp servers are not available on wotan.liu.edu. Telnet and ftp do not encrypt the communication stream. Therefore they are not secure.

Page 23: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

communication with wotan

• The protocol that we use for communicating with the server is the secure shell, short ssh. It is based public-key cryptography.

• There are two PC programs commonly used as ssh clients– putty for issuing commands

– winscp for file transfer.

• winscp is the one we will use. In offers a range of other facilities besides file transfer.

• Mac users should investigate a software called “fugu”.

Page 24: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

important rule

• When you compose web pages, you use winscp.

• When you look at your own web pages, you use a common web user agent.

• Never use winscp to look at your own web pages. You will not rot in hell, but you will be confused.

Page 25: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

user name & password

• You can choose your user name as a short form of your own name.

• It should be all lowercases and can not have spaces.

• Your final project pages can be placed in a subdirectory, say at

http://wotan.liu.edu/~username/project

• We will worry about that later.

Page 26: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

registration time

• As part of the course, you are being provided with web space on the server wotan.liu.edu, at the URL

http://wotan.liu.edu/~username

where username is a user name that you will chose now.

• You may wish to make the user name some short form of your name. Remember you will be able to have that site for many years to come.

Page 27: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

free software• I maintain wotan.liu.edu server but you can build

your own server if– you have Internet access

– you have an old PC to spare

• All the server software, as well as putty and winscp are free, open-source. It is one of my fundamental beliefs that free information should run on free software.

Page 28: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

installing winscp

• http://winscp.sourceforge.net/eng/download.php has – “installation package”. for use if you have administrator

rights on the machine where you are installing to

– “application”. for use otherwise, i.e. to just download and run the application

• At installation time, when/if asked about the default interface, I suggest you use “Windows explorer style”, rather than the default “Norton commander style” . You can change that later, so no panic.

Page 29: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other stuff: installing “user agents”

• Download and install a recent version of at least two browsers. I suggest– Mozilla Firefox at

http://www.mozilla.org/products/firefox/

– Opera at http://www.opera.com

– Netscape Navigator at http://channels.netscape.com/ns/browsers/download.jsp

Page 30: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

open a wotan session with winscp• the host name is “wotan.liu.edu”

• give your user name

• click on “save”, this will save the session, after “ok”

• you will be lead to the list of saved sessions

• double click to open the session

• at first connection you will see a warning you can ignore

• you can save the password as part of the session. It is risky to do that in a public classroom

Page 31: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

initial remote files on wotan• a set of files starting with a dot.

– These are places where Linux Masters exert their black magic.

– Leave them alone.

• a directory called public_html– This is the place where web masters exert their magic.

You can go into that directory to see the files that you have on your web site at the moment.

– There should be one file• validated.html

– do NOT double-click that file!

Page 32: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

validated.html• This is your model web page. You should leave it

alone and never change it.

• To create a new web page, right click (remember never double-click) on validated.html, and choose "duplicate" from the menu. Do not choose "copy".

• You will be asked to supply a name for the file. You may also be asked to give your password again. Erase any contents in that box, and then enter the file name you want to create (say test.html). Always have that file name end with ".html".

• Did I say you should not double-click in winscp?

Page 33: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

test.html

• In your test.htm file, look for the

<p id="validator">

• Right before that string, insert

<div>Hello, world</div>

• Save you file by write

• Do not double click test.html!

• Open a web user agent, point it to the URL http://wotan.liu.edu/~username/test.html

Page 34: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

public_html

• Imagine you are user user and you have a file file in public_html.

• The web server will map requests to http://wotan.liu.edu/~username/file to show the file /home/username/public_html/file.

• Here user stands for your user id, and file is the file name, and "/" is the directory separator.

• If file ends with ".html" or ".htm" the web browser will be told that the file is a HTML file. It will be rendered accordingly by the browser.

Page 35: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

index.html

• The web server on wotan will map requests to http://wotan.liu.edu/~username to show the file home/username/public_html/index.html

• If this file is not there, the server will prepare a HTML document from the list of files that it finds in the directory and send it to the user agent.

• Once you have a file index.html, the web user can no longer see the individual files in your directory.

Page 36: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

HTML and XHTML

• HTML is the hypertext markup language

• HTML is a markup language that is widely used on the Web.

• The latest, and probably last version of HTML is at http://www.w3.org/TR/html4/

• The W3C, the standard making body for the Web, have issued XHTML, a replacement of HTML that is compatible with XML.

• We will work with XHTML. But we will call it HTML by abuse of language.

Page 37: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

SGML HTML XML

• You will probably have come across these terms.

• SGML was developed first. HTML and XML are developed from SGML in different ways.– HTML is an SGML DTD.

– XML is an SGML application.

• One common thing here is the ML. It stands for Markup Language.

• Markup is everything in a document that is not content.

Page 38: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

procedural/descriptive

• Markup can be given in two ways

• 1: Procedural– Codes identify point size, style, font, etc.

– Usually only understood by defining tool

– Example: Microsoft Word

• 2: Descriptive– Describes purpose of text within the document

– Chapter head, Paragraph, Section Head, TOC

– Structure and Style are kept separate

– Example: LaTeX, SGML

Page 39: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

SGML

• Standard Generalized Markup Language

• Descriptive approach with three separate layers– structure: types of information in document

– content: the information itself

– style: defines how to typeset the document

• Developed for the publishing industry by a group of consultants.

• So complicated that no software implements it fully.

• But an important idea that remains of it is the document type definition.

Page 40: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Document Type Definition (DTD)

• The DTD is a non-SGML language that describes SGML document types

• Describes information the document handles, e.g.– title

– chapter

• Relationships between fields e.g.– a chapter contains sections

– a title comes at the top of the document

Page 41: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

XML

• Since SGML is so complicated, it is not good for use on the Web.

• So the W3C has issued XML, the eXtensible Markup Language.

• Every XML document is SGML, but not the opposite.

• Thus XML is like SGML but with many features removed.

• XML defines the syntax that we will use in the course. We have to study that syntax in some detail.

Page 42: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

XML elements• XML is based on elements. There are basically

three ways of writing an element.

• The first way is write <element/>.

• Here element is the name of the element.

• Such an element is called an empty element.

• Example:<bang/>

• This is an empty element, the name of which is “bang”.

Page 43: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

non-empty elements

• If name is the name of the element, you can give an element contents contents by writing <name>contents</name>.

• contents is simple character data.

• Here <name> is called a start tag. </name> is called the end tag. Both tags surround the contents of the element.

• Remember the previous slide? Then note that <name/> is just a shortcut for <name></name>.

Page 44: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Examples

• <greeting>bonjour</greeting>

• <greeting>здравствуйте</greeting>

• <sentence>She says <greeting>hello</greeting> to you.</sentence>

• <examples> <example>I koh Glos essa, und es duard ma ned wei.</example><example>Ja mogu esti staklo, i ne boli me. </example> <example>Kristala jan dezaket, ez det minik ematen.</example></examples>

Page 45: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

attributes to elements

• Elements can have attributes. Here is an element with two attributes

• <name attribute_name_one="value_one" attribute_name_two="value_two"/>

• Here attribute_name_one and attribute_name_two are attribute names and value_one and value_two are attribute values. The element itself is empty.

• Example: <greeting language=”french”>bonjour</greeting>

Page 46: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

more on attributes

• There can be no two attributes to the same element with the same names.

• Attribute values are simple strings. You can not have an element inside an attribute value.

• Attribute names are separated from their values by the = sign.

• Attribute values can be enclosed in single or double quotes. It does not matter. Double quotes are more common, so I suggest you use those.

Page 47: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

more examples

<poet born="1799" died="1837">

<name lang="ru">Александер Сергеевич Пушкин</name>

<name lang="en">Alexander S. Pushkin</name>

<name lang="fr">Alexandre Pouchkine</name>

</poet>

Page 48: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

XML document

• An XML document is a piece of data that is written in XML.

• But sometimes the author of a document makes a mistake, and, in fact the XML is wrong in some ways.

• If there is no mistake, the document is called well-formed.

• If a document is not well-formed, it really is not an XML document.

Page 49: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

some rules for well-formedness • All elements must be properly nested. You can

only close the outer element after all inner elements are closed. Examples– <a><b></a></b> not well-formed

– <a><b></b></a> well formed

• An attribute must have a value. Thus you can not write <result abstract>... </result>. The value may be empty like in <result abstract=''>...</result> or <result abstract="">... </result>.

Page 50: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

more rules for well-formedness

• There must be one single element in the document that all other elements are children of.– It is called the root element.

– All other elements are called children of the root.

– Whitespace that surrounds the root element is ignored.

– The root element may be preceded by a prologue. A prologue is anything before the root element.

• There can be other things, i.e. that are not elements in an XML document.

Page 51: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other things: comments

• In an XML document, you can make comments about your code. These are notes to yourself.

• Comments start with <!--

• Comments end with -->

• Example: <!-- this is a comment -->

• Comments can not be nested.

• Can appear anywhere in the document.

• They can enclose elements.

Page 52: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other things: XML declaration

• The XML declaration is a special line that says that what follows is XML and give some very basic information about that XML. It is trendy to use it.

• It is optional, but if it is there it has to be on the first line.

• You will need to have an XML declaration if your character encoding is not UTF-8. We will come back to this point later.

Page 53: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other things: XML declaration

• Normally the XML declaration looks like

• <?xml version="1.0" encoding="encoding"?>

• where encoding is the character encoding. By default, the character encoding is UTF-8, so if you use that, you do not need to mention it.

• There is now a version "1.1" of XML around, but – it is not widely deployed

– it is not much different from version 1.0

Page 54: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other things: document type declaration

• XML documents, like any SGML documents, accept document type declarations.

• A document type declaration tells us something about the vocabulary of elements and attributes used in the document.

• It should appear before the root element, after the XML declaration, if you have one.

• It takes the form <!DOCTYPE mumbojumbo >

• We will come back to the document type declaration later.

Page 55: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

nodes

• elements, attributes, character data etc all are things that are used in the XML document.

• "node" is a word to characterize everything that can be put in the XML document.

• Thus an element is a node of type element. A comment is a node of type comment. "Hello, world!" is a node of type character data.

• Exercise: open the source code for your test file. Show your neighbor all the nodes and tell her/him what type they are.

Page 56: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

HTML

• HyperText Markup Language

• HTML is an SGML DTD– Head, Title, Body, Paragraph, etc.

– Headings, Bold, Italic, etc.

– Table, List, Image, etc.

– Links to other documents

– Forms

– and many others

Page 57: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

HTML history• HTML was a very bare-bones language when

first invented by Tim Berners-Lee. It did not describe pages with much of a visual appeal.

• In the 90s, successful browsers invented “extensions” that aimed to stretch the visual boundaries of HTML.

• Some of these extensions found their way in the official HTML spec issued by the W3C.

• Later the W3C developed style sheets as a way to accommodate for display requirements without having to extend HTML.

Page 58: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

HTML versions

• HTML 4.01 is the last version of HTML This version has two different DTDs:– the loose DTD

– the strict DTD

• I only the cover the elements of the strict DTD.

• The loose DTD has more elements, but all the functionality of these elements is best done with style sheets.

• Thus, the pages created with HTML only will look rather boring.

• But we do cover style sheets later.

Page 59: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

XHTML

• XHTML is HTML written in an XML syntax.

• Every XHTML document has to be well-formed XML.

• non-XHTML HTML documents can violate some well-formedness constraints, including– HTML element names are not case sensitive

– some HTML elements do not need closing.

– there is no need for a single root element in a HTML document.

Page 60: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

XHTML: pain without gain?

• In this course we study XHTML.

• When I say HTML in the following, I mean XHTML.

• Reasons to study XHTML rather than HTML– syntactic rules of XML are easier to understand.

– any tool that can work with XML can be applied to XHTML, but can not be applied to HTML.

– in general XML documents are more computer understandable. This is crucial in the age of the search engine.

Page 61: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Example HTML snippet

<a href="http://openlib.org/home/krichel" title="homepage of Thomas Krichel">Thomas Krichel</a> – the whole thing is an <a> element. It creates an

anchor. (I use < and > to surround element names.)

– “href” is an attribute name

– “http://openlib.org/home/krichel” is the value of the "href" attribute

(I surround attribute names with straight quotes)

– 'Thomas Krichel' is character data.

Page 62: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Characters: concept

• A character set combine two things– Character repertoire: a set of characters e.g. "A", "ض"

"‼", "₣"

– Character code positions: defines a number for each character in the repertoire.

• Character encoding is a way to encode the code positions in bytes.

• To correctly display a document, the user agent needs to know both!

Page 63: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

playing safe with characters

• Only use the characters on the US keyboard, don't insert symbols.

• Save as ASCII or UTF-8. All ASCII files are also UTF-8 files.

• Never save as "Unicode" within MS Notepad.

• If you encounter a character that is not on your keyboard, use an SGML entity.

• The SGML entity is the last special SGML thing that we have to study.

Page 64: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

SGML entities

• SGML entities are something like a way to represent non-ASCII characters when only ASCII input is possible.

• Codes can can be &code;– Ex. &eacute;

• Inserts and e with acute accent.

– this is called a character entity

– Codes are often abbreviation of the character names

• Codes can be in hex form• Ex. &#38; to insert an ampersand

• this is called a numeric entity

Page 65: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

XHTML entities• They are officially defined in three files that are

maintained by the W3C– http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent

– http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent

– http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent

• A sample line is

<!ENTITY ccedil "&#231;"> <!-- latin small letter c with cedilla, U+00E7 ISOlat1 -->

• <!ENTITY is DTD speak for defining an entity

• it is followed by the character form and the numeric form of the entity

• the rest of the line is a comment, of course

Page 66: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

entities used in XML

• There are three that you need to know and use.– &lt; stands for <

– &gt; stands for >

– &amp; stands for &

• Every time you want to insert <, > or & in the documents, you have to use the entities instead.

• Examples:– krichel&#64;openlib.org

– je suis Fran&ccedil;ais

– Marks &amp; Spencers

Page 67: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

other examples

• The course has an examples page at http://wotan.liu.edu/home/krichel/lis650/examples.

• Thomas will now show you further examples.

Page 68: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

special topic: images

• The appeal of the web to the masses has a lot to do with its capability to transport image.

• Image formats are independent of the web, but there are two classic format that are widely supported by user agents.– GIF

– JPEG

– PNG

• The resolution of the image is an important factor.

Page 69: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

resolution

• On a pixel image the term resolution is often used to say how many pixels are there horizontally and vertically.

• The larger the number of pixels the wider it will appear on the screen.

• But you will never know how large it is on the screen because that depends on how many pixels your user's screen draws per inch of display.

• The web is a bad place for a control freaks.

Page 70: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

GIF

• stands for graphics interchange format.

• developed by CompuServe.

• unresolved copyright issues make the format abhorred by the free software community.

• 250 colors maximum

• uses a loss-less compression technique

Page 71: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

GIF has three tricks• interlacing:

– when downloading the file, the browser can show every forth row first

– user gets in an idea of the picture before it is sharp

• transparency– some GIFs are transparent, so you can see them on

top of already exist– technically, the GIF has one color as the background

color, and pixels of that color are ignored by the user agent

• animation– some GIFs are in fact sequences of GIFs that can be

rendered one after the other.

Page 72: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

JPEG

• The Joint Photographic Experts Group is a standard-making body for images

• They can support thousands of colors.

• The compression is lossy, i.e. the JPEG file will look like the original image, but not be the same.

• The compression does not work well with drawings.

• There are no copyright and patent problems with JPEG

Page 73: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

Homework

• Look at course home page.

• Install winscp and browsers at home.

• Prepare a one-page max summary of the type of website that you want to build, bring printed copy with you next week.

• Prepare for quiz at the beginning of next lecture.

Page 74: LIS650lecture 0 Introductory lecture Thomas Krichel 2005-01-21 and 2005-01-28

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!