university of erlangen-nuremberg computational linguistics instructor: professor airi salminen

23
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/ 12.1.-16.1. 2009 XML for Information Management

Upload: sybil-mcintyre

Post on 31-Dec-2015

29 views

Category:

Documents


5 download

DESCRIPTION

XML for Information Management. 12.1.-16.1. 2009. University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/. Day 2: Background of XML. Outline. 1. Markup languages 2. Structured documents 3. World Wide Web Consortium. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

XML for Information Management – Day 2Airi Salminen

University of Erlangen-NurembergComputational Linguistics

Instructor: Professor Airi Salminenhttp://users.jyu.fi/~airi/

12.1.-16.1. 2009

XML for Information Management

Page 2: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

2XML for Information Management – Day 2Airi Salminen

1. Markup languages2. Structured documents3. World Wide Web Consortium

Day 2: Background of XML

Outline

Page 3: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

3XML for Information Management – Day 2Airi Salminen

1. Markup languages

•intended for human readers

•intended for computers

Markup

Page 4: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

4XML for Information Management – Day 2Airi Salminen

•punctuational

•presentational

Markup for human readers

Texthasalwaysincludedsomekindofmarkupalsobeforethetimeofcomputers

to clarify the written expression

Text has always included some kind of markup, also before the time of computers.

Text has always included some kind of markup, also before the time of computers.

1. Markup languages

Page 5: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

5XML for Information Management – Day 2Airi Salminen

• presentational

• procedural

• descriptive

Markup for computers

to provide information for a software module

In markup languages clear separation of markup and primary content. Markup is metadata, adding some information to the primary data.

1. Markup languages

Page 6: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

6XML for Information Management – Day 2Airi Salminen

Presentational markup

information about the way the software module should present the primary content to the human perceiver

In <i>markup languages</i> there is clear separation of <i>markup</i> and <i>primary content</i>. Markup is <i>metadata</i>, adding some information to the primary data.

The tags <i> and </i> represent presentational markup in HTML.

1. Markup languages

The markup in an HTML file

Page 7: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

7XML for Information Management – Day 2Airi Salminen

Procedural markup

a processing instruction for the software module

<![CDATA[<element>Example of an XML element</element>]]>

The strings <![CDATA[ and ]]> represent procedural markup in XML.

<![CDATA[ instructs the XML processor to regard all text before ]]> as character data

]]> instructs the XML processor to to continue normal identification of markup

<![CDATA[<element>Example of an XML element</element>]]>

1. Markup languages

The markup in an XML file

Page 8: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

8XML for Information Management – Day 2Airi Salminen

Declarative markup

describes the content of a piece of primary content, what it is, or declares that the piece is a member of a particular class<student><first_name>Steve</first_name><last_name>Chung</last_name><email>[email protected]</email></student>

XML is primarily for declarative markup.

1. Markup languages

The markup in an XML file

Page 9: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

9XML for Information Management – Day 2Airi Salminen

Markup in XML

‣All markup delivers information to XML Processor. DTD represents metamarkup, facilitating the definition of the markup vocabulary.

‣Markup in an XML document is usually classified in respect to the application.

‣Processing instructions represent procedural markup.

‣Element tags represent declarative markup.

‣ In the specification of an XML application different kinds of meanings can be given to element names, they can be processing instructions to the application or instructions about the way the content should be presented by the application.

1. Markup languages

Page 10: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

10XML for Information Management – Day 2Airi Salminen

Example of HTML markup

<html><head><title>University of Jyv&auml;skyl&auml; </title></head><body><h2>Faculties</h2><ul><li>Humanities<li>Information Technology <li>Social Sciences</ul><br><address>[email protected]</address></body></html>

The element markup describes the structure for WWW publishing.

1. Markup languages

Page 11: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

11XML for Information Management – Day 2Airi Salminen

<university><name>University of Jyväskylä</name><faculties>Faculties<faculty>Humanities</faculty><faculty>Information

Technology</faculty><faculty>Social Sciences</faculty></faculties><contact_email>[email protected]</

contact_email></university>

The same primary content with markup describing the content of elements by means of XML markup.

1. Markup languages

Page 12: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

12XML for Information Management – Day 2Airi Salminen

1. Markup languages

Logical structure of the HTML document

html

body

Faculties

University of Jyväskylä

Humanitieshead

[email protected]

br

title

h2

ul

Social Sciences

Information Technology

li

li

li

address

Logical structure of the XML document

university

faculties

Faculties

University of Jyväskylä

Humanitiesname

[email protected]

Social Sciences

Information Technology

faculty

contact_email faculty

faculty

Page 13: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

13XML for Information Management – Day 2Airi Salminen

2. Structured documents

Structured document

‣ structure, content, and external presentation can be separated from each other and processed separately

‣ structural components have names

‣ structural components can be recognized by software modules

‣ possible to define the structure

Page 14: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

14XML for Information Management – Day 2Airi Salminen

Structured document

Structure

Content

Layout

2. Structured documents

an open language standard,

e.g. SGML, XML

different languages for defining the layout, e.g., CSS and XSL for XML

different languages for defining the structure,

e.g., DTD, XML Schema, RELAX NG for XML

Page 15: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

15XML for Information Management – Day 2Airi Salminen

Structured document

Structure

Content

Layout

2. Structured documents

Example

DTD.txt

rhymes.txt rhymes.xml

style.txt style.css

rhymes with style attachment.xml

rhymes with style attachment.txt

Page 16: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

16XML for Information Management – Day 2Airi Salminen

Management of structured documents

‣ document management

‣ management of the data contained in documents

2. Structured documents

Page 17: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

17XML for Information Management – Day 2Airi Salminen

Characteristics in the management of structured documents

‣ Design. Adopting the approach of structured document management in an environment often requires careful planning before the creation of documents. Includes schema design and layout design.

‣ Content production. Content can be produced by different types of software, e.g. by a syntax-directed editor. Checking the validity against the schema.

‣ Evolution. Schema versioning, layout versioning.

‣ Operations. Most typical operation is some kind of transformation.

‣ Software. Many kinds of software systems used.

2. Structured documents

Page 18: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

18XML for Information Management – Day 2Airi Salminen

2. Structured documents

Traditional document management

Structured document management

- No schema design.

- Processing applied to a document.

- Content, structure, and layout together.

- Schema design important. Also layou designed.

- Schemas can be utilized in various ways. Semantic information attached in the schemas.

- Processing of document parts.

- Content, structure, and layout can be processed separately.

- Management required for content schema, and stylesheet items and their different versions.

Page 19: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

19XML for Information Management – Day 2Airi Salminen

2. Structured documents

Database management Structured document management

- Database often the information repository of one software system called Database Management System (DBMS), data processed by the operations of the DBMS.

- Design divided into schema design and view design.

- Content produced gradually, by the operations of the DBMS.

- Queries are the most important operations.

- Different software systems used to manipulate data.

- Schema design often related to extensive sectoral standard development. Layout requires design as well.

- Content produced by different kinds of programs, e.g. interactively by structure editors or automatically.

- Transformations most important operations.

Page 20: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

20XML for Information Management – Day 2Airi Salminen

Database languages

‣ definition languages‣ query languages

Structured document languages

‣ definition languages‣ style languages‣ various manipulation, transformation

and query languages

2. Structured documents

Page 21: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

21XML for Information Management – Day 2Airi Salminen

3. World Wide Web Consortium

‣W3C developes specifications to support the use of the web, publicly available at http://www.w3.org/TR/

‣Development is systematic

‣Development process is specified and published

Page 22: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

22XML for Information Management – Day 2Airi Salminen

‣Working Draft: represents work in progress.

‣Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback.

‣Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review.

‣Recommendation: represents consensus within W3C, widespread implementation encouraged.

Phases of the development process

3. World Wide Web Consortium

Page 23: University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen

23XML for Information Management – Day 2Airi Salminen

3. World Wide Web Consortium

‣Remains as a Recommendation indefinitely.

‣W3C rescinds the recommendation. A report called Rescinded Recommendation is published.

‣A new version of the Recommendation is developed.

‣Minor modifications are done. A report called Proposed Edited Recommendation is published.

What happens to a W3C Recommendation?