markup languages & xml - by vishal kamtam venkatesh
TRANSCRIPT
Markup Languages & XML
-BY
VISHAL KAMTAM VENKATESH
What are markup languages??
Language that uses tags to define elements within a document.
It is human-readable.
The two most popular markup languages are HTML and XML.
What is HTML?
HTML-Hyper text Markup Language.
HTML is comprised of “elements” and “tags”
Begins with <html> and ends with </html>
Elements (tags) are nested one inside another:
Tags have attributes:
HTML describes structure using two main sections: <head> and <body>
Example HTML code:
<HTML>
<head>
<title>Hello World</title>
</head>
<body bgcolor = “#000000”>
<font color = “#ffffff”>
<H1>Hello World</H1>
</font>
</body>
</HTML>
Output:
How they work?
Representation of HTML as Parse Tree <html>
<body>
<p>
Hello World
</p>
<div> <imgsrc="example.png"/></div>
</body>
</html>
Representation in CFG
HTML can be described by classes of textText is any string of characters literally interpreted
(i.e. there are no tags, user-text)Char is any single character legal in HTML
tags.Blanks includedElement is
Text or A pair of matching tags and the document between them,
or Unmatched tag followed by a document
Doc is sequences of elements ListItem is the <LI> tag followed by a document
followed by </LI> List is a sequence of zero or more list items
HTML Grammar
Char a | A | …
Text ε | Char Text
Doc ε | Element Doc
Element Text | <I> Doc </I> | <P> Doc |<OL> List</OL>
ListItem <LI> Doc </LI>
List ε | ListItem | List
HTML Example <html>
<body>
<p>
<I> popular markup languages</I>
<ol>
<li>HTML
<li>XML
</ol>
</body>
</html>
The text can be viewed as :
popular markup languages
1. HTML
2. XML
Extensible Mark-up languages (XML)
XML has user defined tags whereas HTML has predefined tags.
designed to describe data, not to display data.
eg: ” 12 Maple Street ”
<ADDR>12 Maple Street</ADDR>
In most web applications, XML is used to describe data, while HTML is used to format and display the data.
Example XML
<sentence>
<subject><noun>Mary</noun></subject>
<predicate>
<transitive-verb>likes</transitive-verb>
<object><noun>John</noun></object>
</predicate>
<period>.</period>
</sentence>
PARSE TREE
PRODUCTION RULES
<sentence> → <subject> <predicate> <period>
<subject> → <noun>
<predicate> → <intransitive verb>
<predicate> → <transitive verb> <object>
<object> → <noun>
<noun> → Mary|John
<intransitive verb> → believes
<transitive verb> → likes
XML’s DTD
The DTD lets us define our own grammar
Context-free grammar notation, also using regular expressions
Form of DTD:
<!DOCTYPE name-of-DTD [list of element definitions]>
Element definition:
<!ELEMENT element-name (description of element)>
Element Description Element descriptions are regular expressions Basis
Other element names #PCDATA, for any TEXT without tags
Operators | for union , for concatenation * zero or more occurrences of ? for zero or one occurrence of + for one or more occurrences of
Example DTD-1 <!DOCTYPE PcSpecifications [
<!ELEMENT PCS (PC*)>
<!ELEMENT PC (MODEL, PRICE, PROCESSOR, DISK+)>
<!ELEMENT MODEL (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
<!ELEMENT PROCESSOR (MANF, MODEL)>
<!ELEMENT MANF (#PCDATA)>
<!ELEMENT MODEL (#PCDATA)>
<!ELEMENT DISK (HD| CD)>
<!ELEMENT HD (MANF, MODEL, SIZE)>
<!ELEMENT CD (SPEED)>
<!ELEMENT SPEED (#PCDATA)>
<!ELEMENT SIZE (#PCDATA)> ]>
Pc Specs XML Document<PCS><PC>
<MODEL>4560</MODEL><PRICE>$2295</PRICE><PROCESSOR>
<MANF>Intel</MANF><MODEL>Pentium</MODEL><SPEED>4Ghz</SPEED>
</PROCESSOR><RAM>8192</RAM><DISK>
<HARDDISK><MANF>Maxtor</MANF> <MODEL>Diamond</MODEL><SIZE>2000Gb</SIZE>
</HARDDISK></DISK><DISK><CD><SPEED>32x</SPEED></CD></DISK>
</PC><PC> ….. </PC></PCS>
DTD and Production Rules
DTD:
<!ELEMENT PROCESSOR (MANF, MODEL, SPEED)>
Production Rule:
PROCESSOR MANF MODEL SPEED
DTD:
<!ELEMENT DISK (HARDDISK|CD|DVD)
Production Rule:
Disk HARDDISK|CD|DVD
DTD:
<!ELEMENT PC (MODEL, PRICE, PROCESSOR, DISK+)>
Production Rule:
PC AB
A Model Price Processor Ram
B Disk+
Last production is illegal .we introduce C
B CB|C
C Disk
We can rewrite above expression
PC Model Price Processor Ram B
B Disk B|Disk
Example DTD-2 <!DOCTYPE SENTENCES [
<!ELEMENT SENTENCE (SENTENCE*)>
<!ELEMENT SENTENCE (NOUN-PHRASE,VERB-PHRASE)>
<!ELEMENT NOUN-PHRASE(CMPLX-NOUN|CMPLX-NOUN,PREP-PHRASE)>
<!ELEMENT VERB-PHRASE(CMPLX-VERB|CMPLX-VERB,PREP-PHRASE)>
<!ELEMENT PREP-PHRASE(PREP,CMPLX-NOUN)>
<!ELEMENT CMPLX-NOUN(ARTICLE,NOUN)>
<!ELEMENT CMPLX-VERB(VERB|VERB,NOUN-PHRASE)>
<!ELEMENT ARTICLE(a|the)>
<!ELEMENT NOUN(boy|girl|flower)>
<!ELEMENT VERB(touches|likes|sees)>
<!ELEMENT PREP(with)>
}>
Production Rules
(SENTENCE) (NOUN-PHRASE)(VERB-PHRASE)
(NOUN-PHRASE) (CMPLX-NOUN)|(CMPLX-NOUN)(PREP-PHRASE)
(VERB-PHRASE) (CMPLX-VERB)|(CMPLX-VERB)(PREP-PHRASE)
(PREP-PHRASE) (PREP)(CMPLX-NOUN)
(CMPLX-NOUN) (ARTICLE)(NOUN)
(ARTICLE) A|THE
(NOUN) BOY|GIRL|FLOWER
(VERB) TOUCHES|LIKES|SEES
(PREP) WITH
CONCLUSION DTD for both XML and CFG describe languages with certain rules and
restrictions, and thereby declare what’s legal and what’s not in a given language.
An XML document is considered valid if it’s well formed and has been validated against a DTD.
A string is a valid string in a given Context-free language if the Context-free grammar for that language can generate it.
Reference Links:
http://www.math.uaa.alaska.edu/~afkjm/cs351/handouts/cfg.pdf
http://taligarsiel.com/Projects/howbrowserswork1.htm#w3c
http://www.w3.org/People/Bos/Schema/schemas
http://www.dcs.bbk.ac.uk/~ptw/teaching/dtd-new/notes.html
THANKYOU!!!!!!