DIGITIZATION OF RARE LIBRARY MATERIALS
Metadata -Introduction
Mark-up
© Adolf Knoll, National Library of the Czech Republic
CZ
DI
LV
LT
car
lorry
car
carmotorcycle
OKaircraft
Marked up categories of objects
All are means of transport. Marked the country (CZ, LV, D, I, OK,
LT) Marked the type (car, motorcycle, lorry,
aircraft) However, the way of marking is different: if the
type expresses the means of road transport, the Czech Republic is marked as CZ, but if it is a flying object, it is marked OK.
How is driven the mark-up?
The same object can belong to different categories marked differently.
This is done on the basis of object properties.
That property considered as decisive is taken as background for classification.
We cannot foresee the number of such decisive properties in many cases.
Text processing
To display text on a page, it must be arranged graphically.
If the text is electronic, the characters organized in such a way to form sequences divided by blanks must be driven by some tools to be displayed or printed where we want.
There are several possibilities how to do it:
Text processing
Page Description Language, e.g. Postscript Text editors:
making a paragraph in obsolete editors: break line + add an empty line + indent
making a paragraph in modern editors: say that a block of a text is a ¶paragraph¶
The paragraph is marked, but what to do with it?
Text processingMark-up and behaviour
An object can be marked by a sequence of characters (a Latvian car marked as LV), by a symbol or sign (a Latvian man marked by , a paragraph marked by ¶).
Under certain conditions, we may need to assign some behaviour to the objects marked in a certain way. Thus it is evident that we need some behavioural information somewhere to tell to the identically marked objects how to behave:
Text processingMark-up and behaviour
During an ice-hockey championship men marked by will play against men marked by .
Cars marked by LV will undergo other mandatory technical control than the cars marked by CZ.
Text processingMark-up and behaviour
In a good text editor, e.g. MS World, the formatting of the marked object (paragraph) is set separately (indented or not, how many dots after or before the paragraph, etc.)
In the web language, HTML, this is analogue: the <P>paragraph</P> is marked as shown, while the web browser knows that it must be displayed on a separate line after some space is omitted.
What is marked up?
We have seen that objects are marked up.
These objects can be objects from the real world or their representations.
The objects can be represented by their denominations, which - when written - are mere sequences of characters.
However, they can be also represented by their images or symbols and by the sound.
Object
Car
Properties of the object INSECT
It may be necessary to mark also some other properties of the object, which may be relevant to group or to classify its concrete representations.
beetle
beetle
fly butterfly
spider
spider
Concrete INSECT
The concrete insect can be beetle that is lady-bird, goldsmith-beetle, longicorn beetle, or may-bug, etc.
This is its name, which is different in different languages: in Czech, for example, the above sequence of beetles have names as beruška, zlatohlávek, tesařík, chroust.
However, the differences of names do not affect the correctness of content mark-up.
Summing up
It is evident that marking an object we should distinguish between: the mark-up of the content the complementary properties of the
marked object the assigned names to the object the information about how such an object
should behaved if activated (display, printing, projection, …)
How to describe the object?
INSECTBEETLEPICTURE
Colorado BeetleINSECTBEETLETEXT
How to prescribe behaviour to the described object?
The behaviour is prescribed by special formatting - in this case - rules.
This behaviour is separate from the mark-up of the contents.
The formatting rule can take only the representations of the objects and mark up their behaviour if these representations are activated.
Among the images used for my Power Point presentation, there is also this one: <p><img SRC="Image_Colorado_Beetle.gif" ALT="Image of a Colorado Beetle" height=196 width=166 align=CENTER>Colorado Beetle<br> This beetle is very nice.
The problems
At the formatted output here, we have lost the description of the contents, which are necessary for other kind of work.
It is evident from this that such a kind of output cannot be taken as the only existing source data.
However, it can be admitted that it one of possible appearances of source data.
Source data and access
SourceData
Appearanceno. 1
Appearanceno. 3
Appearanceno. 2
Direct and simple look inside the source data is desirable