croatian internet serials 1 croatian electronic publishing results of a survey on e-serials and...
TRANSCRIPT
1Cro
atia
n In
tern
et s
eria
ls
Croatian Electronic Publishing
Results of a survey on e-serials
and usage of metadata
Sofija Klarin, Sonja Pigac, Damir Pavelić
[email protected], [email protected], [email protected]
National and University Library, Zagreb
Faculty of Economics, Zagreb
2
Topics
Part 1• Context: facts, presumptions and questions
Part 2• Results of the survey Croatian remote
access e-serials
Part 3
Use of metadata in e-serials, possibilities for Croatia
3
1. Electronic publishing using the Internet
• explosion of publishing activities since 90s raises the problems of searching, retrieval, identification and preservation of electronic documents
• World Wide Web (1995)• Cataloguer-based management
vs.• Author-based management(Koehler)
4
1.1 How big is the Web?
Lawrence & Giles (1999):• 800 million web pages• 15 TB of information• 6 TB of text
BrightPlanet - LexiBot software(2000)• 19 TB - the “surface” Web• 7,500 TB - the “deep” Web
Kulturarw3 project - Sweden• web harvesting• 7.5 million files• 300 GB
Croatia (since 1991)• 8000 .hr domains• types, number of
files?• types of resources?• publishers?
Too big?
5
1.2 Lawrence & Giles (1999):
• 83% of sites contain commercialcontent and 6% contain scientific or educational content in the Web
Valuable
material?
05.08.2000 most visited Croatian sites (Proof)
6
1.3 Persistence of Web documents (Koehler,1999)
• Web pages are unstable– go under change (in a year 99% of web
pages - some degree of change)– disappear – 5% return within a specific period of time
• Two types of change– change of content (20% in a week)– change of structure (20% in a week)
TooToo
ephemeral ?ephemeral ?
7
1.4 Low use of metadata on the WWW
Lawrence & Giles (1999)
• the simple HTML "keywords" and "description" metatags are only used on the homepages of 34% of sites
• only 0.3% of sites use the Dublin Core metadata standard
• who are Web “publishers”? – can they accept standards for management
and interchange of metadata?
Search/retrieval?
Reliability?
Authenticity?
Interchange?
Publishers?
8
1.5 Products of electronic publishing
• local access • hybrid • remote access
resources
• monograph publications
(finite publications.)• continuing
resources?– serials– integrating resources
• data
or/and• programs
• public access• restricted access
• static• dynamic
New types of
resources?
9
2. The survey (January 2000 - April 2001)
• The aim of the survey on e-serials: quantity, categories, persistence, publishers, metadata usage in Croatian web space...
• sample:
- electronic publications which consist of successive parts with numerical or chronological designations
- in Croatian or produced by Croatian publishers, available via WWW
• items excluded:
OPACs or databases, lists/archives, web sites, online services, advertisements
10
2.1 Identification
• Lists, directories, portals, search engines:
CroLinks http://www.crolinks.com
www.hr - News, media, journals
Iskon - Net.hr portal http://www.iskon.hr
Google, Yahoo
• from their print versions• from publishers
11
2.2 Numbers
• Total number: 153
disappeared: 16
changed URL: 12
ceased: 2
changed the title: 1
• NL Denmark - 1069 (2000)• NL Norway - 299 (1999)
12Cro
atia
n In
tern
et s
eria
ls
Religious magazines
2.3 Categories:
Weekly/fortnightly magazines
Scientific journals
Student journalsSerials published by universities,
scientific institutesSerials published by civil services
Serials of unknown type
Newspapers
--------------------------------------------------------------------Sums 153
28
42
9
10
8
14
4
9
Serials published by societies
Serials published by companies
11
18
Journals
13
2.4 Editions: electronic, both electronic and printed
110
42 +
1
e.g. Vjesnik, Večernji list, Slobodna Dalmacija
both electronic and print
e.g. Mountain Bikinig, Morsko prase
electronic only Internet Monitor
print became electronic
14
2.5 Place of publication:
– Zagreb: 115
– Split: 6
– Rijeka: 5
– Osijek, Dubrovnik, Varaždin, Čakovec
Slavonski Brod: 2
– Karlovac, Zadar, Pula Koprivnica, Ičići,
Prelog, Sv. Ivan Zelina, Rovinj, Virovitica 1
– other:(AT) 1
– unknown: 4
15
.hr 82%
2.6 URLs: Croatian domain or …?
www.vjesnik.hr
www.vecernji-list.hr
www.slobodnadalmacija.hr
www.nacional.hr
www.vef.hr/vetarhiv
www.nn.hr/Glasilo/index.htm
www.hi-fi.hr/hgz
wam.hi-fi.hr
www.agr.hr/smotra/index.htm
www.monitor.hr
www.gradst.hr/engmod
www.bug.hretc.
.com 17%A
www.hrvatska.com/glas-
podravine
duhovno-vrelo.com
www.win-ini.com
cyberdream.croadria.com
www.zarez.com
www.hrvatska.com/bilten.html
www.kapital.com
etc.
other 1%www.moravek.net/kla
www.hrvatskenovine.at
C
B
1 item 3 URLs / domains (.hr .com .net) 1 item 2 URLs / domains (.hr .com)
16
2.6.1 Domains, URLs• 28 items have top-level domain name e.g. www.vjesnik.hr, www.morsko-prase.hr
• 12 items changed URL:– 5 from first/second... level domain to top-level domain name
e.g. http://www.hbk.hr/GK/gk.htm http://www.glas-koncila.hr
– 5 internal changes of the site (domain)
e.g. http://www.kdb.hr/projekt/paedro/index.htm
http://www.kdb.hr/paedro/
– 1 .hr .com
– 1 .com .hr
• 16 items disappeared:– 11 .hr 68,75% (total .hr 82%)– 5 .com 31.35% (total .com 17%)
17
2.7 Chronological overview
0
5
10
15
20
25
30
35
‘94 ‘95 ‘96 ‘97 ‘98 ‘99 2000 2001
year titles
1994 2
1995 6
1996 11
1997 21
1998 26
1999 33
2000 24
2001 2
unknown 26
18
2.8 Low metadata use
Croatian e-serials
• HTML metatags
”keywords”
“description”
“author” – 32.8% (September
2000)– 33.3% (April 2001)
• 1 title - DC metadata standard
Lawrence & Giles (1999)• simple HTML metatags
are only used on the homepages of 34% of sites.
• Only 0.3% of sites use the Dublin Core metadata standard.
19
2.9 Metadata<HTML><HEAD><META NAME="GENERATOR" CONTENT="Adobe PageMill 3.0 Win"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-2"><TITLE>ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS</TITLE><LINK REV="made" HREF="mailto:[email protected]"><META NAME="keywords" CONTENT="Croatia, agriculture, science, publication,
agricultural, economics, rural, sociology, plant, pathology, herbology, animal, nutrition, engineering, soil, amelioration, microbiology, dairy, agronomy, breeding, genetics, botany, zoology, crops, fishery, beekeeping, husbandry, forades, grassland, ornamental, ladnscape, architecture, farm, management, enology, viticulture, pomology">
<META NAME="description" CONTENT="On-line Scientific Journal" AGRICULTURAE CONSPECTUS SCIENTIFICUS PUBLISHED BY FACULTY OF AGRICULTURE UNIVERSITY ZAGREB>
<META NAME="copyright" CONTENT="ACS Agriculture Conspectus Scientificus"><META NAME="revisit-after" CONTENT="60 Days"><META NAME="Robot" CONTENT="ALL"><META NAME="DC.Title" CONTENT="ACS-AGRICULTURAE CONSPECTUS
SCIENTIFICUS"><META NAME="DC.Creator" CONTENT="Agriculture Conspectus Scientificus, Faculty
of Agriculture, Zagreb CROATIA"><META NAME="DC.Publisher" CONTENT="Faculty of Agriculture University of
Zagreb"></HEAD>
20
2.10 Metadata questionnaire
• sent in April 2001 by e-mail to 160 e-publishers, editors, webmasters…
• to find out more about their familiarity with metadata, and their intentions to use metadata and cooperate with librarians
• an effort to raise the awareness among publishers of the need for “electronic title page” to be included in their publications
21
Do you know what metadata is?
0
10
20
30
40
50
60
70
80
90
NO YES NO YES
Do you usemetadata?
27 answers representing 32 publications received (17,3% or 20,6%)
6 incorrect statements:4 claim to use metadata (they don’t!)2 claim not to use metadata (they do!)
22
The benefits of metadata • facilitates search and retrieval 69,6%• promotes the company/publ. 56,5%• helps identify the author and the
content of the publication 52,2%• everybody uses metadata 13%• reliability and authenticity of publ. 8,7%• contains copyright information 4,3%
<title> 95,7%
<keywords> 95,7%
<author> 52,2%
<description> 60,9%
<copyright> 21,7%
23
Metadata is created by...
0
5
10
15
20
25
30
35
40
45
webmaster
editor-in-chief
both
25,8% don’t use metadata because they:•know nothing about metadata 50%•don’t have enough time 12,5%•don’t have enough employees 12,5%
24
Meatadata generators? (DC-dot, TagGen, DC assist, EdNA, AHDS, Reggie, Nordic DC metadata generator, SAFARI)
• aware of their existence 11%• not aware 71%
– would like to be informed 100%
25
Metadata is contained in...
• homepage only 26,1%• all pages (same metadata) 17,4%• all pages (different metadata) 47,8%
26
Metadata standardization?1. Have you heard of metadata standardization?2. Which metadata schema do you know of?3. Would a metadata guideline help you?4. Is standardization important for your work?5. Would you like to have standardized metadata in your publ.?
0
1020304050607080
90100
NO YES - dc u o YES NO YES NO YES NO
27
Could librarians help you?
0
10
20
30
40
50
60
70
80
90
YES
NO
• librarians work on standardization of bibl. description 48%
• I’d appreciate any help 44%• librarians describe print publ. 32%• librarians work on standardization of
metadata 12%• we are already familiar with library activities
(ISBN,ISSN,CIP…) 24%
• librarians don’t know much about the Web 50%
• webmasters should do that44%
• can do it by myself25%
28
E-journals available through the library WebPAC?
YES 93,8%• it’s useful information for users 75%• it’s important to treat both print and
e-publ. in the same way 75%• it’s useful for publishers 46,4%
NO 6,3%• people prefer to use search
engines • web publications often change
their URLs - “I’m not sure librarians should catalogue them”
29
Dublin Core Metadata Initiative survey
From Feb. 20th to March 9th, 2001.
The purpose of the questionnaire was to help achieve some of the DC Libraries Working Group’s objectives for 2001, including: (1) to collect and share examples of Dublin Core use in libraries and (2) to stimulate discussion that will feed into the process of drafting an application profile for the use of Dublin Core in libraries
DC-General and DC-Libraries lists, CORC Users List, and The Alberta Library Metadata List
29 responses from 9 countires
Most used: creator, publisher, title, rights, type, identifier, format, description
Low use of qualifiers
http://dublincore.org
30
3. Use of metadata in e- serials and possibilities in Croatia
E-serials
- digital / hybrid libraries
- databases (publishers, vendors)
cooperation (BIBLINK) hosted.ukoln.ac.uk/biblink
- separately (web pages)
31
3.1. Using metadata
1. Inside the document – HTML (XML)
<head> metadata </head><body> document described above </body)
2. Separate file
- metadata records + links to e-serials (bibliography, similar serials…)
- file containing metadata – link from web page
with no metadata in the <header> (DC web page)
32
3.2 Metadata schemes
- before Internet and electronic publications (cataloguing, exchange – MARC, GILS, CIMI)
- development of Internet (searching, cataloguing, exchange)
Qualified Dublin Core (dublincore.org)
- translations versions (21 language)- no Croatian but translation is finished
33
3.3. Creation & conversion tools
- Creating metadata (templates)Nordic DC metadata creator (including URN generator)
(choice of controlled vocabularies, classification, date format, identifier)
- Creation / change of templatesReggie, Mantis (OCLC) HotMETA (search DC)
- Automatic extraction / gathering from HTML (enter URL)
DC-dot (results in DC, RDF, XHTML - aditional
corrections possible)
Donor metatagenerator (similar to Nordic DC)
34
- Automatic production
Klarity (automatically generates metadata based on concepts found in text)
Scorpion (automatic classification to DDC)
- Commercial software
TagGen Dublin Core edition (number of schemes and possibilities)
Metabrowser (shows Metadata and Web Pages simultaneously)
http://dublincore.org/tools
3.3. Creation & conversion tools
35
DC-dot - ( http://www.agr.hr/smotra )
3.3. Creation & conversion tools
36
Donor - ( http://www.agr.hr/smotra )
3.3. Creation & conversion tools
37
3.3. Creation & conversion tools
Metabrowser – “Metabrowser is a web browser that catalogues web pages using schemas such as Dublin Core, GILS, AGLS. Metabrowser allows metadata to be added to web pages accessible from a local or network drive or sent to an external system such as a database or firewalled web server”
38
3.3. Creation & conversion tools
Conversion:
- DC -> MARC (Dan, Fin, Is, Nor, Swe, US)
Nordic Metadata Project: DC to MARC converter
(www.bibsys.no/mete/d2m)
- Crosswalks: DC, MARC, MARC21, EAD, GILS,ISAD, FGDC
(www.ukoln.ac.uk/metadata/interoperability)
39
3.3. Creation & conversion tools
Nordic metadata project: DC to MARC converter
008010508s 245 $a ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS260 $b Faculty of Agriculture University of Zagreb
856 $u http://www.agr.hr/smotra
40
Conversion MARC -> XML -> MARC
( www.logos.com/marc)
( www.culture.fr/BiblioML) - additional applications needed
3.3. Creation & conversion tools
41
3.4. Which model / scheme ?
- company / organization needs - connection and cooperation with other
companies / organizations - budget - standardization - softver and upgrading possibilities - exchange of data / records
LibrariesPublishersVendors
different needs and aims
42
Libraries - bibliographic control, - up-to-date record collections
(users benefit), - exchange
Publishers - timely, accurate and full exposure of their products and services,
- search and retrieval – benefit users and publisher,
- standardized record in databases for possible exchange and profit
Cooperation !
3.4.1 Choose scheme and strategy - Croatia
43
Use knowledge and experience from foreign projects:
BiblinkCORC (Cooperative online resources cataloguing)DONOR (Directory of Netherlands online resources)
- Inform publishers of standards and possibilities (survey)
- Point out necessity of standardization and use of one primary (major) scheme (Dublin Core ?)
- Show them how to use free web-available tools
3.4.1 Choose scheme and strategy - Croatia
44
3.5 DC – RDF - XML
Dublin Core is enough for basic description (qualified) – serves our needs for the beginning
RDF (Resource Description Framework) is about to become standard (semantic web)
XML (eXtended Markup Language) is already growing standard (strucure, exchange, e-business, internal control…)
45
RDF - development is still in process but…
Many projects and tools exist (creation, conversion)
Constant work, often non-commercial (learn & use)
Croatia - use same metadata scheme (DC?) enriched with internal metadata scheme if needed (for publishers use)
- embed it into HTML documents
- convert to RDF-XML eventualy
3.5 DC – RDF - XML
46
3.6. Conclusion
Low use of any metadata scheme opens possibility to adopt one primary scheme (DC?) and emerging standard (RDF?)
Concentrate on the start and strategy, use experience from others
Build environment to help publishers (similar to Biblink)
Cooperation among libraries and publishers is essential
47
3.7 Links
http://dublincore.org www.ifla.org
www.ukoln.ac.uk www.w3c.org
www.editeur.org www.xml.com
www.logos.com/marc www.culture.fr/BiblioML