cezary mazurek ([email protected]) marcin werla ([email protected]) poznań supercomputing...
TRANSCRIPT
Cezary Mazurek ([email protected])Marcin Werla ([email protected])Poznań Supercomputing and Networking Center (Poznań, Poland)
2009-09-30 ECDL 2009, Corfu, Greece
2009-09-30 ECDL 2009, Corfu, Greece
Main organizational models Regional digital libraries▪ Created and maintained by several institutions from
particular region▪ Gather mostly resources related to the region, its history
and culture but also academic educational materials and national cultural heritage
Institutional digital libraries▪ Created and maintained by single institutions (like
universities)▪ Gather mostly resources related to present activities (like
institutional repositories) and history of the institution In many cases the technical base and support
for digital libraries is provided by local computing or networking centres (like PSNC)
2009-09-30 ECDL 2009, Corfu, Greece
Regional digital libraries
Institutional digital libraries
Overall number of digital objects
285 thousands
Number of active digital libraries:
19 regional 21 institutional
Number of cooperatinginstitutions: Several hundreds of
libraries, museums and archives
+ several other digital libraries in the phase of planning, configurationor initial content uploading
2009-09-30 ECDL 2009, Corfu, Greece
Main aims To facilitate the use of resources
from Polish digital libraries To increase the visibility of these
resources in the Internet To create new, advanced network
services both for end-users and digital libraries creators on the base of these resources
2009-09-30 ECDL 2009, Corfu, Greece
Basic assumptions No need nor requirement to move
resources to the DLF No fees for the use of the DLF and for
being a part of it Open standards are the basis for
cooperation▪ Particular digital libraries can use different
technological platforms
2009-09-30 ECDL 2009, Corfu, Greece
Basic functions Search in the available publications▪ Simple▪ Advanced
Digitization plans▪ Searchable▪ Report▪ API for the prevention of duplicted digitization
Location of digital objects on the basis of their OAI Identifiers
Database of Polish digital libraries Statistics and reports
Information in the DLF is updated on the daily (nightly) basis
2009-09-30 ECDL 2009, Corfu, Greece
ECDL 2009, Corfu, Greece
Digital Libraries Federation
search plugin
2009-09-30
InstitutionsDigital librariesMetadata aggregator
2009-09-30 ECDL 2009, Corfu, Greece
We gather the information about content providers and their information systems
Database of Polish Digital Libraries in the DLF
2009-09-30 ECDL 2009, Corfu, Greece
We gather the metadata of objects that should be visible in Europeana
Done with the OAI-PMH▪ In most cases we require the OAI-PMH
interface▪ In really special cases we can do it in
different way (eg. Polish Internet Library) Now we harvest only Dublin Core Simple
▪ Works on new national metadata schema started in September 2009
▪ Approximate time of development: 3 months▪ Approximate time of deployment: ???
2009-09-30 ECDL 2009, Corfu, Greece
We will try to clean-up the metadata, normalize it and enrich On the DLF level there are automatically built
dictionaries on the basis of aggregated metadata▪ Separately for each metadata element▪ Separately for each metadata language
Differences between the metadata from various digital libraries have negative impact for the searching possibilities of the end-users
That is why the metadata normalization is so important
The basic analysis shows which elements are crucial and which should be easy to clean-up▪ The analysis was done in April 2009 on the metadata of
214 254 aggregated objects
2009-09-30 ECDL 2009, Corfu, Greece
DC ElementNumber of
unique valuesHow many times values were used in metadata
Average number of uses per one
valueformat 39 209 789 5 379,2
language 195 210 529 1 079,6 type 822 211 816 257,7
rights 1 192 246 093 206,5 coverage 66 2 390 36,2 publisher 18 002 310 764 17,3
contributor 12 979 83 464 6,4 subject 78 440 438 871 5,6 relation 9 292 48 319 5,2
date 47 581 209 589 4,4 identifier 6 426 27 666 4,3
description 43 657 180 391 4,1 source 16 996 52 506 3,1 creator 21 908 67 503 3,1
title 210 745 227 039 1,1 2009-09-30 ECDL 2009, Corfu, Greece
Format In 99% of descriptions: MIME type(eg.
text/html, image/x.djvu) Language
In most cases: ISO 639-2 (pol, ger, lat, fre etc.)
Sometimes one value „pol, ger” instead of „pol”, „ger”
Rights Name of the institution which holds the
original object Type
…2009-09-30 ECDL 2009, Corfu, Greece
Values for „Type” (top 20)Number of objects
with the value% of aggregated
objects% of aggr. obj. (after clean-up)
czasopismo 44 709
20,9%33,8%
gazeta 32 921
15,4%31,3%
gazety 23 119
10,8%
Czasopismo 20 965
9,8%
książka 12 503
5,8%
Gazeta 11 098
5,2%
pocztówka 5 768
2,7%
czasopisma 4 962
2,3%
text 4 452
2,1%
grafika 3 863
1,8%
fotografia 3 596
1,7%
artykuł z czasopisma 3 164
1,5%2,6%
artykuł 2 455
1,1%
Czasopisma 1 710
0,8%
dzienniki urzędowe 1 516
0,7%
stary druk 1 222
0,6%1,1%
starodruk 1 221
0,6%
rysunek 1 094
0,5%
rękopis 1 062
0,5%
mapa 1 028
0,5%
Sum 85,1% 68,9%
2009-09-30 ECDL 2009, Corfu, Greece
DC ElementNumber of
unique valuesHow many times values were used in metadata
Average number of uses per one
valueformat 39 209 789 5 379,2
language 195 210 529 1 079,6 type 822 211 816 257,7
rights 1 192 246 093 206,5 coverage 66 2 390 36,2 publisher 18 002 310 764 17,3
contributor 12 979 83 464 6,4 subject 78 440 438 871 5,6 relation 9 292 48 319 5,2
date 47 581 209 589 4,4 identifier 6 426 27 666 4,3
description 43 657 180 391 4,1 source 16 996 52 506 3,1 creator 21 908 67 503 3,1
title 210 745 227 039 1,1 2009-09-30 ECDL 2009, Corfu, Greece
(Polish version of objects’ description)
ValueNo. of associations % of all associations
gazety regionalne 12214 2,56%czasopisma 7716 1,62%prasa polska 5424 1,14%czasopisma niemieckie 5009 1,05%gazety sublokalne 4968 1,04%Grodków 4962 1,04%Grottkau 4961 1,04%Wielkopolska 4422 0,93%19 w. 4249 0,89%Prusy 4164 0,87%Czasopisma regionalne i lokalne polskie -19 w. 4140 0,87%wiadomości polityczne 4094 0,86%Gazety polskie - 1918-1939 r. 4077 0,85%kultura 4071 0,85%czasopisma sublokalne 3813 0,80%Górny Śląsk 3731 0,78%architektura 3566 0,75%Wrocław 3515 0,74%Śląsk 3448 0,72%budownictwo 3388 0,71%
2009-09-27 ECDL 2009, Corfu, Greece
Confused with coverage:temporal spatial
(Polish version of objects’ description)
ValueNo. of associations % of all associations
Poznań 54943 12,62%Telecomp Service na zlecenie PBI 22310 5,12%Kraków 13662 3,14%Warszawa 11245 2,58%Toruń 11221 2,58%Katowice 8187 1,88%Drukarnia Polska 7998 1,84%Drukarnia Dziennika Poznańskiego T.A. 6828 1,57%Warszawa : Telecomp Service na zlecenie PBI 6824 1,57%Drukarnia Dziennika Poznańskiego S.A. 5785 1,33%Nakładem F[ranciszka] T[adeusza] Rakowicza 5406 1,24%Kielce 5292 1,22%Krakowskie Wydawnictwo Prasowe RSW "Prasa" 5137 1,18%Breslau 5130 1,18%E. Neugebauer 4959 1,14%Wangefield 4959 1,14%Grottkau 4959 1,14%Bydgoszcz 4752 1,09%Drukarnia Dziennika Poznańskiego 3923 0,90%Drukarnia J. I. Kraszewskiego 3869 0,89%
2009-09-27 ECDL 2009, Corfu, Greece
Geographical location…
We have over 40 digital libraries in Poland which are filled with content and metadata coming from hundreds of institutions from different domains
We harvest the metadata and provide a single point of access to it The PIONIER Network Digital Libraries Federation (
http://fbc.pionier.net.pl/) The software used for this service will be released as an open-
source by the end of this year Cooperation with Europeana (but not only this) requires
cleaning-up and normalization of metadata This is currently our biggest challenge
But we do not want to solve it only by technical means on the level of our aggregator
Close cooperation with content providers and some organizational changes prepared by them should effect in more efficient and sustainable metadata improvement process than a purely technical solution
2009-09-30 ECDL 2009, Corfu, Greece
Cezary Mazurek ([email protected])Marcin Werla ([email protected])Poznań Supercomputing and Networking Center (Poznań, Poland)
2009-09-30 ECDL 2009, Corfu, Greece
Thank you for your attention. Any questions?