october 21st, 2010 current trends in library search: from electronic card boxes to large scale,...
TRANSCRIPT
October 21st, 2010www.gbv.de
Current trends in library search:
From electronic card boxes to large scale,
aggregated search engines
Till Kinstler, [email protected] des GBV (VZG)
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
Taken from Avram, Henriette D.: The MARC Pilot Project. Final Report. Library of Congress, Washington, DC., 1968, http://www.eric.ed.gov/ERICWebPortal/detail?accno=ED029663
October 21st, 2010www.gbv.de
01410nam a2200349 i 4500
001 177062495
003 DE-601
005 20100706235250.0
008 950105s1957 xxu 000 0 eng d
016 7 $a 452218721 $2 DE-101
040 $a GyGoGBV $b ger $e rakwb
041 0 $a eng
044 $a xxu $a xxk
245 00 $a Information systems in documentation : $b based on the Symposium on Systems for Information Retrieval held at Western Reserve University, Cleveland, Ohio, in April, 1957 / $c ed.: Jesse H. Shera; A. Kent; J. W. Perry.
260 $a New York, NY [u.a.] : $b Interscience Publ., $c 1957.
300 $a XV, 639 S : $b Ill., graph. Darst ; $c gr. 8.
490 0 $a Advances in documentation and library science ; $v 2
653 0 $a Information storage and retrieval systems
700 1 $a Shera, Jesse H., $0 (DE-601)400477432.
700 1 $a Kent, A..
700 1 $a Perry, J. W..
710 2 $a Symposium on Systems for Information Retrieval $c (1957, Cleveland, Ohio)
711 2 $a Symposium on Systems for Information Retrieval $d (1957.04. : $c Cleveland, Ohio)
830 $v 2 $w (DE-601)129356271
950 $a Literaturrecherche $a Kongre� $2 GBV
050 0 $a Z695.92
060 0 $a Z 1008
082 00 $a 029.75
084 $a 06.74 ; Informationssysteme $2 bcl
084 $a 35.99 ; Chemie: Sonstiges $2 bcl
900 $a GBV $b SUB+Uni G�ttingen <7> $d !FMAG! ZA 18582:2 $x L $z LC
954 $a 40 $b 742655784 $c 01 $d ZA 18582:2 $e u $x 0007
October 21st, 2010www.gbv.de
GBV (Common Library Network)
~ 400 (mainly academic) libraries
~ 120 million data records describing books, articles, digital objects, ... available through these libraries
What do we do with this data?
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
„In addition, we have also found that the poor usability, high complexity, and lack of integration of many electronic resource discovery systems, have raised the entry threshold of information technology literacy. This acts as a barrier to information search and retrieval. […]
Users find database structures hinder. They have to learn the procedural knowledge for using a particular database as well as have some basic knowledge of how the data table is organised and what subject matter the built-in thesauri refers to; both have limited transferability. The participants did not appear to lack information technology or digital literacy, as they had demonstrated they were able to use other internet-based search and retrieval tools.“
(Wong, W. ; Stelmaszewska, H. ; Barn, B. ; Bhimani, N. ; Barn, S.: JISC User Behaviour Observational Study: User Behaviour in Resource Discovery. Final Report / JISC. Version: November 2009. http://www.jisc.ac.uk/media/documents/publications/programme/2010/ubirdfinalreport.pdf)
October 21st, 2010www.gbv.de
Search Engine Index
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
Suchkiste goals:
Provide a single access point to DFG Nationallizenzen collection (~150 million digital ressources)
Make better use of data using a search engine using built on information retrieval technology → Solr
User experience based on web standards
Open up library data silos to the web
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
October 21st, 2010www.gbv.de
Challenges: data collection
Where to get it? Legal questions?
Coverage?
How to get/transfer it? OAI-PMH? RSS? ftp? „Dumps“? „XML“? Tapes?
Updates?
Processing? Normalisation? Deduplication/Clustering? Variety of data (formats), structure (implicit and explicit), documentation?, messy data, errors in data (encoding, structure...)
Sotrage and management of large data sets
…
→ lots of manual(!) work
October 21st, 2010www.gbv.de
Challenges: Search Engines
How to index structured library data?
Relevance ranking?
Factors? (TF/IDF?, popularity?, availability?,„freshness“?, „context“?, …)Use structure and content of (library) metadata?Mixing „metadata“ and „fulltext“?Mixing (data on) different media?
Minor issues: search suggestions, stemming
October 21st, 2010www.gbv.de
Challenges: User Interfaces
overall user experience
Single search box ↔ advanced search making use of data structure
Browsing, Visualisation, Refining, Facets?
Making it part of the web
Search suggestions, spelling corrections, ...
October 21st, 2010www.gbv.de
Next:
One central index of all GBV data (~120 million records)
Beyond opening HTML record views to the web, opening data for use on the web: linked data