august 2005ifla - cdnl1 the international internet preservation consortium (iipc)

17
August 2005 IFLA - CDNL 1 The International Internet Preservation Consortium (IIPC)

Upload: rebecca-mcdowell

Post on 27-Mar-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 1

The International Internet Preservation Consortium

(IIPC)

Page 2: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 2

Synopsis

• The IIPC - what is it?

• Background

• IIPC goals and organisation

• IIPC issues

• IIPC future?

• Concluding remarks

Page 3: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 3

The IIPC - What is it?• International collaboration for preserving Internet

content

• Mission: Acquire, preserve and make accessible Internet (WWW) content for future generations

• 12 participating institutions– National libraries of: Australia, Canada, Denmark, Finland,

France, Iceland, Italy, Norway, Sweden. The British Library (UK), The Library of Congress (USA) and the Internet Archive (USA)

• Chartered in Paris July, 2003, agreement in effect for 3 years

• Future not decided but IIPC seeks to involve national libraries

• IIPC welcomes inquiries about future membership

Page 4: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 4

Background

• The Internet is a specific medium with attributes of: – Books, journals, radio, images, video

• Characterised by– Exponential growth since 1994– Proliferation– Immense volume– Anybody can publish– Accessible everywhere

Page 5: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 5

Archiving the Web – WHY - Who Presently and in the future, a large and significant

part of our culture will exist ONLY on the Internet

• If the Web pages are not collected in an orderly and continuous manner they will disappear and thereby an important part of the worlds cultural and intellectual heritage

Therefore we should:• Preserve material that is only available on the Web• Preserve scholarly data and secure access to it because it is:

– Important and valuable– Cited– Finding and locating it is a problem

A logic extension of national libraries mission and goals

LEGAL DEPOSIT LAW

Page 6: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 6

1662

1697

2003

18861909-1941

1949

1977

Evolution of Legal deposit Law in Iceland

WWW

Page 7: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 7

Pre IIPC Development

• 1996 - Internet Archive, Sweden, Australia

• 1998 – Nordic co-operation

• 2000 - 2003 – Loc, BnF, UK, Austria, Slovenia, Check Republic, Lithuania, Canada

• IFLA 2002: Brewster Kahle presents the IA and Web

archiving

• September 2002 – IA proposes a project with a few libraries

• September 2002 – Meeting in Rome (during ECDL)

• January 2003 – Meeting in Paris (COBRA +)

• July 2003 IIPC incorporated

Page 8: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 8

IIPC Goals

To build a virtual global distributed collection to ensure that thedistributed and linked nature of the original web material is not lost

forever

Find a new way of collaborating among national heritage institutions

In order to create a network of heritage institutions

That can build and preserve the global distributed collection

Global information space of the Internet Global Distributed Collection

Page 9: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 9

IIPC Organisation

• Steering group one person from each institution

• Working groups– Access

– Content Management

– Deep Web

– Framework

– Metrics and Testbed

– Researchers Requirements

Page 10: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 10

IIPC ObjectivesCollaborative work, within each country's legislative framework, to

identify, develop and facilitate implementation of solutions for selecting, collecting, preserving and providing access to internet content

Facilitate international coverage of internet content archive collections within

national legal frameworks, in accordance with national collection policies

International advocacy for initiatives that encourage the collection, preservation and access to internet content

Provide a forum for sharing knowledge about internet content archiving both

within the Consortium and beyond

Develop and recommend standards

Develop interoperable tools and techniques to acquire, archive and provide

access to web sites

Raise awareness of internet preservation issues and initiatives through conferences, workshops, training events and publications.

Page 11: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 11

IIPC Results Intangible• Common understanding and clarification of issues

• Definition of the overall architecture for web archiving with system interface specifications

• Proposed standards for Web Archive file format and Metadata

• Access requirements with Use cases illustrating common understanding of the functionality of a web archive

• Identification and requirement specification of new access tools

• Curator tool for controlling and scheduling the collection of web content

• Definition of the the WARC (web ARChive) file format to store information blocks harvested by web crawlers

Page 12: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 12

IIPC Results

Tangible

• Heritrix Crawler/Harvester– “Smart crawling”– Continuous harvesting

• Full Text Indexer/Search Engine – searching/browsing the content of a Web Archive

• Extract data from an archived database

• Arc files manipulation tool

Page 13: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 13

IIPC Future - Issues

Collection building• Broad scope representative collection of Web• Narrow scope in depth collection of selected sites

Registration• Cataloguing is not possible• Indexing of text (with time element)

Access • Direct using a URL• Search Engine (Google type)• Data Mining (Analytical and statistical methods)

Long time preservation of a web archive a conscious omission

Page 14: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 14

IIPC Future

Current IIPC charter ends in July 2006

Proposals for continuation will be discussed at the next

meeting in late October 2005

Challenge is to keep the work focused and effective

Many unsolved problems and hopefully new members can

help

Page 15: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 15

Concluding remarks

Creating and accessing a Web Arcive is: • Very complex, challenging and exiting - not a problem

nor a burden• Collection – Preservation – Access

The first phase has started

Our knowledge of the Web and its contents is incomplete

All present software and tools must be improved

International cooperation needed to:• Define and develop standards, techniques and methods• Create national and even a global Web Archives• Provide access to the archives

Page 16: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 16

EXTRACT FROM THE ARCHIVE

ARCHIVE

TXT SOUNDVIDEOIMAGE

INDEXDATABASE

Createindex

Createindex

Createindex

Createindex

INTERNETBROWSER

Page 17: August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

August 2005 IFLA - CDNL 17

Books/Journals/Sound Rec. Video/Micro/CD’s Manuscr.Internet

INDEX

Films

National Bibliography reflecting new law

Bibliography of National Cultural Heritage

Gallery

Archive

Museum

National

National Bibliography - from Print to Digital

Present National Bibliography