preserving the union … j. douglass klein associate dean for information technology union college
TRANSCRIPT
Preserving the Union …
J. Douglass KleinAssociate Dean for Information TechnologyUnion College
Preserving the Union …
WEBSITEWEBSITE
Preserving the Union . . .Preserving the Union . . .(and Amherst and Hamilton and Skidmore)(and Amherst and Hamilton and Skidmore)
J. Douglass KleinAssociate Dean for Information TechnologyUnion College
Acknowledgements
– Tom McFadden; Ellen Fladger;
Dave Cossey; Diane Keller; Tom Smith
- Daria D'Arenzo; Susan Edelberg
- Peter MacDonald; Ned Stankus
– Leo Geoffrion
… and many, many more.
With funding from -
Primary Reference:http://www.union.edu/PUBLIC/ECODEPT/kleind/wwwarchive
So, what did your websitelook like
in…
1997 ?
1998 ?
2000 ?
2001 ??
2003 ?
Outline:Archivingthe web
What is the college website?Why do we care? - What do we want to save? - Who are we saving it for?How do we do it?
Outline1Outline:Archivingthe web
What is the college website? Why do we care? - What do we want to save? - Who are we saving it for?How do we do it?
Original diagram
The Union web
Outline2Outline:Archivingthe web
What is the college website?Why do we care? - What do we want to save? - Who are we saving it for?How do we do it?
Data Storage: Digits to Dust
"Digital information lasts forever, or five years--whichever comes first,"
-- Jeff Rothenberg, senior computer scientist at RAND Corp.
LONGEVITYMagnetic tape breaks down from exposure to air, heat, and humidity; optical disks can decay and surface dyes can fade in sunlight, sometimes causing the loss of information stored on them.
OBSOLESCENCEAs UNIVAC drives or programs such as Word-Perfect 4.0 become obsolete, information stored when using them may be lost, too.
MIGRATIONInformation can be lost or corrupted as it is transferred periodically from one type of media or computer system to a newer one.
...And That's Just One Problem
(1998)
Dark Ages II“Author shows why our data is at far greater risk than we've ever imagined, and envisions a frightening future, where so much critical information is lost that civilization itself could collapse. . .”
- amazon.com capsule review
NINCH
http://www.ninch.org/forum/price.report.html
Challenge of digital preservation versusBenefits of vast access versusIssues of intellectual property rights versusLegal incentives to delete digital data
Historians – the web in context; what was it, how was it used
Institutional Records – College policies; curriculum; etc.
Lawyers and Accountants – every individual transaction
Historians – the web in context; what was it, how was it used
Institutional Records – College policies; curriculum; etc.
Lawyers and Accountants – every individual transaction
Who wants to know?Who wants to know?
Outline3Outline:Archivingthe web
What is the college website? Why do we care? - What do we want to save? - Who are we saving it for?How do we do it?
Data Extinction
• Migration• Emulation• Encapsulation• Universal Virtual Computer
Claire Tristram, “Data Extinction,” MIT Tech Review, Oct. 2002
• Migration• Emulation• Encapsulation• Universal Virtual Computer
Claire Tristram, “Data Extinction,” MIT Tech Review, Oct. 2002
BTN archiving
D-Space Web archive database
(the Candle project)
Video recording
Wayback Machine
Archiving Work in Progress
http://seattlepi.nwsource.com/dayart/20010228/226messy.jpg
DSpace
Title : A name given to the resource. Creator : An entity primarily responsible for making the content of the resource. Subject : A topic of the content of the resource. Description : An account of the content of the resource. Publisher : An entity responsible for making the resource available Contributor : An entity responsible for making contributions to the content of the resource. Date : A date of an event in the lifecycle of the resource. Type : The nature or genre of the content of the resource. Format : The physical or digital manifestation of the resource. Identifier : An unambiguous reference to the resource within a given context. Source : A Reference to a resource from which the present resource is derived. Language : A language of the intellectual content of the resource. Relation : A reference to a related resource. Coverage : The extent or scope of the content of the resource. Rights : Information about rights held in and over the resource.
http://dublincore.org/documents/dces/
The Elements
Also: Dependencies (HW & SW); Context
Archiving in theDigital Age:There’s a will, but is there a way?
Kevin Guthrie,President, JSTOR
Kevin Guthrie
Conclusion2There are still many issues left to solve, among them insuring systematic migration to media (and software) that continue to be viewable; another is the thorny issues of web pages generated on-the-fly from underlying databases.
Nevertheless, the first lesson is: Think about what it is that you need to preserve, and why. Then start asking the technical questions. Not the other way around. The solutions are not one-size-fits-all, because the problems are not.
The second lesson is: The chances are good that nothing you do now, if you do not rethink, refresh, and migrate to newer media, will last more than a few decades at best.
There are still many issues left to solve, among them insuring systematic migration to media (and software) that continue to be viewable; another is the thorny issues of web pages generated on-the-fly from underlying databases.
Nevertheless, the first lesson is: Think about what it is that you need to preserve, and why. Then start asking the technical questions. Not the other way around. The solutions are not one-size-fits-all, because the problems are not.
The second lesson is: The chances are good that nothing you do now, if you do not rethink, refresh, and migrate to newer media, will last more than a few decades at best.
ConclusionThe problem – digital data, stored on complex networks, plus rapid obsolesce of hardware, software, and storage media
The future – we will not be able to save everything; nor should we
The solution – planning, prioritizing, commitment, continuous attention
DON’T DO NOTHING
The problem – digital data, stored on complex networks, plus rapid obsolesce of hardware, software, and storage media
The future – we will not be able to save everything; nor should we
The solution – planning, prioritizing, commitment, continuous attention
DON’T DO NOTHING