storage and preservation
DESCRIPTION
Storage and Preservation. Week 3 LBSC 671 Creating Information Infrastructures. Physical Storage. Segregate by: Users (e.g., Chemistry library) Type (e.g., audiovisual materials) Usage frequency (e.g., offsite storage) Size (e.g., folios) Arrange in a way that facilitates access - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/1.jpg)
Storage and Preservation
Week 3LBSC 671
Creating Information Infrastructures
![Page 2: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/2.jpg)
Physical Storage• Segregate by:
– Users (e.g., Chemistry library)– Type (e.g., audiovisual materials)– Usage frequency (e.g., offsite storage)– Size (e.g., folios)
• Arrange in a way that facilitates access– Topical shelf order (e.g., Dewey Decimal System)
• Foster preservation– Environment (temperature, humidity, light)– Access controls (closed stacks, gloves, …)
![Page 3: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/3.jpg)
High-Density Shelving
http://www.kmhsystems.com/high-density-storage.html
![Page 4: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/4.jpg)
Compact Storage Robot
Kyushu University, Japan
![Page 5: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/5.jpg)
Closed Stacks
University of Education, Ghana
![Page 6: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/6.jpg)
Preservation
c. 3000 BCE
![Page 7: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/7.jpg)
Organic Decay• Rag paper: 300-2,000 years• Acidic paper: 25-50 years• Acetate film: 40 years• Nitrate film: 40-1-00 years
Image Permanence Institute, 2012
ISO 11799:2003
![Page 8: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/8.jpg)
Threats to Physical Collections• Organic decay• Intentional actions
– Pilferage and vandalism– Official acts
• Disasters– Natural disasters
• Flood, tornado, earthquake, …– Accidents
• Fire, sprinkler malfunction, … – Armed conflict
![Page 9: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/9.jpg)
Disaster Mitigation Examples• Flood:
– Know where you can vacuum freeze dry• Decide quickly what to freeze• Air dry or dehumidify the rest
– Immerse wet or muddy tape or film in water• Then air dry or dehumidify
– Replace wet archival boxes immediately• Fire:
– Handle as fragile, wrap in clean paper– Pack between cardboard to stiffen
http://matrix.msu.edu/~disaster/balcplan.php
![Page 10: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/10.jpg)
Digital Preservation• Preservation of born-digital materials
– Preserving appearance and interpretability– Preserving behavior
• Digitization for preservation– Scanning (of paper, of microfilm)– Audio digitization– Video digitization– Volumetric imaging
• Digital holography, computational tomography
![Page 11: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/11.jpg)
Binary Data RepresentationExample: American Standard Code for Information Interchange (ASCII)
01000001 = A01000010 = B01000011 = C01000100 = D01000101 = E01000110 = F01000111 = G01001000 = H01001001 = I01001010 = J01001011 = K01001100 = L01001101 = M01001110 = N01001111 = O01010000 = P01010001 = Q…
01100001 = a01100010 = b01100011 = c01100100 = d01100101 = e01100110 = f01100111 = g01101000 = h01101001 = i01101010 = j01101011 = k01101100 = l01101101 = m01101110 = n01101111 = o01110000 = p01110001 = q…
![Page 12: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/12.jpg)
Units of Size
Unit Abbreviation Size (bytes)bit b 1/8byte B 1kilobyte KB 210 = 1024megabyte MB 220 = 1,048,576gigabyte GB 230 = 1,073,741,824terabyte TB 240 = 1,099,511,627,776petabyte PB 250 = 1,125,899,906,842,624
![Page 13: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/13.jpg)
![Page 14: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/14.jpg)
![Page 15: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/15.jpg)
Georges Seurat, A Sunday Afternoon on the Island of La Grande Jatte
Nothing new…
![Page 16: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/16.jpg)
Basic Audio Coding
• Sample at twice the highest frequency– 8 bits or 16 bits per sample
• Speech (0-4 kHz) requires 8 kB/s– Standard telephone channel (1-byte samples)
• Music (0-22 kHz) requires 172 kB/s– Standard for CD-quality audio (2-byte samples)
Sampler
![Page 17: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/17.jpg)
MPEG Encoding
Frame Types
• • • • • •
I1 B1 B2 B3 P1 B4 B5 B6 P2 B7 B8 B9 I2
I Intra Encode complete image, similar to JPEGP Forward Predicted Motion relative to previous I and P’sB Backward Predicted Motion relative to previous & future I’s & P’s
![Page 18: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/18.jpg)
Volumetric Imaging
![Page 19: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/19.jpg)
Rotating Storage Media
• Fixed magnetic disk– Hard drives
• Removable magnetic disk– Floppy disk
• Removable optical disc– CD, DVD, Blu-ray
![Page 20: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/20.jpg)
Magnetic Disk (Hard Drive)
Shelly, Cashman and Vermatt, Discovering Computers, 2004
![Page 21: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/21.jpg)
Optical Disc
![Page 22: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/22.jpg)
Optical Disk Technologies
near infared red violet
![Page 23: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/23.jpg)
Magnetic Tape
• Tapes store data sequentially– Fast transfer, but no practical “random access”
• Used only for low-use storage– Disaster recovery, offline storage
![Page 24: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/24.jpg)
Solid-State Memory• ROM
– Does not require power to retain content– Used for “Basic Input/Output System” (BIOS)
• RAM– Cheap and fast, but works only while power is on
• Flash memory (Solid State Disk, memory sticks)– Much faster “random access” than rotating disk
• ~10,000 times faster, but ~10 times more expensive per bit– Limited number of lifetime write operations (~5,000)
• But Zipf’s law permits “wear leveling”
![Page 25: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/25.jpg)
Threats to Digital Collections• Business decisions
– Termination of service– Termination of infrastructure support
• e.g., reading Amiga files, displaying Word Perfect• Malfunctions
– Hardware failure, operator error, software bugs, …• Vandalism (hackers)• Disasters
– Physical risks to servers– Electromagnetic pulse
![Page 26: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/26.jpg)
![Page 27: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/27.jpg)
http://www.crashplan.com/medialifespan/
![Page 28: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/28.jpg)
Media Migration
• What format should old tapes be converted to?– Newer tape– Rotating media– Solid state disks
• How often must we “refresh” these media?
![Page 29: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/29.jpg)
Risk Management
• Redundancy drives down uncorrelated risk– Let p be the probability of loss of one copy– Then p*p*p is the chance of loss at 3 sites– Example: if p=0.01 then p*p*p=0.000001
• Two fundamental problems:– Unanticipated correlation
• For example, an operating system bug– Underestimated “black swan” probabilities
![Page 30: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/30.jpg)
Layered Defense
• Good storage practices– Offline: Media migration– Online: uninterruptable power, RAID, backups
• Distributed storage– Storage Resource Broker (SRB), LOCKSS, …
• Air gaps– Interrupt unexpected correlation
![Page 31: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/31.jpg)
Source: Wikipedia
Data Centers
![Page 32: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/32.jpg)
Shared Data Center Locations
http://www.datacentermap.com/usa/datacenters.html
![Page 33: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/33.jpg)
Data Center Electricity Use (USA)2010
Jonathan Koomey,Analytics Press, 2010
![Page 34: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/34.jpg)
Digital Federal Depository Library
http://lockss-usdocs.stanford.edu
![Page 35: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/35.jpg)
LOCKSS Distributed Repair
![Page 36: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/36.jpg)
ITHAKA
• JSTOR digitization– Back runs of journals– Recently expanded to books
• Portico preservation– Centralized management, originally for journals
• Release triggers: discontinuation, loss of access– Also service for books and datasets
![Page 37: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/37.jpg)
HathiTrust• Centralized repository for digitized books
– Google Books digitization (via owning libraries)– Microsoft book search (ran from 2006-2008)– Internet Archive
• Million book project, project Gutenberg, contributions, …– Cooperative digitization
6,549,680 Total volumes3,798,116 Book titles 153,311 Serial titles1,300,896 Public Domain
As of August 13, 2010
![Page 38: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/38.jpg)
Jeremy York, IFLA 2010
![Page 39: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/39.jpg)
Indiana University Digitization
![Page 40: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/40.jpg)
Preserving Behavior• Word processors
– Formatting, track changes, undo deleted text• Spreadsheets
– Formulas, visualizations• Databases
– Queries, forms, derived values• Computer-Assisted Design (CAD)
– Display, modification, manufacturing• Software
– Simulation, games, embedded systems, …
![Page 41: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/41.jpg)
Behavior Preservation Strategies
• Format migration– For example, convert Word Perfect to PDF
• Emulation– Allows running old software on newer systems
![Page 42: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/42.jpg)
http://www.ibiblio.org/apollo/
Apollo Guidance Computer Emulation
![Page 43: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/43.jpg)
An Integrated Strategy• Delay decay of organic materials to buy time
• Balance quality and scale– For future access, quantity has a quality all its own
• Rescue high-value at-risk collections
• Design diversity into the process– Technologies, risk exposure, institutions
• Adequately resource the process
![Page 44: Storage and Preservation](https://reader035.vdocuments.us/reader035/viewer/2022062315/568164a9550346895dd6a55a/html5/thumbnails/44.jpg)
Before You Go!
• On a sheet of paper (no names), answer the following question:
What was the muddiest point in today’s class?