how to build your own dark archive (in your spare time) priscilla caplan fcla
TRANSCRIPT
![Page 1: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/1.jpg)
How to build your own Dark Archive (in your spare time)
Priscilla CaplanFCLA
![Page 2: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/2.jpg)
![Page 3: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/3.jpg)
Topics
• History: What we thought we were going to do
• Geography: Where theory meets reality
• Horticulture: Some thorny details
![Page 4: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/4.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage
• 3-year project with help from IMLS
• Focus on data for cost analysis
• Treatment based on Action Plans
• Limit ingest to formats with Action Plan
• Canonicalization & forward format migration
• Make tools available as Open Source
![Page 5: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/5.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage
• ?-year project with help from IMLS
• Focus on data for cost analysis
• Treatment based on Action Plans
• Limit ingest to formats with Action Plan
• Canonicalization & forward format migration
• Make tools available as Open Source
![Page 6: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/6.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage
• ?-year project with help from IMLS
• Focus on designing DAITSS
• Treatment based on Action Plans
• Limit ingest to formats with Action Plan
• Canonicalization & forward format migration
• Make tools available as Open Source
![Page 7: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/7.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and
Background Reports• Limit ingest to formats with Action Plan• Canonicalization & forward format migration• Make tools available as Open Source
![Page 8: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/8.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and
Background Reports• Unlimited ingest; two preservation levels• Canonicalization & forward format migration• Make tools available as Open Source
![Page 9: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/9.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and
Background Reports• Unlimited ingest; two preservation levels• Normalization, forward migration, bit
preservation of original• Make tools available as Open Source
![Page 10: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/10.jpg)
FCLA Digital Archive Plan
• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and
Background Reports• Unlimited ingest; two preservation levels• Normalization, forward migration, bit
preservation of original• Make DAITSS available as Open Source
![Page 11: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/11.jpg)
Theory 1: Preservation Strategies
![Page 12: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/12.jpg)
Maintain original
technology
Preserve Technology
OBJECTIVE
Preserve Objects
Spec
ific
APPLI
CABIL
ITY
Gen
eral
ProgrammableChips
Emulation
Viewer
Re-engineerSoftware
VirtualMachine
UniversalVirtual
Computer
VersionMigration
FormatStandardization
Rosetta StoneTranslation
Typed ObjectConversion
PersistentArchives
ObjectI nterchange
Format
Source: Thibodeau, 2002.
![Page 13: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/13.jpg)
Mass Migration
B
P1
A
B
P2
C
C
![Page 14: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/14.jpg)
Migration On Request
C
BA
A
B C
P1
P2
P3
![Page 15: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/15.jpg)
Mass Migration Or MOR
C
BA
A
B C
P1
P2
P3
![Page 16: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/16.jpg)
Mass Migration Or MOR + Normalization
BA
N
P1
NNNN
NNNNMP2
![Page 17: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/17.jpg)
Theory 2: OAIS
4-1
.2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
![Page 18: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/18.jpg)
Formal OAIS Compliance
“A conforming OAIS archive...
• … shall support the model of information described in 2.2”
• … shall fulfill the responsibilities listed in 3.1”
![Page 19: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/19.jpg)
OAIS Information Model
Content InformationPreservation DescriptiveInformation
Contentdata
object
RepresentationInformation
Context Info
Reference Info
Provenance Info
Fixity Info
![Page 20: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/20.jpg)
Responsibilities in 3.1
![Page 21: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/21.jpg)
FCLA’s OAIS Compliance
• Formal agreements with “Producers”• Documented SIP, DIP, AIP• Metadata stored redundantly with content data
objects• Retaining both original and migrated AIPs• No content data objects altered in repository• All representation info ends in specification library• Clear separation of functions (4.1)
![Page 22: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/22.jpg)
DAITSS Functional Architecture
IngestSIP
AIP
Storagemanagement
Dissem-ination DIP
Reporting
MgmtDB
![Page 23: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/23.jpg)
Ingest Functions
• METS validation and metadata extraction
• File format identification and validation
• Extraction of technical metadata
• Harvesting of external files
• Normalization and Forward Migration
• AIP creation
• Storage update
![Page 24: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/24.jpg)
What’s a (S)(A)(D)IP anyway?
XML
PDF AVI
SIP
![Page 25: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/25.jpg)
XML
PDF AVI
SIP
XML
XML
XML
XML
XML
XML
TIFF
TIFF
TIFF
Database
AIP
![Page 26: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/26.jpg)
Theory 3: Risk Management
![Page 27: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/27.jpg)
Formats
• Risk of format obsolescence
• Risk of loss in migration
• Action Plans and Background Reports– whether to normalize– long-term strategy and short-term actions– when to revisit
![Page 28: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/28.jpg)
![Page 29: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/29.jpg)
Background Reports
• Format description• Pointer to
specification • How to recognize• History and duration• Openness,
maintenance body• Platform support
• Legal issues• Perceived popularity• Limitations• Related specifications• Conclusions• ALL GOOD THINGS
FOR A GLOBAL DIGITAL FORMATS REGISTRY!
![Page 30: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/30.jpg)
TANSTAASF
• There ain’t no such thing as a simple format– XML?
• Extension technologies
• External references (DTDs, entity references, Schema, external files, stylesheets, …)
– ASCII?• No way to indicate character encoding
![Page 31: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/31.jpg)
Redundancy
• Content:– multiple independently written masters– routine normalization– bit preservation of original– retention of intermediate versions
• Integrity: SHA-1 and MD5 checksums• Metadata: in XML with content and in
RDBMS
![Page 32: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/32.jpg)
Metadata Redundancy
• How to store all metadata pertaining to an object with the object?
• No existing / suitable METS extension schema
• Direct map to DAITSS tables– elements for each table– sub-elements for each column
![Page 33: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/33.jpg)
![Page 34: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/34.jpg)
Theory 4: File formats
![Page 35: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/35.jpg)
Preferred file formats
• Pass fidelity test
• Pass “future” test– Well documented, well supported– Standards or de facto standards (widely used)– Without proprietary technologies e.g. codecs
• Without access inhibitors e.g. encryption
![Page 36: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/36.jpg)
Preferred file formats for FDA
• We can’t control what comes in
• Will do bit-level preservation on anything
• Will normalize to preferred format if possible
• Encourage use of preferred formats on campuses
![Page 37: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/37.jpg)
But what’s a file format anyway?
• Format profiles, e.g. GeoTIFF or XML document with DTD
• Technical characteristics adhere to bitstreams
Metadata-1
Image-1
Image-2
Metadata-2
TIFF 6.0
![Page 38: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/38.jpg)
And files can have multiple layered formats
Foo.AVI
Foo.PDF
Foo.XML
Foo.tar
Foo.tgz
![Page 39: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/39.jpg)
DAITSS Data Model
Intellectualentity
(1)
Bitstream(0..n)
Information Package
Data File (1..n)
![Page 40: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/40.jpg)
DAITSS Data File Object
X M L S G M L
M a rku p F ile T IF F F ile
D T D
T e x tF ile P D F F ile
D a ta F ile
![Page 41: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/41.jpg)
DAITSS Bitstream Object
A u d io
JP E G Im a ge T IF F Im a ge
Im a ge T e xt V id eo
B its tre am
![Page 42: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/42.jpg)
Environment
• Software (rendering, runtime, OS, driver)
• Hardware (processor, memory, video card)
• Is environment a property of file format?
• Which of many environments do you record?
• To be meaningful, must environment be arbitrarily recursive?
![Page 43: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7c5503460f94b7da49/html5/thumbnails/43.jpg)
http://www.fcla.edu/digitalArchive/[email protected]