ics-forth september 20, 2004 1 reliability modelling for long term digital preservation panos...
TRANSCRIPT
![Page 1: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/1.jpg)
1ICS-FORTH September 20, 2004
Reliability Modelling for Long Term Digital Preservation
Panos Constantopoulos, Martin Doerr, Meropi Petraki
Foundation for Research and Technology - HellasInstitute of Computer Science
Heraklion, Greece May 12, 2005
Information Systems Laboratory
![Page 2: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/2.jpg)
2ICS-FORTH September 20, 2004
The CIDOC CRMOutline
Problem statement
Approach
Case studies
Conclusion
![Page 3: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/3.jpg)
3ICS-FORTH September 20, 2004
The CIDOC CRMProblem Statement
All Digital Material is vulnerable to loss
Cultural and scientific memory needs long-term preservation:
We would like to have the library of Alexandria back...
A large museum may keep and describe a million objects
It may not want to loose more than 10 objects per year
= 1% loss in 1000 years!
![Page 4: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/4.jpg)
4ICS-FORTH September 20, 2004
The CIDOC CRMProblem Statement
Risk factors:
Media decay and failure Access Component Obsolescence (format, H/W) Human and Software Errors External events
Format Obsolescence:
Best studied. Measures are standards, technology preservation, migration.
For knowledge in text form, textual databases, vector graphics, bitmap images reasonably solved with XML and extensive documentation.
![Page 5: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/5.jpg)
5ICS-FORTH September 20, 2004
The CIDOC CRMProblem Statement
Hardware Obsolescence:
Systematic, foreseeable. Reasonable Solution: carrier migration.
Human errors:
Stochastic failure. Can be reduced but not avoided. Solution: replication and control
Software errors:
Difficult to model and to foresee. Replication , multiple S/W platforms and control.
![Page 6: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/6.jpg)
6ICS-FORTH September 20, 2004
The CIDOC CRMProblem Statement
External Events:
Stochastic failure. Solution: replication and control
Media decay:
Stochastic and systematic failure. Solution: Preventive carrier migration, replication and control
![Page 7: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/7.jpg)
7ICS-FORTH September 20, 2004
The CIDOC CRMProblem Statement
Summary:
In long terms, the basic strategy is carrier migration, replication and control.
The expected life-time of information exceeds any platform and technology.
The respective risk management has hardly been addressed
“The Gksan strategy”: longest human memories known
People of the Haida and Qksan tribes in British Columbia, resident there since Ice Age, keep historical oral memories more than 10.000 years back on land-ownership by:
Distribution to multiple, selected human carriers, annual quality control, and Totem poles as mnemonic aids.
![Page 8: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/8.jpg)
8ICS-FORTH September 20, 2004
The CIDOC CRMApproach
Statistical modelling of long-term risk of data loss due to media decay and failure and external events.
Analyze risk factors of different configurations
In models for long times, complex aging effects average out. e.g. preventative replacement results in constant average failure rate. Long-term studies are simpler than short-term ones!
Extrapolation of current technology:
Optimal strategy: maintain constant failure rate at any time.
This is independent of technology = has to be reevaluated at each technology change, and to be maintained for each technology period. Random processes have no memory
![Page 9: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/9.jpg)
9ICS-FORTH September 20, 2004
The CIDOC CRMApproach
Analytical models that allow for
Dominant factor analysis
Cost/benefit analysis (future work) to achieve the politically set reliability goal.
“memoryless” Markov chains and fault tree
Evaluation with program “SHARPE”.
![Page 10: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/10.jpg)
10ICS-FORTH September 20, 2004
The CIDOC CRMCase 1: Mirror Disks
Assumptions:
Two identical disks, constant failure rate , system failure if both are destroyed
MTTF = 1/λ, Mean time to failure
MTTR =1/μ: Mean time to repair,
MTTFD = 1/θ : Mean time to failure detection.
2λ
λ
θ λ
μ
2 1 1D F
![Page 11: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/11.jpg)
11ICS-FORTH September 20, 2004
The CIDOC CRMCase 1: Mirror Disks
15 120 360 480 720 8001
10
100
1000
10000
MTTFD (in days)
MTTF
configu
ration
(in y
ears
)
3 years5 years10 years20 years
120d = 4m
360d =12m
740d= 2yrs
![Page 12: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/12.jpg)
12ICS-FORTH September 20, 2004
The CIDOC CRMCase 1: Mirror Disks
Results:
MTTF = 3yrs, MTTR = 50hrs, MTTFD = 14days :
MTTF total = 106,46 yrs
MTTFD = MTTR=0 => MTTF = !
The dominant factor is only the time to detect failure and to repair! Any quality of the disk can be compensated by faster detection and repair, in the realistic limits.
Any uncontrolled media will loose the data in the long term.
=> cost/benefit analysis to be done!
![Page 13: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/13.jpg)
13ICS-FORTH September 20, 2004
The CIDOC CRMCase 2: Mirror Disks + Backup Tape MTTF = 1/λ, Mean time to failure, MTTR =1/μ: Mean time to repair,
MTTFD = 1/θ : Mean time to failure detection, 1,2 = disk, 3 = tape.
2,1 1,1
2,0 1,0
0D,1
F
1,0D2,0D
1D,1 0,12λ1
λ1
λ1θ1
μ1
μ2
θ1
λ2
θ2μ3
2λ1
λ2
θ2
2λ1
μ3
λ1
λ1
λ2 λ2
1D,0
λ2
1D,0D
θ2
μ1
λ1
θ1λ1
θ1
μ1
μ3
![Page 14: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/14.jpg)
14ICS-FORTH September 20, 2004
The CIDOC CRM Case 2: Mirror Disks + Backup Tape
Parameter values Result
MTTFdisk= 1/λ1 = 3 χρόνια
ΜΤΤFtape=1/λ2=5 χρόνια
ΜΤΤR1=1/μ1=50 ώρες
ΜΤΤR2=1/μ2 =100 ώρες
ΜΤΤR3=1/μ3= 12 ώρες
ΜΤΤFDdisk=1/θ1=14 ημέρες
ΜΤΤFDtape=1/θ2=60 ημέρες
ΜΤΤFδιάταξης=2550 χρόνια
R(t)=0.96 (t=100 χρόνια)
Coming closer !
![Page 15: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/15.jpg)
15ICS-FORTH September 20, 2004
The CIDOC CRM Case 2: Mirror Disks + Backup Tape
Adding Fire !
At least another backup needed in a third room
without fire fire, backup in the same room
fire, backup in another room
ΜΤΤFconfig=2551 χρ. ΜΤΤFconfig=773 χρ. ΜΤΤFconfig=2375 χρ.
![Page 16: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/16.jpg)
16ICS-FORTH September 20, 2004
The CIDOC CRMCase 3: Distributed carriers
Assumptions: Data are distributed to N independent systems with mirror disk and tape each.
Question: Which percentage of my data will exist after 1000 years? (Binomial model)
Ν 50% Rk/n 90% Rk/n
2 1 / 2 0.8597 2/2 0.3911
10 5/10 0.8731 9/10 0.063
100 50/100 0.9960 90/100 0
500 250/500 1
1000 500/1000 1
![Page 17: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/17.jpg)
17ICS-FORTH September 20, 2004
The CIDOC CRMCase 3: Distributed carriers
If all data are on one system:
High probability to preserve all data
High probability to loose all data
If all data are on many individual systems:
Some data will be lost by sure
Some data will survive by sure
Conclusion:
Optimal strategy may combine both modes!
![Page 18: ICS-FORTH September 20, 2004 1 Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Foundation](https://reader036.vdocuments.us/reader036/viewer/2022082820/56649eb15503460f94bb6d0a/html5/thumbnails/18.jpg)
18ICS-FORTH September 20, 2004
The CIDOC CRMConclusions
Some results seem not to be very intuitive:
The influence of failure detection and repair time
The effect of data distribution
The effect of external events
Long-term risk modeling allows for simplifications, that allow for analytical models.
Analytical models can effectively turned into decision support tools and combined with cost/benefit models
Future work: A practical decision support tool