unsupervised creation of small world networks for the preservation of digital objects charles l....

20
Unsupervised Creation of Small World Networks for the Preservation of Digital Objects Charles L. Cartledge Michael L. Nelson Old Dominion University Department of Computer Science Norfolk, Virginia

Upload: sheila-mccormick

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Unsupervised Creation of Small World Networks for the

Preservation of Digital Objects

Charles L. Cartledge

Michael L. Nelson

Old Dominion University

Department of Computer Science

Norfolk, Virginia

SP145 JCDL Short Paper Presentation 2

Order of Presentation

• Technology enablers• Constraints• Simple rules for Complex Behavior• Simulation approach• Simulation results• Future work

SP145 JCDL Short Paper Presentation 3

Motivation

Time

1907 2007 2107

SP145 JCDL Short Paper Presentation 4

Technology Enablers

Cost data: http://www.archivebuilders.com/whitepapers/22011p.pdf

SP145 JCDL Short Paper Presentation 5

Constraints“ … Tomorrow we could see the National

Library of Medicine abolished by Congress, Elsevier dismantled by a corporate raider, the Royal Society

declared bankrupt, or the University of Michigan Press destroyed by a meteor. All are highly unlikely, but over a long period of time unlikely events will happen. …”

(emphasis CLC)

W. Y. Arms, “Preservation of Scientific Serials: Three Current Examples,” Journal of Electronic Publishing, Dec., 1999

Expectancy data: http://www.cdc.gov/nchs/data/nvsr/nvsr57/nvsr57_14.pdf

8075

12 – 101 yrs

Picture: Patricia W. and J Douglas Perry Library, Old Dominion Universityhttp://www2.westminster-mo.edu/wc_users/homepages/staff/brownr/ClosedCollegeIndex.htm

Those that die,

do so in

avg. 23

yrs. http://www.lbl.gov/Science-Articles/Archive/ssc-and-future.htmlhttp://www.dod.mil/brac/

http://www.hq.nasa.gov/office/pao/97budget/zbr.txt

5 – 60 yrs

SP145 JCDL Short Paper Presentation 6Doctoral Consortium 6

Reynolds’s Rules for Flocking

• Collision Avoidance: avoid collisions with nearby flock mates

• Velocity Matching: attempt to match

velocity with nearby flock mates

• Flock Centering: attempt to stay close to

nearby flock mates

Images and rules:http://www.red3d.com/cwr/boids/

My interpretation

• Namespace collision avoidance

• Following others to available storage

locations

• Deleting copies of one’s self to provide room for late arrivers

SP145 JCDL Short Paper Presentation 7

Types of Graphs

Regular Random

Path length Long Short Clustering coefficient

High

Low

Small World

Shorter

Still high

(Each graph has 20 vertices and 40 edges.)

SP145 JCDL Short Paper Presentation 8

Desirable Graph Properties

SP145 JCDL Short Paper Presentation 9

Unsupervised Small World Graph Creation

• gamma = 0.0

• alpha = 0.99

• gamma = 0.7

• alpha = 0.99• 0.2 <= beta <=0.66• gamma < 0.6

CC is shown as dark lines

L is shown as light lines

SP145 JCDL Short Paper Presentation 10

Phases/ActivitiesCreation

(Human or archivist activities)

Wandering(Autonomous activities)

Connecting(Autonomous activities)

Flocking(Autonomous activities)

SP145 JCDL Short Paper Presentation 11

Creation

Any DO

SP145 JCDL Short Paper Presentation 12

Wandering

A B

D C

Who are you connected to?Connected to:

<Nil>W

ho are

you

conn

ected

to?

Con

nected

to:A

Who are you

connected to?

Connected to:BW

ho are you

connect

ed to

?

Connected

to:

A

Con

nected

to:B

, C

Wh

o are you

con

nected

to?

Connected to:A

Who are you connected to?

SP145 JCDL Short Paper Presentation 13

Connecting

A B

D C

Possible

connection

Connection

NOT establish

edP

ossi

ble

co

nn

ecti

onC

onn

ecti

on

esta

bli

shed

SP145 JCDL Short Paper Presentation 14

Flocking

A’A’’

A B

D C

A’A’’

C’

C’’

D’

D’’

SP145 JCDL Short Paper Presentation 15

Typical Simulation Parameters• alpha = 0.5

• beta = 0.6

• gamma = 0.1

• Number of DOs = 1000

• Number of hosts = 1000

• Min number desired replicas = 3

• Max number desired replicas = 10

• Max number of replicas per host = 20

SP145 JCDL Short Paper Presentation 16

Simulation Results and Analysis

Future work

• Test the autonomous graphs for resilience to error and attack

• Test what happens when a graph becomes disconnected

• Test what happens when a disconnected graph becomes re-connected

SP145 JCDL Short Paper Presentation 17

SP145 JCDL Short Paper Presentation 18

Conclusions

• We have shown that Digital Objects can autonomously create small world graphs based on locally gleaned data

• These graphs can be used for long term preservation

• We intend to study these graphs focusing on their tolerance to isolated and widespread failures

SP145 JCDL Short Paper Presentation 19

And that concludes my presentation.

SP145 JCDL Short Paper Presentation 20

Backup Information

• Equations for Average Path Length and Clustering Coefficients