micah altman associate director, harvard-mit data center institute for quantitative social science,...

18
Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing and Network Services Inter-university Consortium of Political and Social Research, University of Michigan Marc Maynard Director of Technical Services The Roper Center for Public Opinion Research, University of Connecticut Jonathan Crabtree Assistant Director for Archives and Information Technology HW Odum Institute for Research in Social Science, University of North Carolina CNI 2008 Fall Task Force Meeting 1

Upload: rose-cunningham

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Micah AltmanAssociate Director, Harvard-MIT Data Center

Institute for Quantitative Social Science, Harvard University

Bryan BeecherDirector of Computing and Network Services

Inter-university Consortium of Political and Social Research, University of Michigan

Marc MaynardDirector of Technical Services

The Roper Center for Public Opinion Research, University of Connecticut

Jonathan CrabtreeAssistant Director for Archives and Information Technology

HW Odum Institute for Research in Social Science, University of North Carolina

CNI 2008 Fall Task Force Meeting 1

Page 2: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Our StoryWho are you guys?What problem are you trying to solve?What have you done?Why do we care?

CNI 2008 Fall Task Force Meeting 2

Page 3: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Data-PASS• Partnership devoted to identifying, acquiring and preserving data at-risk of being lost to the social science research community

• Partners– ICPSR– Odum Institute– Harvard MIT Data Center

– Roper Center– National Archives

CNI 2008 Fall Task Force Meeting 3

http://flickr.com/photos/phauly/35555985/

Page 4: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Data-PASS

CNI 2008 Fall Task Force Meeting 4

Page 5: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Data-PASSLots of little files (social science data)ASCII data filesPDF technical documentation (codebooks)Millions of ‘em

Archival storageWas tapeNow disk

CNI 2008 Fall Task Force Meeting 5

Page 6: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Before

CNI 2008 Fall Task Force Meeting 6

Page 7: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

After

CNI 2008 Fall Task Force Meeting 7

Page 8: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Archival storage?

CNI 2008 Fall Task Force Meeting 8

http://failblog.org/2008/02/08/floppy-fail/

Page 9: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Archival storage?Remote disksGridsCloudsWith partners?

CNI 2008 Fall Task Force Meeting 9

Page 10: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Why roll your own?Policy-drivenAuditableAsymmetricIndependence of each location

CNI 2008 Fall Task Force Meeting 10

Page 11: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Syndicated Storage Platform (SSP)Start with LOCKSSLots of Copies Keep Stuff SafeBut used in a closed network

Private LOCKSS Network (PLN)A few of them out there

MetaArchive perhaps the best known

Biggest selling point was independence of each node in the PLN

CNI 2008 Fall Task Force Meeting 11

Page 12: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

PLNsLOCKSS is really easy to setup

PLNs are more difficultOther differences between traditional PLN and our needsOur content isn’t harvestable via HTTPOur PLN nodes are different sizesOur trust model requirement prevents a centralized authority controlling the network

CNI 2008 Fall Task Force Meeting 12

Page 13: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

SSP = Stone Soup Platform?ICPSR and Odum setup a small PLNHDMC provided a harvester and designed the schema

Odum built the ComparatorRoper is building the Invitor

CNI 2008 Fall Task Force Meeting 13

Page 14: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

PLN

CNI 2008 Fall Task Force Meeting 14

Page 15: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Schema• Nodes

– IP address– Storage commitment

• AUs– Max size– # in the PLN

• Lots more

CNI 2008 Fall Task Force Meeting 15

Page 16: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Comparator• diff for our SSP• Compares

– Contents of the LOCKSS Cache Manager [sic] – Schema

• Produces– List of differences between “what is” and “what should be”

– Feeds into another tool for “fixing the PLN”

• Machine-actionable output (XML)

CNI 2008 Fall Task Force Meeting 16

Page 17: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

Invitor• Reads the report from the Comparator• Issues requests to PLN nodes to ADD or DROP an AU– Expectation is that PLN nodes always accept an ADD if they can• An offer they cannot refuse

• Requests may be reviewed/approved by a human administrator (or not)

• USENET news technology?

CNI 2008 Fall Task Force Meeting 17

Page 18: Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing

SummaryData-PASS is a group of archives committed to preserving social science data

Exploring various technology optionsOne avenue is a custom LOCKSS deploymentNetwork schemaOAI data harvesterComparison toolNetwork update tool

CNI 2008 Fall Task Force Meeting 18