a looming crisis: maintaining access to electronic research products daphne fautin university of...
Post on 20-Dec-2015
212 views
TRANSCRIPT
A LOOMING CRISIS: MAINTAINING ACCESS
TO ELECTRONIC RESEARCH PRODUCTS
A LOOMING CRISIS: MAINTAINING ACCESS
TO ELECTRONIC RESEARCH PRODUCTS
Daphne FautinUniversity of Kansas
Gail KampmeierIllinois Natural History Survey
Daphne FautinUniversity of Kansas
Gail KampmeierIllinois Natural History Survey
Electronic PEET ProductsElectronic PEET Products
Project web pages Images Literature - publications,
reports, field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Project web pages Images Literature - publications,
reports, field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
What Happens…What Happens…When project funding ceasesWhen project members
disperseWhen PIs retire, change
research topics, move, or …
When project funding ceasesWhen project members
disperseWhen PIs retire, change
research topics, move, or …
Who will champion access to the Who will champion access to the electronic resources produced by electronic resources produced by PEETs, AToLs, BSIs, PBIs, …?PEETs, AToLs, BSIs, PBIs, …?
Fate of Our Electronic Resources
Fate of Our Electronic Resources
Who should be responsible? Institutions originally receiving
project funding? Funding agencies? Those creating the resources? Professional societies?
Who should be responsible? Institutions originally receiving
project funding? Funding agencies? Those creating the resources? Professional societies?
IssuesIssues
Who owns the products? (not an issue only for electronic media)
How can the products continue to be served?
How should the products best be preserved?
Who owns the products? (not an issue only for electronic media)
How can the products continue to be served?
How should the products best be preserved?
This is a global issueThis is a global issue
Among efforts to grapple with it is the 2005 National Science Board Report 05-40
Among efforts to grapple with it is the 2005 National Science Board Report 05-40
www.nsf.gov/pubs/2005/nsb0540
(NPR this morning on electronic art and art museums)
IssuesIssues
Who owns the products? (not an issue only for electronic media)
How can the products continue to be served?
How should the products best be preserved?
Who owns the products? (not an issue only for electronic media)
How can the products continue to be served?
How should the products best be preserved?
LIBRARIESLIBRARIES have historically been the repository of scholarly output (= publications)
MUSEUMSMUSEUMS have been custodians of specimens
Some other physical objects end up in TRADITIONAL TRADITIONAL ARCHIVESARCHIVES
ArchivingArchiving
ArchivingArchivingWHICH products should be preservedHOW should they be preservedWHERE should they be preserved
Locally, supercomputers, electronic archives, etc.
WHICH products should be preservedHOW should they be preservedWHERE should they be preserved
Locally, supercomputers, electronic archives, etc.
Metadata: retrieval requires excellent documentationSoftware versions: a practical challenge, not a technical one
(remember Gene Stoermer!)
Electronic PEET ProductsElectronic PEET Products
Project web pages Images Literature - publications, reports,
field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Project web pages Images Literature - publications, reports,
field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Internet ArchiveInternet Archive
Mr. Peabody’s WayBack Machine…
Mr. Peabody’s WayBack Machine…
Caveats: Pages Not Archived
Caveats: Pages Not Archived
Anything requiring interaction with the server Forms, database-generated content Javascript not resolving in true URLs Server-side image maps
Pages with robot exclusion headers (robots.txt)
Orphan pages (no links into) Unknown sites
Anything requiring interaction with the server Forms, database-generated content Javascript not resolving in true URLs Server-side image maps
Pages with robot exclusion headers (robots.txt)
Orphan pages (no links into) Unknown sites
Electronic PEET ProductsElectronic PEET Products
Project web pages Images Literature - publications, reports,
field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Project web pages Images Literature - publications, reports,
field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
ImagesImages Scanned
Resolution Format standard: TIF?
Produced digitally Format evolution of production software if
not saved as flat TIF
Scanned Resolution Format standard: TIF?
Produced digitally Format evolution of production software if
not saved as flat TIF
Electronic PEET ProductsElectronic PEET Products
Project web pages Images Literature - publications,
reports, field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Project web pages Images Literature - publications,
reports, field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Literature, Reports, Field Journals...
Literature, Reports, Field Journals...
Issues similar to images Format evolution Media migration Metadata for retrieval OCR for finding individual items
Solutions are library-like, requiring recurring infusions of $$$ Personnel
Migrate as formats evolve, versions change Time Digital lifetime determination
Issues similar to images Format evolution Media migration Metadata for retrieval OCR for finding individual items
Solutions are library-like, requiring recurring infusions of $$$ Personnel
Migrate as formats evolve, versions change Time Digital lifetime determination
Literature, Reports, Field Journals...
Literature, Reports, Field Journals...
Electronic PEET ProductsElectronic PEET Products
Project web pages Images Literature - publications, reports,
field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Project web pages Images Literature - publications, reports,
field journals Gene sequences and other
molecular data Character matrices & keys Databases - data & structure
Gene sequence
s and other
molecular data
Gene sequence
s and other
molecular data
A central archive –a library!
Maintained by a Federal agency
Electronic PEET ProductsElectronic PEET Products
Project web pages Images Literature - publications, reports,
field journals Gene sequences
Project web pages Images Literature - publications, reports,
field journals Gene sequences Character matrices & keys Databases - data & structure
Character Matrices & KeysCharacter Matrices & Keys DELTA/INTKEY (example of standard in
danger of format evolution) Lucid (now in Version 3.4) MacClade PAUP Hennig86 MorphoBank Others…
DELTA/INTKEY (example of standard in danger of format evolution)
Lucid (now in Version 3.4) MacClade PAUP Hennig86 MorphoBank Others…
Relational Databases: Content & Structure
Relational Databases: Content & Structure
Archiving Metadata essential for discovery Convert to flat files
Software-independent format (e.g. comma delimited)
Lose relational structure – but relationships can be coded
Archiving Metadata essential for discovery Convert to flat files
Software-independent format (e.g. comma delimited)
Lose relational structure – but relationships can be coded
Relational Databases: Content & Structure
Relational Databases: Content & Structure
Continued service Version changes High maintenance (some require
professional DBA) One size generally does not fit all – makes
it difficult to pass on Maintain also “front end” (required for
queries) scripting language: e.g. ColdFusion, PHP
Continued service Version changes High maintenance (some require
professional DBA) One size generally does not fit all – makes
it difficult to pass on Maintain also “front end” (required for
queries) scripting language: e.g. ColdFusion, PHP
a SILVER BULLETor
SILVER BUCKSHOT?
a SILVER BULLETor
SILVER BUCKSHOT?
Concentration of resources vs. discovery of new methods by
diversification
Concentration of resources vs. discovery of new methods by
diversification
TO MAINTAIN ACCESS TO ELECTRONIC RESEARCH
PRODUCTS
Demonstrate value / usefulness
Demonstrate value / usefulness
Hits / citationsCan be problematic for taxonomy / systematics
Become part of large entity
the data portal for and legacy of
www.iobis.org
the main provider of marine data to
(currently the third-largest data provider with nearly 10 million
records)
www.gbif.org
Maintaining functionalityMaintaining functionality
A distributed resource
PORTAL
CONTRIBUTORS
OBISGBIFFishBase Consortium
IndividualsInstitutions
LIBRARIES have been custodiansof scholarly knowledge
DIGITAL LIBRARIES
www.nsf.gov/pubs/2005/nsb0540
The Foundation should actively engage with the community to ensure that community policies and priorities are established and then updated in a timely way.
The Foundation should actively engage with the community to ensure that community policies and priorities are established and then updated in a timely way.
Develop a clear technical and financial strategy; create policy for key issues
consistent with the technical and financial strategy.
Develop a clear technical and financial strategy; create policy for key issues
consistent with the technical and financial strategy.
Recurring Challenges Recurring Challenges $$$ Personnel Time Format evolution / back
compatibility Metadata – complete, appropriate
(controlled vocabulary) Digital lifetime - determining
what, if anything, should be truly discarded
$$$ Personnel Time Format evolution / back
compatibility Metadata – complete, appropriate
(controlled vocabulary) Digital lifetime - determining
what, if anything, should be truly discarded
IT’S UP TO US