managing large and complex data sets
DESCRIPTION
Presentation given by Catherine Hardman of the Archaeology Data Service in York.The presentation was given at the 'Managing Archaeology Data' event on Monday 7th March 2011 at the University of Glasgow.TRANSCRIPT
![Page 1: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/1.jpg)
Managing large and complex data sets:
… THE CHALLENGES OF ARCHIVING AND ONLINE DELIVERY
CATHERINE HARDMAN
![Page 2: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/2.jpg)
My lithics report here, on floppy disc
The problem….in 1996
![Page 3: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/3.jpg)
The Archaeology Data Service:•set up in 1996 •one of five AHDS subject centres•based within the University of York
Funding:•initially received funding from
•Arts and Humanities Research Council (AHRC)
•Joint Information Systems Committee (JISC)•Presently receives core funding from AHRC alongside cross-sectoral, project-based funding.
The ADS: some ancient history
![Page 4: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/4.jpg)
Our remit:
“To support research, learning and teaching with high quality and dependable digital resources.”
In practice this means three key things:
•That ADS collect and preserve datasets•That we allow full, easy and free access to these•And that we additionally provide guidance and support to data creators
What do we do?
![Page 5: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/5.jpg)
No need for digital preservation
Domesday Book: Publisher: William of
Normandy (1086) – still readable
![Page 6: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/6.jpg)
Where’s preservation when you need it?
Domesday Disc: Publisher: BBC (1986) –nearly lost
![Page 7: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/7.jpg)
Why is it important?
![Page 8: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/8.jpg)
Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and Stafford, S.G. 1997. Nongeospatial Metadata for the Ecological Sciences. Ecological Applications. 7: 330-342.
What’s the problem? Information Entropy
![Page 9: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/9.jpg)
The scale of the problem in the 1990s
None47%
Humidity control
8%
Heat control
7%
Fire-resistant container
23%
Anti-magnetic
10%
Anti-static
protected5%
Strategies for protecting physical media
Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999
![Page 10: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/10.jpg)
Protecting Physical media
…never the twain
![Page 11: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/11.jpg)
The scale of the problem in the 1990s
Hard disc28%
Tape22%
CD-ROM14%
Netw ork13%
Floppy disc23%
The popularity of storage options
Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999
![Page 12: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/12.jpg)
8" Floppy
3.5" Floppy
5.25" Floppy
12" Optical Disk
5.25" Optical Disk
CD-ROM
Sparq Disk Cartridge
Zip Disk
Click!
DVD-ROM
Jaz Disk
Floptical Disk
Punch Tape
Rectangular Hole Punch Card
IBM 3480
DLT Tape
DG90M Tape
DC4_120
8mmD-eight
QIC DC600
G2000 Tape
4mm Tape
Ditto Max
9-Track Reel
Cassette tape
Memory Stick
MultiMedia Card SD Memory Card
xD Picture Card
Smart Media
CompactFlash
Travan
![Page 13: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/13.jpg)
Why is it all so difficult?
Deterioration of the storage medium Obsolescence of the storage mediumFailure to document the format adequatelyObsolescence of the softwareObsolescence of the hardware Long-term management
![Page 14: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/14.jpg)
How do we do it?Open Archival Information System (OAIS)
![Page 15: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/15.jpg)
But that’s people…
![Page 16: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/16.jpg)
Migration based approach & controlled ingest
Aim to connect with data
producers early on in their project
lifecycles to ensure that preservation
planning is a key consideration
during the project rather than an afterthought.
![Page 17: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/17.jpg)
17
Guides to help you do all that.
![Page 18: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/18.jpg)
It hasn’t really got much easier
The goal posts keep moving!
![Page 19: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/19.jpg)
The size of digital archives held by different types of The size of digital archives held by different types of archaeological bodies archaeological bodies
0
10
20
30
40
1-5Mb 5-10Mb 10-50Mb 50-100Mb 100-1,000Mb
>1Gb
Num
ber
of a
rchi
ving
bod
ies
National body
Local gov. archaeology
Field archaeology
HEI
Museum
Consultancy
http://ads.ahds.ac.uk/
Archaeology Data Service
![Page 20: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/20.jpg)
Big Data ProjectRoughly how much data would be generated by a single project?
Average project size (estimated)
19%
3%
3%
25%
50%
over 200GB
150 - 200GB
100 - 150GB
50 - 100GB
under 50GB
![Page 21: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/21.jpg)
Which of these data collection techniques do you carry out?
Technologies used
12%
4%
4%
3%
8%
1%
3%
11%
9%
9%
7%
14%
3%
12%
3D Laser Scanning
Sidescan Sonar
Multibeam Scanning
Single Beam Scanning
Geophysics
Acoustic Tracking
Sub bottom profiling
Geographic (eg GIS)
Lidar
Digital Video
Video Movie Clips
Still Images
CAD (2D or 3D)
Other
![Page 22: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/22.jpg)
What are the main software packages you use ?
Software (noted more than once)
4%10%
12%
4%
4%
4%
6%6%10%
4%
4%
4%
8%
6%
4%
4%4%
3D Studio Max
ArcGIS
AutoCAD
BAE SOCETSET
CODA
ENVI / IDL
ERDAS Imagine
Golden Software Surfer
Leica Cyclone
MicroStation
Pointools
Polyworks
RapidForm
TerraScan
Trimble Realworks
Custom software
MySQL
![Page 23: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/23.jpg)
Do you have an archiving policy for the data sets / types in question?
Archival policy?
48%
27%
25%
Yes
No
No response
![Page 24: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/24.jpg)
back-up
![Page 25: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/25.jpg)
When you start a new project …would you consider using existing datasets?
Yes, 28
Not answered, 2
Yes
Not answered
![Page 26: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/26.jpg)
This is the opportunity!
![Page 27: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/27.jpg)
![Page 28: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/28.jpg)
Making the inaccessible accessible
to make available unpublished fieldwork reports in an easily retrievable fashion. There are currently 8018 reports available and this number is increasing steadily through the OASIS project in England and Scotland.
![Page 29: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/29.jpg)
…between publication and archives …
Blurring the distinction …
![Page 30: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/30.jpg)
Making the LEAP…
![Page 31: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/31.jpg)
![Page 32: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/32.jpg)
What does that mean for you?
Plan for reusePlan for reusePlan for reusePlan for reuse
![Page 33: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/33.jpg)
How do you do that?
Include a data management plan (use the DCCs)Order your dataFile naming strategyVersion controlBack-up (in the field)Consider your file formatsDissemination plan (and it’s longevity)What does the long term look like?Discuss requirements with an archive
![Page 34: Managing large and complex data sets](https://reader035.vdocuments.us/reader035/viewer/2022070317/55635500d8b42a90698b5889/html5/thumbnails/34.jpg)
We’re here to help
http://archaeologydataservice.ac.uk/
http://guides.archaeologydataservice.ac.uk/