the natural history open data challenge @ ota16
TRANSCRIPT
Diverse collections spanning space and time
Challenge of scale:>80 million specimens!
Challenge of speed (digitising within a lifetime)
Ambitious digitisation programme (DCP)
Institutional policy “open by default”
Higher ClassificationScientific name: Thymelicus lineola (Ochsenheimer, 1808)Family: Hesperiidae
LocationLocality: Tilbury DocksState/province: EnglandCountry: United KingdomContinent: EuropeDecimal latitude: 51.4605Decimal longitude: 0.3449
Collection EventRecorded by: T G. Howarth; HowarthCollection date: 31 / 07 / 1938
Most iCollections specimens will have ~30 fields containing data (over 100 different fields across all collections)
There are some issues… (where is H. M. Edelsten!?)
Complete NHM Specimen Dataset (3.3M records)
http://bit.ly/2goEpBB
GitHub Gist – NHM API:
http://bit.ly/2gtukRv
iCollections Datasets
http://bit.ly/2gGZub5
Even more data…
http://www.gbif.org/occurrence
Potential Challenges
How did collecting effort change over time?
Who was the collector who collected from the most distinct localities? – can we make a ranking table and mash up data with Wikipedia or other sources?
What can we learn about the collectors – who travelled the furthest or most regularly?
Were most specimens collected in rural areas? Is there collection bias in particular counties?
How can we make the data more attractive to difference audiences?
How could we display the data in more engaging or informative ways?