the natural history open data challenge @ ota16

8

Upload: margaret-gold

Post on 12-Apr-2017

245 views

Category:

Technology


1 download

TRANSCRIPT

Diverse collections spanning space and time

Challenge of scale:>80 million specimens!

Challenge of speed (digitising within a lifetime)

Ambitious digitisation programme (DCP)

Institutional policy “open by default”

Higher ClassificationScientific name: Thymelicus lineola (Ochsenheimer, 1808)Family: Hesperiidae

LocationLocality: Tilbury DocksState/province: EnglandCountry: United KingdomContinent: EuropeDecimal latitude: 51.4605Decimal longitude: 0.3449

Collection EventRecorded by: T G. Howarth; HowarthCollection date: 31 / 07 / 1938

Most iCollections specimens will have ~30 fields containing data (over 100 different fields across all collections)

There are some issues… (where is H. M. Edelsten!?)

http://data.nhm.ac.uk

Complete NHM Specimen Dataset (3.3M records)

http://bit.ly/2goEpBB

GitHub Gist – NHM API:

http://bit.ly/2gtukRv

iCollections Datasets

http://bit.ly/2gGZub5

Even more data…

http://www.gbif.org/occurrence

Potential Challenges

How did collecting effort change over time?

Who was the collector who collected from the most distinct localities? – can we make a ranking table and mash up data with Wikipedia or other sources?

What can we learn about the collectors – who travelled the furthest or most regularly?

Were most specimens collected in rural areas? Is there collection bias in particular counties?

How can we make the data more attractive to difference audiences?

How could we display the data in more engaging or informative ways?

Complete NHM Specimen Dataset (3.3M records)

http://bit.ly/2goEpBB

GitHub Gist – NHM API:

http://bit.ly/2gtukRv

iCollections Datasets

http://bit.ly/2gGZub5

Even more data…

http://www.gbif.org/occurrence