going digital

Post on 10-May-2015

3.138 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Invited talk given at The Natural History Museum, London, 17 March 2009 (I gave a very similar talk at the Department of Zoology, University of Stockholm, 12 March 2009).

TRANSCRIPT

Going Digital

Rod Page

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

If you are not online you are invisible

(Web 1.0)

All useful information will be online

(Web 1.0)

Value is explicit and based on usage (links)

(Web 1.0)

Reputation is created…

(Web 2.0)

…not conferred by authority

(Web 2.0)

Everything will have a URL

(Web 3.0)

Yes, I’ve drunk the Kool Aid

…but I’m not alone

Social networking

Dinosaurs ban it

Scaremongers say it causes cancer

Some “get it”

Some do real work with it

#uksnow

@kzelnio Could you do me a favour: 10.1016/j.anbehav.2008.12.017

Where is the (digital) museum?

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

www.nhm.ac.uk

Zoology

This dataset is not accessible by the public. For more information please contact the

Department of Zoology.

Silo

http://www.flickr.com/photos/kenmccown/132990634/

404

GBIF

http://www.flickr.com/photos/chrisfreeland/3306689322/

Top 10 GBIF data providers

League tableMuseum GBIF data Open access

journalStaff publications online

Social Networking

3,446,016 yes Twitter, Facebook,

Youtube, etc.(searchable collections) yes (Twitter)

412,797 (planned)

(planned)

Why go digital?

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

Diverse kinds of data

Apomys datae

Apomys specimen

How do we integrate these data?

Why integrate?

Learn stuff we don’t know

• There are known knowns, things we know that we know

• There are known unknowns, things we now know we don’t know

• But there are also unknown unknowns, things we do not know we don't know

Unknown knowns

Things we know …without knowing that we know

Melissotarsus insularis

Melissotarsus insularis no hit

CASENT0107663-D01 DQ176312

Melissotarsus sp. BLF m1DQ176312

CASENT0107663-D01Melissotarsus insularis

1

Melissotarsus insularisMelissotarsus sp. BLF m1 =

No one source has all the answers

Joining the dots

Identifiers

Digital Object Identifier(DOI)

Identifies a publication

Globally unique

10.1016/j.ympev.2006.04.006

Paper

Why have DOIs?

Link rot

Refs

2006

Cites

2006

Forward Cites

2006 2009

Shoulders of giants

progress is incremental

reuse past results

Forward Cites

2006 2008

Species

Genes

data linking

data citation

http://iphylo.org/~rpage/challenge

demo

Vision

Chromis circumaurea

What should museums do?

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

Do nothing

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

The new hotness!

Don’t try this at home

• Image storage (Flickr)• Video storage (YouTube, Vimeo)• Bibliographies (Connotea, Mendeley)• Social networking (Facebook, Twitter)• Annotation (CMS, Wikis, Blogs)• Bulk storage (Amazon S3)• Bulk computing (Amazon EC2)

Make it easy

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

http://www.flickr.com/photos/scobleizer/2256358640/

http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

http://abacus.gene.ucl.ac.uk/software/paml.html

http://mrbayes.csit.fsu.edu/

http://www.tree-puzzle.de/

http://atgc.lirmm.fr/phyml/

No branding

No corporate style

No permission needed

Institution provides infrastructure…

…then gets out of the way

Top five European papers in evolutionary biology 1996-2006

1,118 – 4,512 citations

Partnerships(EOL)

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

“Dance of the initiatives”

Christine Hine

Danger of too much money

Million Dollar Page

EOL in it’s present form

sucks

Can I do science with it?

Not yet…

IntellectualProperty

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

Fear

Ignorance

Silo

http://www.flickr.com/photos/kenmccown/132990634/

AMNH Conditions

1. Except as otherwise expressly stated herein, the information, records, or images in these databases may not be reproduced, distributed, or publicly displayed, in whole or in part, without the express written permission of the American Museum of Natural History (AMNH).

2. AMNH does not grant permission for anyone to use, download, reproduce, publicly display, distribute, or reprint all or substantially all of the information, records, or images in the database.

3. Subsets of the information, records, or images in the database may be used, downloaded, reproduced, publicly displayed, distributed, or reprinted strictly for educational, scientific, scholarly, and other non-profit uses provided that AMNH is appropriately cited as the source of the information.

4. Subsets of the records from the database downloaded for use with data from other data sets must be clearly identified by the attribution “AMNH.”

5. Data are provided to individual users with the understanding that said data will not be passed on to third parties or redistributed, except with approval from AMNH.

6. …

2. AMNH does not grant permission for anyone to use, download, reproduce, publicly display, distribute, or reprint all or substantially all of the information, records, or images in the database.

Elachistocleis ovalis

http://www.flickr.com/photos/lleonebio/3328398741/

You know more than the AMNH database does!

FEATURES Location/Qualifiers source 1..2400 /organism="Elachistocleis ovalis" /organelle="mitochondrion" /mol_type="genomic DNA" /specimen_voucher="AMNH A141136" /db_xref="taxon:367647" /country="Guyana: Dubulay Ranch on the Berbice River, 200ft, 5'40'55N, 57'51'32W" misc_RNA <1..>2400 /note="contains 12S ribosomal RNA, tRNA-Val, and 16S ribosomal RNA"

DQ283405

Tens of thousands of copies all around the world

AMNH Conditions

1. Except as otherwise expressly stated herein, the information, records, or images in these databases may not be reproduced, distributed, or publicly displayed, in whole or in part, without the express written permission of the American Museum of Natural History (AMNH).

2. AMNH does not grant permission for anyone to use, download, reproduce, publicly display, distribute, or reprint all or substantially all of the information, records, or images in the database.

3. Subsets of the information, records, or images in the database may be used, downloaded, reproduced, publicly displayed, distributed, or reprinted strictly for educational, scientific, scholarly, and other non-profit uses provided that AMNH is appropriately cited as the source of the information.

4. Subsets of the records from the database downloaded for use with data from other data sets must be clearly identified by the attribution “AMNH.”

5. Data are provided to individual users with the understanding that said data will not be passed on to third parties or redistributed, except with approval from AMNH.

6. …

You are going digital whether you like it or not…

If it is on the web it will be found, and used

This is a good thing

Creative Commons

Why be open?

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

Caeciliidae

Caeciliidae

Caeciliidae

Pagellus erythrinus

Pagellus erythrinus

Pagellus erythrinus

Mannophryne trinitatis

MVZ 199828(Aneides flavipunctatus)

MVZ 199838

Errors in databases

Errors in publications

The Carmen Electra argument for Open Access

treemap

reuse data

Electra pilosa

Carmen Electra versus Electra

(guess who wins…)

reuse data

Homo sapiens

AJ711044

should be AJ971044

How do we find and fix these errors?

Don’t release data until it is “perfect”

(wrong)

“given enough eyes, all bugs are shallow”

Eric S Raymond

Credit

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

Google Page Rank

1.49

1.58

0.15

0.78

A

B

C

D

Page rank for web page

Scientific citation

H-index for authors

Impact factor for journals

What about an impact factor for data?

Metric of the value of the data

Incentive to have globally unique, citable identifiers

What to digitise first?

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

First digitise that which has been cited

W D Lang Nature 139, 191 (1937) doi:10.1038/139191a0

http://www.flickr.com/photos/mtl_shag/1403957285/

www.nhm.ac.uk

V S Smith

…end

Photo by Keith Marshall http://www.flickr.com/photos/keithmarshall/432924465/

top related