döring dwc basisofrecord

18
Typing in Darwin Core do we need dwc:basisOfRecord? TDWG 2014 Markus Döring, GBIF Jönköping, October 2014

Upload: markus-doering

Post on 14-Aug-2015

114 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Döring dwc basisofrecord

Typing in Darwin Coredo we need dwc:basisOfRecord?

TDWG 2014Markus Döring, GBIF

Jönköping, October 2014

Page 2: Döring dwc basisofrecord

dc:type“The nature or genre of the resource.” “Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary” - but it has range rdfs:Class

Collection PhysicalObjectDataset ServiceEvent SoftwareImage SoundInteractiveResource StillImageMovingImage Text

Page 3: Döring dwc basisofrecord

dwc:basisOfRecord“The Darwin Core Type Vocabulary extends and refines terms from the

Dublin Core Type Vocabulary to describe and categorize resources more specifically for biodiversity applications. The basisOfRecord

should be populated with the value from the Darwin Core Type Vocabulary that best corresponds to the resource being shared.”

Occurrence dc:Event HumanObservationMaterialSample dc:StillImage MachineObservation

Taxon dc:MovingImage PreservedSpecimenNomen* dc:Sound FossilSpecimen

dc:Text LivingSpecimendc:PhysicalObject

DwC types vocabulary has been merged into DwC namespace !!! http://rs.tdwg.org/dwc/terms/PreservedSpecimen

Page 4: Döring dwc basisofrecord

Typing DwC XML• XML protocols (DiGIR, TAPIR)

use simple, flat DwC !

• Occurrences typed by • dc:type • BasisOfRecord

Page 5: Döring dwc basisofrecord

Typing DwC Archives• rowType with class term defines type for all records in a file

• extension files have their own rowType

• dwc:basisOfRecord in addition

Page 6: Döring dwc basisofrecord

Occurrences in DwC-A• dwc:Occurrence core rowType

• single flat Occurrence core, similar to XML

• dwc:Occurrence extension (inherits all core values!) • dwc:Taxon core, “checklists” • dwc:Event core, sampling / monitoring

• dwc:MaterialSample for specimens (not observations) • as core or extension? • subset of basisOfRecord values is applicable • should dwc:Occurrence be restricted to observations?

https://github.com/mdoering/dwca-examples

Page 7: Döring dwc basisofrecord

Typing in DwC RDF• rdf:type primarily defines type for a RDF resource

• values can be URIs from dcmitype, dwc, …

• other terms could describe the resource nature • dc:type, dcterms:type, dwc:basisOfRecord • … but not recommended

Page 8: Döring dwc basisofrecord

basisOfRecord @ GBIFHumanObserva,on 265573716 Museum  specimen 175324PreservedSpecimen 139572007 Reportado 162128Observa,on 72567152 F 143949O 26314003 herbarium  specimen 141170S 19055660 Published  Report 134974specimen 9809257 L 131428Occurrence 6733802 genomic  DNA 118827Colectado 1774372 Observado 113815voucher 1672883 Plant 110017OtherSpecimen 1575388 preserved 107149specimen(SP) 1325600 s,ll  image 96591FossilSpecimen 1228399 Especimen  preservado 85682HO 1119094 FishPrepara,on 75023Accession 976457 Unknown 74037FossileSpecimen 906052 Fossil  Specimen 73975Observaciõn  humana 842668 collected  specimen 73311fossil 764820 Compound  observa,on 72926MachineObserva,on 672918 Especimen  preservado 70272LivingSpecimen 558926 Literature 40380FossilRecord 372226 DrawingOrPhotograph 37105Observasjon 349598 VirtualSpecimen 26505Objekt 267634 living  organism 11630Unpublished  Report 255305 PreservedTissue 11091Voucher 211771 living,  growing  plant 6189Personal  Communica,on 180665 fluid  specimen 7870

Page 9: Döring dwc basisofrecord

basisOfRecord @ GBIFHumanObserva,on 265573716 Museum  specimen 175324PreservedSpecimen 139572007 Reportado 162128Observa,on 72567152 F 143949O 26314003 herbarium  specimen 141170S 19055660 Published  Report 134974specimen 9809257 L 131428Occurrence 6733802 genomic  DNA 118827Colectado 1774372 Observado 113815voucher 1672883 Plant 110017OtherSpecimen 1575388 preserved 107149specimen(SP) 1325600 s,ll  image 96591FossilSpecimen 1228399 Especimen  preservado 85682HO 1119094 FishPrepara,on 75023Accession 976457 Unknown 74037FossileSpecimen 906052 Fossil  Specimen 73975Observaciõn  humana 842668 collected  specimen 73311fossil 764820 Compound  observa,on 72926MachineObserva,on 672918 Especimen  preservado 70272LivingSpecimen 558926 Literature 40380FossilRecord 372226 DrawingOrPhotograph 37105Observasjon 349598 VirtualSpecimen 26505Objekt 267634 living  organism 11630Unpublished  Report 255305 PreservedTissue 11091Voucher 211771 living,  growing  plant 6189Personal  Communica,on 180665 fluid  specimen 7870

Page 10: Döring dwc basisofrecord

Evidence model• Keep evidence for Occurrence as distinct entities

• Occurrence only for organism in place and time

!

!

!

!

• Feasable for publishers? • overly normalized for flat sources? • Evidence location != occurrence location

hasEvidence

Occurrence

StillImage

MachineObservation

MaterialSample

hasEvidencehasEv

idence

hasEvidence

Literature

Time

Organism

Place

Page 11: Döring dwc basisofrecord

Typing Evidence• Basic evidence types

• MaterialSample • Observation • Media

• Extend base types as class hierarchy in DwC? • rdf:type/rowType

• An evidenceType property with an external vocabulary? • is this just another name for basisOfRecord?

Page 12: Döring dwc basisofrecord

Managing type vocabulary• Many dimensions. Multiple inheritance or many vocabularies?

• preservationMethod • samplingProtocol • organismPart

• Manage vocabulary • simply Github?

• Format • YAML • BCO OWL • RDF • SKOS • OBO file format

Page 13: Döring dwc basisofrecord

Do we need basisOfRecord?• Occurrence

• HumanObservation • MachineObservation

• PhysicalObject • PreservedSpecimen • FossilSpecimen • LivingSpecimen

• legacy values • Germplasm • Literature

type=dwc:Occurrence samplingProtocol=human samplingProtocol=machine

type=dwc:MaterialSample preparations=preserved preparations=fossil (???) preparations=alive (seed)

preparations=seed, culture collection, … dc:source (evidence in literature)

Page 14: Döring dwc basisofrecord

preparations @ GBIFhb 10674247 dry-­‐mount 152522Pinned 1592815 whole  organism 145996skin 1309430 skull;  study  skin 140141Alcohol 1125061 Skin,  Y 127946herbarium  specimen 1146250 Envelope 127829whole  animal  (ethanol) 1015170 lichen  -­‐  1 116223Ethanol 752583 Not  Mounted 113817herbarium  specimen  of  unspecified  type 662727 75%  EtOH 344052Skin;  Skull 657289 pointed 107153Dry 550633 Packet 106134study  skin 434260 microscopic  slide 100764mounted 429388 whole  animal  (pinned) 97852pin 417088 Exicado  -­‐  1 97459Skin:  Whole 340390 SS;  Tissue-­‐false 97432shell  (dry) 309713 whole  animal 92908ETOH  -­‐  1 286076 Fluid 90661Sheet 284165 70%  Ethanol 90403dried 277271 whole  organism  (isopropanol) 89954not  applicable 230226 skin,skull 82584alcoholic 352914 Skin  Study 81486skin  (dry) 214550 fossil  -­‐  1 77413unknown  (fossil) 188080 fluid 75123eggs 175639 dried  and  pressed 70738skeleton 169125 skin;  skull 70238skull 160763 fossil 64859

Page 15: Döring dwc basisofrecord

preparations @ GBIFhb 10674247 dry-­‐mount 152522Pinned 1592815 whole  organism 145996skin 1309430 skull;  study  skin 140141Alcohol 1125061 Skin,  Y 127946herbarium  specimen 1146250 Envelope 127829whole  animal  (ethanol) 1015170 lichen  -­‐  1 116223Ethanol 752583 Not  Mounted 113817herbarium  specimen  of  unspecified  type 662727 75%  EtOH 344052Skin;  Skull 657289 pointed 107153Dry 550633 Packet 106134study  skin 434260 microscopic  slide 100764mounted 429388 whole  animal  (pinned) 97852pin 417088 Exicado  -­‐  1 97459Skin:  Whole 340390 SS;  Tissue-­‐false 97432shell  (dry) 309713 whole  animal 92908ETOH  -­‐  1 286076 Fluid 90661Sheet 284165 70%  Ethanol 90403dried 277271 whole  organism  (isopropanol) 89954not  applicable 230226 skin,skull 82584alcoholic 352914 Skin  Study 81486skin  (dry) 214550 fossil  -­‐  1 77413unknown  (fossil) 188080 fluid 75123eggs 175639 dried  and  pressed 70738skeleton 169125 skin;  skull 70238skull 160763 fossil 64859

Page 16: Döring dwc basisofrecord

preparations @ GBIFhb 10674247 dry-­‐mount 152522Pinned 1592815 whole  organism 145996skin 1309430 skull;  study  skin 140141Alcohol 1125061 Skin,  Y 127946herbarium  specimen 1146250 Envelope 127829whole  animal  (ethanol) 1015170 lichen  -­‐  1 116223Ethanol 752583 Not  Mounted 113817herbarium  specimen  of  unspecified  type 662727 75%  EtOH 344052Skin;  Skull 657289 pointed 107153Dry 550633 Packet 106134study  skin 434260 microscopic  slide 100764mounted 429388 whole  animal  (pinned) 97852pin 417088 Exicado  -­‐  1 97459Skin:  Whole 340390 SS;  Tissue-­‐false 97432shell  (dry) 309713 whole  animal 92908ETOH  -­‐  1 286076 Fluid 90661Sheet 284165 70%  Ethanol 90403dried 277271 whole  organism  (isopropanol) 89954not  applicable 230226 skin,skull 82584alcoholic 352914 Skin  Study 81486skin  (dry) 214550 fossil  -­‐  1 77413unknown  (fossil) 188080 fluid 75123eggs 175639 dried  and  pressed 70738skeleton 169125 skin;  skull 70238skull 160763 fossil 64859

Page 17: Döring dwc basisofrecord

Is preparations overloaded?preservationMethod

NCD: http://rs.tdwg.org/ontology/voc/Collection#SpecimenPreservationMethodTypeTerm

!

organismPart ABCD KindOfUnit:“Part(s), physical state, or class of materials represented by this specimen.” Examples: whole organisms, antlers, bark, blood samples, bones, eggs, feathers, fruits, galls, heads, leaves

Page 18: Döring dwc basisofrecord

Discussion• Restrict Occurrence to observations?

• use MaterialSample for all physical things

• Do we want to use all new DwC terms as classes? • is it legitimate to use them as rowType / rdf:type? • do we need new id terms, e.g. FossilSpecimenID ?

• Type by single vocabulary or multiple “dimensions” • typing by class hierarchy or properties

• How do we want to manage a type vocabulary?

• dc:type needed if we have a more specific type?