döring dwc basisofrecord
TRANSCRIPT
Typing in Darwin Coredo we need dwc:basisOfRecord?
TDWG 2014Markus Döring, GBIF
Jönköping, October 2014
dc:type“The nature or genre of the resource.” “Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary” - but it has range rdfs:Class
Collection PhysicalObjectDataset ServiceEvent SoftwareImage SoundInteractiveResource StillImageMovingImage Text
dwc:basisOfRecord“The Darwin Core Type Vocabulary extends and refines terms from the
Dublin Core Type Vocabulary to describe and categorize resources more specifically for biodiversity applications. The basisOfRecord
should be populated with the value from the Darwin Core Type Vocabulary that best corresponds to the resource being shared.”
Occurrence dc:Event HumanObservationMaterialSample dc:StillImage MachineObservation
Taxon dc:MovingImage PreservedSpecimenNomen* dc:Sound FossilSpecimen
dc:Text LivingSpecimendc:PhysicalObject
DwC types vocabulary has been merged into DwC namespace !!! http://rs.tdwg.org/dwc/terms/PreservedSpecimen
Typing DwC XML• XML protocols (DiGIR, TAPIR)
use simple, flat DwC !
• Occurrences typed by • dc:type • BasisOfRecord
Typing DwC Archives• rowType with class term defines type for all records in a file
• extension files have their own rowType
• dwc:basisOfRecord in addition
Occurrences in DwC-A• dwc:Occurrence core rowType
• single flat Occurrence core, similar to XML
• dwc:Occurrence extension (inherits all core values!) • dwc:Taxon core, “checklists” • dwc:Event core, sampling / monitoring
• dwc:MaterialSample for specimens (not observations) • as core or extension? • subset of basisOfRecord values is applicable • should dwc:Occurrence be restricted to observations?
https://github.com/mdoering/dwca-examples
Typing in DwC RDF• rdf:type primarily defines type for a RDF resource
• values can be URIs from dcmitype, dwc, …
• other terms could describe the resource nature • dc:type, dcterms:type, dwc:basisOfRecord • … but not recommended
basisOfRecord @ GBIFHumanObserva,on 265573716 Museum specimen 175324PreservedSpecimen 139572007 Reportado 162128Observa,on 72567152 F 143949O 26314003 herbarium specimen 141170S 19055660 Published Report 134974specimen 9809257 L 131428Occurrence 6733802 genomic DNA 118827Colectado 1774372 Observado 113815voucher 1672883 Plant 110017OtherSpecimen 1575388 preserved 107149specimen(SP) 1325600 s,ll image 96591FossilSpecimen 1228399 Especimen preservado 85682HO 1119094 FishPrepara,on 75023Accession 976457 Unknown 74037FossileSpecimen 906052 Fossil Specimen 73975Observaciõn humana 842668 collected specimen 73311fossil 764820 Compound observa,on 72926MachineObserva,on 672918 Especimen preservado 70272LivingSpecimen 558926 Literature 40380FossilRecord 372226 DrawingOrPhotograph 37105Observasjon 349598 VirtualSpecimen 26505Objekt 267634 living organism 11630Unpublished Report 255305 PreservedTissue 11091Voucher 211771 living, growing plant 6189Personal Communica,on 180665 fluid specimen 7870
basisOfRecord @ GBIFHumanObserva,on 265573716 Museum specimen 175324PreservedSpecimen 139572007 Reportado 162128Observa,on 72567152 F 143949O 26314003 herbarium specimen 141170S 19055660 Published Report 134974specimen 9809257 L 131428Occurrence 6733802 genomic DNA 118827Colectado 1774372 Observado 113815voucher 1672883 Plant 110017OtherSpecimen 1575388 preserved 107149specimen(SP) 1325600 s,ll image 96591FossilSpecimen 1228399 Especimen preservado 85682HO 1119094 FishPrepara,on 75023Accession 976457 Unknown 74037FossileSpecimen 906052 Fossil Specimen 73975Observaciõn humana 842668 collected specimen 73311fossil 764820 Compound observa,on 72926MachineObserva,on 672918 Especimen preservado 70272LivingSpecimen 558926 Literature 40380FossilRecord 372226 DrawingOrPhotograph 37105Observasjon 349598 VirtualSpecimen 26505Objekt 267634 living organism 11630Unpublished Report 255305 PreservedTissue 11091Voucher 211771 living, growing plant 6189Personal Communica,on 180665 fluid specimen 7870
Evidence model• Keep evidence for Occurrence as distinct entities
• Occurrence only for organism in place and time
!
!
!
!
• Feasable for publishers? • overly normalized for flat sources? • Evidence location != occurrence location
hasEvidence
Occurrence
StillImage
MachineObservation
MaterialSample
hasEvidencehasEv
idence
hasEvidence
Literature
Time
Organism
Place
Typing Evidence• Basic evidence types
• MaterialSample • Observation • Media
• Extend base types as class hierarchy in DwC? • rdf:type/rowType
• An evidenceType property with an external vocabulary? • is this just another name for basisOfRecord?
Managing type vocabulary• Many dimensions. Multiple inheritance or many vocabularies?
• preservationMethod • samplingProtocol • organismPart
• Manage vocabulary • simply Github?
• Format • YAML • BCO OWL • RDF • SKOS • OBO file format
Do we need basisOfRecord?• Occurrence
• HumanObservation • MachineObservation
• PhysicalObject • PreservedSpecimen • FossilSpecimen • LivingSpecimen
• legacy values • Germplasm • Literature
type=dwc:Occurrence samplingProtocol=human samplingProtocol=machine
type=dwc:MaterialSample preparations=preserved preparations=fossil (???) preparations=alive (seed)
preparations=seed, culture collection, … dc:source (evidence in literature)
preparations @ GBIFhb 10674247 dry-‐mount 152522Pinned 1592815 whole organism 145996skin 1309430 skull; study skin 140141Alcohol 1125061 Skin, Y 127946herbarium specimen 1146250 Envelope 127829whole animal (ethanol) 1015170 lichen -‐ 1 116223Ethanol 752583 Not Mounted 113817herbarium specimen of unspecified type 662727 75% EtOH 344052Skin; Skull 657289 pointed 107153Dry 550633 Packet 106134study skin 434260 microscopic slide 100764mounted 429388 whole animal (pinned) 97852pin 417088 Exicado -‐ 1 97459Skin: Whole 340390 SS; Tissue-‐false 97432shell (dry) 309713 whole animal 92908ETOH -‐ 1 286076 Fluid 90661Sheet 284165 70% Ethanol 90403dried 277271 whole organism (isopropanol) 89954not applicable 230226 skin,skull 82584alcoholic 352914 Skin Study 81486skin (dry) 214550 fossil -‐ 1 77413unknown (fossil) 188080 fluid 75123eggs 175639 dried and pressed 70738skeleton 169125 skin; skull 70238skull 160763 fossil 64859
preparations @ GBIFhb 10674247 dry-‐mount 152522Pinned 1592815 whole organism 145996skin 1309430 skull; study skin 140141Alcohol 1125061 Skin, Y 127946herbarium specimen 1146250 Envelope 127829whole animal (ethanol) 1015170 lichen -‐ 1 116223Ethanol 752583 Not Mounted 113817herbarium specimen of unspecified type 662727 75% EtOH 344052Skin; Skull 657289 pointed 107153Dry 550633 Packet 106134study skin 434260 microscopic slide 100764mounted 429388 whole animal (pinned) 97852pin 417088 Exicado -‐ 1 97459Skin: Whole 340390 SS; Tissue-‐false 97432shell (dry) 309713 whole animal 92908ETOH -‐ 1 286076 Fluid 90661Sheet 284165 70% Ethanol 90403dried 277271 whole organism (isopropanol) 89954not applicable 230226 skin,skull 82584alcoholic 352914 Skin Study 81486skin (dry) 214550 fossil -‐ 1 77413unknown (fossil) 188080 fluid 75123eggs 175639 dried and pressed 70738skeleton 169125 skin; skull 70238skull 160763 fossil 64859
preparations @ GBIFhb 10674247 dry-‐mount 152522Pinned 1592815 whole organism 145996skin 1309430 skull; study skin 140141Alcohol 1125061 Skin, Y 127946herbarium specimen 1146250 Envelope 127829whole animal (ethanol) 1015170 lichen -‐ 1 116223Ethanol 752583 Not Mounted 113817herbarium specimen of unspecified type 662727 75% EtOH 344052Skin; Skull 657289 pointed 107153Dry 550633 Packet 106134study skin 434260 microscopic slide 100764mounted 429388 whole animal (pinned) 97852pin 417088 Exicado -‐ 1 97459Skin: Whole 340390 SS; Tissue-‐false 97432shell (dry) 309713 whole animal 92908ETOH -‐ 1 286076 Fluid 90661Sheet 284165 70% Ethanol 90403dried 277271 whole organism (isopropanol) 89954not applicable 230226 skin,skull 82584alcoholic 352914 Skin Study 81486skin (dry) 214550 fossil -‐ 1 77413unknown (fossil) 188080 fluid 75123eggs 175639 dried and pressed 70738skeleton 169125 skin; skull 70238skull 160763 fossil 64859
Is preparations overloaded?preservationMethod
NCD: http://rs.tdwg.org/ontology/voc/Collection#SpecimenPreservationMethodTypeTerm
!
organismPart ABCD KindOfUnit:“Part(s), physical state, or class of materials represented by this specimen.” Examples: whole organisms, antlers, bark, blood samples, bones, eggs, feathers, fruits, galls, heads, leaves
Discussion• Restrict Occurrence to observations?
• use MaterialSample for all physical things
• Do we want to use all new DwC terms as classes? • is it legitimate to use them as rowType / rdf:type? • do we need new id terms, e.g. FossilSpecimenID ?
• Type by single vocabulary or multiple “dimensions” • typing by class hierarchy or properties
• How do we want to manage a type vocabulary?
• dc:type needed if we have a more specific type?