central arizona phoenix lter center for environmental studies arizona state university database...
TRANSCRIPT
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Database Design
Peter McCartney (CAP)
RDIFS Training WorkshopSevilleta LTER
October 28-30, 2002
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Two Perspectives on Design
• Field researcher– Goal is to facilitate entry and analysis of
what they observed– Tend emphasize information that
discriminates between cases– Data collection dictates design
• Data analyst– goal is to retrieve what they are looking for– Tend to emphasize information that
aggregates cases– Data use dictates design
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Two Interfaces
• Field forms– Organized by sampling events– Hierarchical
• Analysis and reporting– Matrix (“flat file”) structure– Aggregated or transformed values
( sums, z-scores, etc)– Encoded or decoded
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
The Data Cycle
RDBMSStorage
ILSDFWEWFDSF FWEFSDFWE DIFJDISLDIJFS
SDFSDFSDFSDIMLMKM FSMDSID MLSDIFMSLDFI
SDFSDFSL LSIDFSLDFM JIJIJIJLSDIFS
Sdfsdfsdfs dfsdfs 8989
sdfsdfsdsdfsd 90
sdfsdsd 4004
dfsdfsdfsdfsdfs dfsd
QueryAnalysis
QA/QCData Entry
Denormalized Normalized
Site, Type,Sample, date,taxon1A, urb, 1,2/2/99, 1451A, urb, 1, 2/2/99, 1231A, ag, 2, 2/3/99, 145…..1..A
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Normalization
• Information does not repeat• Every row can be uniquely
referenced• Attributes are independent• Attributes convey only one piece of
information• Each piece of information is
expressed in only one attribute
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Denormalization
• You can take a good thing too far• Large numbers of tables with small
amount of data will degrade performance and be difficult to read
• Denormalize to avoid excessive links
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Data Modeling
• Concentrate on logical design – independent of vendor
• Entity-Relationship (ER) diagram – Relational databases– Tables=entities, columns=attributes,
keys=relationships, code=procedures
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Entities - Tables
• Two dimensional matrix – fields and observations
• Should contain discrete attributes of information that repeat together as a group – use protocol as guide
• The majority of ecological datasets will have at least three entities– Site description (occurs once for each site)– Sample description (occurs each time a site
is sampled)– Observations (occurs once for each
observation made during a sample event)
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Attributes
• Data types– storage capacity– Precision– Performance
• Avoid dependencies– Do not make the meaning of one
attribute conditional on another• Avoid nesting
– Do not restrict the domain of one attribute based on the value of another
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Attribute properties
• Indexes– Primary Key– Alternate Key– Foreign key– Inversion key
• Null/not null• Default value• Assignment
– Identity– Production rules
• Check Constraints
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Attribute Domains
• Measurement scale• Units• Storage types• Domains
– Ranges– Enumerations– Rules
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Relationships (join conditions)
• Cardinality– Parent occurrences– Child occurrences
• properties– Propagating
• Enforcement– Keys– Triggers
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Some design considerations
• Use related tables to define domains if list is long and may grow, constraints if list is short
• Use alternate keys to enforce complicated nested identifiers, hidden primary keys for linking
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Spatial data
• Most ecological data have some spatial context– Some are just poorly referenced– Some gain spatial context through
relationships
• GIS systems– Powerful for storing and analyzing
spatial geometry – Not so good for managing data
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Spatial Data Entities
• Vector GIS – Attribute table with all the properties
of Tables– Plus some association with a
geometric representation– Geometry may be simple or complex
• Raster GIS– Cell attribute with similar properties
• View, Stored Procedures– Return tabular entities
Center for Environmental Studies
Arizona State University
Central Arizona PhoenixLTER
Strategies for integrating spatial data
• Use a GIS rather than a database– Advisable only if attribute data are minimal
and flat
• Hybrid GIS/relational database– Store geometries (points, lines, polygons) in
a GIS. – Use a key field to relate geometries to
tables in your database
• Use a spatial data server– Stores gis geometries in binary fields in your
database– Uses special software to access spatial and
non-spatial data in the same connection