usgs september 16, 2009 werc multitaxa database council for data integration
TRANSCRIPT
Statement of Problems• Survey data can be complex
• Data collectors can stray from protocols
• Over time, volume of data becomes enormous – difficult to keep organized
• Sharing data with our partners is labor intensive and time consuming
Objectives• Efficiently collect and manage survey data
while adhering to specific study design protocols
• Integrate data across studies, time and location
• Effectively and efficiently share data with our partners
• Leading to our ultimate objective…
These objective evolved over the years as we moved along the learning curve
• Develop a single multi-taxa database for all studies
• Develop study-specific electronic forms for data entry, review & edit
Multitaxa database
PDAPC
In the beginning…• Using Personal Digital Assistants (PDAs) to
collect survey data– Main goal: have data stored in electronic format
from the start – no need to migrate later from paper– Often stored in Excel in the beginning– Huge improvement:
• Easier to share and backup data
• Improved analysis with Excel capabilities
– NO INTEGRATION YET…
Next phase…• Data storage using Access database instead
of Excel– A database was built for each project– PDAs sync directly to a database– More improvement!
• Ability to create forms for viewing data• Greater QA/QC tools• Improved reporting capability• Still simple to export to Excel when needed
– NOT MUCH INTEGRATION YET…
Next phase (a big step…)• Create a single database for all projects
– Query across study, location, time– Standardize site names across projects– Centralize user access and permissions– Systematic data backup
• This database has been called the MultiTaxa database, or MTX– years in the making
• REAL INTEGRATION!
Current Status• BioInformatics Operational Plan created
– An attempt to identify the scope of work, organize it, put personnel in positions of authority and responsibility
• PlanOverview.pdf
• FAMissionStmts.pdf
– Responsibilities grouped into Functional Areas• Defined
• Personnel assigned
• Required skills listed
– Hard copies & electronic copies available• Please use as an outline for your operational planning;
keep the meat, leave the bones!
• Successfully connected PDAs to new database structure
Multitaxa database
•Pacific pocket mouse 2008
•Pacific pocket mouse 2009
•Incidental observations
• Legacy datasets being migrated to new database, QA/QC’d, and approved datasets pushed up to Califoria Dept. of Fish & Game BIOS website (NBII compatible).
Multitaxa database
legacy
legacy
legacy
Migration highlights• Survey types:
– Small mammal trapping
– Pitfall trapping
– Radio telemetry tracking
– Sea otter observations (aerially and from ground)
– Carnivore tracking stations
– Carnivore camera tracking
– Visual encounter
– Vegetation assessments
– Incidental observations
More highlights…• Record counts
– Survey Events: 18,438– Taxa Observations: 113,436– Weather Observations: 30,259
• Geographic range– Southern California– Central California shores– Western North America– South Pacific islands
More highlights…• Observed species
– Snakes
– Birds
– Ants
– Small and large Mammals
– Fish
– Amphibians
– Trees
– Plants
– Grasses, shrubs
Each study uses custom VIEWS of the database:
• To display only the relevant tables and fields
• To filter for records only from the given survey
• To control access to records
2B 180276 TRUE NOTE "PREV S54 S2" medium 1 99 Y N N N N N NULL NULL NULL NULL
2B 180286 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL
3A 180286 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL
3A 180286 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL
3B 552491 TRUE NULL medium 1 99 N N N N N N NULL NULL 24 10
3B 180276 TRUE NOTE "PREV S54 S2" medium 1 99 Y N N N N N NULL NULL NULL NULL
3B 552491 TRUE NULL medium 1 2 N N N N N N NULL NULL NULL NULL
3B 180343 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL
1B 180286 TRUE NULL medium 1 1 N N N N N Y NULL NULL NULL NULL
2A 180286 TRUE NULL medium 1 1 Y N N N N Y NULL NULL NULL NULL
2A 180286 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL
2A 552491 TRUE NULL medium 4 2 N N N N N N NULL NULL NULL NULL
2A 180286 TRUE NULL medium 1 2 Y Y N N N N NULL NULL NULL NULL
2B 180276 TRUE PPM SAND medium 21 99 N N N N N N NULL NULL NULL NULL
2B 180286 TRUE Repro is ps medium 1 99 N N N N N N NULL NULL NULL NULL
2B 180286 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL
2B 552491 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL
3A 180276 TRUERepro is ps; REDESIGNATED FROM 62S1-EXACT COORDS medium 21 99 Y N N N N N NULL NULL NULL NULL
3A 180286 TRUE NULL medium 21 99 N N N N N N NULL NULL NULL NULL
3A 180286 TRUE Repro is nu medium 1 99 N N N N N N NULL NULL NULL NULL
3A 552491 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL
3A 552491 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL
VIEWS
Electronic forms use these views to interface with the database• Each set of PDA forms is unique and
designed to support the workflow described in the protocol
Multitaxa databaseU
niqu
e vi
ew
PDA
PC
INSTEAD-OF TRIGGERS
• Created for each view • Most views not updateable• Trigger used to parse incoming record into
each table• Also in the proper order, per referential
integrity
SECURITY• Permissions granted on views through Roles
AUDITING• History tables
– Populated with update/delete triggers
• User metadata– User and date recorded via trigger in each DML operation
QA/QC• Metadata field included with each “main” table
Challenges• Managing ripple-effects from ERD changes
– So far, okay, but will continue to get more complex
• Access Projects seem to get around the triggers somehow– Sending SQL command in separate connection
used to override Project’s native functionality
• Learning curve• Staff number limitations
Future Plans• Browser based forms/Web forms
– .NET language
• Continue expansion to more PI’s datasets
• More development of spatial representations
Acknowledgments• USGS
– Chris Brown– Carlton Rochester– Curt Tamanaha– Bill Perry– Robert Lugo– Len Winkler– Robert Fisher– Tom Suchanek
• NBII– Jennifer Pollack
• U.S. Fish &Wildlife– Elise Watson
– Tony McKinner
• Calif. Dept. of Fish & Game– Tom Lupo
References– Pendragon http://www.pendragonsoftware.com/
– BIOS http://bios.dfg.ca.gov/
– INSTEAD-OF-TRIGGERS http://technet.microsoft.com/en-us/library/ms175521.aspx
– Audit fields http://sqlserver2000.databases.aspfaq.com/how-do-i-audit-changes-to-sql-server-data.html
– xCase http://www.xcase.com/
– Spatial viewers website http://www.werc.usgs.gov/geospatial/arcgisserversites.html