usgs september 16, 2009 werc multitaxa database council for data integration

30
USGS September 16, 2009 WERC Multitaxa Database Council for Data Integration

Upload: everett-floyd

Post on 28-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

USGS

September 16, 2009

WERC Multitaxa Database

Council for Data Integration

Statement of Problems• Survey data can be complex

• Data collectors can stray from protocols

• Over time, volume of data becomes enormous – difficult to keep organized

• Sharing data with our partners is labor intensive and time consuming

Objectives• Efficiently collect and manage survey data

while adhering to specific study design protocols

• Integrate data across studies, time and location

• Effectively and efficiently share data with our partners

• Leading to our ultimate objective…

These objective evolved over the years as we moved along the learning curve

• Develop a single multi-taxa database for all studies

• Develop study-specific electronic forms for data entry, review & edit

Multitaxa database

PDAPC

History of our ApproachOur data collection methods have evolved

over the years…

In the beginning…• Using Personal Digital Assistants (PDAs) to

collect survey data– Main goal: have data stored in electronic format

from the start – no need to migrate later from paper– Often stored in Excel in the beginning– Huge improvement:

• Easier to share and backup data

• Improved analysis with Excel capabilities

– NO INTEGRATION YET…

Next phase…• Data storage using Access database instead

of Excel– A database was built for each project– PDAs sync directly to a database– More improvement!

• Ability to create forms for viewing data• Greater QA/QC tools• Improved reporting capability• Still simple to export to Excel when needed

– NOT MUCH INTEGRATION YET…

Enterprises data management model at that time:

And so on…

Next phase (a big step…)• Create a single database for all projects

– Query across study, location, time– Standardize site names across projects– Centralize user access and permissions– Systematic data backup

• This database has been called the MultiTaxa database, or MTX– years in the making

• REAL INTEGRATION!

Revised enterprise data management model:

A

A

B

B C

C

D

D

Current Status• BioInformatics Operational Plan created

– An attempt to identify the scope of work, organize it, put personnel in positions of authority and responsibility

• PlanOverview.pdf

• FAMissionStmts.pdf

– Responsibilities grouped into Functional Areas• Defined

• Personnel assigned

• Required skills listed

– Hard copies & electronic copies available• Please use as an outline for your operational planning;

keep the meat, leave the bones!

Multitaxa database

Developed a single Multitaxa database in SQL Server 2005

Now in production!

• Successfully connected PDAs to new database structure

Multitaxa database

•Pacific pocket mouse 2008

•Pacific pocket mouse 2009

•Incidental observations

• MS Access Project viewers successfully connected to new databaseOne set of forms per Study

• Legacy datasets being migrated to new database, QA/QC’d, and approved datasets pushed up to Califoria Dept. of Fish & Game BIOS website (NBII compatible).

Multitaxa database

legacy

legacy

legacy

Migration highlights• Survey types:

– Small mammal trapping

– Pitfall trapping

– Radio telemetry tracking

– Sea otter observations (aerially and from ground)

– Carnivore tracking stations

– Carnivore camera tracking

– Visual encounter

– Vegetation assessments

– Incidental observations

More highlights…• Record counts

– Survey Events: 18,438– Taxa Observations: 113,436– Weather Observations: 30,259

• Geographic range– Southern California– Central California shores– Western North America– South Pacific islands

More highlights…• Observed species

– Snakes

– Birds

– Ants

– Small and large Mammals

– Fish

– Amphibians

– Trees

– Plants

– Grasses, shrubs

Spatial Viewers

ARC GIS Server Web Mapping Application

Google Maps API in a Web Page

JavaScript API in a Web Page

Which technologies are we using?

Solutions

Each study uses custom VIEWS of the database:

• To display only the relevant tables and fields

• To filter for records only from the given survey

• To control access to records

2B 180276 TRUE NOTE "PREV S54 S2" medium 1 99 Y N N N N N NULL NULL NULL NULL

2B 180286 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL

3A 180286 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL

3A 180286 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL

3B 552491 TRUE NULL medium 1 99 N N N N N N NULL NULL 24 10

3B 180276 TRUE NOTE "PREV S54 S2" medium 1 99 Y N N N N N NULL NULL NULL NULL

3B 552491 TRUE NULL medium 1 2 N N N N N N NULL NULL NULL NULL

3B 180343 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL

1B 180286 TRUE NULL medium 1 1 N N N N N Y NULL NULL NULL NULL

2A 180286 TRUE NULL medium 1 1 Y N N N N Y NULL NULL NULL NULL

2A 180286 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL

2A 552491 TRUE NULL medium 4 2 N N N N N N NULL NULL NULL NULL

2A 180286 TRUE NULL medium 1 2 Y Y N N N N NULL NULL NULL NULL

2B 180276 TRUE PPM SAND medium 21 99 N N N N N N NULL NULL NULL NULL

2B 180286 TRUE Repro is ps medium 1 99 N N N N N N NULL NULL NULL NULL

2B 180286 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL

2B 552491 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL

3A 180276 TRUERepro is ps; REDESIGNATED FROM 62S1-EXACT COORDS medium 21 99 Y N N N N N NULL NULL NULL NULL

3A 180286 TRUE NULL medium 21 99 N N N N N N NULL NULL NULL NULL

3A 180286 TRUE Repro is nu medium 1 99 N N N N N N NULL NULL NULL NULL

3A 552491 TRUE NULL medium 1 99 Y N N N N N NULL NULL NULL NULL

3A 552491 TRUE NULL medium 21 99 Y N N N N N NULL NULL NULL NULL

VIEWS

Electronic forms use these views to interface with the database• Each set of PDA forms is unique and

designed to support the workflow described in the protocol

Multitaxa databaseU

niqu

e vi

ew

PDA

PC

INSTEAD-OF TRIGGERS

• Created for each view • Most views not updateable• Trigger used to parse incoming record into

each table• Also in the proper order, per referential

integrity

SECURITY• Permissions granted on views through Roles

AUDITING• History tables

– Populated with update/delete triggers

• User metadata– User and date recorded via trigger in each DML operation

QA/QC• Metadata field included with each “main” table

Challenges• Managing ripple-effects from ERD changes

– So far, okay, but will continue to get more complex

• Access Projects seem to get around the triggers somehow– Sending SQL command in separate connection

used to override Project’s native functionality

• Learning curve• Staff number limitations

Future Plans• Browser based forms/Web forms

– .NET language

• Continue expansion to more PI’s datasets

• More development of spatial representations

Acknowledgments• USGS

– Chris Brown– Carlton Rochester– Curt Tamanaha– Bill Perry– Robert Lugo– Len Winkler– Robert Fisher– Tom Suchanek

• NBII– Jennifer Pollack

• U.S. Fish &Wildlife– Elise Watson

– Tony McKinner

• Calif. Dept. of Fish & Game– Tom Lupo

References– Pendragon http://www.pendragonsoftware.com/

– BIOS http://bios.dfg.ca.gov/

– INSTEAD-OF-TRIGGERS http://technet.microsoft.com/en-us/library/ms175521.aspx

– Audit fields http://sqlserver2000.databases.aspfaq.com/how-do-i-audit-changes-to-sql-server-data.html

– xCase http://www.xcase.com/

– Spatial viewers website http://www.werc.usgs.gov/geospatial/arcgisserversites.html