quality assurance procedures for corie data

24
Quality Assurance Procedures for CORIE Data Realtime QA Timeseries Diagram of Slopes Sequential Likelihood Ratio Archival QA Time Pressure Temperature Salinity Velocity Meteorology Databas e Web Visualizati on Field Staff Near Real- time Quality Assessment Instrumen t Network Archival Quality Assessment Ascii Public Data Archive Fie l d Note s Quality Flags Quality Flags

Upload: anjolie-black

Post on 01-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Quality Assurance Procedures for CORIE Data. Quality Flags. Archival Quality Assessment. Instrument Network. Database. Web Visualization. Field Notes. Ascii Public Data Archive. Quality Flags. Archival QA Time Pressure Temperature Salinity Velocity Meteorology. - PowerPoint PPT Presentation

TRANSCRIPT

Quality Assurance Procedures for CORIE Data

Realtime QATimeseriesDiagram of SlopesSequential Likelihood Ratio

Archival QATimePressureTemperatureSalinityVelocityMeteorology

Database

Web

Visualization

Field StaffNear Real-time

Quality

Assessment

Instrument

Network

Archival Quality

Assessment

Ascii Public Data

Archive

Fiel

d N

otes

Qua

lity

Fla

gs

Quality Flags

Real-time Quality Assurance

• Visual evaluation of data quality 4 times a week• Automated testing for biofouling, checked by operator• Real-time checks result in notification of field staff• No database storage of notification• No incorporation of assessment into database quality

flags• Non-automated use of assessment to deactivate web

display of real-time data

Archival quality assurance

• CTD and ADP data quality assessment on a monthly basis.

• 1 month lag in assessment (January data tested at the end of February)

• CTD QA dependent on subsequent data• ADP QA not dependent on subsequent data• Data which passes QA is stored in ascii public archive• QA not used to generate Quality flags in database

Feb Mar Apr

CTD QA

ADP QA ADP QA

Timeseries

• Timeseries of Depth, Salinity, Temperature displayed on website

• Inspected for instrument failure or biofouling

Cross-estuary slope diagram

• S-T plot of all stations

• Almost all stations should produce the same s-t line

• Chnke, ogi01, and ogi02 are exceptions

Mottb possibly biofouled Extensive biofouling

Sequential Likelihood Ratio

• Based on linear S-T relationship across estuary

• Accounts for local variation from linear relationship

• Depends on S and T at daily maximum S at each station, river T and Ocean T

• Modeled Sclean and Sbiofouled based on T, TR, and TO

compared to measured S

• Station specific ratio cutoff, trained on known biofouled data

• Used to generate a visual display

• Currently trained for lower estuary stations

• Extension of method to lateral bays under development

• Could be used for archival QA

TR

34

TO

0

SM

TM

Scl

Sbf

CTD: time

• Radio network can produce data with bunched time values

• Expected timestep between data points is determined from data (median timestep)

• If timesteps are shorter than median time step, with a gap preceding bunch that has correct length, then data are reassigned times evenly spaced over gap

• If gap is longer than data clump, then data clump is discarded

CTD: pressure data

• Pressure data is corrected for atmospheric pressure using atmospheric pressure record from marsh or tansy

• Tested for spikes using a high pass filter (4th order non-causal Butterworth filter with a cutoff period of 1 hour, implemented using the matlab function idfilt)

• spikes > 0.22 m removed

• Period (30 minutes) around spike is removed

• If tide period has extensive smaller spikes (mean noise > 0.01 m), entire period is removed

CTD: temperature

• Subject to range limits (<0 or >30)• Subject to visual inspection• Instrument failure has generally produced extensive

invalid values

CTD: Salinity

• Main concern is biofouling, but Conductivity sensors can also fail

• Sensor failure is detected by range check (S<0 or > 35) and by visual inspection

• Biofouling is tested by using cross estuary s-t relationship

• Determine median s-t slope for each tidal period

CTD: Salinity

• Compare each instrument’s s-t slope for that tidal period to median• Cutoff: abs(local slope) – abs(median slope) > 0.2 => biofouled• When an instrument is considered biofouled, preceding data is considered

biofouled until a clean cutoff is exceeded • Clean cutoff: abs(local slope) – abs(median slope) < 0 • When median slope approaches 0, method fails• If instrument is biofouled after period of near-zero slope, then entire period of

near-sero slope is considered biofouled

CTD: Salinity

• Automated assessment produces both false positives and false negatives

• Results are manually checked

False positive False negative

Transient Biofouling

ADP: velocity

2 major sources of bad velocity data:• Surface reflection• Low signal strength

– signal strength decreases with distance from instrument

– decreases with biofouling of instrument over time

1 minor source: instrument roll-over

Unlike Conductivity sensors, biofouling is easily determined because signal strength is a measured variable

ADP: velocity

• Signal strength < 20 dB, data flagged bad• Documentation recommends a cutoff of 9

dB, but testing indicates that 9 dB cutoff admits some questionable data

• Surface reflection detected by increasing signal strength

• Signal strength increase can also be caused by variation in reflective material in water

• Determine approximate surface from pressure record, check for signal strength increase within 3 m of approximate surface

Surface Reflection

Signal Strength

Meteorological Data

• Subject to physical range tests and visual inspection• Further methods under development

Storage of Quality Assessment

• Data records which do not meet minimal quality standards are stored in the raw data files, but do not enter the database

• Notices of observer suspicion of data quality are not currently stored in a formal manner, and are not entered into the database

• Archival quality assurance procedures currently generate public archive files which contain only data which has passed the quality assurance tests

• The quality assurance flagging is not currently stored in the database

End

Models

• A model of the clean signal– Temperature and salinity variation are correlated. Model daily

maximum salinity and corresponding temperature are jointly Gaussian.

– The probability density for observing the sequence of salinity measurements {sn }, given the sequence of recorded mixing coefficients {Tn }and a clean sensor

p({sn} | {n }clean )

• A model of the biofouled signal– Allows for different degradation rates m for each biofouling

episode, and arbitrary onset time with these parameters fit to incoming data.

p({sn} | {n }mbiofouled ) = p ({sn} | {n }biofouled ) – m and are unknown – These parameters are fit to the data sequence by maximum

likelihood.

,maxm

Regression Model: Mixture of Experts

• The correlation between salinity and temperatures is not stationary.– The detector system needs to switch between seasons.– A mixture of local models can cover different behaviors.

• Both of experts and gating network receive same input vector.

• Each expert network tackles each of the different seasons.

• The gating network decides which of the experts should be used.

• Regression output Expert Network 1

Expert Network 2

Expert Network n

Gating Network

Input vector T

Output

n

g1

gng2

i ii

g

Ref.

Approach and Results

• Parameterized novelty detectors embedded in a sequential likelihood ratio test

– SLR at current time N

is compared to a threshold to identify biofouling events.• Results

– Automated biofouling detectors deployed throughout the estuary. Monitored by observer, and used to send out notices of biofouling events, but not incorporated directly in to data flagging.

cleanTsp

biofouledTsp

cleanTsp

biofouledmTspSLR

nn

nnm

nn

nn

},{|}{

},{|}{max

},{|}{

,,},{|}{ ,

N

n nn

nn

mN leancTSp

fouledmTSpSLR

),|(

),,|(lnmax ,

,

Ref.

Criteria for rejecting data before it enters the database

• rserial2db rejects data lines based on failed checksum or garbled line

Short input line: [RE^M], skipping.Skipping unknown data line: [abedCT 0000 00 00 00 00 00 1516D +20.856, +07.947, +19.0889*6F]Checksum failed for data line: W,üR'¢í?»TW%X¯»U»PT$CRdsdmaRV0CTDd00730R seabedCT 0000 00 00 00 00 00 1516D +09.502, +08.366, +08.0447*60Short input line: [], skipping.Skipping unknown data line: [W,ýS'¢è¾?»T W%Y »S»UT10394A141322 1316:0 746:1 :2 :3 :4

532:5 -1806:6 :7]Line length = 162, must be 81 to 83 chars long, skipping data line: 10395A138173 1193:0 770:1 :2 :3 :4 282:5 :6 10395A138177 1192:0 770:1 :2 :3 :4 278:5 :6 :7

• Most data is not subjected to sanity check (e.g salinity <0 or > 35)

• Certain stations are handled as special cases and are subject to sanity checks (ogi02 is checked for negative sal, temp, and cond)

Depth spikes removed

Slope Comparison

Salinity Flagging