automated qa/qc technique for climate sensor data epscor hawaii hgdr scientific data management...
TRANSCRIPT
![Page 1: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/1.jpg)
Automated QA/QC Technique for Climate Sensor Data
EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team
![Page 2: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/2.jpg)
TOC
• QA/QC Requirements• Detecting Outliers– Types of Outliers– Detection Methods– Statistical Correlation Functions– QuaT Correlational Method
• Data Mining for further automation
![Page 3: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/3.jpg)
QA/QC Requirements
• Detect Abnormal Data & Outliers• Correct abnormal data and outliers where it is
possible• Find additional property/correlation among
variables– To catch changes overtime
![Page 4: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/4.jpg)
Detecting Outliers
• Type of Outliers– Correctable Outliers• Caused by calibration, sensor cleaning, low battery
voltage, erroneous sensor installation, etc. Outliers caused by these factors can be corrected
– Error Values• Missing or impossible values caused by sensor failure:
physical damage, irreversible factor effects• This type of outliers cannot be corrected and must be
discarded
![Page 5: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/5.jpg)
Detecting Outliers
• Detection Methods1. Normal value range check (Single variable)2. Diurnal pattern check (Single variable)3. Correlational pattern check (Multiple variables)4. Additional methods can be found by data mining
![Page 6: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/6.jpg)
Normal value range check
For example, humidity if it is over 100% does not make sense. Also consideration to regional and seasonal factors required.• Knowledge Required
Known/valid normal value ranges for all variables Also subsets of normal value ranges for all variables in different regions or seasons
![Page 7: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/7.jpg)
Diurnal pattern check
The radiation should be high in the day low in the night• Knowledge Required
Known/valid diurnal patternAlso different diurnal patterns for all variables in different regions or seasons
• Challenge– How to slice time– What value ranges are considered to be high, average, or
low for each variable, simply take standard deviation?
![Page 8: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/8.jpg)
Correlational pattern check
For example, the radiation and the temperature should show correlations• Knowledge Required
Known correlation between the variablesHow can we verify the correlations?• Correlation functions from statistics will be useful• Also, a method called QuaT might be useful to analyze
the similarity of the trends of two variables along the timeline
![Page 9: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/9.jpg)
Additional Analyses
4. Additional methods might be helpful from data mining– Finding additional correlations– Value range change over time (Global climate
change)
![Page 10: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/10.jpg)
Statistic Functions
• Pearson’s Product Moment• Spearman’s Rank Correlation• Kendall’s Rank Correlation
![Page 11: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/11.jpg)
Pearson’s Product Moment
• Pearson’s only works for parametric dataset– Dataset needs to be tested for normality before it
can be analyzed – Normality test: Shapiro-Wilk Normality test• If a dataset is determined to be non-parametric,
either ,or both of, Spearman’s or Kendall’s
– Also, outliers decreases the precision of Pearson’s
![Page 12: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/12.jpg)
Spearman’s & Kendall Correlation
• If a dataset is not parametric, these correlation functions can be used
• Both requires values to be presorted/ranked• Spearman’s – compares the distance of the
values of the same rank from the two variables
• Kendall’s – shows the ratio of the values of the same rank from the two variables
![Page 13: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/13.jpg)
QuaT
• An algorithm to determine the similarity of the two trend curves
• Introduced by Okabe A. & Masuyama A. of Tokyo University
• “A robust exploratory method for qualitative trend curve analysis”
• http://www.csis.u-tokyo.ac.jp/dp/8.pdf
![Page 14: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/14.jpg)
QuaT - Basic steps of the algorithm
1. Find peaks and bottoms for the curves that are compared
2. Calculate the height of each peak3. Determine the distinct height, a threshold height, and
extract peaks that are higher or equal to the distinct height. In other words, ignore less distinct peaks
4. Compare extracted peaks and determine if the two variables’ curves have the times of peaks occur at the same time and magnitude (order) for both variable
![Page 15: Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team](https://reader036.vdocuments.us/reader036/viewer/2022083009/56649f2c5503460f94c47b57/html5/thumbnails/15.jpg)
Basic Relationship among and between the Variables
• Radiation (short, long, net, PAR)• Rainfall (humidity, soil moisture)• Temperature (air, surface, body)• Wind (speed, direction)
Affecting Relatiohship Affected Specific VariableRadiation Category direct Temperature CategoryRadiation Category affect Wind CategoryRadiation Category inverse Rainfall Category Soil MoistureRainfall Category inverse Radiation CategoryRainfall Category inverse Temperature CategoryRainfall Category affect Wind CategoryWind Category inverse Temperature Category