water quality data analysis & r programming … quality data analysis & r programming...
TRANSCRIPT
Water Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc.
March 2013 – February 2014
Megan Gehrke, Graduate Student, CSU Monterey Bay
Advisor: Sarah Lopez, Central Coast Water Quality Preservation, Inc.
Submitted: May 2014
2
Table of Contents
Acknowledgements ............................................................................................................... 3 Executive Summary .............................................................................................................. 4 Introduction .............................................................................................................................. 5 Project Objectives ................................................................................................................. 5 Project Approach ................................................................................................................... 6 Project Outcomes .................................................................................................................. 6 Sample Completeness ........................................................................................................................ 7 Literature Review ............................................................................................................................... 7 “R” Code and Outputs ......................................................................................................................... 9
Conclusion .............................................................................................................................. 10 References .............................................................................................................................. 11 Appendix – Samples of Figures and Graphs Produced ....................................... 12
3
Acknowledgements This project was supported by Agriculture and Food Research Initiative
Competitive Grant no. 2011-38422-31204 from the USDA National Institute of
Food and Agriculture.
4
Executive Summary Central Coast Water Quality Preservation, Inc. (CCWQP) is charged with
implementing a Cooperative Monitoring Program (CMP), in which monthly water
quality samples from 50 sites in agricultural watersheds of the Central Coast are
collected. The major goal of the CMP is to show changes in water quality over
time, hopefully related to advances in agricultural practices that protect water
quality. This internship was focused on the development of code for use within
“R” statistical software to increase automation in the production of figures, tables,
and statistical results for the CMP annual report.
My specific objectives for this internship were to enhance my knowledge and
skills pertaining to agricultural water quality issues and water quality data
analysis, as well as to gain further professional experience in this field. I was able
to meet my objectives through experience working with, and learning how to
visually and quantitatively interpret, a large agricultural water quality dataset with
guidance from my advisor at CCWQP. Specifically, these objectives were met
through internship tasks, which included verifying dataset completeness,
researching water quality statistical trend analyses, and creating “R” code for the
purposes of trend analysis, data characterizations, and data summaries.
This experience has helped prepare me for a future career in water quality and
environmental analysis, which could include future work with the USDA under the
Forest Service or the NRCS.
5
Introduction
Central Coast Water Quality Preservation, Inc. (CCWQP) is charged with
implementing a Cooperative Monitoring Program (CMP), in which monthly water
quality samples from 50 sites in agricultural watersheds of the Central Coast are
collected. The samples are analyzed for ammonia, chlorophyll a, conductivity,
total dissolved solids, nitrate, dissolved oxygen and oxygen saturation, pH,
salinity, and turbidity, in addition to in-field measurements of air temperature,
water temperature, and flow. Samples are also analyzed for toxicity to
invertebrates, fish, and algae 4 times annually. The results are reported to the
Regional Water Quality Control Board (RWQCB) on behalf of farmers enrolled in
the Conditional Waiver for Irrigated Lands. The major goal of the CMP is to show
changes in water quality over time, hopefully related to advances in agricultural
practices that protect water quality. Past CMP reports were used by farmers,
regulators, conservation agencies, researchers, and environmental
organizations. This internship was focused on the development of code for use
within “R” statistical software package (R Core Team 2012) to increase
automation in the production of figures, tables, and statistical trend analysis
results for the CMP annual report.
Project Objectives
The main objective for this internship was to develop and implement code for use
within “R” to provide figures and results for the 2013 CMP report, and to provide
a means of automation for the production of these figures and results for future
annual reports. A personal goal for this internship was to gain professional
experience relevant to watershed science and policy. Specifically, through this
internship I hoped to enhance my knowledge of the field of agricultural water
quality, as well as my data and statistical analysis and reporting skills. Having
these skills and experience will greatly improve my ability to pursue my desired
career path, especially at agencies such as the USDA.
6
Project Approach The internship was structured according to the following tasks:
1. Data preparation and sample completeness check – “R” was used to
review 7 annual data files for completeness, as well as to format and
collate these files as necessary to support further data analysis.
2. Literature review – A review of water quality trend analysis literature was
performed.
3. Trend analyses – Code was developed for use within “R” to test for trends
within the 7-year dataset. Trend tests used include Seasonal Mann
Kendall and Bayesian Point Change tests. Additionally, routines were
created to provide summary figures of the Seasonal Mann Kendall trend
analysis results.
4. Time series and data characterization – Code was developed for use
within “R” to perform data summaries and to create figures for time series
plots, box plots, stacked proportional bar plots, and pie charts of
regulatory exceedances.
These tasks were successfully completed through instruction and guidance from
my supervisor at CCWQP. Additionally, a great deal of independent work and
research was required to improve my skills in “R” and my knowledge of
environmental statistical analysis techniques.
Project Outcomes Through this project I gained extensive experience with analyzing a large water
quality dataset using numerous data and statistical analysis techniques.
Additionally, I was able to broaden my knowledge of agricultural water quality
issues as well as my technical skills in Excel and “R”.
7
Sample Completeness The dataset was tested for completeness using “R”. The results showed the
locations of missing data in the dataset. Sampling event summaries were then
reviewed for each missing data value to determine whether or not the value was
truly missing from the dataset.
Literature Review A literature review of work on water quality trend analysis was performed, with
focus on agricultural water quality and the Seasonal Mann Kendall trend test
(Table 1).
Citation Summary Antonopoulos VZ, Papamichail DM, Mitsiou KA. 2001. Statistical and trend analysis of water quality and quantity data for the Strymon River in Greece. Hydrology and Earth System Sciences, 5(4):679-‐691.
Monthly water quality and discharge data were analyzed for trends and evaluated for best-‐fit models. The relationships between concentrations and loads with discharge were also examined, using simple regression. Relation between concentration and discharge was weak, while relation between load and discharge was very strong. Trends were detected using the non-‐parametric Spearman's criterion.
Bekele A, McFarland A. 2004. Regression-‐based flow adjustment procedures for trend analysis of water quality data. Transactions of the ASAE, 47(4):1093-‐1104.
Used non-‐parametric Kendall's tau trend test (which is suitable for situations in which data are non-‐normal, missing values, or is censored; non-‐parametric tests are preferred for data sets of moderate length) to test for monotonic trends of data adjusted for flow in 3 different ways. Kendall's tau is based on the rank order statistic (compares rank rather than actual values). Objective was to determine best flow adjustment method (OLS or LOWESS). The LOWESS method was found to be more appropriate than the OLS method because it was better able to define relationships between constituent concentration and flow. The default f-‐value of 0.5 was found to be adequate for reducing variability in constituent concentrations due to flow.
Table 1. Results of a literature review of work in water quality trend analysis, with focus on agricultural water quality and the Seasonal Mann Kendall trend test.
8
Berryman D, Bobee B, Cluis D, Haemmerli J. 1988. Non-‐parametric tests for trend detection in water quality time series. Water Resources Bulletin, 24(3):545-‐556.
Methods for choosing trend detection tests based on the identification of sources of serial dependence in the time series are discussed. The Spearman and Kendall tests for monotonic trends and the Mann-‐Whitney test for the detection of steps are identified as powerful non-‐parametric tests.
Bouraoui F, Turpin N, Boerlen P. 1999. Trend analysis of nutrient concentrations and loads in surface water in an intensively fertilized watershed. J. Environmental Quality, 28(6):1878-‐1885.
Analyzed nutrient concentrations and loads at the surface water outlet of a heavily fertilized watershed. A non-‐parametric statistical analysis (seasonal Mann-‐Kendall) was performed on mean monthly and mean annual data, which detected no trend. Next, data were compared from the same month of each year and both decreasing and increasing trends were detected for certain constituents. The Mann-‐Kendall test was chosen based upon reviews of trend analysis of water quality by Walker (1994), Hirsch et al. (1991), and Berryman et al. (1988).
Crain AS, Martin GR. 2009. Trends in surface water quality at selected ambient-‐monitoring network stations in Kentucky, 1979-‐2004. Scientific Investigations Report 2009-‐5027, USDOI and USGS.
Used the S-‐Plus statistical software program (designed to detect monotonic trends) to perform trend analyses on water quality data. Tests used were the Seasonal Kendall non-‐parametric test and the Tobit-‐regression parametric test. One of these tests was selected for each constituent. Flow-‐adjustment methods provided with the S-‐Plus software were used to eliminate effects of flow on WQ variability.
Hirsch RM, Slack JR, Smith RA. 1982. Techniques of trend analysis for monthly water quality data. Water Resources Research, 18(1):107-‐121.
Presents techniques for analysis of monotonic water quality trends that account for non-‐normal distributions, seasonality, flow-‐relatedness, censored values, and serial correlation. The Seasonal Kendall test, the Kendall slope estimator (an estimator of trend magnitude for skewed data), and flow-‐adjusted constituent concentrations coupled with the Seasonal Kendall test were explored. Concluded that the methods explored were useful for long time series, as it is useful to have a set of objective procedures that are powerful over a wide range of situations for identifying trends.
McLeod AI, Hipel KW, Bodo BA. 1991. Trend analysis methodology for water quality time series. Environmetrics, 2(2):169-‐200.
Developed a general trend analysis methodology for water quality time series. Designed for use with non-‐normal, positively skewed data, with seasonal variation and interdependence of water quality variables and flow. The approach is divided into two categories: graphical studies and trend tests. Trend tests included Mann-‐Kendall, Kruskal-‐Wallis, and Spearman's partial-‐rank correlation. It was found that (1) the Spearman test has high power for WQ trend testing with seasonality, (2) flow-‐adjusted WQ data can eliminate sampling bias, and (3) it is important to test for seasonality before applying a test such as the seasonal Mann-‐Kendall test (which is less powerful when seasonality is not present).
9
Renwick WH, Vanni MJ, Zhang Q, Patton J. 2008. Water quality trends and changing agricultural practices in a Midwest U.S. watershed, 1994-‐2006. J. Environmental Quality, 37:1862-‐1874.
Analyzed changes in farm management practices that were likely to have an effect on water quality. Used an auto-‐regressive moving average model to include effects of discharge and season on constituent concentrations. Also used LOWESS plots and analyses of changes in relation between discharge and concentration.
Yu Y, Zou S, Whittemore D. 1993. Non-‐parametric trend analysis of water quality data of rivers in Kansas. J. Hydrology, 150:61-‐80.
Four different non-‐parametric trend detection methods (Mann-‐Kendall, Seasonal Kendall, Sen's T test, Van Belle and Hughes Chi-‐square test) were used for a 9-‐year water quality dataset. The different methods were compared and were found to have practically equal power for datasets of at least 9 years in length. Lays out the steps for preliminary analyses (dist, dependence, seasonality, flow relatedness tests).
“R” Code and Outputs “R” code was developed to automate the creation of boxplots, time-series plots,
precipitation and flow plots, turbidity and flow plots, stacked barplots for toxicity
results, pie charts of regulatory exceedances, and summary statistics tables (see
Appendix for samples of these products).
Additionally, code was developed to analyze water quality trends using the
Seasonal Mann Kendall test from the Kendall package (A.I. McLeod 2011). For
sampling sites with sufficient data, each analyte was tested for long-term
monotonic trends using these routines. The data were tested for trends across
both the full dataset as well as the dataset divided into wet and dry months. The
dataset was divided by season due to potential differences in trends detected as
a result of the effects of seasonality on agricultural water quality data. Meaning
that irrigation during the dry season and precipitation during the wet season may
have differing effects on water quality. Another “R” routine was developed to
create summary figures of the Seasonal Mann Kendall test results for each
hydrologic unit.
10
The analyses, figures, and tables discussed above were produced for inclusion in
the 2013 CMP report. The “R” routines themselves will be used for reproduction
of these items in future annual reports.
Conclusion Through this internship I was able to gain and develop valuable knowledge,
skills, and experience in my field of interest. I learned a great deal about water
quality within agricultural watersheds on the Central Coast and gained the skills
necessary to evaluate a large water quality dataset in terms of general data
characterization, statistical trend analysis, and summary statistics. Through all of
this, I gained further expertise in the use of “R” and am more confident in my
technical skills and knowledge. These skills are extremely valuable to my
professional development and to the enhancement of future career opportunities.
11
References
A.I. McLeod (2011). Kendall: Kendall rank correlation and Mann-Kendall trend
test. R package version 2.2. http://CRAN.R-project.org/package=Kendall
R Core Team (2012). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
http://www.R-project.org/
12
Appendix – Samples of Figures and Graphs Produced A. Precipitation and flow time series plot.
B. Toxicity stacked proportional bar plot.
13
C. Turbidity and flow plots.
14
D. Pie charts showing proportions of regulatory exceedances.
15
E. Trend analysis summary
16
F. Analyte boxplots.
17
G. Analyte time series plots.