---
title: "Introduction"
csl: stylefile.csl
bibliography: refs.bib
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE, message = F, warning = F}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", message = F, warning = F, fig.align = 'center')
library(peptools)
library(ggplot2)
library(mapview)
library(dplyr)
library(sf)
mptyps <- c("CartoDB.Positron", "CartoDB.DarkMatter", "OpenStreetMap", "Esri.WorldImagery", "OpenTopoMap")
```
## Basic use
First we load the peptools package. Note that we do not need to install it every time we use R, but we do need to load the package.
```{r}
library(peptools)
```
The package includes a `pepstations` data object that includes metadata for each station, including lat/lon and bay segment. We can use the sf and dplyr package to make this a spatial data object, then plot it with mapview (you may need to install these packages if you don't have them). Note that the peptools package has built-in mapping functions, this example just shows how to create a map from scratch.
```{r}
library(sf)
library(dplyr)
library(mapview)
locs <- pepstations %>%
sf::st_as_sf(coords = c('Longitude', 'Latitude'), crs = 4326) %>%
dplyr::mutate(bay_segment = as.character(bay_segment))
mapview(locs, zcol = 'bay_segment', layer.name = 'Bay segment')
```
The `pepseg` data object also included with the package shows the polygons for the bay segments.
```{r}
mapview(pepseg)
```
The water quality data can be imported using the `read_pepwq()` function. A compressed folder that includes the data can be downloaded from [here](https://gis.suffolkcountyny.gov/portal/home/item.html?id=5d4b53ec44204219a8da685f1859e096). After the data are downloaded and extracted, the Excel file with the raw data is named "Peconics SCDHS WQ data - 1976 - 2021.xlsx", or something similar depending on when the data were downloaded. The location of this file on your computer is passed to the import function. Below, a local file renamed as "currentdata.xlsx" that contains the water quality data is imported.
```{r}
dat <- read_pepwq('../inst/extdata/currentdata.xlsx')
head(dat)
```
The raw data includes multiple fields, but only the chlorophyll, secchi, and total nitrogen data are retained for reporting. The data are in long format with the `name` column showing which observation (chlorophyll, secchi, or total nitrogen) the row `value` shows and the `status` showing if the observation was above or below detection (indicated as `>` or `<`, secchi only). Each station is also grouped by major bay segment, defined as `r levels(dat$bay_segment)`.
A quick view of the number of observations and length of record at each station shows that effort was not continuous.
```{r, fig.height = 8, fig.width = 11}
library(ggplot2)
toplo <- dat %>%
select(bay_segment, BayStation, yr, name, value) %>%
group_by(bay_segment, BayStation, yr, name) %>%
summarise(`Obs. (n)` = n())
p <- ggplot(toplo, aes(x = yr, y = BayStation, fill = `Obs. (n)`)) +
geom_tile()+ #colour = 'lightgrey') +
facet_grid(bay_segment ~ name, scales = 'free_y', space = 'free_y') +
theme_bw() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor= element_blank(),
strip.background = element_blank(),
axis.title.x = element_blank(),
legend.position = 'top',
axis.text.y = element_text(size = 7)
) +
scale_fill_viridis_c()
p
```
The function `anlz_pepdat()` summarizes the station data by bay segment. The function returns annual medians and monthly medians by year for chlorophyll, secchi depth, and total nitrogen. The chlorophyll and secchi summaries are then used to determine if bay segment targets for water quality are met using the `anlz_attain()` and `anlz_attainpep()` functions.
Below shows how to use `anlz_pepdat()` to summarize the data by bay segment to estimate annual and monthly medians for chlorophyll, secchi depth, and total nitrogen. The output is a two-element list for the annual (ann) and monthly (mos) medians by segment.
```{r}
medpep <- anlz_medpep(dat)
medpep
```
This output can then be further analyzed with `anlz_attainpep()` to determine if the bay segment outcomes are met in each year for chlorophyll and secchi depth. The results are used by the plotting functions described below. In short, the `chl_sd` column indicates the categorical outcome for chlorophyll and secchi for each segment. The outcomes are integer values from zero to three. The relative exceedances of water quality thresholds for each segment, both in duration and magnitude, are indicated by higher integer values.
```{r}
anlz_attainpep(medpep)
```
## Plotting
The plotting functions are used to view station data, long-term trends for each bay segment, and annual results for the overall water quality assessment.
The `show_sitemappep()` function produces an interactive map of medians of water quality conditions at each station. Medians can be shown for chlorophyll, secchi depth, or total nitrogen and for all stations or only stations from selected bay segments. Each point on the map shows the median for the parameter, with the size and color of the point in proportion to the other median values shown on the map. The color scale for the median shows higher concentrations of chlorophyll/total nitrogen or shallower secchi depths in red and lower concentrations of chlorophyll/total nitrogen or deeper secchi depths in blue. Hovering the mouse pointer over a site location will indicate the site name and the median value. Clicking on a station point will reveal the underlying plot data.
Here, the 2020 chlorophyll medians are shown for stations in all bay segments.
```{r}
show_sitemappep(dat, yrsel = 2021)
```
Medians for a given month can also be shown.
```{r}
show_sitemappep(dat, mosel = 7)
```
A different year, parameter, and bay segment can also be chosen. Note that the size and color ramps are reversed for secchi depth, such that smaller and bluer points indicate larger secchi values.
```{r}
show_sitemappep(dat, yrsel = 2010, param = 'sd', bay_segment = c('2', '3'))
```
By default, the color and size scaling in `show_sitemappep()` is relative to only the points on the map. You can view scaling relative to all values in the dataset (across time and space) to get a sense of how the values for the selected year compare to the rest of the record. This can be done by changing the `relative` argument to `TRUE`.
```{r}
show_sitemappep(dat, yrsel = 2021, relative = T)
```
The scaling is also sensitive to outliers. The default is to use the maximum scaling as the 99th percentile value observed in the entire dataset for the chosen parameter. Otherwise, results that are visually difficult to interpret will be returned if the absolute maximum value is used to set the scale. If, however, you want to see scaling relative to a smaller quantile, you can change the value with the `maxrel` argument. The size and color ramps will be scaled to the defined upper quantile value. The actual observed value at a point will always be visible on mouseover.
```{r}
show_sitemappep(dat, yrsel = 2021, relative = T, maxrel = 0.8)
```
The `show_thrpep()` function provides a more descriptive assessment of annual trends for a chosen bay segment relative to thresholds. In this plot, we show the annual medians and non-parametric confidence internals (95%) across stations for a segment. The red line shows annual trends and the horizontal blue line indicates the threshold for chlorophyll-a.
```{r, fig.height = 5, fig.width = 8}
show_thrpep(dat, bay_segment = "1a", param = "chla")
```
We can show the same plot but for secchi depth by changing the `param = "chla"` to `param = "sd"`. Note the change in the horizontal reference lines for the secchi depth target. Secchi trends must also be interpreted inversely to chlorophyll, such that lower values generally indicate less desirable water quality.
```{r, fig.height = 5, fig.width = 8}
show_thrpep(dat, bay_segment = "1a", param = "sd")
```
Total nitrogen can also be plotted.
```{r, fig.height = 5, fig.width = 8}
show_thrpep(dat, bay_segment = "1a", param = "tn")
```
The year range to plot can also be specified using the `yrrng` argument, where the default is `yrrng = c(1990, 2020)`.
```{r, fig.height = 5, fig.width = 8}
show_thrpep(dat, bay_segment = "1a", param = "chla", yrrng = c(1976, 2021))
```
The `show_allthrpep()` function is a slight modification of `show_thrpep()` that shows annual median results for all bay segments combined. It accepts similar arguments as `show_thrpep()`. Note that threshold values for chlorophyll-a and Secchi depth can also be shown because the thresholds do not vary between segments. A modification fo the function will be required if these thresholds are modified for specific bay segments.
```{r, fig.height = 5, fig.width = 8}
show_allthrpep(dat, param = "chla")
```
The `show_thrpep()` function uses results from the `anlz_medpep()` function. For example, you can retrieve the values from the above plot as follows:
```{r}
dat %>%
anlz_medpep %>%
.[['ann']] %>%
filter(bay_segment == '1a') %>%
filter(var == 'chla') %>%
filter(yr >= 1988 & yr <= 2021)
```
Similarly, the `show_boxpep()` function provides an assessment of seasonal changes in chlorophyll, secchi depth, or total nitrogen values by bay segment. The most recent year is highlighted in red by default. This allows a simple evaluation of how the most recent year compared to historical averages. The threshold value is shown in blue text and as the dotted line (not shown for total nitrogen). This is the same dotted line shown in `show_thrpep()`.
```{r, fig.height = 5, fig.width = 8}
show_boxpep(dat, param = 'chla', bay_segment = "1a")
show_boxpep(dat, param = 'sd', bay_segment = "3")
```
A different subset of years and selected year of interest can also be viewed by changing the `yrrng` and `yrsel` arguments. Here we show 1980 compared to monthly averages for the last ten years.
```{r, fig.height = 5, fig.width = 8}
show_boxpep(dat, param = 'chla', bay_segment = "1a", yrrng = c(2008, 2018), yrsel = 1990)
```
The `show_thrpep()` function is useful to understand annual variation in chlorophyll, secchi, or total nitrogen in each bay segment. For chlorophyll and secchi, the information from these plots can provide an understanding of how the annual reporting outcomes are determined. An outcome integer from zero to three is assigned to each bay segment for each annual estimate of chlorophyll and secchi depth. These outcomes are based on both the exceedance of the annual estimate above the threshold (blue lines in `show_thrpep()`) and duration of the exceedance for the years prior. The following graphic describes this logic [@Janicki99].
```{r, echo = F, fig.cap = 'Outcomes for annual estimates of water quality are assigned an integer value from zero to three depending on both magnitude and duration of the exceedence.', out.width = '80%'}
knitr::include_graphics('outints.png')
```
For the Peconic Estuary, light attenuation is replaced with Secchi depth. The outcomes above are assigned for both chlorophyll and secchi depth. The duration criteria are determined based on whether the exceedance was observed for years prior to the current year. The exceedance criteria for chlorophyll and light-attenuation are currently the same for each segment. The peptools package contains a `peptargets` data file that is a reference for determining annual outcomes. This file is loaded automatically with the package and can be viewed from the command line.
```{r}
peptargets
```
The final plotting function is `show_matrixpep()`, which creates an annual reporting matrix that reflects the combined outcomes for chlorophyll and secchi depth. Tracking the attainment outcomes provides the framework from which bay management actions can be developed and initiated. For each year and segment, a color-coded management action is assigned:
__Stay the Course__: Continue planned projects. Report data via annual progress reports and Baywide Environmental Monitoring Report.
__Caution__: Review monitoring data and nitrogen loading estimates. Begin/continue TAC and Management Board development of specific management recommendations.
__On Alert__: Finalize development and implement appropriate management actions to get back on track.
The management category or action is based on the combination of outcomes for chlorophyll and secchi depth [@Janicki99].
```{r, echo = F, fig.cap = 'Management action categories assigned to each bay segment and year based on chlorophyll and Secchi depth outcomes.', out.width = '80%'}
knitr::include_graphics('matrixcats.png')
```
The results can be viewed with `show_matrixpep()`.
```{r, fig.height = 8, fig.width = 3}
show_matrixpep(dat)
```
The matrix is also a `ggplot` object and its layout can be changed using `ggplot` elements. Note the use of `txtsz = NULL` to remove the color labels.
```{r, fig.height = 1.5, fig.width = 8}
show_matrixpep(dat, txtsz = NULL) +
scale_y_continuous(expand = c(0,0), breaks = c(1976:2021)) +
coord_flip() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7))
```
If preferred, the matrix can also be returned in an HTML table that can be sorted and scrolled.
```{r}
show_matrixpep(dat, asreact = TRUE)
```
Use a sufficiently large number to view the entire matrix.
```{r}
show_matrixpep(dat, asreact = TRUE, nrows = 200)
```
Bay segment exceedances can also be viewed in a matrix using `show_wqmatrixpep()`. The not met/met outcome categories indicate if the median was above/below the threshold. However, the small/large exceedances used for the overall report card depend on degree of overlap of the confidence intervals with the threshold. The matrix outcome below is a simplification that shows a binary outcome (not met/met) for location of the median relative to the threshold.
```{r, fig.height = 8, fig.width = 3}
show_wqmatrixpep(dat)
```
By default, the `show_wqmatrixpep()` function returns chlorophyll exceedances by segment. Secchi depth exceedances can be viewed by changing the `param` argument. Note that exceedances are reversed, i.e., lower values are considered less desirable water quality conditions for Secchi.
```{r, fig.height = 8, fig.width = 3}
show_wqmatrixpep(dat, param = 'sd')
```
The results from `show_matrixpep()` for both secchi and chlorophyll can be combined for an individual segment using the `show_segmatrixpep()` function. Only one segment can be plotted for each function call.
```{r, fig.height = 8, fig.width = 2.5}
show_segmatrixpep(dat, bay_segment = '1a')
```
Finally, all plots for a selected segment can be shown together using the `show_plotlypep()` function that combines chlorophyll and secchi data. This function combines outputs from `show_thrpep()` and `show_segmatrixpep()`. The final plot is interactive and can be zoomed by dragging the mouse pointer over a section of the plot. Information about each cell or value can be seen by hovering over a location in the plot.
```{r, out.width = '100%', fig.height = 6, fig.width = 11}
show_plotlypep(rawdat, bay_segment = '1a')
```
## Testing different thresholds
By default, all plotting functions use the `peptargets` data frame included with the package, which assigns a threshold of 6.5 ft for secchi depth and 5.5 ug/L for chlorophyll to all segments. All plotting arguments have an optional argument called `trgs` that accepts user-provided thresholds. A new data frame can be passed to this argument to evaluate different thresholds. The following demonstrates how to create a custom thresholds data frame (a tibble specifically) and use it to evaluate changes on reporting outcomes. For examples, perhaps less stringent thresholds are required for the 1a segment (lower secchi, higher chlorophyll) and more stringent thresholds are required for the 3 segment (higher secchi, lower chlorophyll).
```{r}
segs <- c('1a', '1b', '2', '3')
newtrgs <- data.frame(
bay_segment = factor(segs, levels = segs),
name = factor(segs, levels = segs),
sd_thresh = c(5.0, 5.5, 6.5, 7.5),
chla_thresh = c(6.5, 6, 5.5, 5),
stringsAsFactors = F
)
```
This new data frame can be passed to the plotting functions.
```{r, fig.height = 7, fig.width = 3}
newthr <- show_matrixpep(dat, trgs = newtrgs)
newthr
```
Comparing the default values with the new results can easily be done by plotting the two side by side.
```{r, fig.height = 7, fig.width = 6}
library(patchwork)
oldthr <- show_matrixpep(dat)
oldthr + newthr + plot_layout(ncol = 2)
```
# References