Getting started

Installation

Install the package from r-universe as follows. The source code is available on GitHub.

# Install tbeploads in R:
install.packages('tbeploads', repos = c('https://tbep-tech.r-universe.dev', 'https://cloud.r-project.org'))

Load the package in an R session after installation:

library(tbeploads)

Usage

Load estimates are broadly defined as domestic point source (DPS), industrial point source (IPS), material losses (ML), nonpoint source (NPS), atmospheric deposition (AD), and groundwater sources and springs (GW). The functions are built around these sources with unique inputs for each.

DPS

The domestic point source (DPS) functions are designed to work with raw entity data provided by partners. The core function is anlz_dps_facility() that requires only a vector of file paths as input, where each path points to a file with monthly parameter concentration (mg/L) and flow data (million gallons per day). The data also describe whether the observations are end of pipe (direct inflow to the bay) or reuse (applied to the land), with each defined by outfall Ids typically noted as D-001, D-002, etc. and R-001, R-002, etc, respectively. Both are estimated as concentration times flow, whereas reuse includes an attenuation factor for land application depending on location. The file names must follow a specific convention, where metadata for each entity is found in the facilities() data object using information in the file name.

For convenience, four example files are included with the package. These files represent actual entities and facilities, but the data have been randomized. The paths to these files are used as input to the function. Non-trivial data pre-processing and quality control is needed for each file and those included in the package are the correct format. The output is returned as tons per month for TN, TP, TSS, and BOD and million cubic meters per month for flow (hy).

dpsfls <- list.files(system.file('extdata/', package = 'tbeploads'),
  pattern = 'ps_dom', full.names = TRUE)
anlz_dps_facility(dpsfls)
#> # A tibble: 144 × 11
#>     Year Month entity  facility coastco source tn_load tp_load tss_load bod_load
#>    <int> <int> <chr>   <chr>    <chr>   <chr>    <dbl>   <dbl>    <dbl>    <dbl>
#>  1  2021     1 Clearw… City of… 387     D-001    0.862  6.04     0.452     1.14 
#>  2  2021     2 Clearw… City of… 387     D-001    1.37   1.15     0.553     2.87 
#>  3  2021     3 Clearw… City of… 387     D-001    1.70   0.575    0.982     3.31 
#>  4  2021     4 Clearw… City of… 387     D-001    0.344  0.0801   0.105     0.656
#>  5  2021     5 Clearw… City of… 387     D-001    1.09   0.460    0.603     2.23 
#>  6  2021     6 Clearw… City of… 387     D-001    0.641  0.435    0.248     0.588
#>  7  2021     7 Clearw… City of… 387     D-001    0.667  0.183    0.360     1.33 
#>  8  2021     8 Clearw… City of… 387     D-001    0.138  0.164    0.0601    0.327
#>  9  2021     9 Clearw… City of… 387     D-001    0.421  1.63     0.281     0.655
#> 10  2021    10 Clearw… City of… 387     D-001    1.15   0.217    0.563     0.666
#> # ℹ 134 more rows
#> # ℹ 1 more variable: hy_load <dbl>

The anlz_dps() function uses anlz_dps_facility() to summarize the DPS results by location as facility (combines outfall data), entity (combines facility data), bay segment (combines entity data), and as all (combines bay segment data). The results can also be temporally summarized as monthly or annual totals. The location summary is defined by the summ argument and the temporal summary is defined by the summtime argument. The fls argument used by anlz_dps_facility() is also used by anlz_dps(). The output is tons per month for TN, TP, TSS, and BOD and as million cubic meters per month for flow (hy) if summtime = 'month' or tons per year for TN, TP, TSS, and BOD and million cubic meters per year for flow (hy) if summtime = 'year'.

# combine by entity and month
anlz_dps(dpsfls, summ = 'entity', summtime = 'month')
#> # A tibble: 108 × 10
#>     Year Month source   entity segment tn_load tp_load tss_load bod_load hy_load
#>    <int> <int> <chr>    <chr>  <chr>     <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
#>  1  2021     1 DPS - e… Clear… Old Ta…  0.862  6.04     0.452    1.14     0.663 
#>  2  2021     1 DPS - r… Clear… Old Ta…  0.102  0.00852  0.00844  0.0473   0.165 
#>  3  2021     2 DPS - e… Clear… Old Ta…  1.37   1.15     0.553    2.87     0.948 
#>  4  2021     2 DPS - r… Clear… Old Ta…  0.210  0.0416   0.0169   0.0244   0.319 
#>  5  2021     3 DPS - e… Clear… Old Ta…  1.70   0.575    0.982    3.31     1.83  
#>  6  2021     3 DPS - r… Clear… Old Ta…  0.261  0.0439   0.0194   0.0554   0.417 
#>  7  2021     4 DPS - e… Clear… Old Ta…  0.344  0.0801   0.105    0.656    0.194 
#>  8  2021     4 DPS - r… Clear… Old Ta…  0.0472 0.00602  0.00245  0.00499  0.0563
#>  9  2021     5 DPS - e… Clear… Old Ta…  1.09   0.460    0.603    2.23     0.874 
#> 10  2021     5 DPS - r… Clear… Old Ta…  0.0354 0.00281  0.00298  0.00885  0.0729
#> # ℹ 98 more rows

# combine by bay segment and year
anlz_dps(dpsfls, summ = "segment", summtime = "year")
#> # A tibble: 9 × 8
#>    Year source            segment      tn_load tp_load tss_load bod_load hy_load
#>   <int> <chr>             <chr>          <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
#> 1  2017 DPS - end of pipe Hillsboroug…   0.891 1.20e-1  2.90e-1  6.92e-1 5.25e-1
#> 2  2017 DPS - reuse       Hillsboroug… 464.    2.94e+1  5.10e+1  1.70e+2 1.10e+3
#> 3  2018 DPS - end of pipe Hillsboroug… 102.    1.49e+1  3.58e+1  8.24e+1 6.43e+1
#> 4  2018 DPS - reuse       Hillsboroug…  28.6   8.80e-1  1.82e+0  4.88e+0 3.95e+1
#> 5  2019 DPS - end of pipe Hillsboroug…  14.4   1.63e+0  2.92e+0  9.17e+0 6.55e+0
#> 6  2019 DPS - reuse       Hillsboroug…   4.21  8.47e-2  1.33e-1  5.20e-1 3.82e+0
#> 7  2021 DPS - reuse       Hillsboroug…   2.90  1.61e-9  1.61e-9  1.61e-9 1.76e+0
#> 8  2021 DPS - end of pipe Old Tampa B…   8.62  1.11e+1  4.34e+0  1.43e+1 7.35e+0
#> 9  2021 DPS - reuse       Old Tampa B…   1.48  2.35e-1  1.19e-1  7.17e-1 2.39e+0

IPS

The industrial point source (IPS) functions are designed to work with raw entity data provided by partners and are similar in functionality to the DPS functions. The core function is anlz_ips_facility() that requires only a vector of file paths as input, where each path points to a file with monthly parameter concentration (mg/L) and flow data (million gallons per day). Loads are estimated as concentration times flow. The file names must follow a specific convention, where metadata for each entity is found in the facilities() data object using information in the file name.

For convenience, four example files are included with the package. These files represent actual entities and facilities, but the data have been randomized. The paths to these files are used as input to the function. As before, non-trivial data pre-processing and quality control is needed for each file and those included in the package are the correct format. The output is returned as tons per month for TN, TP, TSS, and BOD and million cubic meters per month for flow (hy).

ipsfls <- list.files(system.file('extdata/', package = 'tbeploads'),
  pattern = 'ps_ind_', full.names = TRUE)
anlz_ips_facility(ipsfls)
#> # A tibble: 60 × 11
#>     Year Month entity  facility coastco source tn_load tp_load tss_load bod_load
#>    <int> <int> <chr>   <chr>    <chr>   <chr>    <dbl>   <dbl>    <dbl>    <dbl>
#>  1  2020     1 Busch … Busch G… 191a    D-002   0.0195 0.00253    0.443    0.850
#>  2  2020     2 Busch … Busch G… 191a    D-002   0.0306 0.00328    0.544    1.04 
#>  3  2020     3 Busch … Busch G… 191a    D-002   0.0972 0.00925    1.39     2.68 
#>  4  2020     4 Busch … Busch G… 191a    D-002   0.0398 0.0258     0.522    1.00 
#>  5  2020     5 Busch … Busch G… 191a    D-002   0.0820 0.00514    0.506    0.971
#>  6  2020     6 Busch … Busch G… 191a    D-002   0.0112 0.00594    0.355    0.681
#>  7  2020     7 Busch … Busch G… 191a    D-002   0.0430 0.00413    0.506    0.971
#>  8  2020     8 Busch … Busch G… 191a    D-002   0.0167 0.00225    0.199    0.382
#>  9  2020     9 Busch … Busch G… 191a    D-002   0.0226 0.00307    0.332    0.638
#> 10  2020    10 Busch … Busch G… 191a    D-002   0.0187 0.0186     0.333    0.638
#> # ℹ 50 more rows
#> # ℹ 1 more variable: hy_load <dbl>

The anlz_ips() function uses anlz_ips_facility() to summarize the IPS results by location as facility (combines outfall data), entity (combines facility data), bay segment (combines entity data), and as all (combines bay segment data). The results can also be temporally summarized as monthly or annual totals. The location summary is defined by the summ argument and the temporal summary is defined by the summtime argument. The fls argument used by anlz_ips_facility() is also used by anlz_ips(). The output is tons per month for TN, TP, TSS, and BOD and as million cubic meters per month for flow (hy) if summtime = 'month' or tons per year for TN, TP, TSS, and BOD and million cubic meters per year for flow (hy) if summtime = 'year'.

# combine by entity and month
anlz_ips(ipsfls, summ = 'entity', summtime = 'month')
#> # A tibble: 60 × 10
#>     Year Month source entity   segment tn_load tp_load tss_load bod_load hy_load
#>    <int> <int> <chr>  <chr>    <chr>     <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
#>  1  2020     1 IPS    Busch G… Hillsb…  0.0195 0.00253    0.443    0.850  0.0804
#>  2  2020     2 IPS    Busch G… Hillsb…  0.0306 0.00328    0.544    1.04   0.0987
#>  3  2020     3 IPS    Busch G… Hillsb…  0.0972 0.00925    1.39     2.68   0.253 
#>  4  2020     4 IPS    Busch G… Hillsb…  0.0398 0.0258     0.522    1.00   0.0946
#>  5  2020     5 IPS    Busch G… Hillsb…  0.0820 0.00514    0.506    0.971  0.0917
#>  6  2020     6 IPS    Busch G… Hillsb…  0.0112 0.00594    0.355    0.681  0.0644
#>  7  2020     7 IPS    Busch G… Hillsb…  0.0430 0.00413    0.506    0.971  0.0918
#>  8  2020     8 IPS    Busch G… Hillsb…  0.0167 0.00225    0.199    0.382  0.0361
#>  9  2020     9 IPS    Busch G… Hillsb…  0.0226 0.00307    0.332    0.638  0.0603
#> 10  2020    10 IPS    Busch G… Hillsb…  0.0187 0.0186     0.333    0.638  0.0603
#> # ℹ 50 more rows

# combine by bay segment and year
anlz_ips(ipsfls, summ = "segment", summtime = "year")
#> # A tibble: 5 × 8
#>    Year source segment          tn_load tp_load tss_load bod_load hy_load
#>   <int> <chr>  <chr>              <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
#> 1  2017 IPS    Hillsborough Bay  0.215   0.0612   0           0    0.188 
#> 2  2018 IPS    Hillsborough Bay  0.168   0.0456   0           0    0.140 
#> 3  2019 IPS    Hillsborough Bay  0.0950  0.0226   0           0    0.0763
#> 4  2020 IPS    Hillsborough Bay  0.437   0.0858   6.11       11.7  1.11  
#> 5  2021 IPS    Hillsborough Bay  0.0305  0.0515   0.0662      0    0.0184

ML

Material losses (ML) are estimates of nutrient loads to the bay primarily from fertilizer shipping activities at ports. Historically, loadings from material losses were much higher than at present. Only a few entities report material losses, typically as a total for the year and only for total nitrogen. The material losses as tons/yr are estimated from the tons shipped using an agreed upon loss rate. Values reported in the example files represent the estimated loss as the total tons of N shipped each year multiplied by 0.0023 and divided by 2000. The total N shipped at a facility each year can be obtained using a simple back-calculation (multiply by 2000, divide by 0.0023).

The core function is anlz_ml_facility() that requires only a vector of file paths as input, where each file should be one row per year per facility, where the row shows the total tons per year of total nitrogen loss. The file names must follow a specific convention, where metadata for each entity is found in the facilities() data object using information in the file name.

For convenience, four example files are included with the package. These files represent actual entities and facilities, but the data have been randomized. The paths to these files are used as input to the function. The output is nearly identical to the input data since no load calculations are used, except results are shown as monthly load as the annual loss divided by 12. Additional empty columns (e.g., TP load, TSS load, etc.) are also returned for consistency of reporting with other loading sources.

mlfls <- list.files(system.file('extdata/', package = 'tbeploads'),
  pattern = 'ps_indml', full.names = TRUE)
anlz_ml_facility(mlfls)
#> # A tibble: 60 × 11
#>     Year Month entity  facility coastco source tn_load tp_load tss_load bod_load
#>    <int> <int> <chr>   <chr>    <chr>   <lgl>    <dbl> <lgl>   <lgl>    <lgl>   
#>  1  2017     1 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  2  2017     2 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  3  2017     3 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  4  2017     4 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  5  2017     5 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  6  2017     6 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  7  2017     7 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  8  2017     8 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#>  9  2017     9 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#> 10  2017    10 Kinder… Kinder … <NA>    NA      0.0155 NA      NA       NA      
#> # ℹ 50 more rows
#> # ℹ 1 more variable: hy_load <lgl>

The anlz_ml() function uses anlz_ml_facility() to summarize the IPS results by location as facility, entity (combines facility data), bay segment (combines entity data), and as all (combines bay segment data). The results can also be temporally summarized as monthly or annual totals. The location summary is defined by the summ argument and the temporal summary is defined by the summtime argument. The fls argument used by anlz_ml_facility() is also used by anlz_ml(). The output is tons per month of TN if summtime = 'month' or tons per year of TN if summtime = 'year'. Columns for TP, TSS, BOD, and hydrologic load are also returned with zero load for consistency with other point source load calculation functions. Material loss loads are often combined with IPS loads for reporting.

# combine by entity and month
anlz_ml(mlfls, summ = 'entity', summtime = 'month')
#> # A tibble: 60 × 10
#>     Year Month source entity segment   tn_load tp_load tss_load bod_load hy_load
#>    <int> <int> <chr>  <chr>  <chr>       <dbl>   <int>    <int>    <int>   <int>
#>  1  2020     1 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  2  2020     2 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  3  2020     3 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  4  2020     4 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  5  2020     5 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  6  2020     6 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  7  2020     7 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  8  2020     8 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#>  9  2020     9 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#> 10  2020    10 ML     CSX    Hillsbor…  0.0743       0        0        0       0
#> # ℹ 50 more rows

# combine by bay segment and year
anlz_ml(mlfls, summ = "segment", summtime = "year")
#> # A tibble: 5 × 8
#>    Year source segment          tn_load tp_load tss_load bod_load hy_load
#>   <int> <chr>  <chr>              <dbl>   <int>    <int>    <int>   <int>
#> 1  2017 ML     Hillsborough Bay  0.186        0        0        0       0
#> 2  2018 ML     Hillsborough Bay  0.188        0        0        0       0
#> 3  2019 ML     Hillsborough Bay  0.0224       0        0        0       0
#> 4  2020 ML     Hillsborough Bay  0.892        0        0        0       0
#> 5  2021 ML     Hillsborough Bay  0.989        0        0        0       0