Getting started with agriReg

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

agriReg Authors

2026-03-27

Overview

agriReg provides a streamlined workflow for the most common regression tasks in agricultural research:

Task	Function
Clean field data	`clean_agri_data()`
Normalise yield	`yield_normalize()`
Flag outliers	`outlier_flag()`
Linear / mixed model	`fit_linear()`
Polynomial selection	`fit_polynomial()`
Nonlinear growth curves	`fit_nonlinear()`
Optimum dose / rate	`optimum_dose()`
Dose-response (LL.4/5)	`fit_dose_response()`
Effective dose (ED50…)	`ed_estimates()`
Compare models	`compare_models()`
Goodness-of-fit stats	`gof_stats()`
Residual diagnostics	`residual_check()`

1 · Data preparation

Load one of the bundled example datasets and inspect it.

wheat <- load_example_data("wheat_trial")
head(wheat)
#>   block nitrogen phosphorus variety yield biomass
#> 1     A        0        low      V1 3.362   4.935
#> 2     A        0        low      V2 3.746   5.826
#> 3     A        0       high      V1 3.799   5.679
#> 4     A        0       high      V2 4.239   6.296
#> 5     A       60        low      V1 5.408   7.625
#> 6     A       60        low      V2 5.381   7.965

Cleaning and outlier detection

wheat_clean <- clean_agri_data(wheat,
                                yield_col     = "yield",
                                flag_outliers = TRUE)
summary(wheat_clean)
#> agriData summary -- response: 'yield'
#> 
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   3.343   5.041   6.097   5.719   6.703   7.328 
#> 
#> Outliers flagged: 0 / 48

Normalisation

Normalise yield within each block using z-scoring:

wheat_norm <- yield_normalize(wheat_clean,
                               yield_col = "yield",
                               method    = "zscore",
                               group_by  = "block")
head(wheat_norm[, c("block", "yield", "yield_norm")])
#> agriData object
#>   Rows      : 6
#>   Columns   : 3
#> 
#> First 6 rows:
#>   block yield yield_norm
#> 1     A 3.362 -1.8258491
#> 2     A 3.746 -1.5347896
#> 3     A 3.799 -1.4946173
#> 4     A 4.239 -1.1611116
#> 5     A 5.408 -0.2750475
#> 6     A 5.381 -0.2955126

Advanced outlier flagging

The standalone outlier_flag() supports three methods:

wheat_clean <- outlier_flag(wheat_clean, col = "yield", method = "modz")
table(wheat_clean$yield_outlier)
#> 
#> FALSE 
#>    48

2 · Linear models

Simple OLS

m_lm <- fit_linear(wheat_clean, "yield ~ nitrogen + phosphorus")
print(m_lm)
#> ==============================
#>  agriReg: Linear Model (agriLM)
#> ==============================
#> Engine  : lm 
#> Formula : yield ~ nitrogen + phosphorus 
#> 
#> Coefficients:
#>   (Intercept)      nitrogen phosphoruslow 
#>     4.4240083     0.0167675    -0.4274167 
#> 
#> R2: 0.8784  |  Adj-R2: 0.8701  |  RMSE: 0.4261

Diagnostic plots

The plot() method returns ggplot2 objects — pipe them, save them, or patch them together with patchwork.

plot(m_lm, type = "fit")

plot(m_lm, type = "residuals")

plot(m_lm, type = "qq")

Full residual panel

residual_check(m_lm)

Mixed-effects model

Add a random intercept per block to account for field heterogeneity:

m_lmer <- fit_linear(wheat_clean,
                     formula = "yield ~ nitrogen + phosphorus + variety",
                     random  = "(1|block)")
summary(m_lmer)
#> ==============================
#>  agriReg: Linear Model Summary
#> ==============================
#> Engine : lmer 
#> 
#> Formula: yield ~ nitrogen + phosphorus + variety 
#> Random : (1|block) 
#> 
#> Fixed effects:
#>   (Intercept)      nitrogen phosphoruslow     varietyV2 
#>     4.2722167     0.0167675    -0.4274167     0.3035833 
#> 
#> Random effects (variance components):
#>       grp        var1     vcov  sdcor
#>     block (Intercept) 0.000000 0.0000
#>  Residual        <NA> 0.172902 0.4158
#> 
#> Number of obs: 48 | Groups: block, Residual 
#> 
#> --- Goodness-of-fit ---
#> R2       : 0.8938
#> Adj-R2   : 0.8915
#> RMSE     : 0.3981
#> MAE      : 0.3771
#> AIC      : 80.77
#> BIC      : 92.00

Polynomial selection

Automatically test degrees 1–3 and return the best-fitting model:

m_poly <- fit_polynomial(wheat_clean,
                          x_col      = "nitrogen",
                          y_col      = "yield",
                          max_degree = 3)
coef(m_poly)
#>   (Intercept) I(nitrogen^1) I(nitrogen^2) 
#>  3.8360916667  0.0354779167 -0.0001039468

3 · Nonlinear models

Load the maize growth dataset:

maize <- load_example_data("maize_growth")
# Use well-watered plants only
maize_ww <- maize[maize$treatment == "WW", ]
head(maize_ww)
#>   plant_id days biomass_g treatment
#> 1        1    1      0.00        WW
#> 2        1    2     10.98        WW
#> 3        1    3      0.00        WW
#> 4        1    4     14.10        WW
#> 5        1    5      0.00        WW
#> 6        1    6      0.28        WW

Logistic growth

The 3-parameter logistic is the most common growth curve in agronomy. Parameter interpretation:

Asym: upper asymptote (maximum biomass)
xmid: inflection point (day of maximum growth rate)
scal: scale parameter (related to the steepness)

m_log <- fit_nonlinear(maize_ww,
                        x_col = "days",
                        y_col = "biomass_g",
                        model = "logistic")
summary(m_log)
#> ==================================
#>  agriReg: Nonlinear Model Summary
#> ==================================
#> Model : logistic 
#> 
#> 
#> Formula: biomass_g ~ Asym/(1 + exp((xmid - days)/scal))
#> 
#> Parameters:
#>      Estimate Std. Error t value Pr(>|t|)    
#> Asym 497.8621     2.1803  228.34   <2e-16 ***
#> xmid  47.0653     0.1869  251.81   <2e-16 ***
#> scal   8.5507     0.1569   54.52   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 23.27 on 497 degrees of freedom
#> 
#> Number of iterations to convergence: 5 
#> Achieved convergence tolerance: 2.749e-07
#> 
#> 
#> --- Goodness-of-fit ---
#> R2   : 0.9869
#> RMSE : 23.1983
#> MAE  : 17.2210
#> AIC  : 4571.02
#> BIC  : 4587.88

plot(m_log)

Gompertz curve

The Gompertz is often a better fit for biological growth that is asymmetric around the inflection point:

m_gomp <- fit_nonlinear(maize_ww, "days", "biomass_g", "gompertz")
coef(m_gomp)

Asymptotic (Mitscherlich) model

Classic model for nutrient-response data where yield approaches a plateau:

m_asym <- fit_nonlinear(wheat_clean,
                         x_col = "nitrogen",
                         y_col = "yield",
                         model = "asymptotic")
plot(m_asym)

Quadratic and optimum dose

The quadratic polynomial is widely used to find the economic optimum nitrogen rate:

m_quad <- fit_nonlinear(wheat_clean,
                         x_col = "nitrogen",
                         y_col = "yield",
                         model = "quadratic")

opt <- optimum_dose(m_quad)
cat(sprintf("Optimum nitrogen rate : %.1f kg/ha\n", opt["x_opt"]))
#> Optimum nitrogen rate : 170.7 kg/ha
cat(sprintf("Predicted max yield   : %.2f t/ha\n",  opt["y_max"]))
#> Predicted max yield   : 6.86 t/ha

Linear-plateau model

Suitable when yield increases linearly up to a critical input level and then plateaus (common for phosphorus response):

m_lp <- fit_nonlinear(wheat_clean,
                       x_col = "nitrogen",
                       y_col = "yield",
                       model = "linear_plateau")
cat("Critical point (plateau onset):", optimum_dose(m_lp)["x_opt"], "kg/ha\n")
#> Critical point (plateau onset): 94.86183 kg/ha

4 · Dose-response models

Load the herbicide dataset:

herb <- load_example_data("herbicide_trial")
# Subset to one species
amaranth <- herb[herb$species == "Amaranth", ]

Fit a 4-parameter log-logistic curve

The LL.4 model is the industry standard for herbicide dose-response:

\[y = c + \frac{d - c}{1 + \exp(b(\log(x) - \log(e)))}\]

where b = slope, c = lower asymptote, d = upper asymptote, e = ED50.

m_dr <- fit_dose_response(amaranth,
                           dose_col = "dose_g_ha",
                           resp_col = "weed_control_pct",
                           fct      = "LL.4")
summary(m_dr)
#> ==================================
#>  agriReg: Dose-Response Summary
#> ==================================
#> Model   : LL.4 
#> Dose    : dose_g_ha 
#> Response: weed_control_pct 
#> 
#> 
#> Model fitted: Log-logistic (ED50 as parameter) (4 parms)
#> 
#> Parameter estimates:
#> 
#>                Estimate Std. Error  t-value p-value    
#> b:(Intercept)  1.905897   0.095867  19.8807 < 2e-16 ***
#> c:(Intercept)  1.576654   0.799879   1.9711 0.05603 .  
#> d:(Intercept) 97.986892   0.875390 111.9351 < 2e-16 ***
#> e:(Intercept) 82.259133   2.334523  35.2360 < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error:
#> 
#>  2.74034 (38 degrees of freedom)
#> 
#> ED10 / ED50 / ED90:
#> 
#> Estimated effective doses
#> 
#>        Estimate Std. Error    Lower    Upper
#> e:1:10  25.9720     1.7769  22.3749  29.5691
#> e:1:50  82.2591     2.3345  77.5331  86.9851
#> e:1:90 260.5332    15.7523 228.6443 292.4221
#>         Estimate Std. Error     Lower     Upper
#> e:1:10  25.97199   1.776870  22.37490  29.56907
#> e:1:50  82.25913   2.334523  77.53314  86.98513
#> e:1:90 260.53321  15.752300 228.64435 292.42208

plot(m_dr, log_dose = TRUE)

Effective dose estimates

ed_estimates(m_dr, respLev = c(10, 50, 90))
#> 
#> Estimated effective doses
#> 
#>        Estimate Std. Error    Lower    Upper
#> e:1:10  25.9720     1.7769  22.3749  29.5691
#> e:1:50  82.2591     2.3345  77.5331  86.9851
#> e:1:90 260.5332    15.7523 228.6443 292.4221
#>         Estimate Std. Error     Lower     Upper
#> e:1:10  25.97199   1.776870  22.37490  29.56907
#> e:1:50  82.25913   2.334523  77.53314  86.98513
#> e:1:90 260.53321  15.752300 228.64435 292.42208

5 · Model comparison

Compare all models fitted to the wheat data:

cmp <- compare_models(
  ols        = m_lm,
  mixed      = m_lmer,
  asymptotic = m_asym,
  quadratic  = m_quad,
  lp         = m_lp
)
print(cmp)
#>        model         engine n_par     AIC     BIC   RMSE    MAE      R2
#> 1  quadratic      quadratic     3  27.105  34.590 0.2953 0.2419  0.9416
#> 2         lp linear_plateau     3  33.296  40.781 0.3149 0.2568  0.9335
#> 3        ols             lm     3  62.315  69.800 0.4261 0.3753  0.8784
#> 4      mixed           lmer     1  80.769  91.997 0.3981 0.3771  0.8938
#> 5 asymptotic     asymptotic     2 205.438 211.051 1.9320 1.1392 -1.5010
#>   delta_AIC
#> 1     0.000
#> 2     6.191
#> 3    35.210
#> 4    53.664
#> 5   178.333

The table is ranked by AIC. delta_AIC shows the difference from the best model — values below 2 are considered equivalent; above 10 is strong evidence against the weaker model.

6 · Extracting results for reports

All agriReg objects are compatible with broom for tidy output:

library(broom)
tidy(m_lm$fit)
#> # A tibble: 3 × 5
#>   term          estimate std.error statistic  p.value
#>   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     4.42    0.124        35.7  1.13e-34
#> 2 nitrogen        0.0168  0.000947     17.7  6.59e-22
#> 3 phosphoruslow  -0.427   0.127        -3.36 1.57e- 3
glance(m_lm$fit)
#> # A tibble: 1 × 12
#>   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
#>       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1     0.878         0.873 0.440      162. 2.60e-21     2  -27.2  62.3  69.8
#> # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Saving plots

p <- plot(m_log)
ggplot2::ggsave("logistic_growth.png", p, width = 7, height = 4.5, dpi = 300)

Session information

sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 26200)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] broom_1.0.12  ggplot2_4.0.2 emmeans_2.0.2 agriReg_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.1.1         mvtnorm_1.3-6      lattice_0.20-45    tidyr_1.3.2       
#>  [5] gtools_3.9.5       reformulas_0.4.4   zoo_1.8-15         utf8_1.2.6        
#>  [9] digest_0.6.39      R6_2.6.1           backports_1.5.0    evaluate_1.0.5    
#> [13] coda_0.19-4.1      pillar_1.11.1      Rdpack_2.6.6       rlang_1.1.7       
#> [17] otel_0.2.0         multcomp_1.4-30    rstudioapi_0.18.0  minqa_1.2.8       
#> [21] car_3.1-5          nloptr_2.2.1       jquerylib_0.1.4    Matrix_1.6-5      
#> [25] rmarkdown_2.30     labeling_0.4.3     splines_4.2.1      lme4_2.0-1        
#> [29] S7_0.2.1           compiler_4.2.1     xfun_0.57          pkgconfig_2.0.3   
#> [33] mgcv_1.8-40        htmltools_0.5.9    tidyselect_1.2.1   tibble_3.3.1      
#> [37] codetools_0.2-18   dplyr_1.2.0        withr_3.0.2        MASS_7.3-57       
#> [41] rbibutils_2.4.1    grid_4.2.1         nlme_3.1-168       jsonlite_2.0.0    
#> [45] xtable_1.8-8       gtable_0.3.6       lifecycle_1.0.5    magrittr_2.0.3    
#> [49] scales_1.4.0       estimability_1.5.1 cli_3.6.5          cachem_1.1.0      
#> [53] carData_3.0-6      farver_2.1.2       bslib_0.10.0       drc_3.0-1         
#> [57] generics_0.1.4     vctrs_0.7.2        boot_1.3-28        sandwich_3.1-1    
#> [61] Formula_1.2-5      TH.data_1.1-5      RColorBrewer_1.1-3 tools_4.2.1       
#> [65] glue_1.7.0         purrr_1.0.2        plotrix_3.8-14     abind_1.4-8       
#> [69] pbkrtest_0.5.5     parallel_4.2.1     fastmap_1.2.0      survival_3.3-1    
#> [73] yaml_2.3.12        knitr_1.51         patchwork_1.3.2    sass_0.4.10

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.