The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Detecting Extremal Values in a Normal Linear Model
Version: 1.1.0
Date: 2026-04-21
Description: Provides a method to detect values poorly explained by a Gaussian linear model. The procedure is based on the maximum of the absolute value of the studentized residuals, which is a parameter-free statistic. This approach generalizes several procedures used to detect abnormal values during longitudinal monitoring of biological markers. For methodological details, see: Berthelot G., Saulière G., Dedecker J. (2025). "DEViaN-LM An R Package for Detecting Abnormal Values in the Gaussian Linear Model". HAL Id: hal-05230549. https://hal.science/hal-05230549.
License: GPL-3
Encoding: UTF-8
Imports: Rcpp
LinkingTo: Rcpp, RcppArmadillo
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
RoxygenNote: 7.3.3
Depends: R (≥ 2.10)
LazyData: true
SystemRequirements: OpenMP (optional, for parallel execution)
NeedsCompilation: yes
Packaged: 2026-04-30 08:50:39 UTC; root
Repository: CRAN
Date/Publication: 2026-04-30 09:40:02 UTC
Author: Guillaume Sauliere ORCID iD [aut], Geoffroy Berthelot ORCID iD [aut, cre], Jérôme Dedecker ORCID iD [aut]
Maintainer: Geoffroy Berthelot <geoffroy.berthelot@insep.fr>

devianLM: Outlier detection via studentized residuals

Description

Provides a method to detect values poorly explained by a Gaussian linear model. The procedure is based on the maximum of the absolute value of the studentized residuals (conditional on the design matrix), which is a parameter-free statistic. This approach generalizes several procedures used to detect abnormal values during longitudinal monitoring of biological markers.

Details

The main R functions are:

Monte Carlo simulations are parallelized using OpenMP when available.

Author(s)

Maintainer: Geoffroy Berthelot geoffroy.berthelot@insep.fr (ORCID)

Authors:

References

Sauliere, G., Berthelot, G., and Dedecker, J. (2025) DEViaN-LM An R Package for Detecting Abnormal Values in the Gaussian Linear Model. https://doi.org/10.48550/arXiv.2509.02202 doi:10.48550/arXiv.2509.02202

Berthelot G., Gelein B., Meinadier E. Orhant E., Dedecker J., (2025) Z-scores-based methods and their application to biological monitoring: An extended analysis of professional soccer players and cyclists athletes. https://arxiv.org/abs/2510.01810 doi:10.48550/arXiv.2510.01810


Identify outliers using devianLM method

Description

This function determines whether the maximum of the absolute values of the studentized residuals of a Gaussian regression is abnormally high. Outliers are detected by comparing the absolute values of the studentized residuals to a threshold (depending on the design matrix), which can be supplied or estimated via n_sims Monte-Carlo simulations.

Usage

devianlm_stats(
  y,
  x,
  threshold = NULL,
  n_sims = 50000,
  verbose = TRUE,
  nthreads = detectCores() - 1,
  quant = 0.95,
  ...
)

Arguments

y

Numeric. Response vector.

x

either a numeric variable or several numeric variables (explanatory variables) concatenated in a data frame. devianLM does not add an intercept automatically; include a column of ones in x if an intercept is desired.

threshold

Numeric or NULL. If NULL, the threshold value is computed using get_devianlm_threshold().

n_sims

Integer. Optional value which is the number of Monte-Carlo simulations. Default is 50,000.

verbose

Logical. If TRUE (default), informative messages are printed during execution (e.g., when ties are detected and handled).

nthreads

Integer. Optional value which is the number of threads to use. Default is parallel::detectCores() - 1.

quant

Numeric. Order of the quantile of interest. Default is 0.95 (this corresponds to a risk level of 0.05).

...

Additional arguments passed to get_devianlm_threshold().

Details

When ties are present in y, a small random perturbation is added to avoid numerical issues. The "Ties were detected in the data, they have been randomly broken" message is displayed when this occurs.

Value

devianlm returns an object of class list with the following components:

reg_residuals

Numeric vector. The studentized residuals from the linear model.

outliers

Integer vector. The indices (positions in the original data) of observations identified as outliers based on the threshold.

threshold

Numeric value. The cutoff applied to the absolute value of the studentized residuals to flag outliers. If not provided, it is estimated using get_devianlm_threshold().

is_outliers

Integer vector. A binary vector (0 or 1) of the same length as reg_residuals, indicating whether each observation is considered an outlier (1) or not (0).

Examples

set.seed(123)
y <- salary$hourly_earnings_log
x <- cbind(1, salary$age, salary$educational_attainment, salary$children_number)

test_salary <- devianlm_stats(y, x, n_sims = 100, quant = 0.95)

plot(test_salary$reg_residuals,
  pch = 16, cex = .8,
  ylim = c(-1 * max(abs(test_salary$reg_residuals)), max(abs(test_salary$reg_residuals))),
  xlab = "", ylab = "Studentized residuals",
  col = ifelse(test_salary$is_outliers, "red", "black"))

# Add the thresholds lines:
abline(h = c(-test_salary$threshold, test_salary$threshold), col = "chartreuse2", lwd = 2)
 

Estimate threshold value via Monte-Carlo simulations.

Description

Estimates the threshold for the maximum absolute studentized residual in a Gaussian linear model, conditional on the design matrix and using n_sims Monte-Carlo simulation for the quantile of order quant.

Usage

get_devianlm_threshold(
  x,
  n_sims = 50000,
  nthreads = detectCores() - 1,
  quant = 0.95
)

Arguments

x

either a numeric variable or several numeric variables (explanatory variables) concatenated in a data frame. Note: devianLM does not add an intercept automatically; include a column of ones in x if an intercept is desired.

n_sims

Integer. Optional value which is the number of Monte-Carlo simulations. Default is 50,000.

nthreads

Integer. Optional value which is the number of threads to use. Default is parallel::detectCores() - 1.

quant

Numeric. Order of the quantile of interest. Default is 0.95 (this corresponds to a risk level of 0.05).

Details

Monte-Carlo simulations are parallelized using OpenMP when available.

Value

Numeric value.

threshold

The quantile of order quant of the distribution of the maximum of the absolute values of the studentized residuals (depending on the design matrix) is computed via Monte-Carlo simulations (with n_sims simulations).


Salary dataset

Description

A random sample from the 2012 Current Population Survey (CPS). It is the primary source of labor force statistics for the US population.

Usage

salary

Format

A data frame with 599 rows and 10 variables

See Also

Original data are available from https://webapps.ilo.org/surveyLib/index.php/catalog/7379.

The data dictionary is available from https://www2.census.gov/programs-surveys/cps/datasets/2022/march/asec2022_ddl_pub_full.pdf.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.