| Type: | Package |
| Title: | Leave One Out Kernel Density Estimates for Outlier Detection |
| Version: | 2.0.0 |
| Maintainer: | Sevvandi Kandanaarachchi <sevvandik@gmail.com> |
| Description: | Outlier detection using leave-one-out kernel density estimates and extreme value theory. The bandwidth for kernel density estimates is computed using persistent homology, a technique in topological data analysis. Using peak-over-threshold method, a generalized Pareto distribution is fitted to the log of leave-one-out kde values to identify outliers. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| BugReports: | https://github.com/sevvandi/lookout/issues |
| Imports: | evd, ggplot2, RANN, robustbase, stats, TDAstats, tidyr |
| Suggests: | knitr, rmarkdown |
| URL: | https://sevvandi.github.io/lookout/, https://github.com/sevvandi/lookout |
| NeedsCompilation: | no |
| Packaged: | 2026-01-19 01:20:37 UTC; hyndman |
| Author: | Sevvandi Kandanaarachchi
|
| Repository: | CRAN |
| Date/Publication: | 2026-01-19 06:50:25 UTC |
lookout: Leave One Out Kernel Density Estimates for Outlier Detection
Description
Outlier detection using leave-one-out kernel density estimates and extreme value theory. The bandwidth for kernel density estimates is computed using persistent homology, a technique in topological data analysis. Using peak-over-threshold method, a generalized Pareto distribution is fitted to the log of leave-one-out kde values to identify outliers.
Author(s)
Maintainer: Sevvandi Kandanaarachchi sevvandik@gmail.com (ORCID)
Authors:
Rob Hyndman rob.hyndman@monash.edu (ORCID)
Other contributors:
Chris Fraley fraley@u.washington.edu [contributor]
See Also
Useful links:
Report bugs at https://github.com/sevvandi/lookout/issues
Plots outliers identified by lookout algorithm.
Description
Scatterplot of two columns from the data set with outliers highlighted.
Usage
## S3 method for class 'lookoutliers'
autoplot(object, columns = 1:2, ...)
Arguments
object |
The output of the function |
columns |
Which columns of the original data to plot (specified as either numbers or strings) |
... |
Other arguments currently ignored. |
Value
A ggplot object.
Examples
X <- rbind(
data.frame(
x = rnorm(500),
y = rnorm(500)
),
data.frame(
x = rnorm(5, mean = 10, sd = 0.2),
y = rnorm(5, mean = 10, sd = 0.2)
)
)
lo <- lookout(X)
autoplot(lo)
Plots outlier persistence for a range of significance levels.
Description
This function plots outlier persistence for a range of significance levels using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
Usage
## S3 method for class 'persistingoutliers'
autoplot(object, alpha = object$alpha, ...)
Arguments
object |
The output of the function |
alpha |
The significance levels to plot. |
... |
Other arguments currently ignored. |
Value
A ggplot object.
Examples
X <- rbind(
data.frame(
x = rnorm(500),
y = rnorm(500)
),
data.frame(
x = rnorm(5, mean = 10, sd = 0.2),
y = rnorm(5, mean = 10, sd = 0.2)
)
)
plot(X, pch = 19)
outliers <- persisting_outliers(X, scale = FALSE)
autoplot(outliers)
Identifies bandwidth for outlier detection.
Description
This function identifies the bandwidth that is used in the kernel density estimate computation. The function uses topological data analysis (TDA) to find the badnwidth.
Usage
find_tda_bw(X, fast = TRUE, gamma = 0.97, use_differences = FALSE)
Arguments
X |
The numerical input data in a data.frame, matrix or tibble format. |
fast |
If |
gamma |
Parameter for bandwidth calculation giving the quantile of the
Rips death radii to use for the bandwidth. Default is |
use_differences |
If TRUE, the bandwidth is set to the lower point of the maximum Rips death radii differences. If FALSE, the gamma quantile of the Rips death radii is used. Default is FALSE. |
Value
The bandwidth
Examples
X <- rbind(
data.frame(
x = rnorm(500),
y = rnorm(500)
),
data.frame(
x = rnorm(5, mean = 10, sd = 0.2),
y = rnorm(5, mean = 10, sd = 0.2)
)
)
find_tda_bw(X, fast = TRUE)
Identifies outliers using the algorithm lookout.
Description
This function identifies outliers using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
Usage
lookout(
X,
alpha = 0.01,
beta = 0.9,
gamma = 0.97,
bw = NULL,
gpd = NULL,
scale = TRUE,
fast = NROW(X) > 1000,
old_version = FALSE
)
Arguments
X |
The numerical input data in a data.frame, matrix or tibble format. |
alpha |
The level of significance. Default is |
beta |
The quantile threshold used in the GPD estimation. Default is |
gamma |
Parameter for bandwidth calculation giving the quantile of the
Rips death radii to use for the bandwidth. Default is |
bw |
Bandwidth parameter. If |
gpd |
Generalized Pareto distribution parameters. If |
scale |
If |
fast |
If |
old_version |
Logical indicator of which version of the algorithm to use. Default is FALSE, meaning the newer version is used. |
Value
A list with the following components:
outliers |
The set of outliers. |
outlier_probability |
The GPD probability of the data. |
outlier_scores |
The outlier scores of the data. |
bandwidth |
The bandwdith selected using persistent homology. |
kde |
The kernel density estimate values. |
lookde |
The leave-one-out kde values. |
gpd |
The fitted GPD parameters. |
References
Kandanaarachchi, S, and Hyndman, RJ (2022) Leave-one-out kernel density estimates for outlier detection, J Computational & Graphical Statistics, 31(2), 586-599. https://robjhyndman.com/publications/lookout/.
Hyndman, RJ, Kandanaarachchi, S, and Turner, K (2026) When lookout meets crackle: Anomaly detection using kernel density estimation, in preparation. https://robjhyndman.com/publications/lookout2.html
Examples
X <- rbind(
data.frame(
x = rnorm(500),
y = rnorm(500)
),
data.frame(
x = rnorm(5, mean = 10, sd = 0.2),
y = rnorm(5, mean = 10, sd = 0.2)
)
)
lo <- lookout(X)
lo
autoplot(lo)
Identifies outliers in univariate time series using the algorithm lookout.
Description
This is the time series implementation of lookout which identifies outliers in the double differenced time series.
Usage
lookout_ts(x, scale = FALSE, ...)
Arguments
x |
The input univariate time series. |
scale |
If |
... |
Other arguments are passed to |
Value
A lookout object.
See Also
Examples
set.seed(1)
x <- arima.sim(list(order = c(1, 1, 0), ar = 0.8), n = 200)
x[50] <- x[50] + 10
plot(x)
lo <- lookout_ts(x)
lo
Compute robust multivariate scaled data
Description
A multivariate version of base::scale(), that takes account
of the covariance matrix of the data, and uses robust estimates
of center, scale and covariance by default. The centers are removed using medians, the
scale function is the IQR, and the covariance matrix is estimated using a
robust OGK estimate. The data are scaled using the Cholesky decomposition of
the inverse covariance. Then the scaled data are returned.
Usage
mvscale(
object,
center = stats::median,
scale = robustbase::s_Qn,
cov = robustbase::covOGK,
warning = TRUE
)
Arguments
object |
A vector, matrix, or data frame containing some numerical data. |
center |
A function to compute the center of each numerical variable. Set to NULL if no centering is required. |
scale |
A function to scale each numerical variable. When
|
cov |
A function to compute the covariance matrix. Set to NULL if no rotation required. |
warning |
Should a warning be issued if non-numeric columns are ignored? |
Details
Optionally, the centering and scaling can be done for each variable
separately, so there is no rotation of the data, by setting cov = NULL.
Also optionally, non-robust methods can be used by specifying center = mean,
scale = stats::sd(), and cov = stats::cov(). Any non-numeric columns are retained
with a warning.
Value
A vector, matrix or data frame of the same size and class as object,
but with numerical variables replaced by scaled versions.
Author(s)
Rob J Hyndman
See Also
base::scale(), stats::sd(), stats::cov(), robustbase::covOGK(), robustbase::s_Qn()
Examples
# Univariate z-scores (no rotation)
z <- mvscale(faithful, center = mean, scale = sd, cov = NULL, warning = FALSE)
# Non-robust scaling with rotation
z <- mvscale(faithful, center = mean, cov = stats::cov, warning = FALSE)
# Robust scaling and rotation
z <- mvscale(faithful, warning = FALSE)
Computes outlier persistence for a range of significance values.
Description
This function computes outlier persistence for a range of significance values, using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
Usage
persisting_outliers(
X,
alpha = seq(0.01, 0.1, by = 0.01),
st_qq = 0.9,
scale = TRUE,
num_steps = 20,
old_version = FALSE
)
Arguments
X |
The input data in a matrix, data.frame, or tibble format. All columns should be numeric. |
alpha |
Grid of significance levels. |
st_qq |
The starting quantile for death radii sequence. This will be used to compute the starting bandwidth value. |
scale |
If |
num_steps |
The length of the bandwidth sequence. |
old_version |
Logical indicator of which version of the algorithm to use. |
Value
A list with the following components:
out |
A 3D array of |
bw |
The set of bandwidth values. |
gpdparas |
The GPD parameters used. |
lookoutbw |
The bandwidth chosen by the algorithm |
Examples
X <- rbind(
data.frame(
x = rnorm(500),
y = rnorm(500)
),
data.frame(
x = rnorm(5, mean = 10, sd = 0.2),
y = rnorm(5, mean = 10, sd = 0.2)
)
)
plot(X, pch = 19)
outliers <- persisting_outliers(X, scale = FALSE)
outliers
autoplot(outliers)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- ggplot2