The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Functional Data Analysis in Rust - A high-performance R package for functional data analysis with a Rust backend.
Functional Data Analysis (FDA) is a branch of statistics that deals with data where each observation is a function, curve, or surface rather than a single number or vector. Examples include:
Traditional statistical methods treat each time point as a separate variable, losing the inherent smoothness and continuity of the data. FDA treats the entire curve as a single observation, enabling more powerful and interpretable analyses.
fdars is a comprehensive toolkit for functional data analysis with a high-performance Rust backend providing 10-200x speedups over pure R implementations.
Measure how “central” or “typical” each curve is: - Fraiman-Muniz (FM), Band depth (BD), Modified band depth (MBD) - Modal depth, Random projection (RP, RT, RPD) - Functional spatial depth (FSD, KFSD) - Depth-based median, trimmed mean, trimmed variance
Multiple approaches to identify anomalous curves: - Depth-based trimming and weighting - Likelihood ratio test (LRT) - Functional boxplot - Magnitude-Shape plot (magnitude vs shape outliers) - Outliergram (MEI vs MBD)
Quantify differences between curves: - Lp distances (L1, L2, L∞) - Hausdorff distance - Dynamic time warping (DTW) - PCA-based and derivative-based semimetrics
Predict scalar outcomes from functional predictors: - Principal
component regression (fregre.pc) - Basis expansion
regression (fregre.basis) - Nonparametric kernel regression
(fregre.np) - Cross-validation for model selection
Group similar curves together: - K-means clustering with K-means++ initialization - Fuzzy C-means with soft membership - Automatic selection of optimal k (silhouette, CH, elbow)
Generate synthetic functional data: - Multiple covariance kernels (Gaussian, Matérn, Exponential, Periodic) - Kernel composition (addition, multiplication) - Brownian motion and Ornstein-Uhlenbeck processes
# Install remotes if needed
install.packages("remotes")
# Install fdars (with documentation)
remotes::install_github("sipemu/fdars-r", build_vignettes = TRUE)Note: On Windows, you may need Rtools installed.
Download the pre-built binary from GitHub Releases:
# macOS
install.packages("path/to/fdars_x.y.z.tgz", repos = NULL, type = "mac.binary")
# Windows
install.packages("path/to/fdars_x.y.z.zip", repos = NULL, type = "win.binary")# Clone the repository
git clone https://github.com/sipemu/fdars-r.git
cd fdars-r
# Build and install
R CMD build .
R CMD INSTALL fdars_*.tar.gzlibrary(fdars)
# Create functional data from a matrix (rows = observations, cols = time points)
t <- seq(0, 1, length.out = 100)
X <- matrix(0, 20, 100)
for (i in 1:20) {
X[i, ] <- sin(2 * pi * t) + rnorm(100, sd = 0.1)
}
fd <- fdata(X, argvals = t)
# Compute depth - measures how "central" each curve is
depths <- depth(fd) # default: FM method
depths <- depth(fd, method = "mode") # or specify method
# Find the functional median (most central curve)
median_curve <- median(fd) # default: FM method
# Detect outliers
outliers <- outliers.depth.trim(fd, trim = 0.1)
# Functional regression: predict scalar y from functional X
y <- rowMeans(X) + rnorm(20, sd = 0.1)
model <- fregre.pc(fd, y, ncomp = 3)
predictions <- predict(model, fd)
# Cluster curves into groups
clusters <- cluster.kmeans(fd, ncl = 2)
# Smooth noisy curves
S <- S.NW(t, h = 0.1) # Nadaraya-Watson smoother
smoothed <- S %*% X[1, ]fdata)The fdata class stores functional data as a matrix where
rows are observations and columns are evaluation points:
fd <- fdata(data_matrix, argvals = time_points, rangeval = c(0, 1))You can attach identifiers and metadata (covariates) to functional data objects:
# Create fdata with IDs and metadata
meta <- data.frame(
group = factor(c("control", "treatment", ...)),
age = c(25, 32, ...),
response = c(0.5, 0.8, ...)
)
fd <- fdata(X, id = paste0("patient_", 1:n), metadata = meta)
# Access fields
fd$id # Character vector of identifiers
fd$metadata$group # Access metadata columns
# Subsetting preserves metadata
fd_sub <- fd[1:10, ] # id and metadata are also subsetted
# View metadata info
print(fd) # Shows metadata columns
summary(fd) # Shows metadata types and rangesNote: If metadata contains an id column
or has non-default row names, they must match the fdata identifiers. An
error is thrown on mismatch.
Depth measures how “central” or “typical” a curve is relative to a sample. Higher depth = more central.
Use the unified depth() function with a
method parameter:
depth(fd, method = "FM") # Fraiman-Muniz depth (default)
depth(fd, method = "BD") # Band depth
depth(fd, method = "MBD") # Modified band depth
depth(fd, method = "mode") # Modal depth (kernel density)
depth(fd, method = "RP") # Random projection depth
depth(fd, method = "RT") # Random Tukey depth
depth(fd, method = "FSD") # Functional spatial depth
depth(fd, method = "KFSD") # Kernel functional spatial depth
depth(fd, method = "RPD") # Random projection with derivativesPredict a scalar response from functional predictors:
fregre.pc - Principal component regressionfregre.basis - Basis expansion regressionfregre.np - Nonparametric kernel regressionAll models support predict() for new data.
Measure similarity between curves using metric() with a
method parameter:
metric(fd, method = "lp") # Lp distance (default, L2 = Euclidean)
metric(fd, method = "hausdorff") # Hausdorff distance
metric(fd, method = "dtw") # Dynamic time warping
metric(fd, method = "pca") # PCA-based semimetric
metric(fd, method = "deriv") # Derivative-based semimetricIndividual functions are also available: metric.lp,
metric.hausdorff, metric.DTW,
semimetric.pca, semimetric.deriv.
Identify unusual curves:
outliers.depth.trim - Trimmed depth-based
detectionoutliers.depth.pond - Weighted depth-based
detectionoutliers.lrt - Likelihood ratio testoutliers.boxplot - Functional boxplot-based
detectionmagnitudeshape - Magnitude-Shape outlier detectionoutliergram - Outliergram (MEI vs MBD plot)Both magnitudeshape and outliergram support
labeling points by ID or metadata columns:
# Create fdata with IDs and metadata
fd <- fdata(X, id = paste0("patient_", 1:n),
metadata = data.frame(subject_id = paste0("S", 1:n)))
# Outliergram with custom labels
og <- outliergram(fd)
plot(og, label = "id") # Label outliers with patient IDs
plot(og, label = "subject_id") # Label with metadata column
plot(og, label_all = TRUE) # Label ALL points, not just outliers
# magnitudeshape with custom labels
magnitudeshape(fd, label = "id") # Label outliers with patient IDs
magnitudeshape(fd, label = NULL) # No labelsmean(fd) - Functional meanvar(fd) - Functional variancesd(fd) - Functional standard deviationcov(fd) - Functional covariancegmed(fd) - Geometric median (L1 median via Weiszfeld
algorithm)Generate synthetic functional data from Gaussian processes with various covariance kernels:
# Smooth samples with Gaussian (squared exponential) kernel
fd_smooth <- make_gaussian_process(n = 20, t = seq(0, 1, length.out = 100),
cov = kernel_gaussian(length_scale = 0.2))
# Rough samples with Matern kernel
fd_rough <- make_gaussian_process(n = 20, t = seq(0, 1, length.out = 100),
cov = kernel_matern(nu = 1.5))
# Periodic samples
fd_periodic <- make_gaussian_process(n = 10, t = seq(0, 2, length.out = 200),
cov = kernel_periodic(period = 0.5))
# Combine kernels: signal + noise
cov_total <- kernel_add(kernel_gaussian(variance = 1), kernel_whitenoise(variance = 0.1))
fd_noisy <- make_gaussian_process(n = 10, t = seq(0, 1, length.out = 100), cov = cov_total)Available covariance functions: - kernel_gaussian -
Squared exponential (RBF) kernel, infinitely smooth -
kernel_exponential - Exponential kernel (Matern ν=0.5),
rough - kernel_matern - Matern family with smoothness
parameter ν - kernel_brownian - Brownian motion covariance
(1D only) - kernel_linear - Linear kernel -
kernel_polynomial - Polynomial kernel -
kernel_whitenoise - Independent noise at each point -
kernel_periodic - Periodic kernel (1D only) -
kernel_add - Combine kernels by addition -
kernel_mult - Combine kernels by multiplication
Use the unified functions with a method parameter:
# Median (curve with maximum depth)
median(fd) # default: FM method
median(fd, method = "mode") # modal depth-based median
# Trimmed mean (mean of deepest curves)
trimmed(fd, trim = 0.1) # default: FM method
trimmed(fd, trim = 0.1, method = "RP") # RP depth-based trimmed mean
# Trimmed variance
trimvar(fd, trim = 0.1) # default: FM method
trimvar(fd, trim = 0.1, method = "mode")plot(fd, color = ...) - Plot curves with coloring by
numeric or categorical variables
show.mean = TRUE - Overlay group mean curvesshow.ci = TRUE - Show confidence interval ribbons per
groupboxplot.fdata - Functional boxplot with depth-based
envelopesmagnitudeshape - Magnitude-Shape outlier detection and
visualizationoutliergram - Outliergram for shape outlier detection
(MEI vs MBD plot)plot.fdata2pc - FPCA visualization (components,
variance, scores)group.distance - Compute distances between groups
(centroid, Hausdorff, depth-based)group.test - Permutation test for significant group
differencesplot.group.distance - Visualize group distances
(heatmap, dendrogram)cluster.kmeans - K-means clustering for functional
datacluster.optim - Optimal k selection using silhouette,
CH, or elbowcluster.fcm - Fuzzy C-means clustering with soft
membershipcluster.init - K-means++ center initializationregister.fd - Shift registration using
cross-correlationlocalavg.fdata - Extract local average features from
curvesfdars supports 2D functional data (surfaces/images). The following functions have full 2D support:
| Category | Functions |
|---|---|
| Depth | depth (methods: FM, mode, RP, RT, FSD, KFSD) |
| Distance | metric.lp, metric.hausdorff,
semimetric.pca, semimetric.deriv |
| Statistics | mean, var, sd,
cov, gmed, deriv |
| Centrality | median, trimmed, trimvar (all
methods except BD, MBD, RPD) |
| Regression | fregre.np (nonparametric) |
| Visualization | plot (heatmap + contours) |
Note: Band depths (BD, MBD), RPD, and DTW do not support 2D data.
# Create 2D functional data (e.g., 10 surfaces on a 20x30 grid)
n <- 10
m1 <- 20
m2 <- 30
s <- seq(0, 1, length.out = m1)
t <- seq(0, 1, length.out = m2)
# Generate surfaces: f(s,t) = sin(2*pi*s) * cos(2*pi*t) + noise
X <- array(0, dim = c(n, m1, m2))
for (i in 1:n) {
for (si in 1:m1) {
for (ti in 1:m2) {
X[i, si, ti] <- sin(2*pi*s[si]) * cos(2*pi*t[ti]) + rnorm(1, sd = 0.1)
}
}
}
fd2d <- fdata(X, argvals = list(s, t), fdata2d = TRUE)
# All these work with 2D data:
mean_surface <- mean(fd2d) # Mean surface
var_surface <- var(fd2d) # Pointwise variance
depths <- depth(fd2d) # Depth values
median_surface <- median(fd2d) # Depth-based median
gmed_surface <- gmed(fd2d) # Geometric median
# Plot 2D data (heatmap + contours)
plot(fd2d)Use df_to_fdata2d() to convert long-format DataFrames to
2D functional data:
# DataFrame structure: id column, s-index column, t-value columns
df <- data.frame(
id = rep(c("surf1", "surf2"), each = 5),
s = rep(1:5, 2),
t1 = rnorm(10), t2 = rnorm(10), t3 = rnorm(10)
)
# Convert to 2D fdata
fd2d <- df_to_fdata2d(df, id_col = 1, s_col = 2)
# With metadata (must have one row per surface)
meta <- data.frame(group = c("A", "B"), value = c(1.5, 2.3))
fd2d <- df_to_fdata2d(df, id_col = 1, s_col = 2, metadata = meta)Wine
Quality Analysis with Andrews Curves — A comprehensive
walkthrough using the UCI Wine dataset (178 wines, 13 chemicals, 3
cultivars) demonstrating outlier detection, clustering, hypothesis
testing, FPCA, and process monitoring. Render with
quarto render examples/medium-andrews-wine.qmd.
Predictive
Truck Maintenance with Andrews Curves — Applying the full
FDA pipeline to the Scania APS Failure dataset (76,000 trucks, 170
anonymized sensors, binary failure classification) for fleet health
monitoring, outlier triage, and sensor-level diagnostics. Render with
quarto render examples/scania-aps-failure.qmd.
MIT
Simon Mueller
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.