Main Functions
The package provides four main estimation functions:
adapdiscom()
- Main adaptive DISCOM methoddiscom()
- Original DISCOM method
fast_adapdiscom()
- Fast version of adaptive DISCOMfast_discom()
- Fast version of original DISCOM
The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The AdapDiscom
package provides implementations of
AdapDISCOM, a novel adaptive direct sparse regression
method for high-dimensional multimodal data with block-wise missingness
and measurement errors. Building on the DISCOM framework, AdapDISCOM
introduces modality-specific weighting schemes to account for
heterogeneity in data structures and error magnitudes across modalities.
Unlike existing methods that are limited to a fixed number of blocks,
AdapDISCOM supports any number of blocks K, making it
highly flexible for diverse multimodal data applications. The package
includes robust variants (AdapDISCOM-Huber) and computationally
efficient versions (Fast-AdapDISCOM) to handle heavy-tailed
distributions and large-scale datasets. AdapDISCOM has been shown to
consistently outperform existing methods such as DISCOM, SCOM, and
imputation based, particularly under heterogeneous contamination
scenarios, providing a scalable framework for realistic multimodal data
analysis with missing data and measurement errors.
The package provides four main estimation functions:
adapdiscom()
- Main adaptive DISCOM
methoddiscom()
- Original DISCOM
methodfast_adapdiscom()
- Fast version of
adaptive DISCOMfast_discom()
- Fast version of
original DISCOMThe package handles different covariance structures:
# Number of variables
p <- 10
# AR(1) structure
Sigma_ar1 <- generate.cov(p, example = 1)
print(Sigma_ar1[1:3, 1:3])
# Block diagonal structure
Sigma_block <- generate.cov(p, example = 2)
print(Sigma_block[1:3, 1:3])
# Kronecker product structure
Sigma_kron <- generate.cov(p, example = 3)
print(dim(Sigma_kron))
This simulation scenario represents a simple replication of scenario 2 from the AdapDISCOM’s paper. It is case study with 300 variables distributed across 3 equal blocks (100 variables each), where each block follows a different covariance structure: the first block uses an AR(1) structure, the second a block-diagonal structure, and the third a Kronecker product structure. The training data (n=440) exhibits a structured missing data pattern where each quarter of the sample is completely missing one of the three variable blocks, creating heterogeneity in data availability that reflects real-world situations where different subgroups of subjects may have access to different sets of measurements. The true coefficient vector \(\beta\) follows a sparse pattern with only \(5\%\) of variables having non-zero effects (\(\beta = 0.5\)), allowing evaluation of the method’s ability to correctly identify relevant variables in a high-dimensional context with heterogeneous missing data.
set.seed(123)
# Set parameters
p <- 300
n <- 440
n.tuning <- 200
n.test <- 400
p1 <- p%/%3
p2 <- p%/%3
p3 <- p - p1 - p2
pp <- c(p1, p2, p3) # Block sizes
sigma <- 1
# Generate different covariance matrices for each block
cov.mat1 <- generate.cov(p1, 1) # AR(1) structure for block 1
cov.mat2 <- generate.cov(p2, 2) # Block diagonal for block 2
cov.mat3 <- generate.cov(p3, 3) # Kronecker product for block 3
# Generate true beta coefficients
beta1 <- rep(c(rep(0.5, 5), rep(0, 95)), p/100)
beta.true <- beta1
# Generate complete data for all samples
pre.x1 <- mvrnorm(n + n.tuning + n.test, rep(0, p%/%3), cov.mat1)
pre.x2 <- mvrnorm(n + n.tuning + n.test, rep(0, p%/%3), cov.mat2)
pre.x3 <- mvrnorm(n + n.tuning + n.test, rep(0, p%/%3), cov.mat3)
pre.x <- cbind(pre.x1, pre.x2, pre.x3)
pre.ep <- rnorm(n + n.tuning + n.test, 0, sigma)
# Training data
n.com <- n/4
n1 <- n2 <- n3 <- n4 <- n.com
X.train <- pre.x[1:n, ]
ep <- pre.ep[1:n]
y.train <- X.train %*% beta1 + ep
colnames(X.train) <- paste0("X", 1:p)
# Introduce missing data pattern
X.train[(n1 + 1):(n1 + n2), (p1 + p2 + 1):(p1 + p2 + p3)] <- NA
X.train[(n1 + n2 + 1):(n1 + n2 + n3), (p1 + 1):(p1 + p2)] <- NA
X.train[(n1 + n2 + n3 + 1):(n1 + n2 + n3 + n4), (1:p1)] <- NA
# Tuning data
X.tuning <- pre.x[(n + 1):(n + n.tuning), ]
ep.tuning <- pre.ep[(n + 1):(n + n.tuning)]
y.tuning <- X.tuning %*% beta1 + ep.tuning
colnames(X.tuning) <- paste0("X", 1:p)
# Test data
X.test <- pre.x[(n + n.tuning + 1):(n + n.tuning + n.test), ]
ep.test <- pre.ep[(n + n.tuning + 1):(n + n.tuning + n.test)]
y.test <- X.test %*% beta1 + ep.test
colnames(X.test) <- paste0("X", 1:p)
# Run AdapDiscom with default parameters
result <- adapdiscom(
beta = beta.true, # True coefficients (optional, for evaluation)
x = X.train, # Training data
y = y.train, # Training response
x.tuning = X.tuning, # Tuning data
y.tuning = y.tuning, # Tuning response
x.test = X.test, # Test data
y.test = y.test, # Test response
nlambda = 20, # Number of lambda values
nalpha = 10, # Number of alpha values
pp = pp, # Block sizes
robust = 0, # Classical estimation
standardize = TRUE, # Standardize data
itcp = TRUE # Include intercept
)
# View results
print(paste("R-squared:", round(result$R2, 4)))
The functions return a list with the following components:
# Optimal parameters
cat("Optimal lambda:", result$lambda, "\n")
cat("Optimal alpha:", paste(round(result$alpha, 4), collapse = ", "), "\n")
# Performance metrics
cat("Training error:", round(result$train.error, 4), "\n")
cat("Test error:", round(result$test.error, 4), "\n")
cat("R-squared:", round(result$R2, 4), "\n")
# Model selection
cat("Number of selected variables:", result$select, "\n")
# If true beta provided, evaluation metrics
if (!is.null(beta.true)) {
cat("Estimation error:", round(result$est.error, 4), "\n")
cat("False Positive Rate:", round(result$fpr, 4), "\n")
cat("False Negative Rate:", round(result$fnr, 4), "\n")
}
# Estimated coefficients
cat("First 6 estimated coefficients:\n")
print(round(head(result$a1), 4))
# Compare different methods
methods <- c("adapdiscom", "discom", "fast_adapdiscom", "fast_discom")
results <- list()
# AdapDiscom
results$adapdiscom <- result # Already computed above
# DISCOM
results$discom <- discom(
beta = beta.true, x = X.train, y = y.train,
x.tuning = X.tuning, y.tuning = y.tuning,
x.test = X.test, y.test = y.test,
nlambda = 20, nalpha = 10, pp = pp
)
# Fast AdapDiscom
results$fast_adapdiscom <- fast_adapdiscom(
beta = beta.true, x = X.train, y = y.train,
x.tuning = X.tuning, y.tuning = y.tuning,
x.test = X.test, y.test = y.test,
nlambda = 20, nalpha = 10, pp = pp
)
# Fast DISCOM
results$fast_discom <- fast_discom(
beta = beta.true, x = X.train, y = y.train,
x.tuning = X.tuning, y.tuning = y.tuning,
x.test = X.test, y.test = y.test,
nlambda = 20, pp = pp
)
# Compare performance
comparison <- data.frame(
Method = names(results),
Test_Error = round(sapply(results, function(x) x$test.error), 4),
R_Squared = round(sapply(results, function(x) x$R2), 4),
Selected_Vars = sapply(results, function(x) x$select),
Est_Error = round(sapply(results, function(x) x$est.error), 4),
FPR = round(sapply(results, function(x) x$fpr), 4),
FNR = round(sapply(results, function(x) x$fnr), 4),
Time = round(sapply(results, function(x) x$time), 2)
)
print(comparison)
Note: When the missing blocks are of equal size within each modality and are disjoint across modalities (i.e., no observations have missing data in multiple modalities simultaneously), Fast-AdapDISCOM reduces to Fast-DISCOM since the adaptive weighting scheme becomes uniform across all modalities.
The package supports robust estimation using Huber’s M-estimator:
# Advanced usage with custom parameters
result_advanced <- adapdiscom(
beta = beta.true,
x = X.train, y = y.train,
x.tuning = X.tuning, y.tuning = y.tuning,
x.test = X.test, y.test = y.test,
nlambda = 30, # More lambda values
nalpha = 15, # More alpha values
pp = pp,
robust = 0,
standardize = TRUE,
itcp = TRUE,
lambda.min.ratio = 0.001, # Custom lambda range
k.value = 1.5
)
pp
vector)sum(pp) == p
(total number of variables)standardize = TRUE
)robust = 1
)lambda.min.ratio
if needed for better regularization pathFor more details on the methodology, please refer to the original papers on A multimodal high dimensional method accounting for missing data and measurement error heterogeneity.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.