The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Department of Biostatistics, Fielding School of Public Health,
University of California, Los Angeles, CA, USA
Department of Computational Biomedicine,
Cedars-Sinai Medical Center, Los Angeles, CA,
USA
The TemporalForest package provides a reproducible method for feature selection in high-dimensional longitudinal data. It combines network analysis, mixed-effects models, and stability selection to identify robust predictors over time. This vignette offers a quick start guide to using the package.
Longitudinal ’omics studies, where subjects are measured repeatedly
over time, present unique challenges for feature selection: high
dimensionality, temporal dependence, and complex correlations. The
TemporalForest algorithm addresses these by creating a
robust, multi-stage pipeline that identifies features which are both
predictive and stable across resamples.
Since the package is not yet on CRAN, you can install the development version from GitHub:
This example walks you through a complete analysis with a small, simulated dataset.
This tiny demo is designed to always return all true signals quickly
(1–3s). We will simulate a dataset with 60 subjects, 2 time points, and
20 potential predictors. We will inject 3 true signals
into the outcome \(Y\), coming from
predictors V1, V2, and V3. To
ensure the example is fast and reliable for CRAN, we will pass a
precomputed dissimilarity matrix to skip Stage 1
(WGCNA/TOM).
set.seed(11) # For reproducibility
n_subjects <- 60; n_timepoints <- 2; p <- 20
# Build X (two time points) with matching colnames
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)
# Long view and IDs
X_long <- do.call(rbind, X)
id <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)
# Strong signal on V1, V2, V3 + modest subject random effect + small noise
u_subj <- rnorm(n_subjects, 0, 0.7)
eps <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] +
rep(u_subj, each = n_timepoints) + eps
# Lightweight dissimilarity to skip Stage 1 (fast on CRAN)
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))We call the main function, passing our precomputed
dissimilarity_matrix = A and asking for 3 features.
# Run TemporalForest with minimal settings for vignette
tf_result <- temporal_forest(
X = X, Y = Y, id = id, time = time,
dissimilarity_matrix = A, # skip WGCNA/TOM (Stage 1)
n_features_to_select = 3,
n_boot_screen = 4, # Very low for quick demo
n_boot_select =8, # Very low for quick demo
keep_fraction_screen = 1, # Permissive screening
min_module_size = 2,
alpha_screen = 0.5, # Permissive screening
alpha_select = 0.6
)
#> ..cutHeight not given, setting it to 0.951 ===> 99% of the (truncated) height range in dendro.
#> ..done.Examine the selected features and check if the true predictors were found.
print(tf_result)
#> --- Temporal Forest Results ---
#>
#> Top 3 feature(s) selected:
#> V1
#> V3
#> V2
#>
#> 5 feature(s) were candidates in the final stage.# Validate against ground truth
true_predictors <- c("V1", "V2", "V3")
cat("True predictors found:", sum(true_predictors %in% tf_result$top_features),
"out of", length(true_predictors), "\n")
#> True predictors found: 3 out of 3The algorithm successfully identified all three true predictors in this high signal-to-noise example.
TemporalForest operates in three stages:
n_features_to_select: Final number of features to
return (default: 10)n_boot_screen, n_boot_select: Number of
bootstrap samples for screening and selection stages. Increase for more
stable results (defaults: 50, 100).keep_fraction_screen: Proportion of features from each
module passed to final selection (default: 0.25). Increase if too few
features are selected.min_module_size: Minimum size for network modules
(default: 4).alpha_screen, alpha_select: Significance
levels for splitting in screening and selection trees (defaults: 0.2,
0.05).| Symptom | Likely Cause | Solution |
|---|---|---|
| No features selected | Screening too strict | Increase keep_fraction_screen or
alpha_screen |
| Too many features selected | Selection too liberal | Decrease keep_fraction_screen or
alpha_select |
| Long computation time | Data too large | Reduce bootstrap numbers or pre-filter features |
The package includes checks for proper data formatting. Here’s an example of the error message for inconsistent inputs:
# This will produce a clear error message
mat1 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "B")))
mat2 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "C")))
bad_X <- list(mat1, mat2)
TemporalForest::check_temporal_consistency(bad_X)
#> Error: Inconsistent data format: The column names of the matrix for time point 2 do not match the column names of the first time point.TemporalForest provides an end-to-end solution for reproducible
feature selection in longitudinal high-dimensional data. For detailed
information on all function parameters and advanced usage, see the
package documentation (?TemporalForest).
To cite TemporalForest in publications, please use:
citation("TemporalForest")
#> To cite package 'TemporalForest' in publications use:
#>
#> Shao S, Moore J, Ramirez C (2025). _TemporalForest: A package for
#> reproducible feature selection in high-dimensional longitudinal
#> data_. R package version 0.1.0,
#> <https://github.com/SisiShao/TemporalForest>.
#>
#> Shao S, Moore J, Ramirez C (2025). "Network-Guided TemporalForest for
#> Feature Selection in High-Dimensional Longitudinal Data." Manuscript
#> submitted for publication.,
#> <https://github.com/SisiShao/TemporalForest>.
#>
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/Los_Angeles
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] TemporalForest_0.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] Rdpack_2.6.4 DBI_1.2.3 gridExtra_2.3
#> [4] rlang_1.1.6 magrittr_2.0.4 matrixStats_1.5.0
#> [7] compiler_4.4.1 RSQLite_2.4.3 png_0.1-8
#> [10] vctrs_0.6.5 stringr_1.5.2 pkgconfig_2.0.3
#> [13] crayon_1.5.3 fastmap_1.2.0 backports_1.5.0
#> [16] XVector_0.44.0 inum_1.0-5 rmarkdown_2.30
#> [19] UCSC.utils_1.0.0 nloptr_2.2.1 preprocessCore_1.66.0
#> [22] bit_4.6.0 xfun_0.53 zlibbioc_1.50.0
#> [25] cachem_1.1.0 flashClust_1.01-2 GenomeInfoDb_1.40.1
#> [28] jsonlite_2.0.0 blob_1.2.4 parallel_4.4.1
#> [31] cluster_2.1.8.1 R6_2.6.1 glmertree_0.2-6
#> [34] bslib_0.9.0 stringi_1.8.7 RColorBrewer_1.1-3
#> [37] boot_1.3-32 rpart_4.1.24 jquerylib_0.1.4
#> [40] Rcpp_1.1.0 iterators_1.0.14 knitr_1.50
#> [43] WGCNA_1.73 base64enc_0.1-3 IRanges_2.38.1
#> [46] Matrix_1.7-4 splines_4.4.1 nnet_7.3-20
#> [49] tidyselect_1.2.1 rstudioapi_0.17.1 yaml_2.3.10
#> [52] partykit_1.2-24 doParallel_1.0.17 codetools_0.2-20
#> [55] lattice_0.22-7 tibble_3.3.0 Biobase_2.64.0
#> [58] KEGGREST_1.44.1 S7_0.2.0 evaluate_1.0.5
#> [61] foreign_0.8-90 survival_3.8-3 Biostrings_2.72.1
#> [64] pillar_1.11.1 checkmate_2.3.3 foreach_1.5.2
#> [67] stats4_4.4.1 reformulas_0.4.1 generics_0.1.4
#> [70] S4Vectors_0.42.1 ggplot2_4.0.0 scales_1.4.0
#> [73] minqa_1.2.8 glue_1.8.0 Hmisc_5.2-4
#> [76] tools_4.4.1 data.table_1.17.8 lme4_1.1-37
#> [79] mvtnorm_1.3-3 fastcluster_1.3.0 grid_4.4.1
#> [82] impute_1.78.0 libcoin_1.0-10 rbibutils_2.3
#> [85] AnnotationDbi_1.66.0 colorspace_2.1-2 nlme_3.1-168
#> [88] GenomeInfoDbData_1.2.12 htmlTable_2.4.3 Formula_1.2-5
#> [91] cli_3.6.5 dplyr_1.1.4 gtable_0.3.6
#> [94] dynamicTreeCut_1.63-1 sass_0.4.10 digest_0.6.37
#> [97] BiocGenerics_0.50.0 htmlwidgets_1.6.4 farver_2.1.2
#> [100] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
#> [103] httr_1.4.7 GO.db_3.19.1 bit64_4.6.0-1
#> [106] MASS_7.3-65
options(old_ops)These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.