README

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

tlars

Description: It computes the solution path of the Terminating-LARS (T-LARS) algorithm. The T-LARS algorithm appends dummy predictors to the original predictor matrix and terminates the forward-selection process after a pre-defined number of dummy variables has been selected.

J. Machkour, M. Muma, and D. P. Palomar, “The terminating-random experiments selector: Fast high-dimensional variable selection with false discovery rate control,” arXiv preprint arXiv:2110.06048, 2022. (https://doi.org/10.48550/arXiv.2110.06048)

Note: The T-LARS algorithm is a major building block of the T-Rex selector (Paper and R package). The T-Rex selector performs terminated-random experiments (T-Rex) using the T-LARS algorithm and fuses the selected active sets of all random experiments to obtain a final set of selected variables. The T-Rex selector provably controls the false discovery rate (FDR), i.e., the expected fraction of selected false positives among all selected variables, at the user-defined target level while maximizing the number of selected variables.

In the following, we show how to use the package and give you an idea of why terminating the solution path early is a reasonable approach in high-dimensional and sparse variable selection: In many applications, most active variables enter the solution path early!

Installation

install.packages("tlars")
library(tlars)

install.packages("devtools")
devtools::install_github("jasinmachkour/tlars")

library(tlars)
help(package = "tlars")
?tlars
?tlars_model
?tlars_cpp
?plot.Rcpp_tlars_cpp
?print.Rcpp_tlars_cpp
?Gauss_data

citation("tlars")

Quick Start

In the following, we illustrate the basic usage of the ‘tlars’ package to perform variable selection in sparse and high-dimensional regression settings using the T-LARS algorithm.

library(tlars)

# Setup
n <- 75 # Number of observations
p <- 150 # Number of variables
num_act <- 3 # Number of true active variables
beta <- c(rep(1, times = num_act), rep(0, times = p - num_act)) # Coefficient vector
true_actives <- which(beta > 0) # Indices of true active variables
num_dummies <- p # Number of dummy predictors (or dummies)

# Generate Gaussian data
set.seed(123)
X <- matrix(stats::rnorm(n * p), nrow = n, ncol = p)
y <- X %*% beta + stats::rnorm(n)

set.seed(1234)
dummies <- matrix(stats::rnorm(n * num_dummies), nrow = n, ncol = num_dummies)
XD <- cbind(X, dummies)

mod_tlars <- tlars_model(X = XD, y = y, num_dummies = num_dummies)
#> Created an object of class tlars_cpp... 
#>       The first p = 150 predictors in 'XD' are the original predictors and 
#>       the last num_dummies = 150 predictors are dummies

tlars(model = mod_tlars, T_stop = 3, early_stop = TRUE) # Perform three T-LARS steps on object "mod_tlars"
#> Executing T-LARS step by reference...
#>       Finished T-LARS step(s)... 
#>           - The results are stored in the C++ object 'mod_tlars'.
#>           - New value of T_stop: 3.
#>           - Time elapsed: 0.001 sec.
print(mod_tlars) # Print information about the results of the performed T-LARS steps
#> 'mod_tlars' is a C++ object of class 'tlars_cpp' ... 
#>   - Number of dummies: 150.
#>   - Number of included dummies: 3.
#>   - Selected variables: 2, 3, 1, 148, 130, 89, 82, 12, 16, 132, 147, 123.
plot(mod_tlars) # Plot the terminated solution path

tlars(model = mod_tlars, early_stop = FALSE) # Compute the whole solution path
#> 'T_stop' is ignored. Computing the entire solution path...
#> Executing T-LARS step by reference...
#>       Finished T-LARS step(s). No early stopping! 
#>           - The results are stored in the C++ object 'mod_tlars'.
#>           - Time elapsed: 0.004 sec.
print(mod_tlars) # Print information about the results
#> 'mod_tlars' is a C++ object of class 'tlars_cpp' ... 
#>   - Number of dummies: 150.
#>   - Number of included dummies: 30.
#>   - Selected variables: 2, 3, 1, 148, 130, 89, 82, 12, 16, 132, 147, 123, 122, 65, 47, 107, 79, 54, 39, 62, 46, 116, 86, 102, 81, 129, 24, 41, 26, 83, 20, 28, 72, 63, 60, 94, 135, 6, 27, 7, 66, 98, 112, 111, 74.
plot(mod_tlars) # Plot the whole solution path

Outlook

The T-LARS algorithm is a major building block of the T-Rex selector (Paper and R package). The T-Rex selector performs terminated-random experiments (T-Rex) using the T-LARS algorithm and fuses the selected active sets of all random experiments to obtain a final set of selected variables. The T-Rex selector provably controls the FDR at the user-defined target level while maximizing the number of selected variables. If you are working in genomics, financial engineering, or any other field that requires a fast and FDR-controlling variable/feature selection method for large-scale high-dimensional settings, then this is for you. Check it out!

Documentation

Links

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.