The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this vignette, we demonstrate FORD algorithm in A New Measure Of Dependence: Integrated R2, a forward stepwise variable selection algorithm based on the integrated \(R^2\) dependence measure. FORD is designed for variable ranking in both linear and nonlinear multivariate regression settings.
FORD closely follows the structure of FOCI A Simple Measure Of Conditional Dependence, but replaces the core dependence measure with irdc.
Let \(Y\) be the response variable and \(\mathbf{X} = (X_1, \dots, X_p)\) the predictor variables. Given \(n\) i.i.d. samples of \((Y, \mathbf{X})\), FORD proceeds as follows:
Select \(j_1 = \arg\max_j \nu_n(Y, X_j)\)
If \(\nu_n(Y, X_{j_1}) \leq 0\), return \(\hat{V} = \emptyset\)
Iteratively add the feature that gives the maximum increase in irdc: $$ j_{k+1} = \arg\max_{j \notin {j_1, \ldots, j_k}} \nu_n(Y, (X_{j_1}, \ldots, X_{j_k}, X_j)) $$
Stop when the irdc does not increase anymore: $$ \nu_n(Y, (X_{j_1}, \ldots, X_{j_k}, X_{j_{k+1}})) \leq \nu_n(Y, (X_{j_1}, \ldots, X_{j_k})) $$
If no such \(k\) exists, select all variables.
Here, \(Y\) depends only on the first 4 features of \(X\) in a nonlinear way.
set.seed(42)
n <- 2000
p <- 100
X <- matrix(rnorm(n * p), ncol = p)
colnames(X) <- paste0("X", seq_len(p))
Y <- X[, 1] * X[, 2] + sin(X[, 1] * X[, 3]) + X[, 4]^2
result_foci_1 <- foci(Y, X, numCores = 1)
result_foci_1
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#>
#> $stepT
#> [1] 0.3356423 0.4027284 0.6226254 0.7619649
#>
#> attr(,"class")
#> [1] "foci"
result_ford_1 <- ford(Y, X, numCores = 1)
result_ford_1
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#>
#> $step_nu
#> [1] 0.3198165 0.4026348 0.6324854 0.7668089
#>
#> attr(,"class")
#> [1] "ford"
We can force both FOCI and FORD to select a specific number of variables instead of using an automatic stopping rule.
result_foci_2 <- foci(Y, X, num_features = 5, stop = FALSE, numCores = 1)
result_foci_2
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#> 5: 66 X66
#>
#> $stepT
#> [1] 0.3356423 0.4027284 0.6226254 0.7619649 0.6900384
#>
#> attr(,"class")
#> [1] "foci"
result_ford_2 <- ford(Y, X, num_features = 5, stop = FALSE, numCores = 1)
result_ford_2
#> $selectedVar
#> index names
#> <num> <char>
#> 1: 4 X4
#> 2: 1 X1
#> 3: 2 X2
#> 4: 3 X3
#> 5: 31 X31
#>
#> $step_nu
#> [1] 0.3198165 0.4026348 0.6324854 0.7668089 0.6988827
#>
#> attr(,"class")
#> [1] "ford"
FORD provides an interpretable, irdc-based alternative to FOCI for variable selection in regression tasks. It offers a principled forward selection framework that can detect complex nonlinear relationships and be adapted for fixed-size feature subsets.
For further theoretical details, see our paper:
Azadkia and Roudaki (2025), A New Measure Of Dependence: Integrated R2
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.