The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Controlled variable Selection with Fixed-X Knockoffs

2022-08-12

This vignette illustrates the basic usage of the knockoff package with Fixed-X knockoffs. In this scenario we make no assumptions on the distribution of the predictors (which can be considered fixed), but we assume a homoscedastic linear regression model for the response. In this scenario, knockoffs only control the FDR if used in combination with statistics that satisfy the “sufficiency” property. In particular, the default statistics based on the cross-validated lasso are not valid.

For simplicity, we will use synthetic data constructed from a linear model such that the response only depends on a small fraction of the variables.

set.seed(1234)
# Problem parameters
n = 500          # number of observations
p = 100           # number of variables
k = 30            # number of variables with nonzero coefficients
amplitude = 4.5   # signal amplitude (for noise level = 1)

# Generate the variables from a multivariate normal distribution
mu = rep(0,p)
rho = 0.25
Sigma = toeplitz(rho^(0:(p-1)))
X = matrix(rnorm(n*p),n) %*% chol(Sigma)

# Generate the response from a linear model
nonzero = sample(p, k)
beta = amplitude * (1:p %in% nonzero) / sqrt(n)
y.sample = function(X) X %*% beta + rnorm(n)
y = y.sample(X)

First examples

In order to create fixed-design knockoffs, we call knockoff.filter with the parameter statistic equal to stat.glmnet_lambdadiff. Moreover, since not all statistics are valid with fixed-design knockoffs, we use stat.glmnet_lambdasmax instead of the default one (which is based on cross-validation).

library(knockoff)
result = knockoff.filter(X, y, knockoffs = create.fixed, statistic = stat.glmnet_lambdasmax)

We can display the results with

print(result)
## Call:
## knockoff.filter(X = X, y = y, knockoffs = create.fixed, statistic = stat.glmnet_lambdasmax)
## 
## Selected variables:
##  [1]  3  5  8 10 14 18 19 21 27 28 30 31 32 33 36 38 39 40 43 44 46 48 49 52 54
## [26] 55 57 64 67 76 80 83 94 96

The default value for the target false discovery rate is 0.1. In this experiment the false discovery proportion is

fdp = function(selected) sum(beta[selected] == 0) / max(1, length(selected))
fdp(result$selected)
## [1] 0.1176471

See also

If you want to see some basic usage of the knockoff filter, see the introductory vignette. If you want to look inside the knockoff filter, see the advanced vignette.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.