This vignette illustrates the basic and advanced usage of knockoff.filter
. For simplicity, we will use synthetic data constructed such that the true coefficient vector \(\beta\) has very few nonzero entries.
# Problem parameters
n = 600 # number of observations
p = 200 # number of variables
k = 30 # number of variables with nonzero coefficients
amplitude = 3.5 # signal amplitude (for noise level = 1)
# Problem data
X = matrix(rnorm(n*p), nrow=n, ncol=p)
nonzero = sample(p, k)
beta = amplitude * (1:p %in% nonzero)
y.sample <- function() X %*% beta + rnorm(n)
To begin, we call knockoff.filter
with all the default settings.
library(knockoff)
y = y.sample()
result = knockoff.filter(X, y)
print(result)
## Call:
## knockoff.filter(X = X, y = y)
##
## Selected variables:
## [1] 7 13 16 17 18 23 34 43 50 52 53 58 60 61 63 65 67
## [18] 76 85 86 87 88 96 97 101 116 128 131 140 152 165 170 177 179
## [35] 187 194
The false discovery proportion is
fdp <- function(selected) sum(beta[selected] == 0) / max(1, length(selected))
fdp(result$selected)
## [1] 0.1666667
This is below the default FDR target of 0.20.
By default, the knockoff filter uses a test statistic based on the lasso. Specifically, it uses the statistic knockoff.stat.lasso_signed_max
, which computes \[
W_j = \max(Z_j, \tilde{Z}_j) \cdot \mathrm{sgn}(Z_j - \tilde{Z}_j),
\] where \(Z_j\) and \(\tilde{Z}_j\) are the maximum values of the regularization parameter \(\lambda\) at which the \(j\)th variable and its knockoff, respectively, enter the lasso model.
The knockoff package includes several other test statistics, all of which have names prefixed with knockoff.stat
. In the next snippet, we use a statistic based on forward selection. We also set a lower target FDR of 0.10.
result = knockoff.filter(X, y, fdr = 0.10, statistic = knockoff.stat.fs)
fdp(result$selected)
## [1] 0.1176471
In addition to using the predefined test statistics, it is also possible to define your own test statistics. To illustrate this possibility, we implement one of the simplest test statistics from the knockoff filter paper, namely \[ W_j = \left|X_j^\top \cdot y\right| - \left|\tilde{X}_j^\top \cdot y\right|. \]
my_knockoff_stat <- function(X, X_ko, y) {
abs(t(X) %*% y) - abs(t(X_ko) %*% y)
}
result = knockoff.filter(X, y, statistic = my_knockoff_stat)
fdp(result$selected)
## [1] 0.2307692
The function knockoff.filter
is a wrapper around several simpler functions that
knockoff.create
)knockoff.stat
)knockoff.threshold
)These functions may be called directly if desired. For more information, see the documentation for the individual functions.
Warning. The high-level function knockoff.filter
will automatically normalize the columns of the input matrix (unless this behavior is explicitly disabled). However, all other functions in this package assume that the columns of the input matrix have unit Euclidean norm. Please be aware of these conventions.