This vignette provides an introduction to the R package ILSE
, where the function ILSE
implements the model ILSE
, Linear Regression by Iterative Least Square Estimation (ILSE) When Covariates Include Missing Values. The package can be installed with the command from Github:
library(remotes)
remotes::install_github("feiyoung/ILSE")
or install from CRAN
install.packages("ILSE")
The package can be loaded with the command:
First, we generate the data with homogeneous normal variables.
n <- 100
p <- 6
X <- MASS::mvrnorm(n, rep(0, p), cor.mat(p, rho=0.5))
beta0 <- rep(c(1,-1), times=3)
Y <- -2+ X %*% beta0 + rnorm(n, sd=1)
Then, we fit the linear regression model without missing values based on ILSE.
We can also create a (data.frame) object as input for ILSE.
dat <- data.frame(Y=Y, X=X)
ilse1 <- ilse(Y~., data=dat)
print(ilse1)
Coef(ilse1) # access the coefficients
Fitted.values(ilse1)
Residuals(ilse1)
Check the significant variables by bootstratp.
First, we randomly remove some entries in X.
mis_rate <- 0.3
set.seed(1)
na_id <- sample(1:(n*p), n*p*mis_rate)
Xmis <- X
Xmis[na_id] <- NA
ncomp <- sum(complete.cases(Xmis))
message("Number of complete cases is ", ncomp, '\n')
Second, we use lm to fit linear regression model based on complete cases, i.e., CC analysis. We can not detect any siginificant covariates.
Third, we use ILSE to fit the linear regression model based on all data.
Fourth, Bootstrap is applied to evaluate the standard error and p-values of each coefficients estimated by ILSE. We observe four significant coefficients.
In ILSE package, we also provide Full Information Maximum Likelihood for Linear Regression fimlreg. We show how to use it to handle the above missing data.
We also use bootstrap to evaluate the standard error and p-values of each coefficients estimated by ILSE. We observe only one significant coefficients.
We visualize the p-vaules of each methods , where red line denotes 0.05 in y-axis and blue line 0.1 in y-axis.
pMat <- cbind(CC=s_cc$coefficients[,4], ILSE=s2[,4], FIML=s_fiml[,4])
library(ggplot2)
df1 <- data.frame(Pval= as.vector(pMat[-1,]),
Method =factor(rep(c('CC', "ILSE", "FIML"),each=p)),
covariate= factor(rep(paste0("X", 1:p), times=3)))
ggplot(data=df1, aes(x=covariate, y=Pval, fill=Method)) + geom_bar(position = "dodge", stat="identity",width = 0.5) + geom_hline(yintercept = 0.05, color='red') + geom_hline(yintercept = 0.1, color='blue')
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=Chinese (Simplified)_China.936
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ILSE_1.1.6
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.28 R6_2.5.1 jsonlite_1.7.2 magrittr_2.0.1
#> [5] evaluate_0.14 rlang_0.4.11 stringi_1.7.5 jquerylib_0.1.4
#> [9] bslib_0.3.1 rmarkdown_2.7 tools_4.0.3 stringr_1.4.0
#> [13] xfun_0.28 yaml_2.2.1 fastmap_1.1.0 compiler_4.0.3
#> [17] htmltools_0.5.2 knitr_1.36 sass_0.4.0