Introduction
On this page you will find information of the psfmi package. The package contains functions to apply pooling or backward selection (BS) for logistic or Cox regression prediction models.
The basic pooling method is Rubin’s Rules. New is that for categorical predictors, different methods to derive pooled p-values are available as: the total covariance matrix (D1 method), pooling Chi-square values (D2 method), pooling Likelihood ratio statistics (method of Meng and Rubin) or pooling the median p-values (MPR rule). Moreover, two-way interaction terms between continuous, dichotomous and categorical predictors are allowed during BS. Also, all type of predictors, interaction terms or a combination, can be forced in the model during BS. For very large datasets and a large number of imputed datasets the D1 and D3 methods may be less efficient than the D2 and MPR methods.
The package also contains functions to generate apparent model performance measures over imputed datasets as ROC/AUC, Nagelkerke R-squares, Hosmer & Lemeshow test values and calibration plots. A wrapper function over Frank Harrell’s validate function is available. Bootstrap internal validation is performed in each imputed dataset and results are pooled. BS as part of internal validation is optional and recommended. A function with the name mivalext_lr can be used to externally validate prediction models in multiple imputed datasets. The following information of the externally validated model is provided: pooled ROC/AUC, (Nagelkerke) R-Square value, Hosmer and Lemeshow Test, pooled coefficients when the model is freely estimated in imputed datasets and the pooled linear predictor (LP), with information about miscalibration in intercept and slope.
Installing the psfmi package
The package can be installed from Github by running the following code in the R console window:
install.packages(“devtools”)
library(devtools)
devtools::install_github(“mwheymans/psfmi”)
library(psfmi)
Main functions
The main functions that are available in the psfmi package are:
psfmi_lr: pooling and selection of Logistic regression models in multiple imputed datasets
psfmi_coxr: pooling and selection of Cox regression models in multiple imputed datasets
miperform_lr: for performance and internal validation of logistic regression models in multiple imputed datasets
mivalext_lr: external validation in multiple imputed datasets