The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
smcfcs
was originally created to create multiple
imputations of missing values of covariates in regression models. As of
2025, it has functionality to impute unobserved values of factor
variables which are ‘coarsened’, based on the developments in van der Burg et al (2025).
By coarsened, we mean that for some of the missing values, some partial
information about the value is known - we know that the value belongs to
some subset of the possible values. In this vignette we demonstrate the
functionality of smcfcs
for imputing such variables.
To demonstrate how to do this, we illustrate using the dataset
ex_coarsening
that is in the smcfcs
package:
library(smcfcs)
summary(ex_coarsening)
#> x xobs z y
#> a :15 Length:100 Min. :-2.08089 Min. :-3.013
#> b :10 Class :character 1st Qu.:-0.73366 1st Qu.:-0.179
#> c :11 Mode :character Median :-0.11602 Median : 1.169
#> NA's:64 Mean :-0.09867 Mean : 1.081
#> 3rd Qu.: 0.61186 3rd Qu.: 2.564
#> Max. : 2.06083 Max. : 4.293
head(ex_coarsening)
#> x xobs z y
#> 1 <NA> a/c -0.5898450 2.3921826
#> 2 a a -1.5314078 -3.0128176
#> 3 c c 1.3189317 3.0480379
#> 4 <NA> a/c -0.3832246 0.5695512
#> 5 <NA> b/c 0.6129756 3.1124292
#> 6 <NA> a/c -0.3664974 -2.4805336
The variable x
is a factor variable which has 64 missing
values. The variable xobs
gives the known information about
(some of) the missing values:
table(ex_coarsening$x,ex_coarsening$xobs,useNA = "ifany")
#>
#> NA a a/c b b/c c
#> a 0 15 0 0 0 0
#> b 0 0 0 10 0 0
#> c 0 0 0 0 0 11
#> <NA> 25 0 22 0 17 0
From this we can see that among the 64 missing values in
x
, for 22 individuals we know that their value for
x
was either a or c, as indicated by the string ‘a/c’, 17
individuals we know that their value for x
was either b or
c, as indicated by the string ‘b/c’, while for the remainder we have no
further information, indicated by the character string “NA”.
Note: the variable xobs
is a character variable, and
for rows where x
is (plain) missing, xobs
takes the character value “NA”, rather than R’s missing value indicator
NA. This is important, since if we used the missing value indicator NA,
smcfcs
would refused to run as we have not told it how to
impute the missing values in xobs
.
In order to impute the missing values in x
using
smcfcs
we have to define a value for the
restrictions
argument. For this we must pass a list of
length equal to the number of variables in the data frame. For the
element in this list corresponding to x
we must give a
vector of formula typ expressions to specify the possible values for
x
when xobs
equals a/c or b/c. To achieve this
we use:
restrictionsX = c("xobs = a/c ~ a + c",
"xobs = b/c ~ b + c")
restrictions = append(list(restrictionsX), as.list(c("", "", "")))
We can then impute the missing values accounting for the partial information with:
set.seed(68204812)
imps <- smcfcs(originaldata=ex_coarsening,
smtype="lm",
smformula = "y~z+x",
method = c("mlogit","", "", ""),
restrictions = restrictions
)
To check that smcfcs
has correctly used the partial
information about the missing values in x
, first we check
the first few rows in the first imputed dataset:
head(imps$impDatasets[[1]])
#> x xobs z y
#> 1 c a/c -0.5898450 2.3921826
#> 2 a a -1.5314078 -3.0128176
#> 3 c c 1.3189317 3.0480379
#> 4 a a/c -0.3832246 0.5695512
#> 5 c b/c 0.6129756 3.1124292
#> 6 a a/c -0.3664974 -2.4805336
This looks fine - when xobs=a/c
we have imputed values
either of a or c, whereas when xobs=b/c
we have imputed
values of b or c. To check properly, we can repeat the earlier
cross-tabulation:
table(imps$impDatasets[[1]]$x,imps$impDatasets[[1]]$xobs,useNA = "ifany")
#>
#> NA a a/c b b/c c
#> a 7 15 9 0 0 0
#> b 9 0 0 10 6 0
#> c 9 0 13 0 11 11
This shows that (at least in the first imputed dataset) the imputed
values respect the partial information contained in xobs
,
as desired.
The restrictions
argument can also be used for ordered
factor variables in the same way.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.