The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Warning message with perccalc package

Jorge Cimentada

2019-12-17

While the other vignette shows you how to use perccalc appropriately, there are instances where there’s just too few categories to estimate percentiles properly. Imagine estimating a distribution of 1:100 percentiles with only three ordered categories, it just sounds too far fetched.

Let’s load our packages.

library(perccalc)
library(dplyr)
library(ggplot2)

For example, take the survey data on smoking habits.

smoking_data <-
  MASS::survey %>% # you will need to install the MASS package
  as_tibble() %>%
  select(Sex, Smoke, Pulse) %>%
  rename(
    gender = Sex,
    smoke = Smoke,
    pulse_rate = Pulse
  )

The final results is this dataset:

## # A tibble: 237 x 3
##    gender smoke pulse_rate
##    <fct>  <fct>      <int>
##  1 Male   Never         35
##  2 Female Never         40
##  3 Female Never         48
##  4 Male   Never         48
##  5 Female Never         50
##  6 Female Regul         50
##  7 Male   Regul         54
##  8 Male   Never         55
##  9 Male   Never         56
## 10 Male   Never         59
## # … with 227 more rows

Note that there’s only four categories in the smoke variable. Let’s try to estimate the percentile difference.

smoking_data <-
  smoking_data %>%
  mutate(smoke = factor(smoke,
                        levels = c("Never", "Occas", "Regul", "Heavy"),
                        ordered = TRUE))

perc_diff(smoking_data, smoke, pulse_rate)

## Warning in perc_diff_(data_model = data_model, categorical_var =
## categorical_var, : Too few categories in categorical variable to estimate the
## variance-covariance matrix and standard errors. Proceeding without estimated
## standard errors but perhaps you should increase the number of categories

## difference         se 
##   390.6092         NA

perc_diff returns the estimated coefficient but also warns you that it’s difficult for the function to estimate the standard error. This happens similarly for perc_dist.

perc_dist(smoking_data, smoke, pulse_rate) %>%
  head()

## Warning in perc_dist(smoking_data, smoke, pulse_rate): Too few categories in
## categorical variable to estimate the variance-covariance matrix and standard
## errors. Proceeding without estimated standard errors but perhaps you should
## increase the number of categories

## # A tibble: 6 x 2
##   percentile estimate
##        <int>    <dbl>
## 1          1     24.5
## 2          2     48.4
## 3          3     71.7
## 4          4     94.3
## 5          5    116. 
## 6          6    138.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.