library(tidyverse)   # Data wrangling
library(tidyfit)     # Model fitting

The combination of .cv = "bootstraps" and .return_slices = TRUE in tidyfit::regress or tidyfit::classify makes it very easy to calculate bootstrap confidence intervals for estimated coefficients. As an additional convenience function, coef.tidyfit.models includes the option of adding percentile bootstrap intervals directly. In this short example, I will calculate and compare bootstrap confidence bands for a partial least squares regression and a principal components regression using Boston house price data:

data <- MASS::Boston %>% 
  scale %>% 
  as_tibble

tidyfit handles data scaling internally (i.e. PLSR and PCR are always fitted on scaled data), however, scaling the data manually here will give us standardized coefficients, which are easier to visualize and compare.

Fit the model

Instead of selecting an optimal number of latent components, I define a preset. This keeps things a little simpler. Note that dropping the ncomp = 5 argument results the optimal number of components being selected using bootstrap resampling.

model_frame <- data %>% 
  regress(medv ~ ., m("plsr", ncomp = 5), m("pcr", ncomp = 5),
          .cv = "bootstraps", .cv_args = list(times = 100), 
          .return_slices = TRUE)

The coefficients are returned for each slice when .add_bootstrap_intervals = FALSE (default behavior — see coef(model_frame)). To obtain bootstrap intervals, I pass .add_bootstrap_interval = TRUE to coef:

estimates <- coef(model_frame, 
                  .add_bootstrap_interval = TRUE, 
                  .bootstrap_alpha = 0.05)
estimates
#> # A tibble: 28 × 4
#> # Groups:   model [2]
#>    model term        estimate model_info      
#>    <chr> <chr>          <dbl> <list>          
#>  1 plsr  (Intercept) -0.00153 <tibble [1 × 2]>
#>  2 plsr  crim        -0.0755  <tibble [1 × 2]>
#>  3 plsr  zn           0.104   <tibble [1 × 2]>
#>  4 plsr  indus       -0.0332  <tibble [1 × 2]>
#>  5 plsr  chas         0.0775  <tibble [1 × 2]>
#>  6 plsr  nox         -0.200   <tibble [1 × 2]>
#>  7 plsr  rm           0.288   <tibble [1 × 2]>
#>  8 plsr  age         -0.0261  <tibble [1 × 2]>
#>  9 plsr  dis         -0.346   <tibble [1 × 2]>
#> 10 plsr  rad          0.171   <tibble [1 × 2]>
#> # … with 18 more rows

The intervals are nested in model_info:

estimates <- estimates %>% 
  unnest(model_info)
estimates
#> # A tibble: 28 × 5
#> # Groups:   model [2]
#>    model term        estimate  .upper  .lower
#>    <chr> <chr>          <dbl>   <dbl>   <dbl>
#>  1 plsr  (Intercept) -0.00153  0.0460 -0.0404
#>  2 plsr  crim        -0.0755   0.0284 -0.136 
#>  3 plsr  zn           0.104    0.191   0.0388
#>  4 plsr  indus       -0.0332   0.0369 -0.112 
#>  5 plsr  chas         0.0775   0.148   0.0133
#>  6 plsr  nox         -0.200   -0.107  -0.306 
#>  7 plsr  rm           0.288    0.430   0.133 
#>  8 plsr  age         -0.0261   0.0800 -0.131 
#>  9 plsr  dis         -0.346   -0.250  -0.450 
#> 10 plsr  rad          0.171    0.243   0.0898
#> # … with 18 more rows

Plot the results

And thus, in a concise workflow, we have 95% bootstrap confidence intervals for the coefficients of a PCR and PLS regression:

estimates %>% 
  ggplot(aes(term, estimate, color = model)) +
  geom_hline(yintercept = 0) +
  geom_errorbar(aes(ymin = .lower, ymax = .upper), position = position_dodge()) +
  theme_bw(8)

plot of chunk unnamed-chunk-8