Get started with mixqr

Kailas Venkitasubramanian

mixqr is an extensible framework for finite mixtures of quantile (and expectile) regressions: at its core it finds hidden subgroups in your data and fits a separate quantile regression in each. This page is a five-minute tour of that core; the Tutorial is the full guide, and the Extensions article covers the expectile/M-quantile families, penalized selection, and non-crossing multi-quantile estimation built on the same platform.

library(mixqr)

A two-regime example

The engine data (Brinkman 1981) record the equivalence ratio (richness of the air/fuel mix) against nitrous-oxide concentration for a test engine. A single line fits badly; there are two regimes.

fit <- mixqr(equivalence ~ nox, data = engine, tau = 0.5, m = 2,
             variance = "stochEM")
fit
#> Mixture of quantile regressions (mixqr)
#>   engine: ald   tau = 0.5   components m = 2   n = 88
#>   converged: TRUE in 19 iterations
#> 
#> Mixing probabilities (pi):
#>  comp1  comp2 
#> 0.5081 0.4919 
#> 
#> Component coefficients (beta):
#>               comp1  comp2
#> (Intercept)  1.2428 0.5568
#> nox         -0.0835 0.0909
#> 
#> logLik = 113.995   AIC = -213.99   BIC = -196.65

mixqr() has jointly (i) split the observations into two groups and (ii) estimated a median regression in each. summary() adds standard errors:

summary(fit)
#> Mixture of quantile regressions (mixqr) -- summary
#>   engine: ald   tau = 0.5   m = 2   n = 88
#> 
#> Component 1  (pi = 0.5081):
#>              Estimate   Std.Err z value Pr(>|z|)    
#> (Intercept)  1.242800  0.012233  101.59   <2e-16 ***
#> nox         -0.083498  0.006812  -12.26   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Component 2  (pi = 0.4919):
#>             Estimate Std.Err z value Pr(>|z|)    
#> (Intercept)  0.55682 0.02401  23.191   <2e-16 ***
#> nox          0.09091 0.01044   8.706   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Missing-information fraction (separability): 0.156
#> Responsibility overlap (0 = separated, 1 = overlapping): 0.129
#> 
#> logLik = 113.995 (ALD working likelihood)   AIC = -213.99   BIC = -196.65

A first picture

A little ggplot2 shows the two recovered regimes and their median lines.

library(ggplot2)

dat <- transform(engine, regime = factor(predict(fit, type = "class")))
grid <- data.frame(nox = seq(min(engine$nox), max(engine$nox), length.out = 100))
lines <- do.call(rbind, lapply(1:2, function(j) {
  data.frame(nox = grid$nox,
             equivalence = cbind(1, grid$nox) %*% fit$beta[, j],
             regime = factor(j))
}))

ggplot(dat, aes(nox, equivalence, colour = regime)) +
  geom_point(size = 2, alpha = 0.8) +
  geom_line(data = lines, linewidth = 1.1) +
  scale_colour_manual(values = c("#1b6ca8", "#e07b39")) +
  labs(x = "Nitrous oxide", y = "Equivalence ratio",
       title = "Two median regimes recovered by mixqr") +
  theme_minimal(base_size = 12)

Engine data coloured by recovered regime with two median regression lines.

Where to next

citation("mixqr")

References

Brinkman, N. D. 1981. “Ethanol Fuel – a Single-Cylinder Engine Study of Efficiency and Exhaust Emissions.” SAE Transactions 90: 1410–27.