The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Countable Histograms with gf_squareplot()

library(coursekata)

Overview

gf_squareplot() creates histograms where individual data points are visible as stacked unit rectangles. Instead of abstract bars, each observation becomes a countable square, making sample size and distribution shape tangible.

This is particularly useful for teaching statistical concepts like sampling distributions and hypothesis testing, where students benefit from seeing that “n = 47” means 47 actual squares.

Basic Usage

Pass a formula and data frame, just like other gf_* functions:

gf_squareplot(~Thumb, data = Fingers)

Display Modes

The bars parameter controls how the histogram is displayed:

gf_squareplot(~Thumb, data = Fingers, bars = "outline")

Customizing Appearance

You can customize fill color, binwidth, and axis limits:

gf_squareplot(~Thumb, data = Fingers,
              fill = "coral",
              binwidth = 5,
              xrange = c(30, 90))

Integer Data

For integer-valued data with a small range, gf_squareplot() automatically selects a binwidth of 1, so each integer gets its own column:

int_data <- data.frame(rolls = sample(1:6, 30, replace = TRUE))
gf_squareplot(~rolls, data = int_data)

Large Samples

When any bin has more than 75 observations, the function automatically switches to solid bars to keep the display readable. You can opt into subdivision instead with auto_subdivide = TRUE, which splits wide bins into sub-columns so rectangles remain countable:

large_data <- data.frame(x = rnorm(500, mean = 50, sd = 10))
gf_squareplot(~x, data = large_data)

Teaching Features

Mean Line

Show a dashed line at the sample mean:

gf_squareplot(~Thumb, data = Fingers, show_mean = TRUE)

DGP Overlay

The show_dgp = TRUE option adds a teaching overlay for hypothesis testing contexts. It shows:

set.seed(42)
samp_dist <- do(100) * b1(Thumb ~ Height, data = sample(Fingers, 30))
gf_squareplot(~b1, data = samp_dist,
              show_dgp = TRUE,
              show_mean = TRUE,
              xrange = c(-0.5, 1.5),
              xbreaks = seq(-0.5, 1.5, by = 0.25))

Factor Input

When the input is a factor with numeric levels, all levels are displayed on the x-axis even if some have zero counts:

ratings <- factor(sample(1:5, 20, replace = TRUE, prob = c(1, 2, 4, 2, 1)),
                  levels = 1:5)
df <- data.frame(rating = ratings)
gf_squareplot(~rating, data = df)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.