The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Tutorial: Adaptive versus regular histograms

Joe Song

Updated 2020-11-07

Adaptive histograms can reveal patterns in data more effectively than regular (equal-bin-width) histograms. The “discontinuous” style adaptive histogram is recommended because it adapts to the data without attempting to assign a non-zero density to every bin. When bins are not equal with, the vertical axis of the histogram is density instead of frequencies.

Example 1: Data from a Gaussian mixture model using one bin per cluster

Plot an adaptive histogram from data generated by a Gaussian mixture model with three components:

require("Ckmeans.1d.dp")
x <- c(rnorm(40, mean=-2, sd=0.3),
       rnorm(45, mean=1, sd=0.1),
       rnorm(70, mean=3, sd=0.2))
ahist(x, col="lightblue", sub=paste("n =", length(x)),
      col.stick="darkblue", lwd=2, xlim=c(-4,4),
      main="Example 1. Gaussian mixture model with 3 components\n(one bin per component)\nAdaptive histogram")

When breaks is specified, ahist will call hist (regular histogram function in R).

ahist(x, breaks=3, col="lightgreen", sub=paste("n =", length(x)),
      col.stick="forestgreen", lwd=2,
      main="Example 1. Regular histogram")

Example 2: Data from a Gaussian mixture model using three bins per cluster

Plot an adaptive histogram from data generated by a Gaussian mixture model with three components using a given number of bins

ahist(x, k=9, col="lavender", col.stick="navy",
      sub=paste("n =", length(x)), lwd=2,
      main="Example 2. Gaussian mixture model with 3 components\n(on average 3 bins per component)\nAdaptive histogram")

When breaks is specified, ahist will call hist (regular histogram function in R).

ahist(x, breaks=9, col="lightgreen", col.stick="forestgreen",
      sub=paste("n =", length(x)), lwd=2,
      main="Example 2. Regular histogram")

Example 3: Adaptive histogram of protein DNase

The DNase data frame has 176 rows and 3 columns of data obtained during development of an ELISA assay for the recombinant protein DNase in rat serum:

data(DNase)
res <- Ckmeans.1d.dp(DNase$density)
kopt <- length(res$size)
ahist(res, data=DNase$density, col=rainbow(kopt), col.stick=rainbow(kopt)[res$cluster],
      sub=paste("n =", length(x)), border="transparent",
      xlab="Optical density of protein DNase",
      main="Example 3. Elisa assay of DNase in rat serum\nAdaptive histogram")

Using the same data with Example 3, this example demonstrates the inadequacy of equal-bin-width histograms. The third bin gives a false sense of sample distribution.

We can specifiy breaks=“Sturges” in ahist() function to use equal-bin-width histograms. The difference is that sticks are added to the histogram by ahist(), but not by the R provided hist() function.

ahist(DNase$density, breaks="Sturges", col="palegreen",
      add.sticks=TRUE, col.stick="darkgreen",
      main="Example 3. Elisa assay of DNase in rat serum\nRegular histogram (equal bin width)",
      xlab="Optical density of protein DNase")

Example 4. Repetitive data

Cluster data with repetitive elements:

x <- c(1,1,1,1, 3,4,4, 6,6,6)
ahist(x, k=c(2,4), col="gray",
      lwd=2, lwd.stick=6, col.stick="chocolate",
      main="Example 4. Adaptive histogram of repetitive elements")
ahist(x, breaks=3, col="lightgreen",
      lwd=2, lwd.stick=6, col.stick="forestgreen",
      main="Example 4. Regular histogram")
## Warning in cluster.1d.dp(x, k, y, method, estimate.k, "L2", deparse(substitute(x)), : Max number of clusters is greater than the unique number of
## elements in the input vector, and k.max is set to the number of
## unique number of input values.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.