The Histogram

Bins

How to display the frequencies of the data values for a continuous variable such as Age, Salary, MPG, Gallons, or Height? In contrast to the relatively few unique values of a categorical value, a continuous variable has many potential unique values. How many potential values? Generally there are too many unique data values to plot each individually on a single visualization. The solution is to form bins, a sequence of adjacent, non-overlapping intervals, each generally of the same size. Each bin contains approximately equal data values.

Construct the Histogram

Place each data value for a continuous variable into its corresponding bin. Construct the histogram as a set of adjacent bars, one bar per bin. The height of each bar is proportional to the frequency of its values. Adjacent bars of a histogram share a common side, no gaps between bars to indicate the underlying continuity.

Artifacts

The final choice of bin width is subjective, so different bin widths should generally be explored beyond whatever default bin width is provided by the computer. The most efficient way to set bin width manually is to first obtain a histogram with the default bin width, then manually modify the bin width. Select a bin width to display as much detail as possible for the sample size without excessive random noise.

The bin width artifact occurs when the change in the bin width of a histogram changes the shape of the histogram. The bin shift artifact occurs when the starting point of a histogram changes, and the shape of the histogram also changes.

Parameters

For the full list of Histogram() parameters, see the manual obtained by entering ?Histogram. These listed parameters are those provided in the interactive session from interact("Histogram").

x-variable

The numerical variable from which to compute the bins and construct the histogram.

Bins

bin_width
width of the bins
bin_start
starting value of the first bin

Colors

fill
interior color of a bar
color
exterior color of a bar, i.e., its border
transparency
transparency level of 0 (none) to 1 (completely transparent)

Values, Cumulate

values
type of value displayed, "%" for percent, "input" for entered data values, "off" for none
cumulate
specifies to display the cumulative histogram or both cumulative and regular histograms with values of "off", "on", and "both".

Smooth

density
smoothed density curve, initially at a default bandwidth
show_histogram
if TRUE, show the histogram behind the density curve
rug
if TRUE, show a direct display of density as a narrow band beneath the density curve
type
general density, normal density, or both
bandwidth
determines the smoothness of the resulting density curve, larger values yield more smooth curves
fill_general
interior color of the general density curve
fill_normal
interior color of the normal density curve

Save

width
width of saved pdf file, in inches
height
height of saved pdf file, in inches