How to display the frequencies of the data values for a continuous variable such as Age, Salary, MPG, Gallons, or Height? In contrast to the relatively few unique values of a categorical value, a continuous variable has many potential unique values. How many potential values? Generally there are too many unique data values to plot each individually on a single visualization. The solution is to form bins, a sequence of adjacent, non-overlapping intervals, each generally of the same size. Each bin contains approximately equal data values.
Place each data value for a continuous variable into its corresponding bin. Construct the histogram as a set of adjacent bars, one bar per bin. The height of each bar is proportional to the frequency of its values. Adjacent bars of a histogram share a common side, no gaps between bars to indicate the underlying continuity.
The final choice of bin width is subjective, so different bin widths should generally be explored beyond whatever default bin width is provided by the computer. The most efficient way to set bin width manually is to first obtain a histogram with the default bin width, then manually modify the bin width. Select a bin width to display as much detail as possible for the sample size without excessive random noise.
The bin width artifact occurs when the change in the bin width of a histogram changes the shape of the histogram. The bin shift artifact occurs when the starting point of a histogram changes, and the shape of the histogram also changes.
For the full list of Histogram()
parameters, see the manual obtained by entering ?Histogram
. These listed parameters are those provided in the interactive session from interact("Histogram")
.
x
-variable0
(none) to 1
(completely transparent)"%"
for percent, "input"
for entered data values, "off"
for none"off"
, "on"
, and "both"
.TRUE
, show the histogram behind the density curveTRUE
, show a direct display of density as a narrow band beneath the density curve