The Scatterplot

Concepts

A two-dimensional scatterplot displays the relationship between two distributions in terms of their joint values, as a set of points in an 2-dimensional coordinate system. The coordinates of each point are the values of two variables for a single observation (row of data). The classic scatterplot is of two continuous variables but any combination of continuous and categorical can be plotted.

If both variables are categorical, many pairs of values may be repeated in the data because of the limited number of values for each of the two variables. In that case the more informative plot is a bubble plot. Each point in the scatterplot is replaced by a circle or bubble, the size of which reflects the joint frequency for that combination of data values. Or, the size of the bubble can represent the magnitude of a third, continuous variable.

The data values in a scatterplot can also be grouped according to the their values on a third, categorical variable. Differentiate the groups of points by a characteristic such as color. Usually after three or four levels of this categorical variable are all that can be meaningfully plotted.

Parameters

For the full list of ScatterPlot() parameters, see the manual obtained by entering ?Plot. These listed parameters are those provided in the interactive session from interact("ScatterPlot").

x-variable

The numerical variable for the x-axis from which construct the scatterplot.

y-variable

The numerical variable for the y-axis from which construct the scatterplot.

by-variable

Optional categorical variable to group the points by color according to the level of the variable.

Points

fill
interior color of a bar
color
exterior color of a bar, i.e., its border
transparency
transparency level of 0 (none) to 1 (completely transparent)
size
size of the plotted points
shape
shape of the plotted points

Line, Ellipse, Outliers

enhance
if TRUE, automatically add the 0.95 data ellipse, labeling of outliers beyond a Mahalanobis distance of 6 from the ellipse center, the best-fitting least squares line of all the data, the best-fitting least squares line of the regular data without the outliers, and a horizontal and vertical line to represent the mean of each of the two variables.
add="means"
add a vertical line for the mean of x and a horizontal line for the mean of y
fit
fit line. Default value is "off", with options "loess" for non-linear fit, "lm" for linear model least squares, "null" for the null model, "exp" for exponential growth or decay, "power" for the general power model, and "quad" for an increasing or decreasing function for the specific power value of 2. If potential outliers are identified according to out_cut, a second (dashed) fit line is displayed calculated without the outliers.
ellipse
add an elipse with the specified confidence level
MD_cut
Mahalanobis distance that defines an outlier
ID
Name of variable for which to label outliers, must have unique values

Jitter

jitter_x
jitter applied to the x-variable coordinates
jitter_y
jitter applied to the y-variable coordinates

Rotate Labels

rotate_x
rotation in degrees applied to the x-axis value labels
rotate_y
rotation in degrees applied to the y-axis value labels
offset
amount of space the value labels are shifted off of the corresponding axis

Save

width
width of saved pdf file, in inches
height
height of saved pdf file, in inches