A two-dimensional scatterplot displays the relationship between two distributions in terms of their joint values, as a set of points in an 2-dimensional coordinate system. The coordinates of each point are the values of two variables for a single observation (row of data). The classic scatterplot is of two continuous variables but any combination of continuous and categorical can be plotted.
If both variables are categorical, many pairs of values may be repeated in the data because of the limited number of values for each of the two variables. In that case the more informative plot is a bubble plot. Each point in the scatterplot is replaced by a circle or bubble, the size of which reflects the joint frequency for that combination of data values. Or, the size of the bubble can represent the magnitude of a third, continuous variable.
The data values in a scatterplot can also be grouped according to the their values on a third, categorical variable. Differentiate the groups of points by a characteristic such as color. Usually after three or four levels of this categorical variable are all that can be meaningfully plotted.
For the full list of ScatterPlot()
parameters, see the manual obtained by entering ?Plot
. These listed parameters are those provided in the interactive session from interact("ScatterPlot")
.
x
-variablex
-axis from which construct the scatterplot.y
-variabley
-axis from which construct the scatterplot.by
-variable0
(none) to 1
(completely transparent)TRUE
, automatically add the 0.95 data ellipse, labeling of outliers beyond a Mahalanobis distance of 6 from the ellipse center, the best-fitting least squares line of all the data, the best-fitting least squares line of the regular data without the outliers, and a horizontal and vertical line to represent the mean of each of the two variables.x
and a horizontal line for the mean of y
"off"
, with options "loess"
for non-linear fit, "lm"
for linear model least squares, "null" for the null model, "exp"
for exponential growth or decay, "power"
for the general power model, and "quad"
for an increasing or decreasing function for the specific power value of 2. If potential outliers are identified according to out_cut, a second (dashed) fit line is displayed calculated without the outliers.x
-variable coordinatesy
-variable coordinatesx
-axis value labelsy
-axis value labels