Ordinal and Mixed PCA

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Patrick Mair, Jan De Leeuw

This vignette shows applications of principal component analysis (PCA) on ordinal and mixed input data which in Gifi slang is called princals. To be more precise, princals is a combination of multiple correspondence analysis and PCA (with splinical and ordinal options).

Ordinal PCA in a Nutshell

We use a dataset from Kennett and Salini (2012) on customer satisfaction that is included in the Gifi package.

library("Gifi")
ABC6 <- ABC[,6:11]            ## we use 6 items from the dataset
head(ABC6)
#>   equipment sales technical training purchase pricing
#> 1         1     1         1        2        2       1
#> 2         4     3         3        3        3       3
#> 3         4     3         4        4        4       3
#> 4         3     4         2        3        4       2
#> 5         4     5         4        3        4       3
#> 6         4     4         5        3        5       4

We have a sample size of 208. Each item is scaled on a 5-point rating scale (1 … very low, 5 … very high). A more detailed description can be found in the corresponding ?ABC help file. We are now going to fit an PCA with all items taken at an ordinal scale level:

fit_ordpca <- princals(ABC6, ndim = 2, levels = "ordinal")  
fit_ordpca
#> Call:
#> princals(data = ABC6, ndim = 2, levels = "ordinal")
#> 
#> Loss value: 0.683 
#> Number of iterations: 61 
#> 
#> Eigenvalues: 2.943 0.857

We obtain a loss value of 0.683. The results are prepared in a way so that they are close to the classical PCA output involving eigenvalues, loadings, and variance accounted for (VAF):

summary(fit_ordpca)
#> 
#> Loadings (cutoff = 0.1):
#>           Comp1  Comp2 
#> equipment  0.784       
#> sales      0.639  0.612
#> technical  0.674 -0.294
#> training   0.673 -0.465
#> purchase   0.681 -0.233
#> pricing    0.741  0.354
#> 
#> Importance (Variance Accounted For):
#>                  Comp1   Comp2
#> Eigenvalues     2.9433  0.8567
#> VAF            49.0555 14.2782
#> Cumulative VAF 49.0600 63.3300

We can plot the loadings, or combine them with the PC scores (which Gifi calls object scores) in a biplot.

plot(fit_ordpca, plot.type = "loadplot")

plot(fit_ordpca, plot.type = "biplot", col.loadings = "coral3", col.scores = "lightgrey")
abline(h = 0, v = 0, lty = 2)

Similar to standard PCA, one can create a scree plot (plot.type = "screeplot") to assess the number of dimensions needed.

If the user wants to have a closer insight into the optimal scaling transformations under the hood, transformation plots can be produced:

plot(fit_ordpca, plot.type = "transplot")

These plots nicely show the transformations of the original 1-5 scores into the optimally scaled category quantifications. Since in the princals() fit above we defined levels = "ordinal", a monotone regression transformation is applied resulting in these step functions. The data frame with the quantifications on the first dimension can be extracted as follows:

head(fit_ordpca$scoremat)
#>     equipment        sales    technical    training     purchase     pricing
#> 1 -0.13910817 -0.126634470 -0.117350005 -0.06629510 -0.087181534 -0.12611164
#> 2  0.02996926  0.007670437 -0.007215740 -0.01317558  0.009995573  0.01436101
#> 3  0.02996926  0.007670437  0.008750445  0.03120505  0.024097448  0.01436101
#> 4 -0.01818965  0.020671274 -0.078541390 -0.01317558  0.024097448 -0.05945442
#> 5  0.02996926  0.046994884  0.008750445 -0.01317558  0.024097448  0.01436101
#> 6  0.02996926  0.020671274  0.047727045 -0.01317558  0.059488573  0.04289496

This data frame can replace the original AB6 object for subsequent analyses.

Standard PCA with Gifi

One can also mimick a standard PCA using princals() by setting the levels argument accordingly:

fit_metpca <- princals(ABC6, ndim = 2, levels = "metric")
fit_metpca
#> Call:
#> princals(data = ABC6, ndim = 2, levels = "metric")
#> 
#> Loss value: 0.693 
#> Number of iterations: 21 
#> 
#> Eigenvalues: 2.871 0.816

The results are highly similar to the classical prcomp(ABC6, scale = TRUE) fit, and the same plots as above can be produced. Here we look again at the transformation plots, showing the linear transformation (simply a z-standardization) induced by the metric measurement level assumption.

plot(fit_metpca, plot.type = "transplot")

One can compare the loss value of 0.683 from the ordinal solution with 0.693 from the metric solution and, in conjunction with the transformation plots, judge whether these Likert items can be treated as metric.

Mixed PCA

To demonstrate PCA with mixed input scale levels, we use a subset of the data based on the Wilson-Patterson conservatism scale from MPsychoR package. For illustration purposes we create age groups so that we have an ordinal variable in the dataset.

data("WilPat2")
WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5)      
head(WilPat2)
#>   GayMarriage SexualFreedom GayAdoption GenderQuotas AffirmativeAction
#> 1           0             0           2            1                 0
#> 2           0             0           0            0                 0
#> 3           0             0           0            1                 0
#> 4           0             1           1            0                 0
#> 5           2             1           2            1                 2
#> 6           0             1           0            0                 1
#>   LegalizedMarijuana Country LibCons LeftRight Gender Age
#> 1                  2   India       4         8   Male   3
#> 2                  0   India       4         5   Male   5
#> 3                  0   India       1         6   Male   3
#> 4                  1   India       4         4 Female   4
#> 5                  2   India       2         4   Male   5
#> 6                  0   India       5         6 Female   4

The first 6 items are Wilson-Patterson items (1 = “approve”, 0 = “disapprove”, and 2 = “don’t know”) which we declare as “nominal”. This is followed by Country which is also “nominal”, and two self-reported liberalism/conservatism and left/right identification which enter as “metric”. Finally we use an “ordinal” transformation for age. The corresponding levelvec object is then shipped to princals().

levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal")    
wen_prin <- princals(WilPat2, levels = levelvec)                                          ## fit 2D princals
wen_prin
#> Call:
#> princals(data = WilPat2, levels = levelvec)
#> 
#> Loss value: 0.818 
#> Number of iterations: 45 
#> 
#> Eigenvalues: 2.348 1.647
summary(wen_prin)
#> 
#> Loadings (cutoff = 0.1):
#>                    Comp1  Comp2 
#> GayMarriage         0.697 -0.458
#> SexualFreedom       0.543 -0.219
#> GayAdoption         0.594 -0.514
#> Country            -0.724 -0.436
#> Age                -0.641 -0.387
#> LibCons                    0.564
#> GenderQuotas        0.327  0.401
#> AffirmativeAction   0.163  0.398
#> LegalizedMarijuana  0.240 -0.249
#> LeftRight          -0.273 -0.205
#> Gender              0.115 -0.209
#> 
#> Importance (Variance Accounted For):
#>                  Comp1   Comp2
#> Eigenvalues     2.3477  1.6471
#> VAF            21.3431 14.9735
#> Cumulative VAF 21.3400 36.3200

Let us look at the transformation plots of some variables:

plot(wen_prin, plot.type = "transplot", var.subset = c(1, 8, 9, 11))

Let’s produce the joint plot as the main output which combines the loadings for the non-nominal variables with the category quantifications of the nominal variables. For aesthetic reasons we re-scale the loading arrows by a factor of 0.08.

plot(wen_prin, plot.type = "jointplot", expand = 0.08)

In case we are interested in the object scores, we can ask for an object plot:

plot(wen_prin, "objplot", cex.scores = 0.6)

A biplot could also be produced that combines the loadings plot with the object plot.

Note that princals is a restricted version of homals: in princals, the category quantifications on the second dimension are linearly related to the ones from dimension 1, whereas in homals they are unrestricted. In the multiple correspondence analysis vignette we call homals() on the same data. Whether to use princals or homals on datasets like these, depends on what we are mainly interest in: category quantifications of the nominal variables (homals) or loadings of the non-nominal variables (princals). This said, as shown in Sec. 8.3.3 in Mair (2018), using the concept of copies one can obtain the same results in princals as with a homals fit. Eventually, it’s all the same (Gifi) monster.

Further technical details can be found in vignette("gifi_theory"). Elaborations on optimal scaling, further applied aspects, and examples are given in Mair (2018) and in the excellent paper by Linting et al. (2007).

Kennett, R. S., and S. Salini. 2012. Modern Analysis of Customer Surveys with Applications in R. New York: Wiley.

Linting, M., J. J. Meulman, P. J. F. Groenen, and A. J. Van Der Koojj. 2007. “Nonlinear Principal Components Analysis: Introduction and Application.” Psychological Methods 12 (3): 336–58. https://doi.org/10.1037/1082-989X.12.3.336.

Mair, P. 2018. Modern Psychometrics with R. New York: Springer-Verlag.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.