Multiple Correspondence Analysis

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Patrick Mair, Jan De Leeuw

This vignette shows an application of multiple correspondence analysis (MCA) on nominal and mixed data which in Gifi slang is called homals. To be more precise, homals is MCA with splines and ordinal restrictions.

Multiple Correspondence Analysis in a Nutshell

We start with using 6 items from the Wilson-Patterson scale (gay marriage, sexual freedom, gay adoption, gender quotas, affirmative action, and legalized marijuana), each of them with response categories 1 = “approve”, 0 = “disapprove”, and 2 = “don’t know”, and the country of the participant (India, Hungary). The full version of this dataset is available in the MPsychoR package (Mair 2018).

library("Gifi")
data("WilPat2")
WP6 <- WilPat2[,1:7]   
head(WP6)
#>   GayMarriage SexualFreedom GayAdoption GenderQuotas AffirmativeAction
#> 1           0             0           2            1                 0
#> 2           0             0           0            0                 0
#> 3           0             0           0            1                 0
#> 4           0             1           1            0                 0
#> 5           2             1           2            1                 2
#> 6           0             1           0            0                 1
#>   LegalizedMarijuana Country
#> 1                  2   India
#> 2                  0   India
#> 3                  0   India
#> 4                  1   India
#> 5                  2   India
#> 6                  0   India

We have a sample size of 493 participants. To fit an two-dimensional MCA we can simply say

fit_homwp <- homals(WP6)
fit_homwp
#> Call:
#> homals(data = WP6)
#> 
#> Loss value: 0.717 
#> Number of iterations: 14 
#> 
#> Eigenvalues: 2.097 1.867

By default, ndim = 2 and levels = "nominal". The main output of an MCA is the symmetric map (aka joint plot) that shows the category quantifications in a 2D space.

plot(fit_homwp, plot.type = "jointplot")

Each item category gets a score in the 2D space. Categories belonging to the same item are presented in the same color. If we are are mostly interested in how the 2 countries are related to each other and to the item responses, we can color the plot accordingly.

colvec <- c(rep("gray", 6), "coral4")
plot(fit_homwp, plot.type = "jointplot", col.points = colvec)

Other plotting options are scree plots (plot.type = "screeplot"), biplots (plot.type = "biplot"), as well as the object plot (plot.type = "objplot") for the object scores.

To get a more nuanced inside into the optimal scaling transformations of each variable, we can produce a transformation plot which shows the category transformations on both dimensions (D1 in “black”, D2 in “red”).

plot(fit_homwp, plot.type = "transplot")

As opposed to princals, these transformations are not linearly restricted. Gifi calls this multiple nominal.

Mixed Input Data

In the Gifi system, MCA and PCA (and everything in-between) are essentially the same thing. In the princals vignette we demonstrate how to fit a mixed PCA. We now fit a mixed MCA on the same data. The only difference between these two Gifi incarnations is that transformations in princals are, by default, linearly restricted, whereas in homals they are not. In princals this restriction can be relaxed by the concept of copies so that both methods give the same results (see Mair 2018, Sec. 8.3.3).

To demonstrate MCA with mixed input scale levels, we use the same data as above but add two self-reported liberalism/conservatism and left/right identification which enter as “metric”, gender (“nominal”), and age as “ordinal”.

data("WilPat2")
WilPat2$Age <- cut(WilPat2$Age, breaks = c(17, 20, 23, 30, 40, 100), labels = 1:5)      
head(WilPat2)
#>   GayMarriage SexualFreedom GayAdoption GenderQuotas AffirmativeAction
#> 1           0             0           2            1                 0
#> 2           0             0           0            0                 0
#> 3           0             0           0            1                 0
#> 4           0             1           1            0                 0
#> 5           2             1           2            1                 2
#> 6           0             1           0            0                 1
#>   LegalizedMarijuana Country LibCons LeftRight Gender Age
#> 1                  2   India       4         8   Male   3
#> 2                  0   India       4         5   Male   5
#> 3                  0   India       1         6   Male   3
#> 4                  1   India       4         4 Female   4
#> 5                  2   India       2         4   Male   5
#> 6                  0   India       5         6 Female   4

levelvec <- c(rep("nominal", 6), "nominal", "metric", "metric", "nominal", "ordinal")    
wen_hom <- homals(WilPat2, levels = levelvec) 
wen_hom
#> Call:
#> homals(data = WilPat2, levels = levelvec)
#> 
#> Loss value: 0.793 
#> Number of iterations: 60 
#> 
#> Eigenvalues: 2.577 1.972

This result is different from the corresponding princals(WilPat2, levels = levelvec) fit. The usual plots can be produced but they are getting cluttered since each category gets its own score, also for variables like LibCons, LeftRight, and Age.

Mair, P. 2018. Modern Psychometrics with R. New York: Springer-Verlag.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.