Introduction to CALMs

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

One of the most basic analyses in social sciences is the ability to compare groups. However, this seemingly straightforward task is often complicated by two significant challenges: ensuring group equivalence and assessing measurement invariance. If not adequately addressed, these challenges can lead to misleading conclusions and undermine the validity of research findings. This application aims to make the process accessible by integrating several advanced statistical techniques including:

Propensity score analysis (group equivalence checking and propensity score matching)
Measurement invariance tests (full and partial)
Structural invariance tests (full and partial)

Launching CALMs

You can launch the CALMs in one of two ways:

1. Web Access (No Installation Required)

Simply visit: evaluent.shinyapps.io/CALMs

No setup or installation is needed. The application runs directly in your browser.

2. Local Access (Installation Required)

A. Open R or RStudio

Make sure R or R studio is installed on your system. Open the application to begin.

B. Install the CALMs Package

Run the following in the R Console:

install.packages("calms",dependencies=TRUE)

This will install the CALMs package from a local tarball file rather than a CRAN repository to maintain author anonymity while under peer-review.

C. Run the CALMs Package

After installing the CALMs package, run the application locally via R with run_calms():

calms::run_calms()

Built-in Dataset

Users can run CALMs analyses using the dataset built into the application. This built-in dataset is a subset of data from Work Orientations IV – ISSP 2015 (ISSP Research Group, 2017) and is included with permission from the ISSP Research Group. The subset and modifications applied to the original dataset were generated using the following code:

###Load necessary packages
library(foreign)
library(haven)

### Read in data set without labels
dso <- 
  read.spss("ZA6770_v2-1-0.sav", 
  use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
nrow(dso)
names(dso)

### Read in data set with labels
dsoa <- 
  read.spss("ZA6770_v2-1-0.sav", 
  use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
nrow(dsoa)
names(dsoa)

### Select only needed columns
#quality of job content (JC: v22-v24) and quality of work environment (WE: v25-v27)
#demographics:SEX,EMPREL,TYPORG2,DEGREE
ds<-subset(dso,select=c(country,v22:v27,SEX,DEGREE,EMPREL,TYPORG2))
names(ds)

ds[,c("country","SEX","DEGREE","EMPREL","TYPORG2")]<-dsoa[,c("country","SEX","DEGREE","EMPREL","TYPORG2")]

###Get data for the groups (i.e., countries)
#country numerical codes in SPSS: UK = 826, US = 840
table(ds$country)
ds<-subset(ds,(country=="GB-Great Britain and/or United Kingdom" | country=="US-United States"))
ds$country<-factor(ds$country)
table(ds$country)
nrow(ds)

###getting rid of missing values 
nrow(ds)
ds<-na.omit(ds)
nrow(ds)

###check values
table(ds$SEX)
table(ds$DEGREE)
table(ds$EMPREL)
table(ds$TYPORG2)
table(ds$country)

levels(ds$EMPREL)<-c("Employee","Self-employed","Self-employed",NA)
levels (ds$DEGREE)<-c(rep("no univ",5),rep("univ",2))

###getting rid of missing values 
nrow(ds)
ds<-na.omit(ds)
nrow(ds)

ds$SEX
levels(ds$SEX)
levels(ds$SEX)<-c(1,0)     #Set "Male" to 1
 
levels(ds$EMPREL)
levels(ds$EMPREL)<-c(0,1)  #Set "Employee" to 1

levels(ds$TYPORG2)
levels(ds$TYPORG2)<-c(0,1) #Set "Private employer" to 1
 
levels(ds$DEGREE)
levels(ds$DEGREE)<-c(0,1)  #Set "univ" to 1

levels(ds$country)
levels(ds$country)<-c(1,0) #Set "US-United States" to 1
 
ds$SEX<-as.numeric(ds$SEX)-1
ds$EMPREL<-as.numeric(ds$EMPREL)-1
ds$TYPORG2<-as.numeric(ds$TYPORG2)-1
ds$DEGREE<-as.numeric(ds$DEGREE)-1
ds$country<-as.numeric(ds$country)-1

nrow(ds)
names(ds)

write_sav(ds,"WosDemo.sav")

Providing Your Own Data

Users can run CALMs analyses on their own datasets. To do so, they must upload two files simultaneously from the same directory:

A data file containing the dataset to be analyzed.
A corresponding meta file that provides information about the dataset.

The CALMs application supports data files in .csv, .dat, and .sav formats.

The meta file must be a .csv and be named such that the last four characters of the file name are “Meta” (e.g., My_Meta.csv, Meta.csv). The meta file must have the column names of itemo, item, type, scale, ds, and missing.

The column labeled itemo identifies the original variable names in the file containing the data to be analyzed.
The column labeled item identifies corresponding new variable names that will be used when creating the cleaned dataset.
The column labeled type identifies whether the item is a scale item, covariate, or grouping variable.
The column labeled scale identifies the name of the corresponding scale for scale items.
The column labeled ds identifies the name of the dataset uploaded that will be cleaned and subsequently analyzed. Note that the user can upload a dataset with any number of other variables in addition to those identified in the meta file.
The column labeled missing identifies a numeric value that denotes how missing values are coded.

A sample meta file is provided below that corresponds to a subset of the 2015 Work Orientations dataset (ISSP Research Group, 2017) that is built into the CALMs application for demonstration purposes.

     itemo       item  type scale          ds missing
1  country        USA group       WosDemo.sav      NA
2      v22        JC1  item    JC                  NA
3      v23        JC2  item    JC                  NA
4      v24        JC3  item    JC                  NA
5      v25        WE1  item    WE                  NA
6      v26        WE2  item    WE                  NA
7      v27        WE3  item    WE                  NA
8      SEX       Male   cov                        NA
9   DEGREE UnivDegree   cov                        NA
10  EMPREL    SelfEmp   cov                        NA
11 TYPORG2 PrivateOrg   cov                        NA

Features Overview

The CALMs Shiny application is organized with multiple tabs that each serve a specific purpose:

Read Me - Displays user manual for the CALMs application.
View Me - Displays demonstration video of the CALMs application.
View Data – Displays the dataset selected.
Downloads – Contains links to files that the user can download.
Check Group Equivalency – Checks group equivalence of selected covariates.
Propensity Score Analysis Setup – Provides default call to matchit which the user can override and create custom call.
Propensity Score Analysis Results – Upon execution of analysis, provides results of propensity score analysis.
Measurement Invariance – Upon execution of analysis, provides results of measurement invariance tests for the grouping variable and items to analyze selected. User may also override the default to use matched data for invariance tests.
Metric Invariance – Upon execution of analysis, provides results of metric invariance tests for the grouping variable, items to analyze, and scale to analyze. User may also override the default to use matched data for invariance tests and modify the alpha for model comparison decisions.
Scalar Invariance - Upon execution of analysis, provides results of scalar invariance tests for the grouping variable, items to analyze, and scale to analyze. User may also override the default to use matched data for invariance tests, modify the alpha for model comparison decisions, and select items with loadings to freely estimate.
Structural Invariance - Upon execution of analysis, provides results of structural invariance tests for the grouping variable and items to analyze. User may also override the default to use matched data for invariance tests, select items with loadings to freely estimate, select items with intercepts to freely estimate, and select scales with means to freely estimate.

Example Workflow

This section walks through the CALMs application interface using screenshots for illustrative purposes. Specifically, we analyze data from the 2015 Work Orientations Survey that includes responses from the United States (USA) and the United Kingdom (UK; ISSP Research Group, 2017). The 2015 Work Orientations dataset is from an international project that began in 1984 and was collected across 37 countries (ISSP Research Group, 2017).

The portion of the 2015 Work Orientations dataset used for the demonstration includes 1,477 responses from the USA and 1,793 responses from the UK. We specifically chose the stated two countries because full scalar invariance was not supported in previous measurement invariance studies using the constructs quality of job context (JX), quality of job content (JC), and quality of work environment (WE) in the measurement model using the 1989 Work Orientations dataset (Cheung & Lau, 2012; Cheung & Rensvold, 1999).

The 2015 Work Orientations dataset provided data for two of these previously utilized constructs, JC and WE (ISSP Research Group, 2017). Each construct is measured by three items, scored on a five-point Likert-type scale ranging from 1 (strongly agree) to 5 (strongly disagree). Figure 1 depicts the 2-factor measurement model used in the illustrative example. What follows next is a recommend set of steps to comprehensively analyze the latent means of JC and WE by country, where country is either USA or UK. Note that researchers may choose to use the application in a different way that the example workflow and skip tests if that fits their research scenario.

Figure 1. Measurement Model

Step 1: Load Data

Users can either use the built-in dataset by leaving Use 2015 Work Orientations Survey Data selected, or upload their own data by deselecting this option.

To upload your own dataset and accompanying *Meta.csv file, follow the steps shown in the GIF below.

Step 2: View Data

The labeling of the items in the original dataset (ISSP Research Group, 2017) was not intuitive for our illustrative example; hence, the original items were renamed as previously described and as depicted in Figure 2.

Figure 2. View Data Tab

Step 3: Check Group Equivalency

CALMs uses the MatchIt package in R (Ho et al., 2011) for propensity score analyis including checking for group equivalency. The comparison groups for the demonstration with the 2015 Work Orientations Survey data are the USA and the UK. Hence, USA was selected as the Grouping Variable. All possible covariates were selected as Covariates to Check.

Figure 3. Check Group Equivalency Tab

The output in Figure 3 shows that there are statistically (p < .05) and practically significant (Cramer’s V > .10) differences in employment type and degree by country. Specifically, employment type (SelfEmp) was found to be statistically significant while organization type (PrivateOrg) was found to be both statistically and practically significant.

Step 4: Propensity Score Analysis Setup

CALMs offers users the flexibility to use either a default call to MatchIt or to define a custom call. A link to the matchIt documentation is included within the application.

To customize the call, deselect Use Default call to matchit and edit the arugments following data=dpsm in the provided code box.

Two propensity score matching (PSM) methods, nearest neighbor and genetic matching, are the most common.

Nearest neighbor matching requires the input of all demographic variables and has been recommended as the most straightforward PSM method (Caliendo & Kopeinig, 2008; Keiffer & Lane, 2016). Although nearest neighbor is computationally effective with large datasets, the method pairs each treated unit with its nearest matching control without considering fully optimizing matches. This may lead to covariates not being optimally balanced with less equivalent groups, as compared to more stringent or robust matching methods.
Genetic matching is recommended when PSM output is required to have highly equivalent groups as it effectively achieves good matching balance even with highly complex data (Randolph et al., 2014). Genetic matching requires the input of all demographic variables (e.g., gender, age group, race/ethnicity, and educational level) into an algorithm that results in statistically (e.g., p ≤ .05) and practically (e.g., Cramer’s V ≥ .10) significant differences.

By default, CALMs uses the nearest neighbor method.

Figure 4. Propensity Score Analysis Setup Tab

Step 5: Propensity Score Analysis

Figure 5 presents the result of the propensity score analysis using the default call previously described.

Figure 5. Propensity Score Analysis Results Tab

The nearest neighbor method yielded two equivalent groups with 769 responses in each country. We observed that there were statistically significant differences in gender (Male) by country and elected to use the results of the nearest neighbor method since there was not a practically significant difference by country (all Cramer’s V < .10).

Step 6: Measuremenent Invariance Tests

When conducting measurement invariance tests, the application defaults to using the matched dataset.

To change this, users can deselect Use matched data for invariance tests.

Users can also select the Grouping Variable and Items to Analyze. By default, the application include all items identified in the *Meta.csv file as type item.

Measurement invariance tests include configural, metric, and scalar. Omnibus and scale-level tests are provided for both metric and scale invariance tests. Commonly recommended fit indices criteria include: (a) comparative fit index (CFI) ≥ .95; (b) standardized root-mean-square residuals (SRMR) ≤ .05; and (c) root-mean-squared-error of approximation (RMSEA) .05 to .08 (Kline, 2016; Schumacker & Lomax, 2016).

Statistically significant model noninvariance is determined based on the p-value of the χ² difference test at p ≤ .05 (Cheung & Rensvold, 1999; van de Schoot et al., 2012). Guidelines have been provided to evaluate the ΔCFI for practical model (non)invariance, namely: (a) practical model invariance for ΔCFI ≥ -.01; (b) potential practical model noninvariance for ΔCFI between -.01 and -.02; and (c) practical model noninvariance for ΔCFI ≤ -.02 (Cheung & Rensvold, 2002).

Figure 6. Measurement Invariance Tab

The results of measurement invariance tests (see Figure 6) indicated that the configural model showed good fit. This model indicated good fit with a SRMR = .031 and CFI = .956. The metric model was compared to the configural model and met criteria for both statistical and practical invariance (Δχ²[4] = 8.484, p = .075; ΔCFI = -.004). However, the data did not reach the thresholds for scalar invariance (Δχ²[4] = 55.772, p < .001; ΔCFI = -.042). Both the JC (Δχ²[2] = 24.996, p < .001; = -.019) and WE (Δχ²[2] = 30.796, p < .001; ΔCFI = -.023) scales demonstrated evidence of scalar non-invariance.

Step 7: Metric Invariance Tests

In our illustrative example, it was not necessary to conduct follow-up tests for metric invariance as neither the omnibus test nor the scale-level tests for JC and WE indicated evidence of metric non-invariance.

However, for demonstration purposes, we conducted metric invariance tests specifically on the JC scale.

Note that the application uses the p-value of the χ² difference test when determining invariant subsets of items (Cheung & Rensvold, 1999). The default significance level (alpha) is set to .05, but users may adjust this threshold as needed. In this example, we set the alpha to .01 (see Figure 7).

Figure 7. Metric Invariance Tab

The factor ratio test (see Figure 7) confirmed that all JC items were metric invariant. Similarly, all WE items were metric invariant (tests not shown).

Step 8: Scalar Invariance Tests

Because full scalar invariance was not demonstrated, we conducted partial MI testing on each scale. Had we determined that the factor loadings were non-invariant at the metric invariance assessment, we could have allowed a set of loadings to be freely estimated to allow for a partial scalar invariance assessment.

Figure 8. Scalar Invariance Tab

The factor ratio test (see Figure 8) identified JC2 and JC3 as a invariant subset of JC items (p > .01). Similarly, WE1 and WE2 were identified (tests not shown) as a invariant subset of WE items (p > .01).

Based on the results of the scalar invariance assessment, the intercepts for WE3 and JC1 should be freely estimated to account for the partial scalar invariance.

Step 9: Structural Invariance Tests

Building on the results of the scalar invariance testing, we allowed the intercepts for WE3 and JC1 to be freely estimated. Structural invariance is given when the comparison between an unconstrained and a constrained structural model yields a non-significant χ² difference (p > .05) and a non-significant CFI difference (Cheung & Rensvold, 1999; Cheung & Rensvold, 2002; Kline, 2016; Schumacker & Lomax, 2016).

Figure 9. Structural Invariance Tab

The results indicate that the set of scales met the criteria for structural invariance. Although the structural invariance model was statistically significantly different from the scalar model (Δχ²[2] = 6.374, p = .041), the difference was not practically significant (ΔCFI = -.004; see Figure 9). However, considering only JC, a statistically significant latent mean difference was observed (-.077, p = .013). Given that the latent mean for the USA was constrained to zero, the negative estimate indicates that the latent mean for the UK is lower in JC. There was no significant latent mean difference for WE across the two countries (-.047, p = .326).

References

Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22(1), 31–72. https://doi.org/10.1111/j.1467-6419.2007.00527.x

Cheung, G. W., & Lau, R. S. (2012). A direct comparison approach for testing measurement invariance. Organizational Research Methods, 15(2), 167–198. https://doi.org/10.1177/1094428111421987

Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A reconceptualization and proposed new method. Journal of Management, 25(1), 1–27. https://doi.org/10.1177/014920639902500101

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5

Ho, D., Imai, K., King, G., & Stuart, E. (2011). MatchIt: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 42(8), 1–28. https://doi.org/10.18637/jss.v042.i08

ISSP Research Group (2017). International social survey programme: Work orientations IV – ISSP 2015. GESIS data archive, Cologne. ZA6770 data file version 2.1.0, https://doi.org/10.4232/1.12848

Keiffer, G. L., & Lane, F. C. (2016). Propensity score analysis: An alternative statistical approach for HRD researchers. European Journal of Training and Development, 40(8/9), 660–675. https://doi.org/10.1108/EJTD-06-2015-0046

Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). New York: The Guilford Press.

Randolph, J. J., Falbe, K., Manuel, A., & Balloun, J. (2014). A step-by-step guide to propensity score matching in R. Practical Assessment, Research & Evaluation, 19, 1–6. https://doi.org/10.7275/n3pv-tx27

Schumacker, R. E., & Lomax, R. G. (2016). A beginner’s guide to structural equation modeling (4th ed.). New York: Routledge.

van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9(4), 486–492. https://doi.org/10.1080/17405629.2012.686740

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.