In the social sciences, we often ask locational questions, such as:
These questions make no mention of the specific distance between relative groups and instead focus on the order of outcome magnitudes. While the statistics applied to these questions are usually variants of the general linear model, there is no reason to impose the assumption of linearity on the reality underlying these tests. One alternative is to apply the general monotone model (GeMM
) as proposed by Dougherty and Thomas (2012).
GeMM
uses a search and scale procedure to find the optimal relative weights for a set of predictors and scale these weights to minimize the order-constrained squared error. This first, computationally-intensive step is accomplished by using a genetic algorithm to optimize some fit criterion (e.g., Kendall’s \(\tau\)) between an observed outcome and a weighted set of predictors. Use of \(\tau\) in this case assures relative weights that maximally reflect the monotone relationship between the outcome and model predictions. Other fit criteria penalize for complexity, but are based on transformations of \(\tau\). We then regress the original outcome onto the relative-weighted model predictions to compute an intercept and scaling factor that minimizes squared error conditioned on this ordinal constraint.
We implement GeMM
with the gemmR
package, which uses Rcpp
to speed up repeated calculation of Kendall’s \(\tau\) for use in the genetic search process. As GeMM
serves as a functional replacement for the linear model, a similar syntax is used to fit a GeMM
model.
library(gemmR)
data(culture)
mod <- gemm(murder.rate ~ pasture + gini + gnp, data = culture, n.chains = 3,
n.gens = 10, n.beta = 200, check.convergence = TRUE)
This produces a gemm
object, which is modeled after the lm
object.
The gemmR
package includes a number of S3 methods and a few novel functions to help extract information from gemm
objects.
summary
displays some helpful information about the fitted gemm
object.
summary(mod)
## Call:
## gemm.formula(formula = murder.rate ~ pasture + gini + gnp, data = culture,
## n.chains = 3, n.gens = 10, n.beta = 200, check.convergence = TRUE)
##
## Coefficients:
## intercept pasture gini gnp
## [1,] 0.5478463 0 0.2485879 -0.0001917586
## [2,] 0.2353015 0 0.2556735 -0.0001893033
## [3,] -2.7045577 0 0.3204726 -0.0001619418
##
## bic
## [1] -45.56397 -45.08970 -43.91595
GeMM
is a stochastic process, so multiple replications are advisable to ensure stability of parameter estimates. gemm
runs four replications by default, all of which are displayed by descending value on the fit criterion.
Below the four chains are the corresponding values of the optimized fit criterion. While all fit criteria are calculated and contained in the gemm
object, only the criterion used for selection is displayed with summary
.
Though no method exists for verifying that results of a random search process on empirical data, one quick way to check the suitability of a solution is to demonstrate convergent results across starting conditions. A quick way to check genetic algorithm performance for a given dataset is to plot the best criterion value across generations and chains.
plot(mod)
The predict
function for gemm
serves two roles. The first is to generate model predictions based on the best chain of a given model. predict
will also generate the counts of concordances, disconcordances, outcome ties and predictor ties for a given model.
yhat <- predict(mod, tie.struct = TRUE)
head(yhat)
## [1] 6.5033527 7.7697543 11.0312726 8.8763036 2.1071913 0.5268963
attr(yhat, "tie.struct")
## tau.a tau.b n.pairs n.ties.1 n.ties.2 n.ties.both n.dis n.con
## 1 0.4878165 0.4879331 4186 2 0 0 1071 3113
gemmR
The information criteria calculated by gemmR
are based on ordinal statistics and cannot be directly compared with likelihood-based criteria. gemmR
includes a number of methods so that traditional information criteria can be easily extracted for comparison with other models.
logLik(mod)
## 'log Lik.' -330.7949 (df=4)
AIC(mod)
## [1] 669.5899
BIC(mod)
## [1] 679.677
Chrabaszcz, Anna, and Nan Jiang. 2014. “The Role of the Native Language in the Use of the English Nongeneric Definite Article by L2 Learners: A Cross-Linguistic Comparison.” Second Language Research 30 (3). SAGE Publications: 351–79.
Dougherty, Michael R, and Rick P Thomas. 2012. “Robust Decision Making in a Nonlinear World.” Psychological Review 119 (2). American Psychological Association: 321.
Dougherty, Michael R, Rick P Thomas, Ryan P Brown, Jeffrey S Chrabaszcz, and Joe W Tidwell. 2014. “An Introduction to the General Monotone Model with Application to Two Problematic Datasets.”
Tidwell, Joe W, Michael R Dougherty, Jeffrey R Chrabaszcz, Rick P Thomas, and Jorge L Mendoza. 2014. “What Counts as Evidence for Working Memory Training? Problems with Correlated Gains and Dichotomization.” Psychonomic Bulletin & Review 21 (3). Springer: 620–28.