These functions interpret an expression written in a SOP - sum of products (or in canonical
DNF - disjunctive normal form), for both crisp and multivalue QCA. The function compute()
tcalculates set membership scores based on a SOP expression applied to a calibrated data set.
For crisp sets notation, upper case letters are considered the presence of that causal condition, and lower case letters are considered the absence of the respective causal condition. Tilde is recognized as a negation, even in combination with upper/lower letters.
A function similar to compute()
was initially written
by Lewandowski (2015) but the actual code in these functions has been completely re-written and
expanded with more extensive functionality (see details and examples below).
The function simplify()
transforms any expression
(most notably a POS product of sums) into a simpler sum of products, minimizing it to the simplest
equivalent logical expression. It provides a software implementation of the intersection
examples presented by Ragin (1987: 144-147), and extended to multi-value sets.
compute(expression = "", data, separate = FALSE)simplify(expression = "", snames = "", noflevels, use.tilde = FALSE)
expression |
String: a QCA expression written in sum of products form. | |||
snames |
A string containing the sets' names, separated by commas. | |||
noflevels |
Numerical vector containing the number of levels for each set. | |||
use.tilde |
Logical, use tilde to negate bivalent conditions. | |||
data |
A dataset with binary cs, mv and fs data. | |||
separate |
Logical, perform computations on individual, separate paths. |
An expression written in SOP - sum of products, is a "union of intersections", for example
A*B + B*c
.The DNF - disjunctive normal form is also a sum of products, with the restriction
that each product has to contain all literals. The equivalent expression is: A*B*c + A*B*c + a*B*c
The same expression can be written in multivalue notation: A{1}*B{1} + B{1}*C{0}
.
Both types of expressions are valid, and yield the same result on the same dataset.
For multivalue notation, causal conditions are expected as upper case letters, and they will be
converted to upper case by default. Expressions can contain multiple values to translate, separated
by a comma. If B was a multivalue causal condition, an expression could be:
A{1} + B{1,2}*C{0}
.
In this example, all values in B equal to either 1 or 2 will be converted to 1, and the rest of the (multi)values will be converted to 0.
These functions automatically detects the use of tilde "~" as a negation for a particular
causal condition. ~A
does two things: it identifies the presence of causal
condition A
(because it was specified as upper case) and it recognizes that it
must be negated, because of the tilde. It works even combined with lower case names:
~a
, which is interpreted as A
.
To negate a multivalue condition using a tilde, the number of levels should be supplied
(see examples below). Improvements in version 2.5 allow for intersections
between multiple levels of the same condition. For a causal condition with 3 levels (0, 1 and 2)
the following expression ~A{0,2}*A{1,2}
is equivalent with A{1}
,
while A{0}*A{1}
results in the empty set.
The number of levels, as well as the set names can be automatically detected from a dataset via
the argument data
. Arguments snames
and noflevels
have
precedence over data
, when specified.
The use of the product operator *
is redundant the set names are single
letters (for example AD + Bc
), and is also redundant for multivalue data, where
product terms can be separated by using the curly brackets notation.
When conditions are binary and their names have multiple letters (for example
AA + CC*bb
), the use of the product operator *
is preferable but the
function manages to translate an expression even without it (AA + CCbb
) by searching
deep in the space of the conditions' names, at the cost of slowing down for a high number of causal
conditions. For this reason, an arbitrary limit of 7 causal snames
is imposed, to write an
expression with.
For the function simplify()
, if a tilde is present in the expression,
the argument use.tilde
is automatically activated. For Boolean expressions, the simplest
equivalent logical expression can result in the empty set, if the conditions cancel each other out.
compute()
, a vector of set membership values.
For function simplify()
, a character expression.
Ragin, C.C. (1987) The Comparative Method: Moving beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press.
Lewandowski, J. (2015) QCAtools: Helper functions for QCA in R. R package version 0.1
# for compute() compute("DEV*ind + URB*STB", data = LF)[1] 0.27 0.89 0.91 0.16 0.58 0.19 0.31 0.09 0.13 0.72 0.34 0.99 0.02 0.01 0.03 [16] 0.20 0.33 0.98data(CVF) compute("DEV*ind + URB*STB", data = LF, separate = TRUE)DEV*ind URB*STB 1 0.27 0.12 2 0.00 0.89 3 0.10 0.91 4 0.16 0.07 5 0.58 0.03 6 0.19 0.03 7 0.04 0.31 8 0.04 0.09 9 0.07 0.13 10 0.72 0.05 11 0.34 0.10 12 0.06 0.99 13 0.02 0.00 14 0.01 0.01 15 0.01 0.03 16 0.03 0.20 17 0.33 0.13 18 0.00 0.98# for simplify() simplify("(A + B)(A + ~B)")S1: A# to force a certain order of the set names simplify("(URB + LIT*~DEV)(~LIT + ~DEV)", snames = "DEV, URB, LIT")S1: ~DEV*LIT + URB*~LIT# multilevel conditions can also be specified (and negated) simplify("(A{1} + ~B{0})(B{1} + C{0})", snames = "A, B, C", noflevels = c(2, 3, 2))S1: B{1} + A{1}C{0} + B{2}C{0}# in Ragin's (1987) book, the equation E = SG + LW is the result # of the Boolean minimization for the ethnic political mobilization. # intersecting the reactive ethnicity perspective (R = lw) # with the equation E (page 144) simplify("lw(SG + LW)", snames = "S, L, W, G")S1: SlwG# resources for size and wealth (C = SW) with E (page 145) simplify("SW(SG + LW)", snames = "S, L, W, G")S1: SLW + SWG# and factorized factorize(simplify("SW(SG + LW)", snames = "S, L, W, G"))F1: SW(G + L)# developmental perspective (D = Lg) and E (page 146) simplify("Lg(SG + LW)", snames = "S, L, W, G", use.tilde = TRUE)S1: LW~G# subnations that exhibit ethnic political mobilization (E) but were # not hypothesized by any of the three theories (page 147) # ~H = ~(lw + SW + Lg) = GLs + GLw + GsW + lsW simplify("(GLs + GLw + GsW + lsW)(SG + LW)", snames = "S, L, W, G")S1: sLWG + SLwG