Calculate parameters of fit

Description

This function returns inclusion (consistency), coverage, PRI for sufficiency and RoN for necessity.

Usage

pof(setms, outcome, data, relation = "necessity", inf.test = "", incl.cut = c(0.75, 0.5), add = NULL, ...)

Arguments

setms A data frame of (calibrated) set memberships, or a matrix of implicants, or a vector of row numbers from the implicant matrix, or a character expression
outcome The name of the outcome column from a calibrated data frame, or the actual numerical column from the data frame, representing the outcome.
data The calibrated data frame, in case the outcome is a name.
relation The set relation to outcome, either "necessity" or "sufficiency", partial words like "suf" being accepted (see examples).
inf.test Specifies the statistical inference test to be performed (currently only "binom") and the critical significance level. It can be either a vector of length 2, or a single string containing both, separated by a comma.
incl.cut The inclusion cutoff(s): either a single value for the presence of the output, or a vector of length 2, the second for the absence of the output.
add A function, or a list containing functions, to add more parameters of fit.
... Other arguments (mainly for backward compatibility).

Details

This is one of the most flexible functions in the QCA package. Depending on particular situations, its arguments can be provided in various formats which are automatically recognized and treated accordingly.

When specified as a data frame, the argument setms contains any kind of set membership scores:

- calibrated causal conditions from the original data,

- membership scores from the resulting combinations (component coms) of function superSubset(),

- prime implicant membership scores (component pims) from function minimize(),

- any other, custom created combinations of set memberships.

When specified as a matrix, setms contains the crisp causal combinations similar to those found in the truth table. If some of the causal conditions have been minimized, they can be replaced by the numerical value -1 (see examples section). The number of columns in the matrix should be equal to the number of causal conditions in the original data.

More generally, setms can be a numerical vector of line numbers from the implicant matrix (see function createMatrix()), which are automatically transformed into their corresponding set membership scores.

The argument setms can also be a string expression, written in sum of products (SOP) form.

For all situation when setms is something else than a data frame, it requires the original data to generate the set memberships.

If character, the argument outcome is the name of the column from the original data, to be explained (it is a good practice advice to specify it using upper case letters, although it will nevertheless be converted, by default).

If the outcome column is multi-value, the argument outcome should use the standard curly-bracket notation X{value}. Multiple values are allowed, separated by a comma (for example X{1,2}). Negation of the outcome can also be performed using the tilde ~ operator, for example ~X{1,2}, which is interpreted as: "all values in X except 1 and 2" and it becomes the new outcome to be explained.

The argument outcome can also be a numerical vector of set membership values, either directly from the original data frame, or a recoded version (if originally multi-value).

The argument inf.test provides the possibility to perform statistical inference tests, comparing the calculated inclusion score with a pair of thresholds (ic1 and ic0) specified in the argument incl.cut. Currently, it can only perform binomial tests ("binom"), which means that data should only be provided as binary crisp (not multivalue, not fuzzy).

If the critical significance level is not provided, the default level of 0.05 is taken.

The resulting object will contain the calculated p-values (pval1 and pval0) from two separate, one-tailed tests with the alternative hypothesis that the true inclusion score is:

- greater than ic1 (the inclusion cutoff for an output value of 1)

- greater than ic0 (the inclusion cutoff for an output value of 0)

It should be noted that statistical tests are performing well only when the number of cases is large, otherwise they are usually not significant.

For the necessity relation, the standard measures of inclusion and coverage are supplemented with the RoN (Relevance of Necessity) measure, as suggested by Schneider & Wagemann's (2012).

The negation of both setms and outcome is accepted and recognized using the Boolean subtraction from 1. If the names of the conditions are provided via an optional (undocumented) argument conditions, the colnames of the setms object are negated using negate().

The logical argument neg.out is deprecated, but backwards compatible. neg.out = TRUE and a tilde ~ in the outcome name don't cancel each other out, either one (or even both) signaling if the outcome should be negated.

When argument setms is a SOP expression, it is the only situation when everything (including the outcome) can be negated using lower case letters, with or without a tilde. Lower case letters and a tilde does cancel each other out, for example ~X is interpreted as x, while ~x is interpreted as X.

References

Cebotari, V.; Vink, M.P. (2013) “A Configurational Analysis of Ethnic Protest in Europe”. International Journal of Comparative Sociology vol.54, no.4, pp.298-324.

Schneider, C. and Wagemann, C. (2012) Set-Theoretic Metods for the Social Sciences. A Guide to Qualitative Comparative Analysis. Cambridge: Cambridge University Press.

Examples

# ----- # Cebotari & Vink (2013) fuzzy data data(CVF) conds <- CVF[, 1:5] PROTEST <- CVF$PROTEST # parameters of fit (default is necessity) pof(conds, PROTEST)
inclN RoN covN -------------------------------- 1 DEMOC 0.741 0.758 0.713 2 ETHFRACT 0.680 0.830 0.755 3 GEOCON 0.904 0.492 0.624 4 POLDIS 0.518 0.878 0.744 5 NATPRIDE 0.708 0.523 0.539 --------------------------------
# parameters of fit negating the conditions pof(1 - conds, PROTEST)
inclN RoN covN ---------------------------------- 1 ~DEMOC 0.564 0.736 0.601 2 ~ETHFRACT 0.661 0.684 0.614 3 ~GEOCON 0.317 0.873 0.601 4 ~POLDIS 0.631 0.517 0.493 5 ~NATPRIDE 0.597 0.952 0.899 ----------------------------------
# negating the outcome pof(conds, 1 - PROTEST)
inclN RoN covN -------------------------------- 1 DEMOC 0.618 0.682 0.580 2 ETHFRACT 0.574 0.760 0.623 3 GEOCON 0.784 0.436 0.529 4 POLDIS 0.335 0.776 0.470 5 NATPRIDE 0.932 0.622 0.693 --------------------------------
# parameters of fit for sufficiency pof(conds, PROTEST, relation = "suf")
inclS PRI covS covU --------------------------------------- 1 DEMOC 0.713 0.508 0.741 0.000 2 ETHFRACT 0.755 0.578 0.680 0.002 3 GEOCON 0.624 0.449 0.904 0.052 4 POLDIS 0.744 0.624 0.518 0.000 5 NATPRIDE 0.539 0.279 0.708 0.024 ---------------------------------------
# also negating the outcome pof(conds, 1 - PROTEST, relation = "suf")
inclS PRI covS covU --------------------------------------- 1 DEMOC 0.580 0.281 0.618 0.001 2 ETHFRACT 0.623 0.349 0.574 0.000 3 GEOCON 0.529 0.309 0.784 0.000 4 POLDIS 0.470 0.221 0.335 0.000 5 NATPRIDE 0.693 0.520 0.932 0.086 ---------------------------------------
# ----- # standard analysis of necessity # using the "coms" component from superSubset() nCVF <- superSubset(CVF, outcome = "PROTEST", incl.cut = 0.90, cov.cut = 0.6) # also checking their necessity inclusion score in the negated outcome pof(nCVF$coms, 1 - PROTEST)
inclN RoN covN -------------------------------------------------------- 1 GEOCON 0.784 0.436 0.529 2 DEMOC+ETHFRACT+geocon 0.881 0.440 0.579 3 DEMOC+ethfract+POLDIS 0.804 0.450 0.545 4 DEMOC+ETHFRACT+POLDIS 0.821 0.458 0.558 5 DEMOC+ethfract+natpride 0.809 0.476 0.560 6 DEMOC+ETHFRACT+natpride 0.801 0.463 0.550 7 DEMOC+geocon+POLDIS 0.792 0.474 0.550 8 DEMOC+geocon+natpride 0.779 0.513 0.562 9 DEMOC+POLDIS+natpride 0.708 0.492 0.515 10 ethfract+POLDIS+natpride 0.815 0.501 0.575 11 democ+ETHFRACT+POLDIS+natpride 0.843 0.491 0.584 12 ETHFRACT+geocon+POLDIS+natpride 0.768 0.533 0.567 --------------------------------------------------------
# ----- # standard analysis of sufficiency # using the "pims" component from minimize() # conservative solution cCVF <- minimize(CVF, outcome = "PROTEST", incl.cut = 0.8, details = TRUE) # verify if their negations are also sufficient for the outcome pof(1 - cCVF$pims, PROTEST)
inclN RoN covN ------------------------------------------------------------- 1 democ+ethfract+geocon 0.841 0.452 0.575 2 ethfract+geocon+poldis 0.790 0.354 0.508 3 democ+ethfract+poldis+NATPRIDE 0.892 0.259 0.526 4 democ+geocon+poldis+natpride 0.893 0.304 0.542 5 DEMOC+ETHFRACT+geocon+POLDIS+NATPRIDE 0.945 0.299 0.567 -------------------------------------------------------------
# ----- # using a SOP expression, translated using the function translate() # notice that lower case letters mean the absence of a causal condition pof("natpride + GEOCON => PROTEST", data = CVF)
inclS PRI covS covU ----------------------------------------- 1 natpride 0.899 0.807 0.597 0.042 2 GEOCON 0.624 0.449 0.904 0.349 3 expression 0.633 0.462 0.946 - -----------------------------------------
# same for the negation of the outcome pof("natpride + GEOCON => ~PROTEST", data = CVF)
inclS PRI covS covU ----------------------------------------- 1 natpride 0.561 0.156 0.381 0.018 2 GEOCON 0.529 0.309 0.784 0.421 3 expression 0.524 0.303 0.803 - -----------------------------------------
# same using lower letters for the negation pof("natpride + GEOCON => protest", data = CVF)
inclS PRI covS covU ----------------------------------------- 1 natpride 0.561 0.156 0.381 0.018 2 GEOCON 0.529 0.309 0.784 0.421 3 expression 0.524 0.303 0.803 - -----------------------------------------
# necessity is indicated by the reverse arrow pof("natpride + GEOCON <= PROTEST", data = CVF)
inclN RoN covN ---------------------------------- 1 natpride 0.597 0.952 0.899 2 GEOCON 0.904 0.492 0.624 3 expression 0.946 0.468 0.633 ----------------------------------
# ----- # more parameters of fit, for instance Haesebrouck' consistency inclH <- function(x, y) { sum(fuzzyand(x, y)) / sum(fuzzyand(x, y) + sqrt(fuzzyor(x - y, 0)*x)) } pof("natpride + GEOCON => protest", data = CVF, add = inclH)
inclS PRI covS covU inclH ------------------------------------------------ 1 natpride 0.561 0.156 0.381 0.018 0.517 2 GEOCON 0.529 0.309 0.784 0.421 0.471 3 expression 0.524 0.303 0.803 - 0.467 ------------------------------------------------

Author

Adrian Dusa

See also

minimize, superSubset, translate