findTh(x, n = 1, hclustm = "ward.D2", distm = "canberra", ...)
The purpose of this function is to automatically find calibration thresholds for a numerical causal condition, to be split into separate groups.
The process of calibration into crisp sets assumes expert knowledge about the best threshold(s) which separates the raw data into the most meaningful groups.
In the absence of such knowledge, an automatic procedure might help grouping the raw data according to statistical clustering techniques.
The number of groups to split depends on the number of thresholds: one thresholds splits into two groups, two thresholds splits into three groups etc.
Previous versions of this function had an argument named groups
instead
of argument n
, but they are bacwards compatible.
For more details about how many groups can be formed with how many thresholds,
see ?cutree
.
More details about the clustering techniques used in this function are found
using ?hclust
, and also more details about different distance measures
can be found with ?dist
. This function uses their default values.
Starting with version 2.2, this function changed the default values for arguments
hclustm
and distm
. Previously, they were left at their
defaults from the original functions hclust()
and dist()
but now they have changed to better identify groupings for QCA analysis, see examples.
The method "ward.D2" implements Ward's original clustering criterion based on minimum variance method to find compact, spherical clusters (perfect for QCA analyses), while the distance "canberra" is also well suited to find compact clusters, with the additional note that it needs non-negative values (which is also the most common case in the QCA research).
n
.
# hypothetical list of country GDPs, clearly separated # into either two or three groups gdp <- c(460, 500, 900, 2000, 2100, 2400, 15000, 16000, 20000) # find one threshold to separate into two groups findTh(gdp)[1] 8700# 8700 # find two thresholds to separate into two groups findTh(gdp, n = 2)[1] 1450 8700# 1450 8700 # using the (old) defaults from the original functions findTh(gdp, n = 2, hclustm = "complete", distm = "euclidean")[1] 8700 18000# 8700 18000 (?)