Find calibration thresholds

Usage

findTh(x, n = 1, hclustm = "ward.D2", distm = "canberra", ...)

Arguments

x
A numerical causal condition.
n
The number of thresholds to find.
hclustm
The agglomeration (clustering) method to be used.
distm
The distance measure to be used.
...
Other arguments (mainly for backwards compatibility).

Description

The purpose of this function is to automatically find calibration thresholds for a numerical causal condition, to be split into separate groups.

Details

The process of calibration into crisp sets assumes expert knowledge about the best threshold(s) which separates the raw data into the most meaningful groups.

In the absence of such knowledge, an automatic procedure might help grouping the raw data according to statistical clustering techniques.

The number of groups to split depends on the number of thresholds: one thresholds splits into two groups, two thresholds splits into three groups etc.

Previous versions of this function had an argument named groups instead of argument n, but they are bacwards compatible.

For more details about how many groups can be formed with how many thresholds, see ?cutree.

More details about the clustering techniques used in this function are found using ?hclust, and also more details about different distance measures can be found with ?dist. This function uses their default values.

Starting with version 2.2, this function changed the default values for arguments hclustm and distm. Previously, they were left at their defaults from the original functions hclust() and dist() but now they have changed to better identify groupings for QCA analysis, see examples.

The method "ward.D2" implements Ward's original clustering criterion based on minimum variance method to find compact, spherical clusters (perfect for QCA analyses), while the distance "canberra" is also well suited to find compact clusters, with the additional note that it needs non-negative values (which is also the most common case in the QCA research).

Value

A numeric vector of length n.

Examples

# hypothetical list of country GDPs, clearly separated # into either two or three groups gdp <- c(460, 500, 900, 2000, 2100, 2400, 15000, 16000, 20000) # find one threshold to separate into two groups findTh(gdp)
[1] 8700
# 8700 # find two thresholds to separate into two groups findTh(gdp, n = 2)
[1] 1450 8700
# 1450 8700 # using the (old) defaults from the original functions findTh(gdp, n = 2, hclustm = "complete", distm = "euclidean")
[1] 8700 18000
# 8700 18000 (?)

See also

cutree, hclust, dist

Author

Adrian Dusa