The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: VALID Inference for Clusters Separation Testing
Version: 0.1.0
Author: Benjamin Hivert
Maintainer: Benjamin Hivert <benjamin.hivert@u-bordeaux.fr>
Description: Given a partition resulting from any clustering algorithm, the implemented tests allow valid post-clustering inference by testing if a given variable significantly separates two of the estimated clusters. Methods are detailed in: Hivert B, Agniel D, Thiebaut R & Hejblum BP (2022). "Post-clustering difference testing: valid inference and practical considerations", <doi:10.48550/arXiv.2210.13172>.
Encoding: UTF-8
RoxygenNote: 7.2.2
Imports: diptest, dplyr
Depends: R (≥ 3.6)
License: MIT + file LICENSE
NeedsCompilation: no
Packaged: 2022-11-30 15:06:52 UTC; benjaminhivert
Repository: CRAN
Date/Publication: 2022-12-01 08:20:02 UTC

Merged version of the selective test

Description

Merged version of the selective test

Usage

merge_selective_inference(X, k1, k2, g, ndraws = 2000, cl_fun, cl)

Arguments

X

The data matrix of size on which the clustering is applied

k1

The first cluster of interest

k2

The second cluster of interest

g

The variables for which the test is applied

ndraws

The number of Monte-Carlo samples

cl_fun

The clustering function used to build clusters

cl

The labels of the data obtained thanks to the cl_fun function

Value

A list with the following elements

Examples

X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=4)))}
cl <- hcl_fun(X)
plot(X, col=cl)
#Note that in practice the value of ndraws (the number of Monte-Carlo simulations must be higher)
test_var1 <- test_selective_inference(X, k1=1, k2=4, g=1, ndraws =100, cl_fun = hcl_fun, cl = cl)

Multimodality test for post clustering variable involvement

Description

Multimodality test for post clustering variable involvement

Usage

test_multimod(X, g, cl, k1, k2)

Arguments

X

The data matrix of size on which the clustering is applied

g

The variable on which the test is applied

cl

The labels of the data obtained thanks to a clustering algorithm

k1

The first cluster of interest

k2

The second cluster of interest

Value

A list containing : A list with the following elements

Examples

X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=2)))}
cl <- hcl_fun(X)
plot(X, col=cl)
test_var1 <- test_multimod(X, g=1, k1=1, k2=2, cl = cl)
test_var2 <- test_multimod(X, g=2, k1=1, k2=2, cl = cl)

Selective inference for post-clustering variable involvement

Description

Selective inference for post-clustering variable involvement

Usage

test_selective_inference(
  X,
  k1,
  k2,
  g,
  ndraws = 2000,
  cl_fun,
  cl = NULL,
  sig = NULL
)

Arguments

X

The data matrix of size on which the clustering is applied

k1

The first cluster of interest

k2

The second cluster of interest

g

The variables for which the test is applied

ndraws

The number of Monte-Carlo samples

cl_fun

The clustering function used to build clusters

cl

The labels of the data obtained thanks to the cl_fun function

sig

The estimated standard deviation. Default is NULL and the standard deviation is estimated using only observations in the two clusters of interest

Value

A list with the following elements

Note

This function is adapted from the clusterpval::test_clusters_approx() of Gao et al. (2022) (available on Github: https://github.com/lucylgao/clusterpval)

References

Gao, L. L., Bien, J., & Witten, D. (2022). Selective inference for hierarchical clustering. Journal of the American Statistical Association, (just-accepted), 1-27.

Examples

X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=2)))}
cl <- hcl_fun(X)
plot(X, col=cl)
#Note that in practice the value of ndraws (the number of Monte-Carlo simulations must be higher)
test_var1 <- test_selective_inference(X, k1=1, k2=2, g=1, ndraws =100, cl_fun = hcl_fun, cl = cl)



These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.