The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
We propose a new dependence measure \(\nu(Y, \mathbf{X})\) (A New Measure Of Dependence: Integrated R2) to assess how much a random vector \(\mathbf{X}\) explains a univariate response \(Y\). Let \(Y\) be a random variable and \(\mathbf{X} = (X_1, \cdots, X_p)\) a random vector defined on the same probability space. Let \(\mu\) be the probability law of \(Y\), and \(S\) be the support of \(\mu\). Define:
$$ \tilde{S} = \begin{cases} S \setminus \{s_{\max}\} & \text{if } S \text{ has a maximum } s_{\max} \\ S & \text{otherwise} \end{cases} $$
We define the measure \(\tilde{\mu}\) on \(S\) as:
$$ \tilde{\mu}(A) = \frac{\mu(A \cap \tilde{S})}{\mu(\tilde{S})}, \quad \text{for measurable } A \subseteq S $$
Then the irdc dependence coefficient is defined as:
$$ \nu(Y, \mathbf{X}) := \int \frac{\mathrm{Var}(\mathbb{E}[\mathbf{1}{Y > t} \mid \mathbf{X}])}{\mathrm{Var}(\mathbf{1}{Y > t})} d\tilde{\mu}(t) $$
In contrast, A Simple Measure Of Conditional Dependence consider:
$$ T(Y, \mathbf{X}) = \frac{\int \mathrm{Var}(\mathbb{E}[\mathbf{1}{Y \ge t} \mid \mathbf{X}]) d\mu(t)}{\int \mathrm{Var}(\mathbf{1}{Y \ge t}) d\mu(t)} $$
n <- 1000
x <- matrix(runif(n * 3), nrow = n)
y <- (x[, 1] + x[, 2]) %% 1
irdc(y, x[, 1])
#> [1] 0.001002072
irdc(y, x[, 2])
#> [1] 0.04123161
irdc(y, x[, 3])
#> [1] 0.003291506
n <- 10000
s <- 0.1
x1 <- c(rep(0, n * s), runif(n * (1 - s)))
x2 <- runif(n)
y <- x1
irdc(y, x1, dist.type.X = "discrete")
#> [1] 0.9441587
irdc(y, x2)
#> [1] -0.01085533
n <- 10000
x1 <- runif(n)
y1 <- rbinom(n, 1, 0.5)
y2 <- as.numeric(x1 >= 0.5)
irdc(y1, x1, dist.type.X = "discrete")
#> [1] -0.4999146
irdc(y2, x1, dist.type.X = "discrete")
#> [1] 0.003289474
FOCI::codec(y1, x1)
#> [1] -0.006410306
FOCI::codec(y2, x1)
#> [1] 1
r_hurdle_poisson <- function(n, p_zero = 0.3, lambda = 2) {
is_zero <- rbinom(n, 1, p_zero)
rztpois <- function(m, lambda) {
samples <- numeric(m)
for (i in 1:m) {
repeat {
x <- rpois(1, lambda)
if (x > 0) {
samples[i] <- x
break
}
}
}
samples
}
result <- numeric(n)
result[is_zero == 0] <- rztpois(sum(is_zero == 0), lambda)
result
}
set.seed(123)
n <- 1000
p_zero <- 0.4
lambda <- 10
hurdle <- r_hurdle_poisson(n, p_zero, lambda)
gamma_mix <- c(rep(0, round(p_zero * n)), rgamma(round((1 - p_zero) * n), shape = lambda, rate = 1))
df <- data.frame(
value = c(hurdle, gamma_mix),
source = rep(c("Hurdle Poisson", "Gamma Mixture"), each = n)
)
ggplot(df, aes(x = value, fill = source)) +
geom_histogram(alpha = 0.5, position = "identity", bins = 40) +
labs(title = "Comparison: Hurdle Poisson vs Gamma Mixture",
x = "Value", y = "Count", fill = "Distribution") +
theme_bw()
x1 <- sort(gamma_mix)
y1 <- rbinom(n, 1, 0.5)
y2 <- sort(hurdle)
irdc(y1, x1, dist.type.X = "discrete")
#> [1] -0.5095727
irdc(y2, x1, dist.type.X = "discrete")
#> [1] 0.5443523
FOCI::codec(y1, x1)
#> [1] 0.04361745
FOCI::codec(y2, x1)
#> [1] 0.9969469
x1 <- sort(hurdle)
y1 <- rbinom(n, 1, 0.5)
y2 <- sort(gamma_mix)
irdc(y1, x1, dist.type.X = "discrete")
#> [1] -0.5030198
irdc(y2, x1, dist.type.X = "discrete")
#> [1] 0.6265961
FOCI::codec(y1, x1)
#> [1] -0.02403687
FOCI::codec(y2, x1)
#> [1] 0.9450425
irdc provides a flexible and theoretically grounded dependence measure that works for both continuous and discrete predictors.
For further theoretical details, see our paper:
Azadkia and Roudaki (2025), A New Measure Of Dependence: Integrated R2
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.