The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

We propose a new dependence measure $\nu(Y, \mathbf{X})$ (A New Measure Of Dependence: Integrated R2) to assess how much a random vector $\mathbf{X}$ explains a univariate response $Y$. Let $Y$ be a random variable and $\mathbf{X} = (X_1, \cdots, X_p)$ a random vector defined on the same probability space. Let $\mu$ be the probability law of $Y$, and $S$ be the support of $\mu$. Define:

$$ \tilde{S} = \begin{cases} S \setminus \{s_{\max}\} & \text{if } S \text{ has a maximum } s_{\max} \\ S & \text{otherwise} \end{cases} $$

We define the measure $\tilde{\mu}$ on $S$ as:

$$ \tilde{\mu}(A) = \frac{\mu(A \cap \tilde{S})}{\mu(\tilde{S})}, \quad \text{for measurable } A \subseteq S $$

Then the irdc dependence coefficient is defined as:

$$ \nu(Y, \mathbf{X}) := \int \frac{\mathrm{Var}(\mathbb{E}[\mathbf{1}{Y > t} \mid \mathbf{X}])}{\mathrm{Var}(\mathbf{1}{Y > t})} d\tilde{\mu}(t) $$

In contrast, A Simple Measure Of Conditional Dependence consider:

$$ T(Y, \mathbf{X}) = \frac{\int \mathrm{Var}(\mathbb{E}[\mathbf{1}{Y \ge t} \mid \mathbf{X}]) d\mu(t)}{\int \mathrm{Var}(\mathbf{1}{Y \ge t}) d\mu(t)} $$

Continuous Case

n <- 1000
x <- matrix(runif(n * 3), nrow = n)
y <- (x[, 1] + x[, 2]) %% 1

irdc(y, x[, 1])
#> [1] 0.001002072
irdc(y, x[, 2])
#> [1] 0.04123161
irdc(y, x[, 3])
#> [1] 0.003291506

Discrete Case

Example 1

n <- 10000
s <- 0.1
x1 <- c(rep(0, n * s), runif(n * (1 - s)))
x2 <- runif(n)
y <- x1

irdc(y, x1, dist.type.X = "discrete")
#> [1] 0.9441587
irdc(y, x2)
#> [1] -0.01085533

Example 2

n <- 10000
x1 <- runif(n)
y1 <- rbinom(n, 1, 0.5)
y2 <- as.numeric(x1 >= 0.5)

irdc(y1, x1, dist.type.X = "discrete")
#> [1] -0.4999146
irdc(y2, x1, dist.type.X = "discrete")
#> [1] 0.003289474

FOCI::codec(y1, x1)
#> [1] -0.006410306
FOCI::codec(y2, x1)
#> [1] 1

Example 3: Hurdle vs Gamma Mixture

r_hurdle_poisson <- function(n, p_zero = 0.3, lambda = 2) {
  is_zero <- rbinom(n, 1, p_zero)
  rztpois <- function(m, lambda) {
    samples <- numeric(m)
    for (i in 1:m) {
      repeat {
        x <- rpois(1, lambda)
        if (x > 0) {
          samples[i] <- x
          break
        }
      }
    }
    samples
  }
  result <- numeric(n)
  result[is_zero == 0] <- rztpois(sum(is_zero == 0), lambda)
  result
}

set.seed(123)
n <- 1000
p_zero <- 0.4
lambda <- 10

hurdle <- r_hurdle_poisson(n, p_zero, lambda)
gamma_mix <- c(rep(0, round(p_zero * n)), rgamma(round((1 - p_zero) * n), shape = lambda, rate = 1))

df <- data.frame(
  value = c(hurdle, gamma_mix),
  source = rep(c("Hurdle Poisson", "Gamma Mixture"), each = n)
)

ggplot(df, aes(x = value, fill = source)) +
  geom_histogram(alpha = 0.5, position = "identity", bins = 40) +
  labs(title = "Comparison: Hurdle Poisson vs Gamma Mixture",
       x = "Value", y = "Count", fill = "Distribution") +
  theme_bw()

plot of chunk hurdle-vs-gamma

Example 3 Continued

x1 <- sort(gamma_mix)
y1 <- rbinom(n, 1, 0.5)
y2 <- sort(hurdle)

irdc(y1, x1, dist.type.X = "discrete")
#> [1] -0.5095727
irdc(y2, x1, dist.type.X = "discrete")
#> [1] 0.5443523

FOCI::codec(y1, x1)
#> [1] 0.04361745
FOCI::codec(y2, x1)
#> [1] 0.9969469

Example 4

x1 <- sort(hurdle)
y1 <- rbinom(n, 1, 0.5)
y2 <- sort(gamma_mix)

irdc(y1, x1, dist.type.X = "discrete")
#> [1] -0.5030198
irdc(y2, x1, dist.type.X = "discrete")
#> [1] 0.6265961

FOCI::codec(y1, x1)
#> [1] -0.02403687
FOCI::codec(y2, x1)
#> [1] 0.9450425

Conclusion

irdc provides a flexible and theoretically grounded dependence measure that works for both continuous and discrete predictors.

For further theoretical details, see our paper:
Azadkia and Roudaki (2025), A New Measure Of Dependence: Integrated R2

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

irdc-demo