Getting Started with citestR

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

citestR is a lightweight R client for the citest Python package. It lets you run the conditional-independence-of-missingness test without using reticulate at runtime — all communication happens over a local HTTP connection to a FastAPI server that wraps the Python package.

1. A small executable example

The chunk below runs during vignette build. It uses mocked HTTP responses, so it exercises the package interface without requiring a live Python backend.

mock_json_response <- function(body, status = 200L) {
  function(req) {
    httr2::response(
      status_code = status,
      headers = list("Content-Type" = "application/json"),
      body = charToRaw(jsonlite::toJSON(body, auto_unbox = TRUE))
    )
  }
}

pkg_env <- citestR:::.pkg_env
old_process <- pkg_env$process
old_port <- pkg_env$port
old_base_url <- pkg_env$base_url
on.exit({
  pkg_env$process <- old_process
  pkg_env$port <- old_port
  pkg_env$base_url <- old_base_url
}, add = TRUE)

pkg_env$process <- list(is_alive = function() TRUE)
pkg_env$port <- 9999L
pkg_env$base_url <- "http://127.0.0.1:9999"

# Mock a ci_test response
mock_fit <- mock_json_response(list(
  model_id = "test-001",
  dataset_id = "ds-001",
  results = list(
    m = 0.12,
    B = 0.04,
    W_bar = 0.01,
    T = 0.05,
    t_k = 2.5,
    p_k = 0.017,
    p_2s = 0.034,
    df = 9
  )
))

example_data <- data.frame(
  Y  = c(1.2, -0.4, 0.7, 0.3, -1.1),
  X1 = c(NA, 0.5, -1.1, 0.8, NA),
  X2 = c(0.3, 1.4, -0.2, 0.6, 0.9)
)

result <- httr2::with_mocked_responses(mock_fit, {
  ci_test(example_data, y = "Y", m = 2L, n_folds = 2L)
})

result
#> 
#>   Conditional Independence of Missingness Test
#> 
#>   Mean diff in BCE: 0.12 
#>   t-statistic:     2.5 
#>   df:              9 
#>   p-value:         0.034 (two-sided)
#> 
#>   Result: Evidence against conditional independence (p < 0.05)

2. Install the Python backend

If you don’t already have a Python environment with citest installed, the package provides a helper:

library(citestR)

# Creates a virtualenv called "citest_env" and installs the citest API backend
install_backend(method = "pip")

You only need to do this once.

3. Run a test

library(citestR)

# Example data frame with some missing values
set.seed(42)
n <- 500
df <- data.frame(
  Y  = rnorm(n),
  X1 = rnorm(n),
  X2 = rnorm(n),
  X3 = rnorm(n)
)
# Introduce MAR missingness on X2
df$X2[df$X1 > 0.5] <- NA

# Run the CI test (server starts automatically)
result <- ci_test(
  data       = df,
  y          = "Y",
  imputer    = "iterative",
  classifier = "rf",
  m          = 5L,
  n_folds    = 5L
)

result$results

The first call starts the Python server in the background; subsequent calls reuse the running process.

4. Retrieve a summary

summary_info <- get_summary(result$model_id)
summary_info

5. Imputer diagnostics

r2 <- imputer_r2(result$model_id, mask_frac = 0.2, m_eval = 1L)
r2$mean_r2
r2$per_variable

6. Sensitivity calibration

# Single kappa value
compute_kappa(r2_x_z = 0.5, beta_yx = 0.3, gamma_x = 0.2)

# Full calibration table
cal <- kappa_calibration_table()
head(cal)

# Pivot for a fixed beta
calibration_pivot(beta_yx = 0.3)

7. Simulated datasets

sim <- simulate_data("single_mar", n = 300, ci = TRUE)
sim$dataset_id
sim$pct_missing

8. Stopping the server

The server shuts down automatically when the R session ends. To stop it manually:

stop_server()

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.