The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Hennepin example

library(shellgame)
library(geoDeltaAudit)
library(dplyr)
library(stringr)
library(janitor)

# vignette-only dependency; keep in Suggests
if (!requireNamespace("readr", quietly = TRUE)) {
  stop("Package 'readr' is required to run this vignette. Install it with install.packages('readr').")
}

Introduction

This vignette demonstrates a complete transformation audit using Hennepin County, Minnesota as an example. We’ll track total population through the transformation chain:

ZCTA → ZIP → COUNTY

And reveal the shell game: same column name (“population”), different underlying quantity (observed → imputed).

The Workflow

Step 1: Prepare the Data

For this example, we’ll use the data you would typically prepare:

acs_path <- system.file("extdata", "toy_acs_zcta_hennepin.csv", package = "geoDeltaAudit")
hud_path <- system.file("extdata", "toy_zip_county_hud_hennepin.csv", package = "geoDeltaAudit")

stopifnot(nchar(acs_path) > 0, nchar(hud_path) > 0)

acs <- readr::read_csv(acs_path, show_col_types = FALSE) |>
  janitor::clean_names() |>
  dplyr::mutate(zcta = stringr::str_pad(as.character(.data$zcta), 5, pad = "0"))

hud <- readr::read_csv(hud_path, show_col_types = FALSE) |>
  janitor::clean_names()

# Toy assoc: 1:1 ZCTA -> ZIP so the example always runs
assoc <- acs |>
  dplyr::distinct(.data$zcta) |>
  dplyr::transmute(zcta = .data$zcta, zip = .data$zcta) |>
  dplyr::distinct()

list(
  acs_rows = nrow(acs),
  assoc_rows = nrow(assoc),
  hud_rows = nrow(hud)
)

Step 2: Run the Audit

# example only (not executed during vignette build)
result <- shellgame::evaluate_transformation(
  data = acs,
  zip_zcta_map = assoc,
  hud_crosswalk = hud,
  geo_col = "zcta",
  var_col = "pop"
)

Step 3: View Results

# Print summary
summary(result)

Membership Visualization

Note: The following graphics are pre-rendered from the configured Hennepin County example dataset to illustrate the spatial relationships being audited.

=== The Shell Game: Transformation Audit ===


Variable: population 
Target County: 27053 

--- Baseline (Observed Data) ---
  Units: 74 ZCTAs
  Total: 1,391,557 

--- After Transformation (Imputed Data) ---
  Intermediate:  98  ZIPs
  Recovered: 1,216,874 

--- The Shell Game Result ---
  Perturbation: -174,683 (-12.6%)

  Same column name.
  Different underlying quantity.
  That's the shell game.

--- Pre-Allocation Expansion ---
  74 ZCTAs → 98 ZIPs (+32.4%)
  This happens BEFORE any allocation or weighting.
  The analytical surface has already shifted.

--- Top Counties Receiving Perturbed Population ---
  27003: 30,535
  27139: 25,268
  27123: 21,835
  27171: 14,391
  27059: 9,526

Baseline: 74 ZCTAs

The analysis begins with 74 ZCTAs that have a relationship-based membership with Hennepin County. These are the ZCTAs used by the Census Bureau in ACS tabulations.

Total population: 1,391,557 (directly observed from ACS)

## The First Hop: ZCTA → ZIP

When we associate these 74 ZCTAs with ZIP codes:

Result: 74 ZCTAs become 98 ZIPs (+32.4%)

This happens before any allocation. The analytical surface has already shifted.

The Second Hop: ZIP → County

Using HUD’s TOT_RATIO, we allocate ZIP-level population to counties.

Result: Population recovered for Hennepin County: 1,216,874

The Perturbation

174,683 people (-12.6%) disappeared in the transformation.

Where did they go? To neighboring counties:

extract_perturbed_population(result, top_n = 5)

Geometric vs Relationship Membership

If we used geometric intersection instead of relationship-based membership, we would have 94 ZCTAs, not 74.

This is Decision #1: How do we define membership?

The 20 extra ZCTAs (shown in grey) intersect the county boundary geometrically but are not included in the relationship-based membership used by ACS. # Visualizing the Difference

The baseline: 74 ZCTAs with relationship-based membership.

The difference: Grey areas show ZCTAs that appear only under geometric intersection.

The Shell Game Revealed

# Normalize expected fields from geoDeltaAudit::audit_transform()
baseline_total <- as.numeric(audit_result$baseline_total)
final_total <- as.numeric(audit_result$final_total)

# delta is already provided; compute if missing
delta <- if (!is.null(audit_result$delta)) {
  as.numeric(audit_result$delta)
} else {
  final_total - baseline_total
}

absolute_perturbation <- abs(delta)

Same column name: “population”
Different underlying quantity: observed → imputed

That’s the shell game.

Why This Matters

This error is agnostic to:

** Transformation is the cause, not the tool or variable.**

Next Steps

See vignette("data-preparation") for how to prepare your own data. See vignette("conceptual_framework-shell-game") for the conceptual explanation.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.