Hennepin example

library(shellgame)
library(geoDeltaAudit)
library(dplyr)
library(stringr)
library(janitor)

# vignette-only dependency; keep in Suggests
if (!requireNamespace("readr", quietly = TRUE)) {
  stop("Package 'readr' is required to run this vignette. Install it with install.packages('readr').")
}

Introduction

This vignette demonstrates a complete transformation audit using Hennepin County, Minnesota as an example. We’ll track total population through the transformation chain:

ZCTA → ZIP → COUNTY

And reveal the shell game: same column name (“population”), different underlying quantity (observed → imputed).

The Workflow

Step 1: Prepare the Data

For this example, we’ll use the data you would typically prepare:

acs_path <- system.file("extdata", "toy_acs_zcta_hennepin.csv", package = "geoDeltaAudit")
hud_path <- system.file("extdata", "toy_zip_county_hud_hennepin.csv", package = "geoDeltaAudit")

stopifnot(nchar(acs_path) > 0, nchar(hud_path) > 0)

acs <- readr::read_csv(acs_path, show_col_types = FALSE) |>
  janitor::clean_names() |>
  dplyr::mutate(zcta = stringr::str_pad(as.character(.data$zcta), 5, pad = "0"))

hud <- readr::read_csv(hud_path, show_col_types = FALSE) |>
  janitor::clean_names()

# Toy assoc: 1:1 ZCTA -> ZIP so the example always runs
assoc <- acs |>
  dplyr::distinct(.data$zcta) |>
  dplyr::transmute(zcta = .data$zcta, zip = .data$zcta) |>
  dplyr::distinct()

list(
  acs_rows = nrow(acs),
  assoc_rows = nrow(assoc),
  hud_rows = nrow(hud)
)

Step 2: Run the Audit

# example only (not executed during vignette build)
result <- shellgame::evaluate_transformation(
  data = acs,
  zip_zcta_map = assoc,
  hud_crosswalk = hud,
  geo_col = "zcta",
  var_col = "pop"
)

Step 3: View Results

# Print summary
summary(result)

Membership Visualization

Note: The following graphics are pre-rendered from the configured Hennepin County example dataset to illustrate the spatial relationships being audited.

=== The Shell Game: Transformation Audit ===


Variable: population 
Target County: 27053 

--- Baseline (Observed Data) ---
  Units: 74 ZCTAs
  Total: 1,391,557 

--- After Transformation (Imputed Data) ---
  Intermediate:  98  ZIPs
  Recovered: 1,216,874 

--- The Shell Game Result ---
  Perturbation: -174,683 (-12.6%)

  Same column name.
  Different underlying quantity.
  That's the shell game.

--- Pre-Allocation Expansion ---
  74 ZCTAs → 98 ZIPs (+32.4%)
  This happens BEFORE any allocation or weighting.
  The analytical surface has already shifted.

--- Top Counties Receiving Perturbed Population ---
  27003: 30,535
  27139: 25,268
  27123: 21,835
  27171: 14,391
  27059: 9,526

Baseline: 74 ZCTAs

The analysis begins with 74 ZCTAs that have a relationship-based membership with Hennepin County. These are the ZCTAs used by the Census Bureau in ACS tabulations.

Total population: 1,391,557 (directly observed from ACS)

## The First Hop: ZCTA → ZIP

When we associate these 74 ZCTAs with ZIP codes:

Result: 74 ZCTAs become 98 ZIPs (+32.4%)

This happens before any allocation. The analytical surface has already shifted.

The Second Hop: ZIP → County

Using HUD’s TOT_RATIO, we allocate ZIP-level population to counties.

Result: Population recovered for Hennepin County: 1,216,874

The Perturbation

174,683 people (-12.6%) disappeared in the transformation.

Where did they go? To neighboring counties:

extract_perturbed_population(result, top_n = 5)

Geometric vs Relationship Membership

If we used geometric intersection instead of relationship-based membership, we would have 94 ZCTAs, not 74.

This is Decision #1: How do we define membership?

The 20 extra ZCTAs (shown in grey) intersect the county boundary geometrically but are not included in the relationship-based membership used by ACS. # Visualizing the Difference

The baseline: 74 ZCTAs with relationship-based membership.

The difference: Grey areas show ZCTAs that appear only under geometric intersection.

The Shell Game Revealed

# Normalize expected fields from geoDeltaAudit::audit_transform()
baseline_total <- as.numeric(audit_result$baseline_total)
final_total <- as.numeric(audit_result$final_total)

# delta is already provided; compute if missing
delta <- if (!is.null(audit_result$delta)) {
  as.numeric(audit_result$delta)
} else {
  final_total - baseline_total
}

absolute_perturbation <- abs(delta)

Same column name: “population”
Different underlying quantity: observed → imputed

That’s the shell game.

Why This Matters

This error is agnostic to:

** Transformation is the cause, not the tool or variable.**

Next Steps

See vignette("data-preparation") for how to prepare your own data. See vignette("conceptual_framework-shell-game") for the conceptual explanation.