Repository Mirror for your Cloud Server and Webhosting

Title:

Optimal Pairing and Matching via Linear Assignment

Version:

1.0.10

Description:

Solves optimal pairing and matching problems using linear assignment algorithms. Provides implementations of the Hungarian method (Kuhn 1955) <doi:10.1002/nav.3800020109>, Jonker-Volgenant shortest path algorithm (Jonker and Volgenant 1987) <doi:10.1007/BF02278710>, Auction algorithm (Bertsekas 1988) <doi:10.1007/BF02186476>, cost-scaling (Goldberg and Kennedy 1995) <doi:10.1007/BF01585996>, scaling algorithms (Gabow and Tarjan 1989) <doi:10.1137/0218069>, push-relabel (Goldberg and Tarjan 1988) <doi:10.1145/48014.61051>, and Sinkhorn entropy-regularized transport (Cuturi 2013) <doi:10.48550/arxiv.1306.0895>. Designed for matching plots, sites, samples, or any pairwise optimization problem. Supports rectangular matrices, forbidden assignments, data frame inputs, batch solving, k-best solutions, and pixel-level image morphing for visualization. Includes automatic preprocessing with variable health checks, multiple scaling methods (standardized, range, robust), greedy matching algorithms, and comprehensive balance diagnostics for assessing match quality using standardized differences and distribution comparisons.

License:

MIT + file LICENSE

Language:

en-US

Encoding:

UTF-8

RoxygenNote:

7.3.3

Depends:

R (≥ 4.1.0)

Imports:

Rcpp (≥ 1.0.0), tibble (≥ 3.0.0), dplyr (≥ 1.0.0), rlang (≥ 0.4.0), purrr (≥ 0.3.0), magrittr (≥ 2.0.0), methods

Suggests:

testthat (≥ 3.0.0), xml2, e1071, R.utils, microbenchmark, withr, knitr, rmarkdown, bench, parallel, future (≥ 1.20.0), future.apply (≥ 1.8.0), ggplot2, ggraph, tidygraph, magick, OpenImageR, farver, av, reticulate, png, combinat

LinkingTo:

Rcpp, RcppEigen, testthat

SystemRequirements:

C++17

LazyData:

true

VignetteBuilder:

knitr

URL:

https://gillescolling.com/couplr/, https://github.com/gcol33/couplr

BugReports:

https://github.com/gcol33/couplr/issues

Config/testthat/edition:

Config/testthat/parallel:

true

NeedsCompilation:

yes

Packaged:

2026-01-21 11:48:44 UTC; Gilles Colling

Author:

Gilles Colling [aut, cre, cph]

Maintainer:

Gilles Colling <gilles.colling051@gmail.com>

Repository:

CRAN

Date/Publication:

2026-01-21 13:00:02 UTC

couplr: Optimal Pairing and Matching via Linear Assignment

Description

Solves optimal pairing and matching problems using linear assignment algorithms. Provides implementations of the Hungarian method (Kuhn 1955) doi:10.1002/nav.3800020109, Jonker-Volgenant shortest path algorithm (Jonker and Volgenant 1987) doi:10.1007/BF02278710, Auction algorithm (Bertsekas 1988) doi:10.1007/BF02186476, cost-scaling (Goldberg and Kennedy 1995) doi:10.1007/BF01585996, scaling algorithms (Gabow and Tarjan 1989) doi:10.1137/0218069, push-relabel (Goldberg and Tarjan 1988) doi:10.1145/48014.61051, and Sinkhorn entropy-regularized transport (Cuturi 2013) doi:10.48550/arxiv.1306.0895. Designed for matching plots, sites, samples, or any pairwise optimization problem. Supports rectangular matrices, forbidden assignments, data frame inputs, batch solving, k-best solutions, and pixel-level image morphing for visualization. Includes automatic preprocessing with variable health checks, multiple scaling methods (standardized, range, robust), greedy matching algorithms, and comprehensive balance diagnostics for assessing match quality using standardized differences and distribution comparisons.

Solves optimal pairing and matching problems using linear assignment algorithms. Designed for matching plots, sites, samples, or any pairwise optimization problem. Provides modern, tidy implementations of 'Hungarian', 'Jonker-Volgenant', 'Auction', and other LAP solvers.

Main functions

lap_solve: Solve single assignment problems
lap_solve_batch: Solve multiple problems efficiently
lap_solve_kbest: Find k-best optimal solutions

Author(s)

Maintainer: Gilles Colling gilles.colling051@gmail.com [copyright holder]

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).

Large value for forbidden pairs

Description

A numeric constant used to mark forbidden pairs in cost matrices.

Usage

BIG_COST

Format

Numeric value (half of .Machine$double.xmax).

Apply all constraints to cost matrix

Description

Main entry point for applying constraints.

Usage

apply_all_constraints(
  cost_matrix,
  left,
  right,
  vars,
  max_distance = Inf,
  calipers = NULL,
  forbidden = NULL
)

Value

Modified cost matrix with all constraints applied.

Apply caliper constraints

Description

Calipers impose per-variable maximum absolute differences.

Usage

apply_calipers(cost_matrix, left, right, calipers, vars)

Value

Modified cost matrix with forbidden pairs marked.

Apply maximum distance constraint

Description

Apply maximum distance constraint

Usage

apply_max_distance(cost_matrix, max_distance = Inf)

Value

Modified cost matrix with forbidden pairs marked.

Apply scaling to matching variables

Description

Apply scaling to matching variables

Usage

apply_scaling(left_mat, right_mat, method = "standardize")

Value

List with scaled left/right matrices and scaling parameters.

Apply weights to matching variables

Description

Apply weights to matching variables

Usage

apply_weights(mat, weights)

Value

Numeric matrix with columns weighted.

Convert assignment result to a binary matrix

Description

Turns a tidy assignment result back into a 0/1 assignment matrix.

Usage

as_assignment_matrix(x, n_sources = NULL, n_targets = NULL)

Arguments

x

An assignment result object of class lap_solve_result

n_sources

Number of source nodes, optional

n_targets

Number of target nodes, optional

Value

Integer matrix with 0 and 1 entries

Assign blocks using clustering

Description

Assign blocks using clustering

Usage

assign_blocks_cluster(left, right, block_vars, method, n_blocks, ...)

Value

List with modified left/right data frames (with block_id) and n_blocks_initial.

Assign blocks based on grouping variable(s)

Description

Assign blocks based on grouping variable(s)

Usage

assign_blocks_group(left, right, block_by)

Value

List with modified left/right data frames (with block_id) and n_blocks_initial.

Linear assignment solver

Description

Solve the linear assignment problem (minimum- or maximum-cost matching) using several algorithms. Forbidden edges can be marked as NA or Inf.

Usage

assignment(
  cost,
  maximize = FALSE,
  method = c("auto", "jv", "hungarian", "auction", "auction_gs", "auction_scaled", "sap",
    "ssp", "csflow", "hk01", "bruteforce", "ssap_bucket", "cycle_cancel", "gabow_tarjan",
    "lapmod", "csa", "ramshaw_tarjan", "push_relabel", "orlin", "network_simplex"),
  auction_eps = NULL,
  eps = NULL
)

Arguments

cost

Numeric matrix; rows = tasks, columns = agents. NA or Inf entries are treated as forbidden assignments.

maximize

Logical; if TRUE, maximizes the total cost instead of minimizing.

method

Character string indicating the algorithm to use. Options:

General-purpose solvers:

"auto" — Automatic selection based on problem characteristics (default)
"jv" — 'Jonker-Volgenant', fast general-purpose O(n³)
"hungarian" — Classic 'Hungarian' algorithm O(n³)

Auction-based solvers:

"auction" — 'Bertsekas' auction with adaptive epsilon
"auction_gs" — 'Gauss-Seidel' variant, good for spatial structure
"auction_scaled" — 'Epsilon-scaling', fastest for large dense problems

Specialized solvers:

"sap" / "ssp" — Shortest augmenting path, handles sparsity well
"lapmod" — Sparse JV variant, faster when >50\
"hk01" — 'Hopcroft-Karp' for binary (0/1) costs only
"ssap_bucket" — 'Dial' algorithm for integer costs
"line_metric" — O(n log n) for 1D assignment problems
"bruteforce" — Exact enumeration for tiny problems (n <= 8)

Advanced solvers:

"csa" — 'Goldberg-Kennedy' cost-scaling, often fastest for medium-large
"gabow_tarjan" — 'Gabow-Tarjan' bit-scaling with complementary slackness O(n³ log C)
"cycle_cancel" — Cycle-canceling with 'Karp' algorithm
"csflow" — Cost-scaling network flow
"network_simplex" — 'Network simplex' with spanning tree representation
"orlin" — 'Orlin-Ahuja' scaling O(sqrt(n) * m * log(nC))
"push_relabel" — 'Push-relabel' max-flow based solver
"ramshaw_tarjan" — 'Ramshaw-Tarjan', optimized for rectangular matrices (n != m)

auction_eps

Optional numeric epsilon for the 'Auction'/'Auction-GS' methods. If NULL, an internal default (e.g., 1e-9) is used.

eps

Deprecated. Use auction_eps. If provided and auction_eps is NULL, its value is used for auction_eps.

Details

method = "auto" selects an algorithm based on problem size/shape and data characteristics:

Very small (n <= 8): "bruteforce" — exact enumeration
Binary/constant costs: "hk01" — specialized for 0/1 costs
Large sparse (n>100, >50\
Sparse or very rectangular: "sap" — handles sparsity well
Small-medium (8 < n <= 50): "hungarian" — provides exact dual solutions
Medium (50 < n <= 75): "jv" — fast general-purpose solver
Large (n>75): "auction_scaled" — fastest for large dense problems

Benchmarks show 'Auction-scaled' and 'JV' are 100-1500x faster than 'Hungarian' at n=500.

Value

An object of class lap_solve_result, a list with elements:

match — integer vector of length min(nrow(cost), ncol(cost)) giving the assigned column for each row (0 if unassigned).
total_cost — numeric scalar, the objective value.
status — character scalar, e.g. "optimal".
method_used — character scalar, the algorithm actually used.

Examples

cost <- matrix(c(4,2,5, 3,3,6, 7,5,4), nrow = 3, byrow = TRUE)
res  <- assignment(cost)
res$match; res$total_cost

Solve assignment problem and return dual variables

Description

Solves the linear assignment problem and returns dual potentials (u, v) in addition to the optimal matching. The dual variables provide an optimality certificate and enable sensitivity analysis.

Usage

assignment_duals(cost, maximize = FALSE)

Arguments

cost

Numeric matrix; rows = tasks, columns = agents. NA or Inf entries are treated as forbidden assignments.

maximize

Logical; if TRUE, maximizes the total cost instead of minimizing.

Details

The dual variables satisfy the complementary slackness conditions:

For minimization: u[i] + v[j] <= cost[i,j] for all (i,j)
For any assigned pair (i,j): u[i] + v[j] = cost[i,j]

This implies that sum(u) + sum(v) = total_cost (strong duality).

Applications of dual variables:

Optimality verification: Check that duals satisfy constraints
Sensitivity analysis: Reduced cost c[i,j] - u[i] - v[j] shows how much an edge cost must decrease before it enters the solution
Pricing in column generation: Use duals to price new columns
Warm starting: Reuse duals when costs change slightly

Value

A list with class "assignment_duals_result" containing:

match - integer vector of column assignments (1-based)
total_cost - optimal objective value
u - numeric vector of row dual variables (length n)
v - numeric vector of column dual variables (length m)
status - character, e.g. "optimal"

Examples

cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3, byrow = TRUE)
result <- assignment_duals(cost)

# Check optimality: u + v should equal cost for assigned pairs
for (i in 1:3) {
  j <- result$match[i]
  cat(sprintf("Row %d -> Col %d: u + v = %.2f, cost = %.2f\n",
              i, j, result$u[i] + result$v[j], cost[i, j]))
}

# Verify strong duality
cat("sum(u) + sum(v) =", sum(result$u) + sum(result$v), "\n")
cat("total_cost =", result$total_cost, "\n")

# Reduced costs (how much must cost decrease to enter solution)
reduced <- outer(result$u, result$v, "+")
reduced_cost <- cost - reduced
print(round(reduced_cost, 2))

Generic Augment Function

Description

S3 generic for augmenting model results with original data.

Usage

augment(x, ...)

Arguments

x

An object to augment

...

Additional arguments passed to methods

Value

Augmented data (depends on method)

Augment Matching Results with Original Data (broom-style)

Description

S3 method for augmenting matching results following the broom package conventions. This is a thin wrapper around join_matched() with sensible defaults for quick exploration.

Usage

## S3 method for class 'matching_result'
augment(x, left, right, ...)

Arguments

x

A matching_result object

left

The original left dataset

right

The original right dataset

...

Additional arguments passed to join_matched()

Details

This method follows the augment() convention from the broom package, making it easy to integrate couplr into tidymodels workflows. It's equivalent to calling join_matched() with default parameters.

If the broom package is not loaded, you can use couplr::augment() to access this function.

Value

A tibble with matched pairs and original data (see join_matched())

Examples

left <- data.frame(
  id = 1:5,
  treatment = 1,
  age = c(25, 30, 35, 40, 45)
)

right <- data.frame(
  id = 6:10,
  treatment = 0,
  age = c(24, 29, 36, 41, 44)
)

result <- match_couples(left, right, vars = "age")
couplr::augment(result, left, right)

Automatically encode categorical variables

Description

Converts categorical variables to numeric representations suitable for matching. Currently supports binary variables (0/1) and ordered factors.

Usage

auto_encode_categorical(left, right, var)

Arguments

left

Data frame of left units

right

Data frame of right units

var

Variable name to encode

Value

List with encoded left and right columns, plus encoding metadata

Balance Diagnostics for Matched Pairs

Description

Computes comprehensive balance statistics comparing the distribution of matching variables between left and right units in the matched sample.

Usage

balance_diagnostics(
  result,
  left,
  right,
  vars = NULL,
  left_id = "id",
  right_id = "id"
)

Arguments

result

A matching result object from match_couples() or greedy_couples()

left

Data frame of left units

right

Data frame of right units

vars

Character vector of variable names to check balance for. Defaults to the variables used in matching (if available in result).

left_id

Character, name of ID column in left data (default: "id")

right_id

Character, name of ID column in right data (default: "id")

Details

This function computes several balance metrics:

Standardized Difference: The difference in means divided by the pooled standard deviation. Values less than 0.1 indicate excellent balance, 0.1-0.25 good balance.

Variance Ratio: The ratio of standard deviations (left/right). Values close to 1 are ideal.

KS Statistic: Kolmogorov-Smirnov test statistic comparing distributions. Lower values indicate more similar distributions.

Overall Metrics include mean absolute standardized difference across all variables, proportion of variables with large imbalance (|std diff| > 0.25), and maximum standardized difference.

Value

An S3 object of class balance_diagnostics containing:

var_stats: Tibble with per-variable balance statistics
overall: List with overall balance metrics
pairs: Tibble of matched pairs with variables
n_matched: Number of matched pairs
n_unmatched_left: Number of unmatched left units
n_unmatched_right: Number of unmatched right units
method: Matching method used
has_blocks: Whether blocking was used
block_stats: Per-block statistics (if blocking used)

Examples

# Create sample data
set.seed(123)
left <- data.frame(
  id = 1:10,
  age = rnorm(10, 45, 10),
  income = rnorm(10, 50000, 15000)
)
right <- data.frame(
  id = 11:30,
  age = rnorm(20, 47, 10),
  income = rnorm(20, 52000, 15000)
)

# Match
result <- match_couples(left, right, vars = c("age", "income"))

# Get balance diagnostics
balance <- balance_diagnostics(result, left, right, vars = c("age", "income"))
print(balance)

# Get balance table
balance_table(balance)

Create Balance Table

Description

Formats balance diagnostics into a clean table for display or export.

Usage

balance_table(balance, digits = 3)

Arguments

balance

A balance_diagnostics object from balance_diagnostics()

digits

Number of decimal places for rounding (default: 3)

Value

A tibble with formatted balance statistics

Solve the Bottleneck Assignment Problem

Description

Finds an assignment that minimizes (or maximizes) the maximum edge cost in a perfect matching. Unlike standard LAP which minimizes the sum of costs, BAP minimizes the maximum (bottleneck) cost.

Usage

bottleneck_assignment(cost, maximize = FALSE)

Arguments

cost

Numeric matrix; rows = tasks, columns = agents. NA or Inf entries are treated as forbidden assignments.

maximize

Logical; if TRUE, maximizes the minimum edge cost instead of minimizing the maximum (maximin objective). Default is FALSE (minimax).

Details

The Bottleneck Assignment Problem (BAP) is a variant of the Linear Assignment Problem where instead of minimizing the sum of assignment costs, we minimize the maximum cost among all assignments (minimax objective).

Algorithm: Uses binary search on the sorted unique costs combined with 'Hopcroft-Karp' bipartite matching to find the minimum threshold that allows a perfect matching.

Complexity: O(E * sqrt(V) * log(unique costs)) where E = edges, V = vertices.

Applications:

Task scheduling with deadline constraints (minimize latest completion)
Resource allocation (minimize maximum load/distance)
Network routing (minimize maximum link utilization)
Fair division problems (minimize maximum disparity)

Value

A list with class "bottleneck_result" containing:

match - integer vector of length nrow(cost) giving the assigned column for each row (1-based indexing)
bottleneck - numeric scalar, the bottleneck (max/min edge) value
status - character scalar, e.g. "optimal"

Examples

# Simple example: minimize max cost
cost <- matrix(c(1, 5, 3,
                 2, 4, 6,
                 7, 1, 2), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(cost)
result$bottleneck  # Maximum edge cost in optimal assignment

# Maximize minimum (fair allocation)
profits <- matrix(c(10, 5, 8,
                    6, 12, 4,
                    3, 7, 11), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(profits, maximize = TRUE)
result$bottleneck  # Minimum profit among all assignments

# With forbidden assignments
cost <- matrix(c(1, NA, 3,
                 2, 4, Inf,
                 5, 1, 2), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(cost)

Build cost matrix for matching

Description

This is the main entry point for distance computation.

Usage

build_cost_matrix(
  left,
  right,
  vars,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE
)

Value

Numeric matrix of distances with optional scaling/weights applied.

Calculate Variable-Level Balance Statistics

Description

Calculate Variable-Level Balance Statistics

Usage

calculate_var_balance(left_vals, right_vals, var_name)

Arguments

left_vals

Numeric vector of values from left group

right_vals

Numeric vector of values from right group

var_name

Character, name of the variable

Value

List with balance statistics for this variable

Check if parallel processing is available

Description

Check if parallel processing is available

Usage

can_parallelize()

Value

Logical indicating if future package is available

Check cost distribution for problems

Description

Examines the distance matrix for common issues and provides helpful warnings.

Usage

check_cost_distribution(cost_matrix, threshold_zero = 1e-10, warn = TRUE)

Arguments

cost_matrix

Numeric matrix of distances

threshold_zero

Threshold for considering distance "zero" (default: 1e-10)

warn

If TRUE, issue warnings for problems found

Value

List with diagnostic information

Check if full matching was achieved

Description

Check if full matching was achieved

Usage

check_full_matching(result)

Value

No return value; throws error if unmatched units exist.

Check variable health for matching

Description

Analyzes variables for common problems that can affect matching quality: constant columns, high missingness, extreme skewness, and outliers.

Usage

check_variable_health(
  left,
  right,
  vars,
  high_missingness_threshold = 0.5,
  low_variance_threshold = 1e-06
)

Arguments

left

Data frame of left units

right

Data frame of right units

vars

Character vector of variable names to check

high_missingness_threshold

Threshold for high missingness warning (default: 0.5)

low_variance_threshold

Threshold for nearly-constant variables (default: 1e-6)

Value

A list with class "variable_health" containing:

summary: Tibble with per-variable diagnostics
issues: List of detected issues with severity levels
exclude_vars: Variables that should be excluded
warnings: Human-readable warnings

Compute pairwise distance matrix

Description

Compute pairwise distance matrix

Usage

compute_distance_matrix(left_mat, right_mat, distance = "euclidean")

Value

Numeric matrix of pairwise distances (n_left x n_right).

Compute and Cache Distance Matrix for Reuse

Description

Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.

Usage

compute_distances(
  left,
  right,
  vars,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE,
  auto_scale = FALSE,
  left_id = "id",
  right_id = "id",
  block_id = NULL
)

Arguments

left

Left dataset (data frame)

right

Right dataset (data frame)

vars

Character vector of variable names to use for distance computation

distance

Distance metric (default: "euclidean")

weights

Optional numeric vector of variable weights

scale

Scaling method: FALSE, "standardize", "range", or "robust"

auto_scale

Apply automatic preprocessing (default: FALSE)

left_id

Name of ID column in left (default: "id")

right_id

Name of ID column in right (default: "id")

block_id

Optional block ID column name for blocked matching

Details

This function computes distances once and stores them in a reusable object. The resulting distance_object can be passed to match_couples() or greedy_couples() instead of providing datasets and variables.

Benefits:

Performance: Avoid recomputing distances when trying different constraints
Exploration: Quickly test max_distance, calipers, or methods
Consistency: Ensures same distances used across comparisons
Memory efficient: Can use sparse matrices when many pairs are forbidden

The distance_object stores the original datasets, allowing downstream functions like join_matched() to work seamlessly.

Value

An S3 object of class "distance_object" containing:

cost_matrix: Numeric matrix of distances
left_ids: Character vector of left IDs
right_ids: Character vector of right IDs
block_id: Block ID column name (if specified)
metadata: List with computation details (vars, distance, scale, etc.)
original_left: Original left dataset (for later joining)
original_right: Original right dataset (for later joining)

Examples

# Compute distances once
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000)
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000)

dist_obj <- compute_distances(
  left, right,
  vars = c("age", "income"),
  scale = "standardize"
)

# Reuse for different matching strategies
result1 <- match_couples(dist_obj, max_distance = 0.5)
result2 <- match_couples(dist_obj, max_distance = 1.0)
result3 <- greedy_couples(dist_obj, strategy = "sorted")

# All use the same precomputed distances

Count valid pairs in cost matrix

Description

Count valid pairs in cost matrix

Usage

count_valid_pairs(cost_matrix)

Value

Integer count of valid (non-forbidden) pairs.

Get a themed emoji

Description

Get a themed emoji

Usage

couplr_emoji(
  type = c("error", "warning", "info", "success", "heart", "broken", "sparkles",
    "search", "chart", "warning_sign", "stop", "check")
)

Value

Character string with the emoji (or empty string if emoji disabled).

Info message with emoji

Description

Info message with emoji

Usage

couplr_inform(...)

Value

No return value, called for side effects (issues a message).

Couplr message helpers with emoji and humor

Description

Light, fun error/warning messages inspired by testthat, themed around coupling and matching. Makes errors less intimidating and more memorable.

Stop with a fun, themed error message

Description

Stop with a fun, themed error message

Usage

couplr_stop(..., call. = FALSE)

Value

No return value, throws an error.

Success message with emoji

Description

Success message with emoji

Usage

couplr_success(...)

Value

No return value, called for side effects (issues a message).

Warn with a fun, themed warning message

Description

Warn with a fun, themed warning message

Usage

couplr_warn(..., call. = FALSE)

Value

No return value, called for side effects (issues a warning).

Detect and validate blocking

Description

Detect and validate blocking

Usage

detect_blocking(left, right, block_id, ignore_blocks)

Value

List with use_blocking (logical) and block_col (character or NULL).

Diagnose distance matrix and suggest fixes

Description

Comprehensive diagnostics for a distance matrix with actionable suggestions.

Usage

diagnose_distance_matrix(
  cost_matrix,
  left = NULL,
  right = NULL,
  vars = NULL,
  warn = TRUE
)

Arguments

cost_matrix

Numeric matrix of distances

left

Left dataset (for variable checking)

right

Right dataset (for variable checking)

vars

Variables used for matching

warn

If TRUE, issue warnings

Value

List with diagnostic results and suggestions

Invalid parameter error

Description

Invalid parameter error

Usage

err_invalid_param(param, value, expected)

Value

No return value, throws an error.

Missing data error

Description

Missing data error

Usage

err_missing_data(dataset = "left")

Value

No return value, throws an error.

Missing variables error

Description

Missing variables error

Usage

err_missing_vars(vars, dataset = "left")

Value

No return value, throws an error.

All pairs forbidden error

Description

All pairs forbidden error

Usage

err_no_valid_pairs(reason = NULL)

Value

No return value, throws an error.

Example cost matrices for assignment problems

Description

Small example datasets for demonstrating couplr functionality across different assignment problem types: square, rectangular, sparse, and binary.

Usage

example_costs

Format

A list containing four example cost matrices:

simple_3x3: A 3x3 cost matrix with costs ranging from 2-7. Optimal assignment: row 1 -> col 2 (cost 2), row 2 -> col 1 (cost 3), row 3 -> col 3 (cost 4). Total optimal cost: 9.
rectangular_3x5: A 3x5 rectangular cost matrix demonstrating assignment when rows < columns. Each of 3 rows is assigned to one of 5 columns; 2 columns remain unassigned. Costs range 1-6.
sparse_with_na: A 3x3 matrix with NA values indicating forbidden assignments. Use this to test algorithms' handling of constraints. Position (1,3), (2,2), and (3,1) are forbidden.
binary_costs: A 3x3 matrix with binary (0/1) costs, suitable for testing the HK01 algorithm. Diagonal entries are 0 (preferred), off-diagonal entries are 1 (penalty).

Details

These matrices are designed to test different aspects of LAP solvers:

simple_3x3: Basic functionality test. Any correct solver should find total cost = 9.

rectangular_3x5: Tests handling of non-square problems. The optimal solution assigns all 3 rows with minimum total cost.

sparse_with_na: Tests constraint handling. Algorithms must avoid NA positions while finding an optimal assignment among valid entries.

binary_costs: Tests specialized binary cost algorithms. The optimal assignment uses all diagonal entries (total cost = 0).

Examples

# Simple 3x3 assignment
result <- lap_solve(example_costs$simple_3x3)
print(result)
# Optimal: sources 1,2,3 -> targets 2,1,3 with cost 9

# Rectangular problem (3 sources, 5 targets)
result <- lap_solve(example_costs$rectangular_3x5)
print(result)
# All 3 sources assigned; 2 targets unassigned

# Sparse problem with forbidden assignments
result <- lap_solve(example_costs$sparse_with_na)
print(result)
# Avoids NA positions

# Binary costs - test HK01 algorithm
result <- lap_solve(example_costs$binary_costs, method = "hk01")
print(result)
# Finds diagonal assignment (cost = 0)

Example assignment problem data frame

Description

A tidy data frame representation of assignment problems, suitable for use with grouped workflows and batch solving. Contains two independent 3x3 assignment problems in long format.

Usage

example_df

Format

A tibble with 18 rows and 4 columns:

sim: Simulation/problem identifier. Integer with values 1 or 2, distinguishing two independent assignment problems. Use with group_by(sim) for grouped solving.
source: Source node index. Integer 1-3 representing the row (source) in each 3x3 cost matrix.
target: Target node index. Integer 1-3 representing the column (target) in each 3x3 cost matrix.
cost: Cost of assigning source to target. Numeric values ranging from 1-7. Each source-target pair has exactly one cost entry.

Details

This dataset demonstrates couplr's data frame interface for LAP solving. The long format (one row per source-target pair) is converted internally to a cost matrix for solving.

Simulation 1: Costs from example_costs$simple_3x3

Optimal assignment: (1->2, 2->1, 3->3)
Total cost: 9

Simulation 2: Different cost structure

Optimal assignment: (1->1, 2->3, 3->3) or equivalent
Total cost: 4

Examples

library(dplyr)

# Solve both problems with grouped workflow
example_df |>
  group_by(sim) |>
  lap_solve(source, target, cost)

# Batch solving for efficiency
example_df |>
  group_by(sim) |>
  lap_solve_batch(source, target, cost)

# Inspect the data structure
example_df |>
  group_by(sim) |>
  summarise(
    n_pairs = n(),
    min_cost = min(cost),
    max_cost = max(cost)
  )

Extract and standardize IDs from data frames

Description

Extract and standardize IDs from data frames

Usage

extract_ids(df, prefix = "id")

Value

Character vector of IDs.

Extract matching variables from data frame

Description

Extract matching variables from data frame

Usage

extract_matching_vars(df, vars)

Value

Numeric matrix of matching variables.

Filter blocks based on size and balance criteria

Description

Filter blocks based on size and balance criteria

Usage

filter_blocks(
  left,
  right,
  min_left,
  min_right,
  drop_imbalanced,
  imbalance_threshold
)

Value

List with filtered left/right data frames and dropped block info.

Standardize block ID column name

Description

Standardize block ID column name

Usage

get_block_id_column(df)

Value

Character string with column name, or NULL if not found.

Extract method used from assignment result

Description

Extract method used from assignment result

Usage

get_method_used(x)

Arguments

x

An assignment result object

Value

Character string indicating method used

Extract total cost from assignment result

Description

Extract total cost from assignment result

Usage

get_total_cost(x)

Arguments

x

An assignment result object

Value

Numeric total cost

Greedy match blocks in parallel

Description

Greedy match blocks in parallel

Usage

greedy_blocks_parallel(
  blocks,
  left,
  right,
  left_ids,
  right_ids,
  block_col,
  vars,
  distance,
  weights,
  scale,
  max_distance,
  calipers,
  strategy,
  parallel = FALSE
)

Arguments

blocks

Vector of block IDs

left

Left dataset with block_col

right

Right dataset with block_col

left_ids

IDs from left

right_ids

IDs from right

block_col

Name of blocking column

vars

Variables for matching

distance

Distance metric

weights

Variable weights

scale

Scaling method

max_distance

Maximum distance

calipers

Caliper constraints

strategy

Greedy strategy

parallel

Whether to use parallel processing

Value

List with combined results from all blocks

Fast approximate matching using greedy algorithm

Description

Performs fast one-to-one matching using greedy strategies. Does not guarantee optimal total distance but is much faster than match_couples() for large datasets. Supports blocking, distance constraints, and various distance metrics.

Usage

greedy_couples(
  left,
  right = NULL,
  vars = NULL,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE,
  auto_scale = FALSE,
  max_distance = Inf,
  calipers = NULL,
  block_id = NULL,
  ignore_blocks = FALSE,
  require_full_matching = FALSE,
  strategy = c("row_best", "sorted", "pq"),
  return_unmatched = TRUE,
  return_diagnostics = FALSE,
  parallel = FALSE,
  check_costs = TRUE
)

Arguments

left

Data frame of "left" units (e.g., treated, cases)

right

Data frame of "right" units (e.g., control, controls)

vars

Variable names to use for distance computation

distance

Distance metric: "euclidean", "manhattan", "mahalanobis", or a custom function

weights

Optional named vector of variable weights

scale

Scaling method: FALSE (none), "standardize", "range", or "robust"

auto_scale

If TRUE, automatically check variable health and select scaling method (default: FALSE)

max_distance

Maximum allowed distance (pairs exceeding this are forbidden)

calipers

Named list of per-variable maximum absolute differences

block_id

Column name containing block IDs (for stratified matching)

ignore_blocks

If TRUE, ignore block_id even if present

require_full_matching

If TRUE, error if any units remain unmatched

strategy

Greedy strategy:

"row_best": For each row, find best available column (default)
"sorted": Sort all pairs by distance, greedily assign
"pq": Use priority queue (good for very large problems)

return_unmatched

Include unmatched units in output

return_diagnostics

Include detailed diagnostics in output

parallel

Enable parallel processing for blocked matching. Requires 'future' and 'future.apply' packages. Can be:

FALSE: Sequential processing (default)
TRUE: Auto-configure parallel backend
Character: Specify future plan (e.g., "multisession", "multicore")

check_costs

If TRUE, check distance distribution for potential problems and provide helpful warnings before matching (default: TRUE)

Details

Greedy strategies do not guarantee optimal total distance but are much faster:

"row_best": O(n*m) time, simple and often produces good results
"sorted": O(nmlog(n*m)) time, better quality but slower
"pq": O(nmlog(n*m)) time, memory-efficient for large problems

Use greedy_couples when:

Dataset is very large (> 10,000 x 10,000)
Approximate solution is acceptable
Speed is more important than optimality

Value

A list with class "matching_result" (same structure as match_couples)

Examples

# Basic greedy matching
left <- data.frame(id = 1:100, x = rnorm(100))
right <- data.frame(id = 101:200, x = rnorm(100))
result <- greedy_couples(left, right, vars = "x")

# Compare to optimal
result_opt <- match_couples(left, right, vars = "x")
result_greedy <- greedy_couples(left, right, vars = "x")
result_greedy$info$total_distance / result_opt$info$total_distance  # Quality ratio

Greedy matching with blocking

Description

Greedy matching with blocking

Usage

greedy_couples_blocked(
  left,
  right,
  left_ids,
  right_ids,
  block_col,
  vars,
  distance,
  weights,
  scale,
  max_distance,
  calipers,
  strategy,
  parallel = FALSE
)

Value

List with pairs tibble and matching info.

Greedy Matching from Precomputed Distance Object

Description

Internal function to handle greedy matching when a distance_object is provided

Usage

greedy_couples_from_distance(
  dist_obj,
  max_distance = Inf,
  calipers = NULL,
  ignore_blocks = FALSE,
  require_full_matching = FALSE,
  strategy = "row_best",
  return_unmatched = TRUE,
  return_diagnostics = FALSE
)

Value

A matching_result object with pairs, info, and optional diagnostics.

Greedy matching without blocking

Description

Greedy matching without blocking

Usage

greedy_couples_single(
  left,
  right,
  left_ids,
  right_ids,
  vars,
  distance,
  weights,
  scale,
  max_distance,
  calipers,
  strategy
)

Value

List with pairs tibble and matching info.

Re-export of dplyr::group_by

Description

Re-export of dplyr::group_by

Value

See group_by.

Check if data frame has blocking information

Description

Check if data frame has blocking information

Usage

has_blocks(df)

Value

Logical indicating whether data frame has block ID column.

Check if any valid pairs exist

Description

Check if any valid pairs exist

Usage

has_valid_pairs(cost_matrix)

Value

Logical indicating whether any valid pairs exist.

Hospital staff scheduling example dataset

Description

A comprehensive example dataset for demonstrating couplr functionality across vignettes. Contains hospital staff scheduling data with nurses, shifts, costs, and preference scores suitable for assignment problems, as well as nurse characteristics for matching workflows.

Usage

hospital_staff

Format

A list containing eight related datasets:

basic_costs

A 10x10 numeric cost matrix for assigning 10 nurses to 10 shifts. Values range from approximately 1-15, where lower values indicate better fit (less overtime, matches skills, respects preferences). Use with lap_solve() for basic assignment.

preferences

A 10x10 numeric preference matrix on a 0-10 scale, where higher values indicate stronger nurse preference for a shift. Use with lap_solve(..., maximize = TRUE) to optimize preferences rather than minimize costs.

schedule_df

A tibble with 100 rows (10 nurses x 10 shifts) in long format for data frame workflows:

nurse_id: Integer 1-10. Unique identifier for each nurse.
shift_id: Integer 1-10. Unique identifier for each shift.
cost: Numeric. Assignment cost (same values as basic_costs).
preference: Numeric 0-10. Nurse preference score.
skill_match: Integer 0/1. Binary indicator: 1 if nurse skills match shift requirements, 0 otherwise.

nurses

A tibble with 10 rows describing nurse characteristics:

nurse_id: Integer 1-10. Links to schedule_df and basic_costs rows.
experience_years: Numeric 1-20. Years of nursing experience.
department: Character. Primary department: "ICU", "ER", "General", or "Pediatrics".
shift_preference: Character. Preferred shift type: "day", "evening", or "night".
certification_level: Integer 1-3. Certification level where 3 is highest (e.g., 1=RN, 2=BSN, 3=MSN).

shifts

A tibble with 10 rows describing shift requirements:

shift_id: Integer 1-10. Links to schedule_df and basic_costs cols.
department: Character. Department needing coverage.
shift_type: Character. Shift type: "day", "evening", or "night".
min_experience: Numeric. Minimum years of experience required.
min_certification: Integer 1-3. Minimum certification level.

weekly_df

A tibble for batch solving with 500 rows (5 days x 10 nurses x 10 shifts):

day: Character. Day of week: "Mon", "Tue", "Wed", "Thu", "Fri".
nurse_id: Integer 1-10. Nurse identifier.
shift_id: Integer 1-10. Shift identifier.
cost: Numeric. Daily assignment cost (varies by day).
preference: Numeric 0-10. Daily preference score.

Use with group_by(day) for solving each day's schedule.

nurses_extended

A tibble with 200 nurses for matching examples, representing a treatment group (e.g., full-time nurses):

nurse_id: Integer 1-200. Unique identifier.
age: Numeric 22-65. Nurse age in years.
experience_years: Numeric 0-40. Years of nursing experience.
hourly_rate: Numeric 25-75. Hourly wage in dollars.
department: Character. Primary department assignment.
certification_level: Integer 1-3. Certification level.
is_fulltime: Logical. TRUE for full-time status.

controls_extended

A tibble with 300 potential control nurses (e.g., part-time or registry nurses) for matching. Same structure as nurses_extended. Designed to have systematic differences from nurses_extended (older, less experience on average) to demonstrate matching's ability to create comparable groups.

Details

This dataset is used throughout the couplr documentation to provide a consistent, realistic example that evolves in complexity. It supports three use cases: (1) basic LAP solving with cost matrices, (2) batch solving across multiple days, and (3) matching workflows comparing nurse groups.

The dataset is designed to demonstrate progressively complex scenarios:

Basic LAP (vignette("getting-started")):

basic_costs: Simple 10x10 assignment
preferences: Maximization problem
schedule_df: Data frame input, grouped workflows
weekly_df: Batch solving across days

Algorithm comparison (vignette("algorithms")):

Use basic_costs to compare algorithm behavior
Modify with NA values for sparse scenarios

Matching workflows (vignette("matching-workflows")):

nurses_extended: Treatment group (full-time nurses)
controls_extended: Control pool (part-time/registry nurses)
Match on age, experience, department for causal analysis

Examples

# Basic assignment: assign nurses to shifts minimizing cost
lap_solve(hospital_staff$basic_costs)

# Maximize preferences instead
lap_solve(hospital_staff$preferences, maximize = TRUE)

# Data frame workflow
library(dplyr)
hospital_staff$schedule_df |>
  lap_solve(nurse_id, shift_id, cost)

# Batch solve weekly schedule
hospital_staff$weekly_df |>
  group_by(day) |>
  lap_solve(nurse_id, shift_id, cost)

# Matching workflow: match full-time to part-time nurses
match_couples(
  left = hospital_staff$nurses_extended,
  right = hospital_staff$controls_extended,
  vars = c("age", "experience_years", "certification_level"),
  auto_scale = TRUE
)

Low match rate info

Description

Low match rate info

Usage

info_low_match_rate(n_matched, n_left, pct)

Value

No return value, called for side effects (issues a message or warning).

Check if Object is a Distance Object

Description

Check if Object is a Distance Object

Usage

is_distance_object(x)

Arguments

x

Object to check

Value

Logical: TRUE if x is a distance_object

Examples

left <- data.frame(id = 1:3, x = c(1, 2, 3))
right <- data.frame(id = 4:6, x = c(1.1, 2.1, 3.1))
dist_obj <- compute_distances(left, right, vars = "x")
is_distance_object(dist_obj)  # TRUE
is_distance_object(list())    # FALSE

Check if object is a batch assignment result

Description

Check if object is a batch assignment result

Usage

is_lap_solve_batch_result(x)

Arguments

x

Object to test

Value

Logical indicating if x is a batch assignment result

Check if object is a k-best assignment result

Description

Check if object is a k-best assignment result

Usage

is_lap_solve_kbest_result(x)

Arguments

x

Object to test

Value

Logical indicating if x is a k-best assignment result

Check if object is an assignment result

Description

Check if object is an assignment result

Usage

is_lap_solve_result(x)

Arguments

x

Object to test

Value

Logical indicating if x is an assignment result

Join Matched Pairs with Original Data

Description

Creates an analysis-ready dataset by joining matched pairs with variables from the original left and right datasets. This eliminates the need for manual joins and provides a convenient format for downstream analysis.

Usage

join_matched(
  result,
  left,
  right,
  left_vars = NULL,
  right_vars = NULL,
  left_id = "id",
  right_id = "id",
  suffix = c("_left", "_right"),
  include_distance = TRUE,
  include_pair_id = TRUE,
  include_block_id = TRUE
)

Arguments

result

A matching_result object from match_couples() or greedy_couples()

left

The original left dataset

right

The original right dataset

left_vars

Character vector of variable names to include from left. If NULL (default), includes all variables except the ID column.

right_vars

Character vector of variable names to include from right. If NULL (default), includes all variables except the ID column.

left_id

Name of the ID column in left dataset (default: "id")

right_id

Name of the ID column in right dataset (default: "id")

suffix

Character vector of length 2 specifying suffixes for left and right variables (default: c("_left", "_right"))

include_distance

Include the matching distance in output (default: TRUE)

include_pair_id

Include pair_id column (default: TRUE)

include_block_id

Include block_id if blocking was used (default: TRUE)

Details

This function simplifies the common workflow of joining matched pairs with original data. Instead of manually merging result$pairs with left and right datasets, join_matched() handles the joins automatically and applies consistent naming conventions.

When variables appear in both left and right datasets, suffixes are appended to distinguish them (e.g., "age_left" and "age_right"). This makes it easy to compute differences or use both values in models.

Value

A tibble with one row per matched pair, containing:

pair_id: Sequential pair identifier (if include_pair_id = TRUE)
left_id: ID from left dataset
right_id: ID from right dataset
distance: Matching distance (if include_distance = TRUE)
block_id: Block identifier (if blocking used and include_block_id = TRUE)
Variables from left dataset (with left suffix)
Variables from right dataset (with right suffix)

Examples

# Basic usage
left <- data.frame(
  id = 1:5,
  treatment = 1,
  age = c(25, 30, 35, 40, 45),
  income = c(45000, 52000, 48000, 61000, 55000)
)

right <- data.frame(
  id = 6:10,
  treatment = 0,
  age = c(24, 29, 36, 41, 44),
  income = c(46000, 51500, 47500, 60000, 54000)
)

result <- match_couples(left, right, vars = c("age", "income"))
matched_data <- join_matched(result, left, right)
head(matched_data)

# Specify which variables to include
matched_data <- join_matched(
  result, left, right,
  left_vars = c("treatment", "age", "income"),
  right_vars = c("age", "income"),
  suffix = c("_treated", "_control")
)

# Without distance or pair_id
matched_data <- join_matched(
  result, left, right,
  include_distance = FALSE,
  include_pair_id = FALSE
)

Solve linear assignment problems

Description

Provides a tidy interface for solving the linear assignment problem using 'Hungarian' or 'Jonker-Volgenant' algorithms. Supports rectangular matrices, NA/Inf masking, and data frame inputs.

Usage

lap_solve(
  x,
  source = NULL,
  target = NULL,
  cost = NULL,
  maximize = FALSE,
  method = "auto",
  forbidden = NA
)

Arguments

x

Cost matrix, data frame, or tibble. If a data frame/tibble, must include columns specified by source, target, and cost.

source

Column name for source/row indices (if x is a data frame)

target

Column name for target/column indices (if x is a data frame)

cost

Column name for costs (if x is a data frame)

maximize

Logical; if TRUE, maximizes total cost instead of minimizing (default: FALSE)

method

Algorithm to use. One of:

"auto" (default): automatically selects best algorithm
"jv": 'Jonker-Volgenant' algorithm (general purpose, fast)
"hungarian": Classic 'Hungarian' algorithm
"auction": 'Bertsekas' auction algorithm (good for large dense problems)
"sap": Sparse assignment (good for sparse/rectangular problems)
"hk01": 'Hopcroft-Karp' for binary/uniform costs

forbidden

Value to mark forbidden assignments (default: NA). Can also use Inf.

Value

A tibble with columns:

source: row/source indices
target: column/target indices
cost: cost of each assignment
total_cost: total cost (attribute)

Examples

# Matrix input
cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3)
lap_solve(cost)

# Data frame input
library(dplyr)
df <- tibble(
  source = rep(1:3, each = 3),
  target = rep(1:3, times = 3),
  cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4)
)
lap_solve(df, source, target, cost)

# With NA masking (forbidden assignments)
cost[1, 3] <- NA
lap_solve(cost)

# Grouped data frames
df <- tibble(
  sim = rep(1:2, each = 9),
  source = rep(1:3, times = 6),
  target = rep(1:3, each = 3, times = 2),
  cost = runif(18, 1, 10)
)
df |> group_by(sim) |> lap_solve(source, target, cost)

Solve multiple assignment problems efficiently

Description

Solve many independent assignment problems at once. Supports lists of matrices, 3D arrays, or grouped data frames. Optional parallel execution via n_threads.

Usage

lap_solve_batch(
  x,
  source = NULL,
  target = NULL,
  cost = NULL,
  maximize = FALSE,
  method = "auto",
  n_threads = 1,
  forbidden = NA
)

Arguments

x

One of: List of cost matrices, 3D array, or grouped data frame

source

Column name for source indices (if x is a grouped data frame)

target

Column name for target indices (if x is a grouped data frame)

cost

Column name for costs (if x is a grouped data frame)

maximize

Logical; if TRUE, maximizes total cost (default: FALSE)

method

Algorithm to use (default: "auto"). See lap_solve for options.

n_threads

Number of threads for parallel execution (default: 1). Set to NULL to use all available cores.

forbidden

Value to mark forbidden assignments (default: NA)

Value

A tibble with columns:

problem_id: identifier for each problem
source: source indices for assignments
target: target indices for assignments
cost: cost of each assignment
total_cost: total cost for each problem
method_used: algorithm used for each problem

Examples

# List of matrices
costs <- list(
  matrix(c(1, 2, 3, 4), 2, 2),
  matrix(c(5, 6, 7, 8), 2, 2)
)
lap_solve_batch(costs)

# 3D array
arr <- array(runif(2 * 2 * 10), dim = c(2, 2, 10))
lap_solve_batch(arr)

# Grouped data frame
library(dplyr)
df <- tibble(
  sim = rep(1:5, each = 9),
  source = rep(1:3, times = 15),
  target = rep(1:3, each = 3, times = 5),
  cost = runif(45, 1, 10)
)
df |> group_by(sim) |> lap_solve_batch(source, target, cost)

# Parallel execution (requires n_threads > 1)
lap_solve_batch(costs, n_threads = 2)

Find k-best optimal assignments

Description

Returns the top k optimal (or near-optimal) assignments using 'Murty' algorithm. Useful for exploring alternative optimal solutions or finding robust assignments.

Usage

lap_solve_kbest(
  x,
  k = 3,
  source = NULL,
  target = NULL,
  cost = NULL,
  maximize = FALSE,
  method = "murty",
  single_method = "jv",
  forbidden = NA
)

Arguments

x

Cost matrix, data frame, or tibble. If a data frame/tibble, must include columns specified by source, target, and cost.

k

Number of best solutions to return (default: 3)

source

Column name for source/row indices (if x is a data frame)

target

Column name for target/column indices (if x is a data frame)

cost

Column name for costs (if x is a data frame)

maximize

Logical; if TRUE, finds k-best maximizing assignments (default: FALSE)

method

Algorithm for each sub-problem (default: "murty"). Future versions may support additional methods.

single_method

Algorithm used for solving each node in the search tree (default: "jv")

forbidden

Value to mark forbidden assignments (default: NA)

Value

A tibble with columns:

rank: ranking of solutions (1 = best, 2 = second best, etc.)
solution_id: unique identifier for each solution
source: source indices
target: target indices
cost: cost of each edge in the assignment
total_cost: total cost of the complete solution

Examples

# Matrix input - find 5 best solutions
cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3)
lap_solve_kbest(cost, k = 5)

# Data frame input
library(dplyr)
df <- tibble(
  source = rep(1:3, each = 3),
  target = rep(1:3, times = 3),
  cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4)
)
lap_solve_kbest(df, k = 3, source, target, cost)

# With maximization
lap_solve_kbest(cost, k = 3, maximize = TRUE)

Solve 1-D Line Assignment Problem

Description

Solves the linear assignment problem when both sources and targets are ordered points on a line. Uses efficient O(n*m) dynamic programming for rectangular problems and O(n) sorting for square problems.

Usage

lap_solve_line_metric(x, y, cost = "L1", maximize = FALSE)

Arguments

x

Numeric vector of source positions (will be sorted internally)

y

Numeric vector of target positions (will be sorted internally)

cost

Cost function for distance. Either:

"L1" (default): absolute distance ('Manhattan' distance)
"L2": squared distance (squared 'Euclidean' distance) Can also use aliases: "abs", "manhattan" for L1; "sq", "squared", "quadratic" for L2

maximize

Logical; if TRUE, maximizes total cost instead of minimizing (default: FALSE)

Details

This is a specialized solver that exploits the structure of 1-dimensional assignment problems where costs depend only on the distance between points on a line. It is much faster than general LAP solvers for this special case.

The algorithm works as follows:

Square case (n == m): Both vectors are sorted and matched in order: x[1] -> y[1], x[2] -> y[2], etc. This is optimal for any metric cost function on a line.

Rectangular case (n < m): Uses dynamic programming to find the optimal assignment that matches all n sources to a subset of the m targets, minimizing total distance. The DP recurrence is:

dp[i][j] = min(dp[i][j-1], dp[i-1][j-1] + cost(x[i], y[j]))

This finds the minimum cost to match the first i sources to the first j targets.

Complexity:

Time: O(n*m) for rectangular, O(n log n) for square
Space: O(n*m) for DP table

Value

A list with components:

match: Integer vector of length n with 1-based column indices
total_cost: Total cost of the assignment

Examples

# Square case: equal number of sources and targets
x <- c(1.5, 3.2, 5.1)
y <- c(2.0, 3.0, 5.5)
result <- lap_solve_line_metric(x, y, cost = "L1")
print(result)

# Rectangular case: more targets than sources
x <- c(1.0, 3.0, 5.0)
y <- c(0.5, 2.0, 3.5, 4.5, 6.0)
result <- lap_solve_line_metric(x, y, cost = "L2")
print(result)

# With unsorted inputs (will be sorted internally)
x <- c(5.0, 1.0, 3.0)
y <- c(4.5, 0.5, 6.0, 2.0, 3.5)
result <- lap_solve_line_metric(x, y, cost = "L1")
print(result)

Mark forbidden pairs

Description

Generic function to mark specific pairs as forbidden.

Usage

mark_forbidden_pairs(cost_matrix, forbidden_indices)

Value

Modified cost matrix with forbidden pairs marked.

Match blocks in parallel

Description

Match blocks in parallel

Usage

match_blocks_parallel(
  blocks,
  left,
  right,
  left_ids,
  right_ids,
  block_col,
  vars,
  distance,
  weights,
  scale,
  max_distance,
  calipers,
  method,
  parallel = FALSE
)

Arguments

blocks

Vector of block IDs

left

Left dataset with block_col

right

Right dataset with block_col

left_ids

IDs from left

right_ids

IDs from right

block_col

Name of blocking column

vars

Variables for matching

distance

Distance metric

weights

Variable weights

scale

Scaling method

max_distance

Maximum distance

calipers

Caliper constraints

method

LAP method

parallel

Whether to use parallel processing

Value

List with combined results from all blocks

Optimal matching using linear assignment

Description

Performs optimal one-to-one matching between two datasets using linear assignment problem (LAP) solvers. Supports blocking, distance constraints, and various distance metrics.

Usage

match_couples(
  left,
  right = NULL,
  vars = NULL,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE,
  auto_scale = FALSE,
  max_distance = Inf,
  calipers = NULL,
  block_id = NULL,
  ignore_blocks = FALSE,
  require_full_matching = FALSE,
  method = "auto",
  return_unmatched = TRUE,
  return_diagnostics = FALSE,
  parallel = FALSE,
  check_costs = TRUE
)

Arguments

left

Data frame of "left" units (e.g., treated, cases)

right

Data frame of "right" units (e.g., control, controls)

vars

Variable names to use for distance computation

distance

Distance metric: "euclidean", "manhattan", "mahalanobis", or a custom function

weights

Optional named vector of variable weights

scale

Scaling method: FALSE (none), "standardize", "range", or "robust"

auto_scale

If TRUE, automatically check variable health and select scaling method (default: FALSE)

max_distance

Maximum allowed distance (pairs exceeding this are forbidden)

calipers

Named list of per-variable maximum absolute differences

block_id

Column name containing block IDs (for stratified matching)

ignore_blocks

If TRUE, ignore block_id even if present

require_full_matching

If TRUE, error if any units remain unmatched

method

LAP solver: "auto", "hungarian", "jv", "gabow_tarjan", etc.

return_unmatched

Include unmatched units in output

return_diagnostics

Include detailed diagnostics in output

parallel

Enable parallel processing for blocked matching. Requires 'future' and 'future.apply' packages. Can be:

FALSE: Sequential processing (default)
TRUE: Auto-configure parallel backend
Character: Specify future plan (e.g., "multisession", "multicore")

check_costs

If TRUE, check distance distribution for potential problems and provide helpful warnings before matching (default: TRUE)

Details

This function finds the matching that minimizes total distance among all feasible matchings, subject to constraints. Use greedy_couples() for faster approximate matching on large datasets.

Value

A list with class "matching_result" containing:

pairs: Tibble of matched pairs with distances
unmatched: List of unmatched left and right IDs
info: Matching diagnostics and metadata

Examples

# Basic matching
left <- data.frame(id = 1:5, x = c(1, 2, 3, 4, 5), y = c(2, 4, 6, 8, 10))
right <- data.frame(id = 6:10, x = c(1.1, 2.2, 3.1, 4.2, 5.1), y = c(2.1, 4.1, 6.2, 8.1, 10.1))
result <- match_couples(left, right, vars = c("x", "y"))
print(result$pairs)

# With constraints
result <- match_couples(left, right, vars = c("x", "y"),
                        max_distance = 1,
                        calipers = list(x = 0.5))

# With blocking
left$region <- c("A", "A", "B", "B", "B")
right$region <- c("A", "A", "B", "B", "B")
blocks <- matchmaker(left, right, block_type = "group", block_by = "region")
result <- match_couples(blocks$left, blocks$right, vars = c("x", "y"))

Match with blocking (multiple problems)

Description

Match with blocking (multiple problems)

Usage

match_couples_blocked(
  left,
  right,
  left_ids,
  right_ids,
  block_col,
  vars,
  distance,
  weights,
  scale,
  max_distance,
  calipers,
  method,
  parallel = FALSE
)

Value

List with pairs tibble and matching info.

Match from Precomputed Distance Object

Description

Internal function to handle matching when a distance_object is provided

Usage

match_couples_from_distance(
  dist_obj,
  max_distance = Inf,
  calipers = NULL,
  ignore_blocks = FALSE,
  require_full_matching = FALSE,
  method = "auto",
  return_unmatched = TRUE,
  return_diagnostics = FALSE,
  check_costs = TRUE
)

Value

A matching_result object with pairs, info, and optional diagnostics.

Match without blocking (single problem)

Description

Match without blocking (single problem)

Usage

match_couples_single(
  left,
  right,
  left_ids,
  right_ids,
  vars,
  distance,
  weights,
  scale,
  max_distance,
  calipers,
  method,
  check_costs = TRUE
)

Value

List with pairs tibble and matching info.

Create blocks for stratified matching

Description

Constructs blocks (strata) for matching, using either grouping variables or clustering algorithms. Returns the input data frames with block IDs assigned, along with block summary statistics.

Usage

matchmaker(
  left,
  right,
  block_type = c("none", "group", "cluster"),
  block_by = NULL,
  block_vars = NULL,
  block_method = "kmeans",
  n_blocks = NULL,
  min_left = 1,
  min_right = 1,
  drop_imbalanced = FALSE,
  imbalance_threshold = Inf,
  return_dropped = TRUE,
  ...
)

Arguments

left

Data frame of "left" units (e.g., treated, cases)

right

Data frame of "right" units (e.g., control, controls)

block_type

Type of blocking to use:

"none": No blocking (default)
"group": Block by existing categorical variable(s)
"cluster": Block using clustering algorithm

block_by

Variable name(s) for grouping (if block_type = "group")

block_vars

Variable names for clustering (if block_type = "cluster")

block_method

Clustering method (if block_type = "cluster"):

"kmeans": K-means clustering
"hclust": Hierarchical clustering

n_blocks

Target number of blocks (for clustering)

min_left

Minimum number of left units per block

min_right

Minimum number of right units per block

drop_imbalanced

Drop blocks with extreme imbalance

imbalance_threshold

Maximum allowed |n_left - n_right| / max(n_left, n_right)

return_dropped

Include dropped blocks in output

...

Additional arguments passed to clustering function

Details

This function does NOT perform matching - it only creates the block structure. Use match_couples() or greedy_couples() to perform matching within blocks.

Value

A list with class "matchmaker_result" containing:

left: Left data frame with block_id column added
right: Right data frame with block_id column added
block_summary: Summary statistics for each block
dropped: Information about dropped blocks (if any)
info: Metadata about blocking process

Examples

# Group blocking
left <- data.frame(id = 1:10, region = rep(c("A", "B"), each = 5), x = rnorm(10))
right <- data.frame(id = 11:20, region = rep(c("A", "B"), each = 5), x = rnorm(10))
blocks <- matchmaker(left, right, block_type = "group", block_by = "region")
print(blocks$block_summary)

# Clustering
blocks <- matchmaker(left, right, block_type = "cluster",
                     block_vars = "x", n_blocks = 3)

Parallel lapply using future

Description

Parallel lapply using future

Usage

parallel_lapply(X, FUN, ..., parallel = FALSE)

Arguments

X

Vector to iterate over

FUN

Function to apply

...

Additional arguments to FUN

parallel

Whether parallel processing is enabled

Value

List of results

Pixel-level image morphing (final frame only)

Description

Computes optimal pixel assignment from A to B and returns the final transported frame (without intermediate animation frames).

Usage

pixel_morph(
  imgA,
  imgB,
  n_frames = 16L,
  mode = c("color_walk", "exact", "recursive"),
  lap_method = "jv",
  maximize = FALSE,
  quantize_bits = 5L,
  downscale_steps = 0L,
  alpha = 1,
  beta = 0,
  patch_size = 1L,
  upscale = 1,
  show = interactive()
)

Arguments

imgA

Source image (file path or magick image object)

imgB

Target image (file path or magick image object)

n_frames

Internal parameter for rendering (default: 16)

mode

Assignment algorithm: "color_walk" (default), "exact", or "recursive"

lap_method

LAP solver method (default: "jv")

maximize

Logical, maximize instead of minimize cost (default: FALSE)

quantize_bits

Color quantization for "color_walk" mode (default: 5)

downscale_steps

Number of 2x reductions before computing assignment (default: 0)

alpha

Weight for color distance in cost function (default: 1)

beta

Weight for spatial distance in cost function (default: 0)

patch_size

Tile size for tiled modes (default: 1)

upscale

Post-rendering upscaling factor (default: 1)

show

Logical, display result in viewer (default: interactive())

Details

Transport-Only Semantics

This function returns a SHARP, pixel-perfect transport of A's pixels to positions determined by the assignment to B.

Key Points:

Assignment computed using: cost = alpha * color_dist + beta * spatial_dist
B's COLORS influence assignment but DO NOT appear in output
Result has A's colors arranged to match B's layout
No motion blur (unlike intermediate frames in animation)

See pixel_morph_animate for detailed explanation of assignment vs rendering semantics.

Permutation Warnings

Assignment is guaranteed to be a bijection (permutation) ONLY when:

downscale_steps = 0 (no resolution changes)
mode = "exact" with patch_size = 1

With downscaling or tiled modes, assignment may have:

Overlaps: Multiple source pixels map to same destination (last write wins)
Holes: Some destinations never filled (remain transparent)

If assignment is not a bijection (due to downscaling or tiling), a warning will be issued. The result may contain:

Overlapped pixels (multiple sources -> one destination)
Transparent holes (some destinations unfilled)

For guaranteed pixel-perfect results, use:

  pixel_morph(A, B, mode = "exact", downscale_steps = 0)

Value

magick image object of the final transported frame

Examples

if (requireNamespace("magick", quietly = TRUE)) {
  imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr")
  imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr")
  if (nzchar(imgA) && nzchar(imgB)) {
    result <- pixel_morph(imgA, imgB, n_frames = 4, show = FALSE)
  }
}

Pixel-level image morphing (animation)

Description

Creates an animated morph by computing optimal pixel assignment from image A to image B, then rendering intermediate frames showing the transport.

Usage

pixel_morph_animate(
  imgA,
  imgB,
  n_frames = 16L,
  fps = 10L,
  format = c("gif", "webp", "mp4"),
  outfile = NULL,
  show = interactive(),
  mode = c("color_walk", "exact", "recursive"),
  lap_method = "jv",
  maximize = FALSE,
  quantize_bits = 5L,
  downscale_steps = 0L,
  alpha = 1,
  beta = 0,
  patch_size = 1L,
  upscale = 1
)

Arguments

imgA

Source image (file path or magick image object)

imgB

Target image (file path or magick image object)

n_frames

Integer number of animation frames (default: 16)

fps

Frames per second for playback (default: 10)

format

Output format: "gif", "webp", or "mp4"

outfile

Optional output file path

show

Logical, display animation in viewer (default: interactive())

mode

Assignment algorithm: "color_walk" (default), "exact", or "recursive"

lap_method

LAP solver method (default: "jv")

maximize

Logical, maximize instead of minimize cost (default: FALSE)

quantize_bits

Color quantization for "color_walk" mode (default: 5)

downscale_steps

Number of 2x reductions before computing assignment (default: 0)

alpha

Weight for color distance in cost function (default: 1)

beta

Weight for spatial distance in cost function (default: 0)

patch_size

Tile size for tiled modes (default: 1)

upscale

Post-rendering upscaling factor (default: 1)

Details

Assignment vs Rendering Semantics

CRITICAL: This function has two separate phases with different semantics:

Phase 1 - Assignment Computation:

The assignment is computed by minimizing:

  cost(i,j) = alpha * color_distance(A[i], B[j]) + 
              beta * spatial_distance(pos_i, pos_j)

This means B's COLORS influence which pixels from A map to which positions.

Phase 2 - Rendering (Transport-Only):

The renderer uses ONLY A's colors:

Intermediate frames: A's pixels move along paths with motion blur
Final frame: A's pixels at their assigned positions (sharp, no blur)
B's colors NEVER appear in the output

Result: You get A's colors rearranged to match B's geometry/layout.

What This Means

B influences WHERE pixels go (via similarity in cost function)
B does NOT determine WHAT COLORS appear in output
Final image has A's palette arranged to mimic B's structure

Parameter Guidance

For pure spatial rearrangement (ignore B's colors in assignment):

  pixel_morph_animate(A, B, alpha = 0, beta = 1)

For color-similarity matching (default):

  pixel_morph_animate(A, B, alpha = 1, beta = 0)

For hybrid (color + spatial):

  pixel_morph_animate(A, B, alpha = 1, beta = 0.2)

Permutation Guarantees

Assignment is guaranteed to be a bijection (permutation) ONLY when:

downscale_steps = 0 (no resolution changes)
mode = "exact" with patch_size = 1

With downscaling or tiled modes, assignment may have:

Overlaps: Multiple source pixels map to same destination (last write wins)
Holes: Some destinations never filled (remain transparent)

A warning is issued if overlaps/holes are detected in the final frame.

Value

Invisibly returns a list with animation object and metadata:

animation

magick animation object

width

Image width in pixels

height

Image height in pixels

assignment

Integer vector of 1-based assignment indices (R convention)

n_pixels

Total number of pixels

mode

Mode used for matching

upscale

Upscaling factor applied

Examples

if (requireNamespace("magick", quietly = TRUE)) {
  imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr")
  imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr")
  if (nzchar(imgA) && nzchar(imgB)) {
    outfile <- tempfile(fileext = ".gif")
    pixel_morph_animate(imgA, imgB, outfile = outfile, n_frames = 4, show = FALSE)
  }
}

Plot method for balance diagnostics

Description

Produces a Love plot (dot plot) of standardized differences.

Usage

## S3 method for class 'balance_diagnostics'
plot(x, type = c("love", "histogram", "variance"), threshold = 0.1, ...)

Arguments

x

A balance_diagnostics object

type

Type of plot: "love" (default), "histogram", or "variance"

threshold

Threshold line for standardized differences (default: 0.1)

...

Additional arguments passed to plotting functions

Value

The balance_diagnostics object (invisibly)

Plot method for matching results

Description

Produces a histogram of pairwise distances from a matching result.

Usage

## S3 method for class 'matching_result'
plot(x, type = c("histogram", "density", "ecdf"), ...)

Arguments

x

A matching_result object

type

Type of plot: "histogram" (default), "density", or "ecdf"

...

Additional arguments passed to plotting functions

Value

The matching_result object (invisibly)

Preprocess matching variables with automatic checks and scaling

Description

Main preprocessing function that orchestrates variable health checks, categorical encoding, and automatic scaling selection.

Usage

preprocess_matching_vars(
  left,
  right,
  vars,
  auto_scale = TRUE,
  scale_method = "auto",
  check_health = TRUE,
  remove_problematic = TRUE,
  verbose = TRUE
)

Arguments

left

Data frame of left units

right

Data frame of right units

vars

Character vector of variable names

auto_scale

Logical, whether to perform automatic preprocessing (default: TRUE)

scale_method

Scaling method: "auto", "standardize", "range", "robust", or FALSE

check_health

Logical, whether to check variable health (default: TRUE)

remove_problematic

Logical, automatically exclude constant/all-NA variables (default: TRUE)

verbose

Logical, whether to print warnings (default: TRUE)

Value

A list with class "preprocessing_result" containing:

left: Preprocessed left data frame
right: Preprocessed right data frame
vars: Final variable names (after exclusions)
health: Variable health diagnostics
scaling_method: Selected scaling method
excluded_vars: Variables that were excluded
warnings: List of warnings issued

Print Method for Balance Diagnostics

Description

Print Method for Balance Diagnostics

Usage

## S3 method for class 'balance_diagnostics'
print(x, ...)

Arguments

x

A balance_diagnostics object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Print Method for Distance Objects

Description

Print Method for Distance Objects

Usage

## S3 method for class 'distance_object'
print(x, ...)

Arguments

x

A distance_object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Print method for batch assignment results

Description

Prints a summary and the table of results for a batch of assignment problems solved with lap_solve_batch().

Usage

## S3 method for class 'lap_solve_batch_result'
print(x, ...)

Arguments

x

A lap_solve_batch_result object.

...

Additional arguments passed to print(). Currently ignored.

Value

Invisibly returns the input object x.

Print method for k-best assignment results

Description

Print method for k-best assignment results

Usage

## S3 method for class 'lap_solve_kbest_result'
print(x, ...)

Arguments

x

A lap_solve_kbest_result.

...

Additional arguments passed to print(). Ignored.

Value

Invisibly returns the input object x.

Print method for assignment results

Description

Nicely prints a lap_solve_result object, including the assignments, total cost, and method used.

Usage

## S3 method for class 'lap_solve_result'
print(x, ...)

Arguments

x

A lap_solve_result object.

...

Additional arguments passed to print(). Currently ignored.

Value

Invisibly returns the input object x.

Print method for matching results

Description

Print method for matching results

Usage

## S3 method for class 'matching_result'
print(x, ...)

Arguments

x

A matching_result object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Print method for matchmaker results

Description

Print method for matchmaker results

Usage

## S3 method for class 'matchmaker_result'
print(x, ...)

Arguments

x

A matchmaker_result object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Print method for preprocessing result

Description

Print method for preprocessing result

Usage

## S3 method for class 'preprocessing_result'
print(x, ...)

Arguments

x

A preprocessing_result object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Print method for variable health

Description

Print method for variable health

Usage

## S3 method for class 'variable_health'
print(x, ...)

Arguments

x

A variable_health object

...

Additional arguments (ignored)

Value

Invisibly returns the input object x.

Restore original parallel plan

Description

Restore original parallel plan

Usage

restore_parallel(parallel_state)

Arguments

parallel_state

State from setup_parallel()

Value

No return value, called for side effects (restores parallel plan).

Setup parallel processing with future

Description

Setup parallel processing with future

Usage

setup_parallel(parallel = FALSE, n_workers = NULL)

Arguments

parallel

Logical or plan specification

n_workers

Number of workers (NULL for auto-detect)

Value

List with original plan and whether we set up parallelization

'Sinkhorn-Knopp' optimal transport solver

Description

Compute an entropy-regularized optimal transport plan using the 'Sinkhorn-Knopp' algorithm. Unlike other LAP solvers that return a hard 1-to-1 assignment, this returns a soft assignment (doubly stochastic matrix).

Usage

sinkhorn(
  cost,
  lambda = 10,
  tol = 1e-09,
  max_iter = 1000,
  r_weights = NULL,
  c_weights = NULL
)

Arguments

cost

Numeric matrix of transport costs. NA or Inf entries are treated as very high cost (effectively forbidden).

lambda

Regularization parameter (default 10). Higher values produce sharper (more deterministic) transport plans; lower values produce smoother distributions. Typical range: 1-100.

tol

Convergence tolerance (default 1e-9).

max_iter

Maximum iterations (default 1000).

r_weights

Optional numeric vector of row marginals (source distribution). Default is uniform. Will be normalized to sum to 1.

c_weights

Optional numeric vector of column marginals (target distribution). Default is uniform. Will be normalized to sum to 1.

Details

The 'Sinkhorn-Knopp' algorithm solves the entropy-regularized optimal transport problem:

P^* = \arg\min_P \langle C, P \rangle - \frac{1}{\lambda} H(P)

subject to row sums = r_weights and column sums = c_weights.

The entropy term H(P) encourages spread in the transport plan. As lambda -> Inf, the solution approaches the standard (unregularized) optimal transport.

Key differences from standard LAP solvers:

Returns a soft assignment (probabilities) not a hard 1-to-1 matching
Supports unequal marginals (weighted distributions)
Differentiable, making it useful in ML pipelines
Very fast: O(n^2) per iteration with typically O(1/tol^2) iterations

Use sinkhorn_to_assignment() to round the soft assignment to a hard matching.

Value

A list with elements:

transport_plan — numeric matrix, the optimal transport plan P. Row sums approximate r_weights, column sums approximate c_weights.
cost — the transport cost <C, P> (without entropy term).
u, v — scaling vectors (P = diag(u) * K * diag(v) where K = exp(-lambda*C)).
converged — logical, whether the algorithm converged.
iterations — number of iterations used.
lambda — the regularization parameter used.

References

Cuturi, M. (2013). 'Sinkhorn Distances': Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems, 26.

Examples

cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE)

# Soft assignment with default parameters
result <- sinkhorn(cost)
print(round(result$transport_plan, 3))

# Sharper assignment (higher lambda)
result_sharp <- sinkhorn(cost, lambda = 50)
print(round(result_sharp$transport_plan, 3))

# With custom marginals (more mass from row 1)
result_weighted <- sinkhorn(cost, r_weights = c(0.5, 0.25, 0.25))
print(round(result_weighted$transport_plan, 3))

# Round to hard assignment
hard_match <- sinkhorn_to_assignment(result)
print(hard_match)

Round 'Sinkhorn' transport plan to hard assignment

Description

Convert a soft transport plan from sinkhorn() to a hard 1-to-1 assignment using greedy rounding.

Usage

sinkhorn_to_assignment(result)

Arguments

result

Either a result from sinkhorn() or a transport plan matrix.

Details

Greedy rounding iteratively assigns each row to its most probable column, ensuring no column is assigned twice. This may not give the globally optimal hard assignment; for that, use the transport plan as a cost matrix with assignment().

Value

Integer vector of column assignments (1-based), same format as assignment().

Examples

cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE)
result <- sinkhorn(cost, lambda = 20)
hard_match <- sinkhorn_to_assignment(result)
print(hard_match)

Calculate Standardized Difference

Description

Computes the standardized mean difference between two groups. This is a key metric for assessing balance in matched samples.

Usage

standardized_difference(x1, x2, pooled = TRUE)

Arguments

x1

Numeric vector for group 1

x2

Numeric vector for group 2

pooled

Logical, if TRUE use pooled standard deviation (default), if FALSE use group 1 standard deviation

Details

Standardized difference = (mean1 - mean2) / pooled_sd where pooled_sd = sqrt((sd1^2 + sd2^2) / 2)

Common thresholds: less than 0.1 is excellent balance, 0.1-0.25 is good balance, 0.25-0.5 is acceptable balance, and greater than 0.5 is poor balance.

Value

Numeric value representing the standardized difference

Perfect balance success message

Description

Perfect balance success message

Usage

success_good_balance(mean_std_diff)

Value

No return value, called for side effects (issues a message).

Suggest scaling method based on variable characteristics

Description

Analyzes variable distributions and suggests appropriate scaling methods.

Usage

suggest_scaling(left, right, vars)

Arguments

left

Data frame of left units

right

Data frame of right units

vars

Character vector of variable names

Value

A character string with the suggested scaling method: "standardize", "range", "robust", or "none"

Summarize block structure

Description

Summarize block structure

Usage

summarize_blocks(left, right, block_vars = NULL)

Value

Tibble with block_id, n_left, n_right, and optional variable means.

Summary method for balance diagnostics

Description

Summary method for balance diagnostics

Usage

## S3 method for class 'balance_diagnostics'
summary(object, ...)

Arguments

object

A balance_diagnostics object

...

Additional arguments (ignored)

Value

A list containing summary statistics (invisibly)

Summary Method for Distance Objects

Description

Summary Method for Distance Objects

Usage

## S3 method for class 'distance_object'
summary(object, ...)

Arguments

object

A distance_object

...

Additional arguments (ignored)

Value

Invisibly returns the input object.

Get summary of k-best results

Description

Extract summary information from k-best assignment results.

Usage

## S3 method for class 'lap_solve_kbest_result'
summary(object, ...)

Arguments

object

An object of class lap_solve_kbest_result.

...

Additional arguments (unused).

Value

A tibble with one row per solution containing:

rank: solution rank
solution_id: solution identifier
total_cost: total cost of the solution
n_assignments: number of assignments in the solution

Summary method for matching results

Description

Summary method for matching results

Usage

## S3 method for class 'matching_result'
summary(object, ...)

Arguments

object

A matching_result object

...

Additional arguments (ignored)

Value

A list containing summary statistics (invisibly)

Update Constraints on Distance Object

Description

Apply new constraints to a precomputed distance object without recomputing the underlying distances. This is useful for exploring different constraint scenarios quickly.

Usage

update_constraints(dist_obj, max_distance = Inf, calipers = NULL)

Arguments

dist_obj

A distance_object from compute_distances()

max_distance

Maximum allowed distance (pairs with distance > max_distance become Inf)

calipers

Named list of per-variable calipers

Details

This function creates a new distance_object with modified constraints applied to the cost matrix. The original distance_object is not modified.

Constraints:

max_distance: Sets cost to Inf for pairs exceeding this threshold
calipers: Per-variable restrictions (e.g., calipers = list(age = 5))

The function returns a new object rather than modifying in place, following R's copy-on-modify semantics.

Value

A new distance_object with updated cost_matrix

Examples

left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45))
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44))
dist_obj <- compute_distances(left, right, vars = "age")

# Apply constraints
constrained <- update_constraints(dist_obj, max_distance = 2)
result <- match_couples(constrained)

Check if emoji should be used

Description

Check if emoji should be used

Usage

use_emoji()

Value

Logical indicating whether emoji should be used.

Validate calipers parameter

Description

Validate calipers parameter

Usage

validate_calipers(calipers, vars)

Value

Validated calipers (list or named numeric), or NULL if none.

Validate and prepare cost data

Description

Internal helper that ensures a numeric, non-empty cost matrix.

Usage

validate_cost_data(x, forbidden = NA)

Arguments

x

Cost matrix or data frame

forbidden

Value representing forbidden assignments (use NA or Inf)

Value

Numeric cost matrix

Validate matching inputs

Description

Validate matching inputs

Usage

validate_matching_inputs(left, right, vars = NULL)

Value

Invisibly returns TRUE if validation passes; otherwise throws an error.

Validate weights parameter

Description

Validate weights parameter

Usage

validate_weights(weights, vars)

Value

Numeric vector of validated weights.

All distances identical warning

Description

All distances identical warning

Usage

warn_constant_distance(value)

Value

No return value, called for side effects (issues a warning).

Constant variable warning

Description

Constant variable warning

Usage

warn_constant_var(var)

Value

No return value, called for side effects (issues a warning).

Extreme cost ratio warning

Description

Extreme cost ratio warning

Usage

warn_extreme_costs(p95, p99, ratio, problem_vars = NULL)

Value

No return value, called for side effects (issues a warning).

Many forbidden pairs warning

Description

Many forbidden pairs warning

Usage

warn_many_forbidden(pct_forbidden, n_valid, n_left)

Value

No return value, called for side effects (issues a warning).

Too many zeros warning

Description

Too many zeros warning

Usage

warn_many_zeros(pct, n_zeros)

Value

No return value, called for side effects (issues a warning).

Parallel package missing warning (reuse from matching_parallel.R)

Description

Parallel package missing warning (reuse from matching_parallel.R)

Usage

warn_parallel_unavailable()

Value

No return value, called for side effects (issues a warning).

High distance matches warning

Description

High distance matches warning

Usage

warn_poor_quality(pct_poor, threshold)

Value

No return value, called for side effects (issues a warning).

couplr: Optimal Pairing and Matching via Linear Assignment

Description

Main functions

Author(s)

See Also

Pipe operator

Description

Usage

Arguments

Value

Large value for forbidden pairs

Description

Usage

Format

Apply all constraints to cost matrix

Description

Usage

Value

Apply caliper constraints

Description

Usage

Value

Apply maximum distance constraint

Description

Usage

Value

Apply scaling to matching variables

Description

Usage

Value

Apply weights to matching variables

Description

Usage

Value

Convert assignment result to a binary matrix

Description

Usage

Arguments

Value

Assign blocks using clustering

Description

Usage

Value

Assign blocks based on grouping variable(s)

Description

Usage

Value

Linear assignment solver

Description

Usage

Arguments

Details

Value

See Also

Examples

Solve assignment problem and return dual variables

Description

Usage

Arguments

Details

Value

See Also

Examples

Generic Augment Function

Description

Usage

Arguments

Value

Augment Matching Results with Original Data (broom-style)

Description

Usage

Arguments

Details

Value

Examples

Automatically encode categorical variables

Description

Usage

Arguments

Value