| Title: | Optimal Pairing and Matching via Linear Assignment |
| Version: | 1.0.6 |
| Description: | Solves optimal pairing and matching problems using linear assignment algorithms. Provides implementations of the Hungarian method (Kuhn 1955) <doi:10.1002/nav.3800020109>, Jonker-Volgenant shortest path algorithm (Jonker and Volgenant 1987) <doi:10.1007/BF02278710>, Auction algorithm (Bertsekas 1988) <doi:10.1007/BF02186476>, cost-scaling (Goldberg and Kennedy 1995) <doi:10.1007/BF01585996>, scaling algorithms (Gabow and Tarjan 1989) <doi:10.1137/0218069>, push-relabel (Goldberg and Tarjan 1988) <doi:10.1145/48014.61051>, and Sinkhorn entropy-regularized transport (Cuturi 2013) <doi:10.48550/arxiv.1306.0895>. Designed for matching plots, sites, samples, or any pairwise optimization problem. Supports rectangular matrices, forbidden assignments, data frame inputs, batch solving, k-best solutions, and pixel-level image morphing for visualization. Includes automatic preprocessing with variable health checks, multiple scaling methods (standardized, range, robust), greedy matching algorithms, and comprehensive balance diagnostics for assessing match quality using standardized differences and distribution comparisons. |
| License: | MIT + file LICENSE |
| Language: | en-US |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | Rcpp (≥ 1.0.0), tibble (≥ 3.0.0), dplyr (≥ 1.0.0), rlang (≥ 0.4.0), purrr (≥ 0.3.0), magrittr (≥ 2.0.0), methods |
| Suggests: | testthat (≥ 3.0.0), xml2, e1071, R.utils, microbenchmark, withr, knitr, rmarkdown, bench, parallel, future (≥ 1.20.0), future.apply (≥ 1.8.0), ggplot2, ggraph, tidygraph, magick, OpenImageR, farver, av, reticulate, png, combinat |
| LinkingTo: | Rcpp, RcppEigen, testthat |
| SystemRequirements: | C++17 |
| LazyData: | true |
| VignetteBuilder: | knitr |
| URL: | https://gillescolling.com/couplr/, https://github.com/gcol33/couplr |
| BugReports: | https://github.com/gcol33/couplr/issues |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | true |
| NeedsCompilation: | yes |
| Packaged: | 2026-01-14 21:35:36 UTC; Gilles Colling |
| Author: | Gilles Colling [aut, cre, cph] |
| Maintainer: | Gilles Colling <gilles.colling051@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-20 10:30:13 UTC |
couplr: Optimal Pairing and Matching via Linear Assignment
Description
Solves optimal pairing and matching problems using linear assignment algorithms. Provides implementations of the Hungarian method (Kuhn 1955) doi:10.1002/nav.3800020109, Jonker-Volgenant shortest path algorithm (Jonker and Volgenant 1987) doi:10.1007/BF02278710, Auction algorithm (Bertsekas 1988) doi:10.1007/BF02186476, cost-scaling (Goldberg and Kennedy 1995) doi:10.1007/BF01585996, scaling algorithms (Gabow and Tarjan 1989) doi:10.1137/0218069, push-relabel (Goldberg and Tarjan 1988) doi:10.1145/48014.61051, and Sinkhorn entropy-regularized transport (Cuturi 2013) doi:10.48550/arxiv.1306.0895. Designed for matching plots, sites, samples, or any pairwise optimization problem. Supports rectangular matrices, forbidden assignments, data frame inputs, batch solving, k-best solutions, and pixel-level image morphing for visualization. Includes automatic preprocessing with variable health checks, multiple scaling methods (standardized, range, robust), greedy matching algorithms, and comprehensive balance diagnostics for assessing match quality using standardized differences and distribution comparisons.
Solves optimal pairing and matching problems using linear assignment algorithms. Designed for matching plots, sites, samples, or any pairwise optimization problem. Provides modern, tidy implementations of 'Hungarian', 'Jonker-Volgenant', 'Auction', and other LAP solvers.
Main functions
lap_solve: Solve single assignment problemslap_solve_batch: Solve multiple problems efficientlylap_solve_kbest: Find k-best optimal solutions
Author(s)
Maintainer: Gilles Colling gilles.colling051@gmail.com [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/gcol33/couplr/issues
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs).
Large value for forbidden pairs
Description
A numeric constant used to mark forbidden pairs in cost matrices.
Usage
BIG_COST
Format
Numeric value (half of .Machine$double.xmax).
Apply all constraints to cost matrix
Description
Main entry point for applying constraints.
Usage
apply_all_constraints(
cost_matrix,
left,
right,
vars,
max_distance = Inf,
calipers = NULL,
forbidden = NULL
)
Value
Modified cost matrix with all constraints applied.
Apply caliper constraints
Description
Calipers impose per-variable maximum absolute differences.
Usage
apply_calipers(cost_matrix, left, right, calipers, vars)
Value
Modified cost matrix with forbidden pairs marked.
Apply maximum distance constraint
Description
Apply maximum distance constraint
Usage
apply_max_distance(cost_matrix, max_distance = Inf)
Value
Modified cost matrix with forbidden pairs marked.
Apply scaling to matching variables
Description
Apply scaling to matching variables
Usage
apply_scaling(left_mat, right_mat, method = "standardize")
Value
List with scaled left/right matrices and scaling parameters.
Apply weights to matching variables
Description
Apply weights to matching variables
Usage
apply_weights(mat, weights)
Value
Numeric matrix with columns weighted.
Convert assignment result to a binary matrix
Description
Turns a tidy assignment result back into a 0/1 assignment matrix.
Usage
as_assignment_matrix(x, n_sources = NULL, n_targets = NULL)
Arguments
x |
An assignment result object of class |
n_sources |
Number of source nodes, optional |
n_targets |
Number of target nodes, optional |
Value
Integer matrix with 0 and 1 entries
Assign blocks using clustering
Description
Assign blocks using clustering
Usage
assign_blocks_cluster(left, right, block_vars, method, n_blocks, ...)
Value
List with modified left/right data frames (with block_id) and n_blocks_initial.
Assign blocks based on grouping variable(s)
Description
Assign blocks based on grouping variable(s)
Usage
assign_blocks_group(left, right, block_by)
Value
List with modified left/right data frames (with block_id) and n_blocks_initial.
Linear assignment solver
Description
Solve the linear assignment problem (minimum- or maximum-cost matching)
using several algorithms. Forbidden edges can be marked as NA or Inf.
Usage
assignment(
cost,
maximize = FALSE,
method = c("auto", "jv", "hungarian", "auction", "auction_gs", "auction_scaled", "sap",
"ssp", "csflow", "hk01", "bruteforce", "ssap_bucket", "cycle_cancel", "gabow_tarjan",
"lapmod", "csa", "ramshaw_tarjan", "push_relabel", "orlin", "network_simplex"),
auction_eps = NULL,
eps = NULL
)
Arguments
cost |
Numeric matrix; rows = tasks, columns = agents. |
maximize |
Logical; if |
method |
Character string indicating the algorithm to use. Options: General-purpose solvers:
Auction-based solvers:
Specialized solvers:
Advanced solvers:
|
auction_eps |
Optional numeric epsilon for the 'Auction'/'Auction-GS' methods.
If |
eps |
Deprecated. Use |
Details
method = "auto" selects an algorithm based on problem size/shape and data
characteristics:
Very small (n <= 8):
"bruteforce"— exact enumerationBinary/constant costs:
"hk01"— specialized for 0/1 costsLarge sparse (n>100, >50\
Sparse or very rectangular:
"sap"— handles sparsity wellSmall-medium (8 < n <= 50):
"hungarian"— provides exact dual solutionsMedium (50 < n <= 75):
"jv"— fast general-purpose solverLarge (n>75):
"auction_scaled"— fastest for large dense problems
Benchmarks show 'Auction-scaled' and 'JV' are 100-1500x faster than 'Hungarian' at n=500.
Value
An object of class lap_solve_result, a list with elements:
-
match— integer vector of lengthmin(nrow(cost), ncol(cost))giving the assigned column for each row (0 if unassigned). -
total_cost— numeric scalar, the objective value. -
status— character scalar, e.g."optimal". -
method_used— character scalar, the algorithm actually used.
See Also
-
lap_solve()— Tidy interface returning tibbles -
lap_solve_kbest()— Find k-best assignments ('Murty' algorithm) -
assignment_duals()— Extract dual variables for sensitivity analysis -
bottleneck_assignment()— Minimize maximum edge cost (minimax) -
sinkhorn()— Entropy-regularized optimal transport
Examples
cost <- matrix(c(4,2,5, 3,3,6, 7,5,4), nrow = 3, byrow = TRUE)
res <- assignment(cost)
res$match; res$total_cost
Solve assignment problem and return dual variables
Description
Solves the linear assignment problem and returns dual potentials (u, v) in addition to the optimal matching. The dual variables provide an optimality certificate and enable sensitivity analysis.
Usage
assignment_duals(cost, maximize = FALSE)
Arguments
cost |
Numeric matrix; rows = tasks, columns = agents. |
maximize |
Logical; if |
Details
The dual variables satisfy the complementary slackness conditions:
For minimization:
u[i] + v[j] <= cost[i,j]for all (i,j)For any assigned pair (i,j):
u[i] + v[j] = cost[i,j]
This implies that sum(u) + sum(v) = total_cost (strong duality).
Applications of dual variables:
-
Optimality verification: Check that duals satisfy constraints
-
Sensitivity analysis: Reduced cost
c[i,j] - u[i] - v[j]shows how much an edge cost must decrease before it enters the solution -
Pricing in column generation: Use duals to price new columns
-
Warm starting: Reuse duals when costs change slightly
Value
A list with class "assignment_duals_result" containing:
-
match- integer vector of column assignments (1-based) -
total_cost- optimal objective value -
u- numeric vector of row dual variables (length n) -
v- numeric vector of column dual variables (length m) -
status- character, e.g. "optimal"
See Also
assignment() for standard assignment without duals
Examples
cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3, byrow = TRUE)
result <- assignment_duals(cost)
# Check optimality: u + v should equal cost for assigned pairs
for (i in 1:3) {
j <- result$match[i]
cat(sprintf("Row %d -> Col %d: u + v = %.2f, cost = %.2f\n",
i, j, result$u[i] + result$v[j], cost[i, j]))
}
# Verify strong duality
cat("sum(u) + sum(v) =", sum(result$u) + sum(result$v), "\n")
cat("total_cost =", result$total_cost, "\n")
# Reduced costs (how much must cost decrease to enter solution)
reduced <- outer(result$u, result$v, "+")
reduced_cost <- cost - reduced
print(round(reduced_cost, 2))
Generic Augment Function
Description
S3 generic for augmenting model results with original data.
Usage
augment(x, ...)
Arguments
x |
An object to augment |
... |
Additional arguments passed to methods |
Value
Augmented data (depends on method)
Augment Matching Results with Original Data (broom-style)
Description
S3 method for augmenting matching results following the broom package
conventions. This is a thin wrapper around join_matched() with
sensible defaults for quick exploration.
Usage
## S3 method for class 'matching_result'
augment(x, left, right, ...)
Arguments
x |
A matching_result object |
left |
The original left dataset |
right |
The original right dataset |
... |
Additional arguments passed to |
Details
This method follows the augment() convention from the broom package,
making it easy to integrate couplr into tidymodels workflows. It's
equivalent to calling join_matched() with default parameters.
If the broom package is not loaded, you can use couplr::augment()
to access this function.
Value
A tibble with matched pairs and original data (see join_matched())
Examples
left <- data.frame(
id = 1:5,
treatment = 1,
age = c(25, 30, 35, 40, 45)
)
right <- data.frame(
id = 6:10,
treatment = 0,
age = c(24, 29, 36, 41, 44)
)
result <- match_couples(left, right, vars = "age")
couplr::augment(result, left, right)
Automatically encode categorical variables
Description
Converts categorical variables to numeric representations suitable for matching. Currently supports binary variables (0/1) and ordered factors.
Usage
auto_encode_categorical(left, right, var)
Arguments
left |
Data frame of left units |
right |
Data frame of right units |
var |
Variable name to encode |
Value
List with encoded left and right columns, plus encoding metadata
Balance Diagnostics for Matched Pairs
Description
Computes comprehensive balance statistics comparing the distribution of matching variables between left and right units in the matched sample.
Usage
balance_diagnostics(
result,
left,
right,
vars = NULL,
left_id = "id",
right_id = "id"
)
Arguments
result |
A matching result object from |
left |
Data frame of left units |
right |
Data frame of right units |
vars |
Character vector of variable names to check balance for. Defaults to the variables used in matching (if available in result). |
left_id |
Character, name of ID column in left data (default: "id") |
right_id |
Character, name of ID column in right data (default: "id") |
Details
This function computes several balance metrics:
Standardized Difference: The difference in means divided by the pooled standard deviation. Values less than 0.1 indicate excellent balance, 0.1-0.25 good balance.
Variance Ratio: The ratio of standard deviations (left/right). Values close to 1 are ideal.
KS Statistic: Kolmogorov-Smirnov test statistic comparing distributions. Lower values indicate more similar distributions.
Overall Metrics include mean absolute standardized difference across all variables, proportion of variables with large imbalance (|std diff| > 0.25), and maximum standardized difference.
Value
An S3 object of class balance_diagnostics containing:
- var_stats
Tibble with per-variable balance statistics
- overall
List with overall balance metrics
- pairs
Tibble of matched pairs with variables
- n_matched
Number of matched pairs
- n_unmatched_left
Number of unmatched left units
- n_unmatched_right
Number of unmatched right units
- method
Matching method used
- has_blocks
Whether blocking was used
- block_stats
Per-block statistics (if blocking used)
Examples
# Create sample data
set.seed(123)
left <- data.frame(
id = 1:10,
age = rnorm(10, 45, 10),
income = rnorm(10, 50000, 15000)
)
right <- data.frame(
id = 11:30,
age = rnorm(20, 47, 10),
income = rnorm(20, 52000, 15000)
)
# Match
result <- match_couples(left, right, vars = c("age", "income"))
# Get balance diagnostics
balance <- balance_diagnostics(result, left, right, vars = c("age", "income"))
print(balance)
# Get balance table
balance_table(balance)
Create Balance Table
Description
Formats balance diagnostics into a clean table for display or export.
Usage
balance_table(balance, digits = 3)
Arguments
balance |
A balance_diagnostics object from |
digits |
Number of decimal places for rounding (default: 3) |
Value
A tibble with formatted balance statistics
Solve the Bottleneck Assignment Problem
Description
Finds an assignment that minimizes (or maximizes) the maximum edge cost in a perfect matching. Unlike standard LAP which minimizes the sum of costs, BAP minimizes the maximum (bottleneck) cost.
Usage
bottleneck_assignment(cost, maximize = FALSE)
Arguments
cost |
Numeric matrix; rows = tasks, columns = agents. |
maximize |
Logical; if |
Details
The Bottleneck Assignment Problem (BAP) is a variant of the Linear Assignment Problem where instead of minimizing the sum of assignment costs, we minimize the maximum cost among all assignments (minimax objective).
Algorithm: Uses binary search on the sorted unique costs combined with 'Hopcroft-Karp' bipartite matching to find the minimum threshold that allows a perfect matching.
Complexity: O(E * sqrt(V) * log(unique costs)) where E = edges, V = vertices.
Applications:
Task scheduling with deadline constraints (minimize latest completion)
Resource allocation (minimize maximum load/distance)
Network routing (minimize maximum link utilization)
Fair division problems (minimize maximum disparity)
Value
A list with class "bottleneck_result" containing:
-
match- integer vector of lengthnrow(cost)giving the assigned column for each row (1-based indexing) -
bottleneck- numeric scalar, the bottleneck (max/min edge) value -
status- character scalar, e.g."optimal"
See Also
assignment() for standard LAP (sum objective), lap_solve() for
tidy LAP interface
Examples
# Simple example: minimize max cost
cost <- matrix(c(1, 5, 3,
2, 4, 6,
7, 1, 2), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(cost)
result$bottleneck # Maximum edge cost in optimal assignment
# Maximize minimum (fair allocation)
profits <- matrix(c(10, 5, 8,
6, 12, 4,
3, 7, 11), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(profits, maximize = TRUE)
result$bottleneck # Minimum profit among all assignments
# With forbidden assignments
cost <- matrix(c(1, NA, 3,
2, 4, Inf,
5, 1, 2), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(cost)
Build cost matrix for matching
Description
This is the main entry point for distance computation.
Usage
build_cost_matrix(
left,
right,
vars,
distance = "euclidean",
weights = NULL,
scale = FALSE
)
Value
Numeric matrix of distances with optional scaling/weights applied.
Calculate Variable-Level Balance Statistics
Description
Calculate Variable-Level Balance Statistics
Usage
calculate_var_balance(left_vals, right_vals, var_name)
Arguments
left_vals |
Numeric vector of values from left group |
right_vals |
Numeric vector of values from right group |
var_name |
Character, name of the variable |
Value
List with balance statistics for this variable
Check if parallel processing is available
Description
Check if parallel processing is available
Usage
can_parallelize()
Value
Logical indicating if future package is available
Check cost distribution for problems
Description
Examines the distance matrix for common issues and provides helpful warnings.
Usage
check_cost_distribution(cost_matrix, threshold_zero = 1e-10, warn = TRUE)
Arguments
cost_matrix |
Numeric matrix of distances |
threshold_zero |
Threshold for considering distance "zero" (default: 1e-10) |
warn |
If TRUE, issue warnings for problems found |
Value
List with diagnostic information
Check if full matching was achieved
Description
Check if full matching was achieved
Usage
check_full_matching(result)
Value
No return value; throws error if unmatched units exist.
Check variable health for matching
Description
Analyzes variables for common problems that can affect matching quality: constant columns, high missingness, extreme skewness, and outliers.
Usage
check_variable_health(
left,
right,
vars,
high_missingness_threshold = 0.5,
low_variance_threshold = 1e-06
)
Arguments
left |
Data frame of left units |
right |
Data frame of right units |
vars |
Character vector of variable names to check |
high_missingness_threshold |
Threshold for high missingness warning (default: 0.5) |
low_variance_threshold |
Threshold for nearly-constant variables (default: 1e-6) |
Value
A list with class "variable_health" containing:
-
summary: Tibble with per-variable diagnostics -
issues: List of detected issues with severity levels -
exclude_vars: Variables that should be excluded -
warnings: Human-readable warnings
Compute pairwise distance matrix
Description
Compute pairwise distance matrix
Usage
compute_distance_matrix(left_mat, right_mat, distance = "euclidean")
Value
Numeric matrix of pairwise distances (n_left x n_right).
Compute and Cache Distance Matrix for Reuse
Description
Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.
Usage
compute_distances(
left,
right,
vars,
distance = "euclidean",
weights = NULL,
scale = FALSE,
auto_scale = FALSE,
left_id = "id",
right_id = "id",
block_id = NULL
)
Arguments
left |
Left dataset (data frame) |
right |
Right dataset (data frame) |
vars |
Character vector of variable names to use for distance computation |
distance |
Distance metric (default: "euclidean") |
weights |
Optional numeric vector of variable weights |
scale |
Scaling method: FALSE, "standardize", "range", or "robust" |
auto_scale |
Apply automatic preprocessing (default: FALSE) |
left_id |
Name of ID column in left (default: "id") |
right_id |
Name of ID column in right (default: "id") |
block_id |
Optional block ID column name for blocked matching |
Details
This function computes distances once and stores them in a reusable object.
The resulting distance_object can be passed to match_couples() or
greedy_couples() instead of providing datasets and variables.
Benefits:
-
Performance: Avoid recomputing distances when trying different constraints
-
Exploration: Quickly test max_distance, calipers, or methods
-
Consistency: Ensures same distances used across comparisons
-
Memory efficient: Can use sparse matrices when many pairs are forbidden
The distance_object stores the original datasets, allowing downstream
functions like join_matched() to work seamlessly.
Value
An S3 object of class "distance_object" containing:
-
cost_matrix: Numeric matrix of distances -
left_ids: Character vector of left IDs -
right_ids: Character vector of right IDs -
block_id: Block ID column name (if specified) -
metadata: List with computation details (vars, distance, scale, etc.) -
original_left: Original left dataset (for later joining) -
original_right: Original right dataset (for later joining)
Examples
# Compute distances once
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000)
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000)
dist_obj <- compute_distances(
left, right,
vars = c("age", "income"),
scale = "standardize"
)
# Reuse for different matching strategies
result1 <- match_couples(dist_obj, max_distance = 0.5)
result2 <- match_couples(dist_obj, max_distance = 1.0)
result3 <- greedy_couples(dist_obj, strategy = "sorted")
# All use the same precomputed distances
Count valid pairs in cost matrix
Description
Count valid pairs in cost matrix
Usage
count_valid_pairs(cost_matrix)
Value
Integer count of valid (non-forbidden) pairs.
Get a themed emoji
Description
Get a themed emoji
Usage
couplr_emoji(
type = c("error", "warning", "info", "success", "heart", "broken", "sparkles",
"search", "chart", "warning_sign", "stop", "check")
)
Value
Character string with the emoji (or empty string if emoji disabled).
Info message with emoji
Description
Info message with emoji
Usage
couplr_inform(...)
Value
No return value, called for side effects (issues a message).
Couplr message helpers with emoji and humor
Description
Light, fun error/warning messages inspired by testthat, themed around coupling and matching. Makes errors less intimidating and more memorable.
Stop with a fun, themed error message
Description
Stop with a fun, themed error message
Usage
couplr_stop(..., call. = FALSE)
Value
No return value, throws an error.
Success message with emoji
Description
Success message with emoji
Usage
couplr_success(...)
Value
No return value, called for side effects (issues a message).
Warn with a fun, themed warning message
Description
Warn with a fun, themed warning message
Usage
couplr_warn(..., call. = FALSE)
Value
No return value, called for side effects (issues a warning).
Detect and validate blocking
Description
Detect and validate blocking
Usage
detect_blocking(left, right, block_id, ignore_blocks)
Value
List with use_blocking (logical) and block_col (character or NULL).
Diagnose distance matrix and suggest fixes
Description
Comprehensive diagnostics for a distance matrix with actionable suggestions.
Usage
diagnose_distance_matrix(
cost_matrix,
left = NULL,
right = NULL,
vars = NULL,
warn = TRUE
)
Arguments
cost_matrix |
Numeric matrix of distances |
left |
Left dataset (for variable checking) |
right |
Right dataset (for variable checking) |
vars |
Variables used for matching |
warn |
If TRUE, issue warnings |
Value
List with diagnostic results and suggestions
Invalid parameter error
Description
Invalid parameter error
Usage
err_invalid_param(param, value, expected)
Value
No return value, throws an error.
Missing data error
Description
Missing data error
Usage
err_missing_data(dataset = "left")
Value
No return value, throws an error.
Missing variables error
Description
Missing variables error
Usage
err_missing_vars(vars, dataset = "left")
Value
No return value, throws an error.
All pairs forbidden error
Description
All pairs forbidden error
Usage
err_no_valid_pairs(reason = NULL)
Value
No return value, throws an error.
Example cost matrices for assignment problems
Description
Small example datasets for demonstrating couplr functionality across different assignment problem types: square, rectangular, sparse, and binary.
Usage
example_costs
Format
A list containing four example cost matrices:
- simple_3x3
A 3x3 cost matrix with costs ranging from 2-7. Optimal assignment: row 1 -> col 2 (cost 2), row 2 -> col 1 (cost 3), row 3 -> col 3 (cost 4). Total optimal cost: 9.
- rectangular_3x5
A 3x5 rectangular cost matrix demonstrating assignment when rows < columns. Each of 3 rows is assigned to one of 5 columns; 2 columns remain unassigned. Costs range 1-6.
- sparse_with_na
A 3x3 matrix with NA values indicating forbidden assignments. Use this to test algorithms' handling of constraints. Position (1,3), (2,2), and (3,1) are forbidden.
- binary_costs
A 3x3 matrix with binary (0/1) costs, suitable for testing the HK01 algorithm. Diagonal entries are 0 (preferred), off-diagonal entries are 1 (penalty).
Details
These matrices are designed to test different aspects of LAP solvers:
simple_3x3: Basic functionality test. Any correct solver should find total cost = 9.
rectangular_3x5: Tests handling of non-square problems. The optimal solution assigns all 3 rows with minimum total cost.
sparse_with_na: Tests constraint handling. Algorithms must avoid NA positions while finding an optimal assignment among valid entries.
binary_costs: Tests specialized binary cost algorithms. The optimal assignment uses all diagonal entries (total cost = 0).
See Also
Examples
# Simple 3x3 assignment
result <- lap_solve(example_costs$simple_3x3)
print(result)
# Optimal: sources 1,2,3 -> targets 2,1,3 with cost 9
# Rectangular problem (3 sources, 5 targets)
result <- lap_solve(example_costs$rectangular_3x5)
print(result)
# All 3 sources assigned; 2 targets unassigned
# Sparse problem with forbidden assignments
result <- lap_solve(example_costs$sparse_with_na)
print(result)
# Avoids NA positions
# Binary costs - test HK01 algorithm
result <- lap_solve(example_costs$binary_costs, method = "hk01")
print(result)
# Finds diagonal assignment (cost = 0)
Example assignment problem data frame
Description
A tidy data frame representation of assignment problems, suitable for use with grouped workflows and batch solving. Contains two independent 3x3 assignment problems in long format.
Usage
example_df
Format
A tibble with 18 rows and 4 columns:
- sim
Simulation/problem identifier. Integer with values 1 or 2, distinguishing two independent assignment problems. Use with
group_by(sim)for grouped solving.- source
Source node index. Integer 1-3 representing the row (source) in each 3x3 cost matrix.
- target
Target node index. Integer 1-3 representing the column (target) in each 3x3 cost matrix.
- cost
Cost of assigning source to target. Numeric values ranging from 1-7. Each source-target pair has exactly one cost entry.
Details
This dataset demonstrates couplr's data frame interface for LAP solving. The long format (one row per source-target pair) is converted internally to a cost matrix for solving.
Simulation 1: Costs from example_costs$simple_3x3
Optimal assignment: (1->2, 2->1, 3->3)
Total cost: 9
Simulation 2: Different cost structure
Optimal assignment: (1->1, 2->3, 3->3) or equivalent
Total cost: 4
See Also
lap_solve, lap_solve_batch,
example_costs
Examples
library(dplyr)
# Solve both problems with grouped workflow
example_df |>
group_by(sim) |>
lap_solve(source, target, cost)
# Batch solving for efficiency
example_df |>
group_by(sim) |>
lap_solve_batch(source, target, cost)
# Inspect the data structure
example_df |>
group_by(sim) |>
summarise(
n_pairs = n(),
min_cost = min(cost),
max_cost = max(cost)
)
Extract and standardize IDs from data frames
Description
Extract and standardize IDs from data frames
Usage
extract_ids(df, prefix = "id")
Value
Character vector of IDs.
Extract matching variables from data frame
Description
Extract matching variables from data frame
Usage
extract_matching_vars(df, vars)
Value
Numeric matrix of matching variables.
Filter blocks based on size and balance criteria
Description
Filter blocks based on size and balance criteria
Usage
filter_blocks(
left,
right,
min_left,
min_right,
drop_imbalanced,
imbalance_threshold
)
Value
List with filtered left/right data frames and dropped block info.
Standardize block ID column name
Description
Standardize block ID column name
Usage
get_block_id_column(df)
Value
Character string with column name, or NULL if not found.
Extract method used from assignment result
Description
Extract method used from assignment result
Usage
get_method_used(x)
Arguments
x |
An assignment result object |
Value
Character string indicating method used
Extract total cost from assignment result
Description
Extract total cost from assignment result
Usage
get_total_cost(x)
Arguments
x |
An assignment result object |
Value
Numeric total cost
Greedy match blocks in parallel
Description
Greedy match blocks in parallel
Usage
greedy_blocks_parallel(
blocks,
left,
right,
left_ids,
right_ids,
block_col,
vars,
distance,
weights,
scale,
max_distance,
calipers,
strategy,
parallel = FALSE
)
Arguments
blocks |
Vector of block IDs |
left |
Left dataset with block_col |
right |
Right dataset with block_col |
left_ids |
IDs from left |
right_ids |
IDs from right |
block_col |
Name of blocking column |
vars |
Variables for matching |
distance |
Distance metric |
weights |
Variable weights |
scale |
Scaling method |
max_distance |
Maximum distance |
calipers |
Caliper constraints |
strategy |
Greedy strategy |
parallel |
Whether to use parallel processing |
Value
List with combined results from all blocks
Fast approximate matching using greedy algorithm
Description
Performs fast one-to-one matching using greedy strategies. Does not guarantee
optimal total distance but is much faster than match_couples() for large
datasets. Supports blocking, distance constraints, and various distance metrics.
Usage
greedy_couples(
left,
right = NULL,
vars = NULL,
distance = "euclidean",
weights = NULL,
scale = FALSE,
auto_scale = FALSE,
max_distance = Inf,
calipers = NULL,
block_id = NULL,
ignore_blocks = FALSE,
require_full_matching = FALSE,
strategy = c("row_best", "sorted", "pq"),
return_unmatched = TRUE,
return_diagnostics = FALSE,
parallel = FALSE,
check_costs = TRUE
)
Arguments
left |
Data frame of "left" units (e.g., treated, cases) |
right |
Data frame of "right" units (e.g., control, controls) |
vars |
Variable names to use for distance computation |
distance |
Distance metric: "euclidean", "manhattan", "mahalanobis", or a custom function |
weights |
Optional named vector of variable weights |
scale |
Scaling method: FALSE (none), "standardize", "range", or "robust" |
auto_scale |
If TRUE, automatically check variable health and select scaling method (default: FALSE) |
max_distance |
Maximum allowed distance (pairs exceeding this are forbidden) |
calipers |
Named list of per-variable maximum absolute differences |
block_id |
Column name containing block IDs (for stratified matching) |
ignore_blocks |
If TRUE, ignore block_id even if present |
require_full_matching |
If TRUE, error if any units remain unmatched |
strategy |
Greedy strategy:
|
return_unmatched |
Include unmatched units in output |
return_diagnostics |
Include detailed diagnostics in output |
parallel |
Enable parallel processing for blocked matching. Requires 'future' and 'future.apply' packages. Can be:
|
check_costs |
If TRUE, check distance distribution for potential problems and provide helpful warnings before matching (default: TRUE) |
Details
Greedy strategies do not guarantee optimal total distance but are much faster:
"row_best": O(n*m) time, simple and often produces good results
"sorted": O(nmlog(n*m)) time, better quality but slower
"pq": O(nmlog(n*m)) time, memory-efficient for large problems
Use greedy_couples when:
Dataset is very large (> 10,000 x 10,000)
Approximate solution is acceptable
Speed is more important than optimality
Value
A list with class "matching_result" (same structure as match_couples)
Examples
# Basic greedy matching
left <- data.frame(id = 1:100, x = rnorm(100))
right <- data.frame(id = 101:200, x = rnorm(100))
result <- greedy_couples(left, right, vars = "x")
# Compare to optimal
result_opt <- match_couples(left, right, vars = "x")
result_greedy <- greedy_couples(left, right, vars = "x")
result_greedy$info$total_distance / result_opt$info$total_distance # Quality ratio
Greedy matching with blocking
Description
Greedy matching with blocking
Usage
greedy_couples_blocked(
left,
right,
left_ids,
right_ids,
block_col,
vars,
distance,
weights,
scale,
max_distance,
calipers,
strategy,
parallel = FALSE
)
Value
List with pairs tibble and matching info.
Greedy Matching from Precomputed Distance Object
Description
Internal function to handle greedy matching when a distance_object is provided
Usage
greedy_couples_from_distance(
dist_obj,
max_distance = Inf,
calipers = NULL,
ignore_blocks = FALSE,
require_full_matching = FALSE,
strategy = "row_best",
return_unmatched = TRUE,
return_diagnostics = FALSE
)
Value
A matching_result object with pairs, info, and optional diagnostics.
Greedy matching without blocking
Description
Greedy matching without blocking
Usage
greedy_couples_single(
left,
right,
left_ids,
right_ids,
vars,
distance,
weights,
scale,
max_distance,
calipers,
strategy
)
Value
List with pairs tibble and matching info.
Re-export of dplyr::group_by
Description
Re-export of dplyr::group_by
Value
See group_by.
Check if data frame has blocking information
Description
Check if data frame has blocking information
Usage
has_blocks(df)
Value
Logical indicating whether data frame has block ID column.
Check if any valid pairs exist
Description
Check if any valid pairs exist
Usage
has_valid_pairs(cost_matrix)
Value
Logical indicating whether any valid pairs exist.
Hospital staff scheduling example dataset
Description
A comprehensive example dataset for demonstrating couplr functionality across vignettes. Contains hospital staff scheduling data with nurses, shifts, costs, and preference scores suitable for assignment problems, as well as nurse characteristics for matching workflows.
Usage
hospital_staff
Format
A list containing eight related datasets:
- basic_costs
A 10x10 numeric cost matrix for assigning 10 nurses to 10 shifts. Values range from approximately 1-15, where lower values indicate better fit (less overtime, matches skills, respects preferences). Use with
lap_solve()for basic assignment.- preferences
A 10x10 numeric preference matrix on a 0-10 scale, where higher values indicate stronger nurse preference for a shift. Use with
lap_solve(..., maximize = TRUE)to optimize preferences rather than minimize costs.- schedule_df
A tibble with 100 rows (10 nurses x 10 shifts) in long format for data frame workflows:
- nurse_id
Integer 1-10. Unique identifier for each nurse.
- shift_id
Integer 1-10. Unique identifier for each shift.
- cost
Numeric. Assignment cost (same values as basic_costs).
- preference
Numeric 0-10. Nurse preference score.
- skill_match
Integer 0/1. Binary indicator: 1 if nurse skills match shift requirements, 0 otherwise.
- nurses
A tibble with 10 rows describing nurse characteristics:
- nurse_id
Integer 1-10. Links to schedule_df and basic_costs rows.
- experience_years
Numeric 1-20. Years of nursing experience.
- department
Character. Primary department: "ICU", "ER", "General", or "Pediatrics".
- shift_preference
Character. Preferred shift type: "day", "evening", or "night".
- certification_level
Integer 1-3. Certification level where 3 is highest (e.g., 1=RN, 2=BSN, 3=MSN).
- shifts
A tibble with 10 rows describing shift requirements:
- shift_id
Integer 1-10. Links to schedule_df and basic_costs cols.
- department
Character. Department needing coverage.
- shift_type
Character. Shift type: "day", "evening", or "night".
- min_experience
Numeric. Minimum years of experience required.
- min_certification
Integer 1-3. Minimum certification level.
- weekly_df
A tibble for batch solving with 500 rows (5 days x 10 nurses x 10 shifts):
- day
Character. Day of week: "Mon", "Tue", "Wed", "Thu", "Fri".
- nurse_id
Integer 1-10. Nurse identifier.
- shift_id
Integer 1-10. Shift identifier.
- cost
Numeric. Daily assignment cost (varies by day).
- preference
Numeric 0-10. Daily preference score.
Use with
group_by(day)for solving each day's schedule.- nurses_extended
A tibble with 200 nurses for matching examples, representing a treatment group (e.g., full-time nurses):
- nurse_id
Integer 1-200. Unique identifier.
- age
Numeric 22-65. Nurse age in years.
- experience_years
Numeric 0-40. Years of nursing experience.
- hourly_rate
Numeric 25-75. Hourly wage in dollars.
- department
Character. Primary department assignment.
- certification_level
Integer 1-3. Certification level.
- is_fulltime
Logical. TRUE for full-time status.
- controls_extended
A tibble with 300 potential control nurses (e.g., part-time or registry nurses) for matching. Same structure as nurses_extended. Designed to have systematic differences from nurses_extended (older, less experience on average) to demonstrate matching's ability to create comparable groups.
Details
This dataset is used throughout the couplr documentation to provide a consistent, realistic example that evolves in complexity. It supports three use cases: (1) basic LAP solving with cost matrices, (2) batch solving across multiple days, and (3) matching workflows comparing nurse groups.
The dataset is designed to demonstrate progressively complex scenarios:
Basic LAP (vignette("getting-started")):
-
basic_costs: Simple 10x10 assignment -
preferences: Maximization problem -
schedule_df: Data frame input, grouped workflows -
weekly_df: Batch solving across days
Algorithm comparison (vignette("algorithms")):
Use
basic_coststo compare algorithm behaviorModify with NA values for sparse scenarios
Matching workflows (vignette("matching-workflows")):
-
nurses_extended: Treatment group (full-time nurses) -
controls_extended: Control pool (part-time/registry nurses) Match on age, experience, department for causal analysis
See Also
lap_solve for basic assignment solving,
lap_solve_batch for batch solving,
match_couples for matching workflows,
vignette("getting-started") for introductory tutorial
Examples
# Basic assignment: assign nurses to shifts minimizing cost
lap_solve(hospital_staff$basic_costs)
# Maximize preferences instead
lap_solve(hospital_staff$preferences, maximize = TRUE)
# Data frame workflow
library(dplyr)
hospital_staff$schedule_df |>
lap_solve(nurse_id, shift_id, cost)
# Batch solve weekly schedule
hospital_staff$weekly_df |>
group_by(day) |>
lap_solve(nurse_id, shift_id, cost)
# Matching workflow: match full-time to part-time nurses
match_couples(
left = hospital_staff$nurses_extended,
right = hospital_staff$controls_extended,
vars = c("age", "experience_years", "certification_level"),
auto_scale = TRUE
)
Low match rate info
Description
Low match rate info
Usage
info_low_match_rate(n_matched, n_left, pct)
Value
No return value, called for side effects (issues a message or warning).
Check if Object is a Distance Object
Description
Check if Object is a Distance Object
Usage
is_distance_object(x)
Arguments
x |
Object to check |
Value
Logical: TRUE if x is a distance_object
Examples
left <- data.frame(id = 1:3, x = c(1, 2, 3))
right <- data.frame(id = 4:6, x = c(1.1, 2.1, 3.1))
dist_obj <- compute_distances(left, right, vars = "x")
is_distance_object(dist_obj) # TRUE
is_distance_object(list()) # FALSE
Check if object is a batch assignment result
Description
Check if object is a batch assignment result
Usage
is_lap_solve_batch_result(x)
Arguments
x |
Object to test |
Value
Logical indicating if x is a batch assignment result
Check if object is a k-best assignment result
Description
Check if object is a k-best assignment result
Usage
is_lap_solve_kbest_result(x)
Arguments
x |
Object to test |
Value
Logical indicating if x is a k-best assignment result
Check if object is an assignment result
Description
Check if object is an assignment result
Usage
is_lap_solve_result(x)
Arguments
x |
Object to test |
Value
Logical indicating if x is an assignment result
Join Matched Pairs with Original Data
Description
Creates an analysis-ready dataset by joining matched pairs with variables from the original left and right datasets. This eliminates the need for manual joins and provides a convenient format for downstream analysis.
Usage
join_matched(
result,
left,
right,
left_vars = NULL,
right_vars = NULL,
left_id = "id",
right_id = "id",
suffix = c("_left", "_right"),
include_distance = TRUE,
include_pair_id = TRUE,
include_block_id = TRUE
)
Arguments
result |
A matching_result object from |
left |
The original left dataset |
right |
The original right dataset |
left_vars |
Character vector of variable names to include from left. If NULL (default), includes all variables except the ID column. |
right_vars |
Character vector of variable names to include from right. If NULL (default), includes all variables except the ID column. |
left_id |
Name of the ID column in left dataset (default: "id") |
right_id |
Name of the ID column in right dataset (default: "id") |
suffix |
Character vector of length 2 specifying suffixes for left and right variables (default: c("_left", "_right")) |
include_distance |
Include the matching distance in output (default: TRUE) |
include_pair_id |
Include pair_id column (default: TRUE) |
include_block_id |
Include block_id if blocking was used (default: TRUE) |
Details
This function simplifies the common workflow of joining matched pairs
with original data. Instead of manually merging result$pairs with left
and right datasets, join_matched() handles the joins automatically
and applies consistent naming conventions.
When variables appear in both left and right datasets, suffixes are appended to distinguish them (e.g., "age_left" and "age_right"). This makes it easy to compute differences or use both values in models.
Value
A tibble with one row per matched pair, containing:
-
pair_id: Sequential pair identifier (if include_pair_id = TRUE) -
left_id: ID from left dataset -
right_id: ID from right dataset -
distance: Matching distance (if include_distance = TRUE) -
block_id: Block identifier (if blocking used and include_block_id = TRUE) Variables from left dataset (with left suffix)
Variables from right dataset (with right suffix)
Examples
# Basic usage
left <- data.frame(
id = 1:5,
treatment = 1,
age = c(25, 30, 35, 40, 45),
income = c(45000, 52000, 48000, 61000, 55000)
)
right <- data.frame(
id = 6:10,
treatment = 0,
age = c(24, 29, 36, 41, 44),
income = c(46000, 51500, 47500, 60000, 54000)
)
result <- match_couples(left, right, vars = c("age", "income"))
matched_data <- join_matched(result, left, right)
head(matched_data)
# Specify which variables to include
matched_data <- join_matched(
result, left, right,
left_vars = c("treatment", "age", "income"),
right_vars = c("age", "income"),
suffix = c("_treated", "_control")
)
# Without distance or pair_id
matched_data <- join_matched(
result, left, right,
include_distance = FALSE,
include_pair_id = FALSE
)
Solve linear assignment problems
Description
Provides a tidy interface for solving the linear assignment problem using 'Hungarian' or 'Jonker-Volgenant' algorithms. Supports rectangular matrices, NA/Inf masking, and data frame inputs.
Usage
lap_solve(
x,
source = NULL,
target = NULL,
cost = NULL,
maximize = FALSE,
method = "auto",
forbidden = NA
)
Arguments
x |
Cost matrix, data frame, or tibble. If a data frame/tibble,
must include columns specified by |
source |
Column name for source/row indices (if |
target |
Column name for target/column indices (if |
cost |
Column name for costs (if |
maximize |
Logical; if TRUE, maximizes total cost instead of minimizing (default: FALSE) |
method |
Algorithm to use. One of:
|
forbidden |
Value to mark forbidden assignments (default: NA). Can also use Inf. |
Value
A tibble with columns:
-
source: row/source indices -
target: column/target indices -
cost: cost of each assignment -
total_cost: total cost (attribute)
Examples
# Matrix input
cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3)
lap_solve(cost)
# Data frame input
library(dplyr)
df <- tibble(
source = rep(1:3, each = 3),
target = rep(1:3, times = 3),
cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4)
)
lap_solve(df, source, target, cost)
# With NA masking (forbidden assignments)
cost[1, 3] <- NA
lap_solve(cost)
# Grouped data frames
df <- tibble(
sim = rep(1:2, each = 9),
source = rep(1:3, times = 6),
target = rep(1:3, each = 3, times = 2),
cost = runif(18, 1, 10)
)
df |> group_by(sim) |> lap_solve(source, target, cost)
Solve multiple assignment problems efficiently
Description
Solve many independent assignment problems at once. Supports lists of matrices,
3D arrays, or grouped data frames. Optional parallel execution via n_threads.
Usage
lap_solve_batch(
x,
source = NULL,
target = NULL,
cost = NULL,
maximize = FALSE,
method = "auto",
n_threads = 1,
forbidden = NA
)
Arguments
x |
One of: List of cost matrices, 3D array, or grouped data frame |
source |
Column name for source indices (if |
target |
Column name for target indices (if |
cost |
Column name for costs (if |
maximize |
Logical; if TRUE, maximizes total cost (default: FALSE) |
method |
Algorithm to use (default: "auto"). See |
n_threads |
Number of threads for parallel execution (default: 1). Set to NULL to use all available cores. |
forbidden |
Value to mark forbidden assignments (default: NA) |
Value
A tibble with columns:
-
problem_id: identifier for each problem -
source: source indices for assignments -
target: target indices for assignments -
cost: cost of each assignment -
total_cost: total cost for each problem -
method_used: algorithm used for each problem
Examples
# List of matrices
costs <- list(
matrix(c(1, 2, 3, 4), 2, 2),
matrix(c(5, 6, 7, 8), 2, 2)
)
lap_solve_batch(costs)
# 3D array
arr <- array(runif(2 * 2 * 10), dim = c(2, 2, 10))
lap_solve_batch(arr)
# Grouped data frame
library(dplyr)
df <- tibble(
sim = rep(1:5, each = 9),
source = rep(1:3, times = 15),
target = rep(1:3, each = 3, times = 5),
cost = runif(45, 1, 10)
)
df |> group_by(sim) |> lap_solve_batch(source, target, cost)
# Parallel execution (requires n_threads > 1)
lap_solve_batch(costs, n_threads = 2)
Find k-best optimal assignments
Description
Returns the top k optimal (or near-optimal) assignments using 'Murty' algorithm. Useful for exploring alternative optimal solutions or finding robust assignments.
Usage
lap_solve_kbest(
x,
k = 3,
source = NULL,
target = NULL,
cost = NULL,
maximize = FALSE,
method = "murty",
single_method = "jv",
forbidden = NA
)
Arguments
x |
Cost matrix, data frame, or tibble. If a data frame/tibble,
must include columns specified by |
k |
Number of best solutions to return (default: 3) |
source |
Column name for source/row indices (if |
target |
Column name for target/column indices (if |
cost |
Column name for costs (if |
maximize |
Logical; if TRUE, finds k-best maximizing assignments (default: FALSE) |
method |
Algorithm for each sub-problem (default: "murty"). Future versions may support additional methods. |
single_method |
Algorithm used for solving each node in the search tree (default: "jv") |
forbidden |
Value to mark forbidden assignments (default: NA) |
Value
A tibble with columns:
-
rank: ranking of solutions (1 = best, 2 = second best, etc.) -
solution_id: unique identifier for each solution -
source: source indices -
target: target indices -
cost: cost of each edge in the assignment -
total_cost: total cost of the complete solution
Examples
# Matrix input - find 5 best solutions
cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3)
lap_solve_kbest(cost, k = 5)
# Data frame input
library(dplyr)
df <- tibble(
source = rep(1:3, each = 3),
target = rep(1:3, times = 3),
cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4)
)
lap_solve_kbest(df, k = 3, source, target, cost)
# With maximization
lap_solve_kbest(cost, k = 3, maximize = TRUE)
Solve 1-D Line Assignment Problem
Description
Solves the linear assignment problem when both sources and targets are ordered points on a line. Uses efficient O(n*m) dynamic programming for rectangular problems and O(n) sorting for square problems.
Usage
lap_solve_line_metric(x, y, cost = "L1", maximize = FALSE)
Arguments
x |
Numeric vector of source positions (will be sorted internally) |
y |
Numeric vector of target positions (will be sorted internally) |
cost |
Cost function for distance. Either:
|
maximize |
Logical; if TRUE, maximizes total cost instead of minimizing (default: FALSE) |
Details
This is a specialized solver that exploits the structure of 1-dimensional assignment problems where costs depend only on the distance between points on a line. It is much faster than general LAP solvers for this special case.
The algorithm works as follows:
Square case (n == m):
Both vectors are sorted and matched in order: x[1] -> y[1], x[2] -> y[2], etc.
This is optimal for any metric cost function on a line.
Rectangular case (n < m): Uses dynamic programming to find the optimal assignment that matches all n sources to a subset of the m targets, minimizing total distance. The DP recurrence is:
dp[i][j] = min(dp[i][j-1], dp[i-1][j-1] + cost(x[i], y[j]))
This finds the minimum cost to match the first i sources to the first j targets.
Complexity:
Time: O(n*m) for rectangular, O(n log n) for square
Space: O(n*m) for DP table
Value
A list with components:
-
match: Integer vector of length n with 1-based column indices -
total_cost: Total cost of the assignment
Examples
# Square case: equal number of sources and targets
x <- c(1.5, 3.2, 5.1)
y <- c(2.0, 3.0, 5.5)
result <- lap_solve_line_metric(x, y, cost = "L1")
print(result)
# Rectangular case: more targets than sources
x <- c(1.0, 3.0, 5.0)
y <- c(0.5, 2.0, 3.5, 4.5, 6.0)
result <- lap_solve_line_metric(x, y, cost = "L2")
print(result)
# With unsorted inputs (will be sorted internally)
x <- c(5.0, 1.0, 3.0)
y <- c(4.5, 0.5, 6.0, 2.0, 3.5)
result <- lap_solve_line_metric(x, y, cost = "L1")
print(result)
Mark forbidden pairs
Description
Generic function to mark specific pairs as forbidden.
Usage
mark_forbidden_pairs(cost_matrix, forbidden_indices)
Value
Modified cost matrix with forbidden pairs marked.
Match blocks in parallel
Description
Match blocks in parallel
Usage
match_blocks_parallel(
blocks,
left,
right,
left_ids,
right_ids,
block_col,
vars,
distance,
weights,
scale,
max_distance,
calipers,
method,
parallel = FALSE
)
Arguments
blocks |
Vector of block IDs |
left |
Left dataset with block_col |
right |
Right dataset with block_col |
left_ids |
IDs from left |
right_ids |
IDs from right |
block_col |
Name of blocking column |
vars |
Variables for matching |
distance |
Distance metric |
weights |
Variable weights |
scale |
Scaling method |
max_distance |
Maximum distance |
calipers |
Caliper constraints |
method |
LAP method |
parallel |
Whether to use parallel processing |
Value
List with combined results from all blocks
Optimal matching using linear assignment
Description
Performs optimal one-to-one matching between two datasets using linear assignment problem (LAP) solvers. Supports blocking, distance constraints, and various distance metrics.
Usage
match_couples(
left,
right = NULL,
vars = NULL,
distance = "euclidean",
weights = NULL,
scale = FALSE,
auto_scale = FALSE,
max_distance = Inf,
calipers = NULL,
block_id = NULL,
ignore_blocks = FALSE,
require_full_matching = FALSE,
method = "auto",
return_unmatched = TRUE,
return_diagnostics = FALSE,
parallel = FALSE,
check_costs = TRUE
)
Arguments
left |
Data frame of "left" units (e.g., treated, cases) |
right |
Data frame of "right" units (e.g., control, controls) |
vars |
Variable names to use for distance computation |
distance |
Distance metric: "euclidean", "manhattan", "mahalanobis", or a custom function |
weights |
Optional named vector of variable weights |
scale |
Scaling method: FALSE (none), "standardize", "range", or "robust" |
auto_scale |
If TRUE, automatically check variable health and select scaling method (default: FALSE) |
max_distance |
Maximum allowed distance (pairs exceeding this are forbidden) |
calipers |
Named list of per-variable maximum absolute differences |
block_id |
Column name containing block IDs (for stratified matching) |
ignore_blocks |
If TRUE, ignore block_id even if present |
require_full_matching |
If TRUE, error if any units remain unmatched |
method |
LAP solver: "auto", "hungarian", "jv", "gabow_tarjan", etc. |
return_unmatched |
Include unmatched units in output |
return_diagnostics |
Include detailed diagnostics in output |
parallel |
Enable parallel processing for blocked matching. Requires 'future' and 'future.apply' packages. Can be:
|
check_costs |
If TRUE, check distance distribution for potential problems and provide helpful warnings before matching (default: TRUE) |
Details
This function finds the matching that minimizes total distance among all
feasible matchings, subject to constraints. Use greedy_couples() for
faster approximate matching on large datasets.
Value
A list with class "matching_result" containing:
-
pairs: Tibble of matched pairs with distances -
unmatched: List of unmatched left and right IDs -
info: Matching diagnostics and metadata
Examples
# Basic matching
left <- data.frame(id = 1:5, x = c(1, 2, 3, 4, 5), y = c(2, 4, 6, 8, 10))
right <- data.frame(id = 6:10, x = c(1.1, 2.2, 3.1, 4.2, 5.1), y = c(2.1, 4.1, 6.2, 8.1, 10.1))
result <- match_couples(left, right, vars = c("x", "y"))
print(result$pairs)
# With constraints
result <- match_couples(left, right, vars = c("x", "y"),
max_distance = 1,
calipers = list(x = 0.5))
# With blocking
left$region <- c("A", "A", "B", "B", "B")
right$region <- c("A", "A", "B", "B", "B")
blocks <- matchmaker(left, right, block_type = "group", block_by = "region")
result <- match_couples(blocks$left, blocks$right, vars = c("x", "y"))
Match with blocking (multiple problems)
Description
Match with blocking (multiple problems)
Usage
match_couples_blocked(
left,
right,
left_ids,
right_ids,
block_col,
vars,
distance,
weights,
scale,
max_distance,
calipers,
method,
parallel = FALSE
)
Value
List with pairs tibble and matching info.
Match from Precomputed Distance Object
Description
Internal function to handle matching when a distance_object is provided
Usage
match_couples_from_distance(
dist_obj,
max_distance = Inf,
calipers = NULL,
ignore_blocks = FALSE,
require_full_matching = FALSE,
method = "auto",
return_unmatched = TRUE,
return_diagnostics = FALSE,
check_costs = TRUE
)
Value
A matching_result object with pairs, info, and optional diagnostics.
Match without blocking (single problem)
Description
Match without blocking (single problem)
Usage
match_couples_single(
left,
right,
left_ids,
right_ids,
vars,
distance,
weights,
scale,
max_distance,
calipers,
method,
check_costs = TRUE
)
Value
List with pairs tibble and matching info.
Create blocks for stratified matching
Description
Constructs blocks (strata) for matching, using either grouping variables or clustering algorithms. Returns the input data frames with block IDs assigned, along with block summary statistics.
Usage
matchmaker(
left,
right,
block_type = c("none", "group", "cluster"),
block_by = NULL,
block_vars = NULL,
block_method = "kmeans",
n_blocks = NULL,
min_left = 1,
min_right = 1,
drop_imbalanced = FALSE,
imbalance_threshold = Inf,
return_dropped = TRUE,
...
)
Arguments
left |
Data frame of "left" units (e.g., treated, cases) |
right |
Data frame of "right" units (e.g., control, controls) |
block_type |
Type of blocking to use:
|
block_by |
Variable name(s) for grouping (if block_type = "group") |
block_vars |
Variable names for clustering (if block_type = "cluster") |
block_method |
Clustering method (if block_type = "cluster"):
|
n_blocks |
Target number of blocks (for clustering) |
min_left |
Minimum number of left units per block |
min_right |
Minimum number of right units per block |
drop_imbalanced |
Drop blocks with extreme imbalance |
imbalance_threshold |
Maximum allowed |n_left - n_right| / max(n_left, n_right) |
return_dropped |
Include dropped blocks in output |
... |
Additional arguments passed to clustering function |
Details
This function does NOT perform matching - it only creates the block structure.
Use match_couples() or greedy_couples() to perform matching within blocks.
Value
A list with class "matchmaker_result" containing:
-
left: Left data frame with block_id column added -
right: Right data frame with block_id column added -
block_summary: Summary statistics for each block -
dropped: Information about dropped blocks (if any) -
info: Metadata about blocking process
Examples
# Group blocking
left <- data.frame(id = 1:10, region = rep(c("A", "B"), each = 5), x = rnorm(10))
right <- data.frame(id = 11:20, region = rep(c("A", "B"), each = 5), x = rnorm(10))
blocks <- matchmaker(left, right, block_type = "group", block_by = "region")
print(blocks$block_summary)
# Clustering
blocks <- matchmaker(left, right, block_type = "cluster",
block_vars = "x", n_blocks = 3)
Parallel lapply using future
Description
Parallel lapply using future
Usage
parallel_lapply(X, FUN, ..., parallel = FALSE)
Arguments
X |
Vector to iterate over |
FUN |
Function to apply |
... |
Additional arguments to FUN |
parallel |
Whether parallel processing is enabled |
Value
List of results
Pixel-level image morphing (final frame only)
Description
Computes optimal pixel assignment from A to B and returns the final transported frame (without intermediate animation frames).
Usage
pixel_morph(
imgA,
imgB,
n_frames = 16L,
mode = c("color_walk", "exact", "recursive"),
lap_method = "jv",
maximize = FALSE,
quantize_bits = 5L,
downscale_steps = 0L,
alpha = 1,
beta = 0,
patch_size = 1L,
upscale = 1,
show = interactive()
)
Arguments
imgA |
Source image (file path or magick image object) |
imgB |
Target image (file path or magick image object) |
n_frames |
Internal parameter for rendering (default: 16) |
mode |
Assignment algorithm: "color_walk" (default), "exact", or "recursive" |
lap_method |
LAP solver method (default: "jv") |
maximize |
Logical, maximize instead of minimize cost (default: FALSE) |
quantize_bits |
Color quantization for "color_walk" mode (default: 5) |
downscale_steps |
Number of 2x reductions before computing assignment (default: 0) |
alpha |
Weight for color distance in cost function (default: 1) |
beta |
Weight for spatial distance in cost function (default: 0) |
patch_size |
Tile size for tiled modes (default: 1) |
upscale |
Post-rendering upscaling factor (default: 1) |
show |
Logical, display result in viewer (default: interactive()) |
Details
Transport-Only Semantics
This function returns a SHARP, pixel-perfect transport of A's pixels to positions determined by the assignment to B.
Key Points:
Assignment computed using:
cost = alpha * color_dist + beta * spatial_distB's COLORS influence assignment but DO NOT appear in output
Result has A's colors arranged to match B's layout
No motion blur (unlike intermediate frames in animation)
See pixel_morph_animate for detailed explanation of
assignment vs rendering semantics.
Permutation Warnings
Assignment is guaranteed to be a bijection (permutation) ONLY when:
-
downscale_steps = 0(no resolution changes) -
mode = "exact"withpatch_size = 1
With downscaling or tiled modes, assignment may have:
-
Overlaps: Multiple source pixels map to same destination (last write wins)
-
Holes: Some destinations never filled (remain transparent)
If assignment is not a bijection (due to downscaling or tiling), a warning will be issued. The result may contain:
Overlapped pixels (multiple sources -> one destination)
Transparent holes (some destinations unfilled)
For guaranteed pixel-perfect results, use:
pixel_morph(A, B, mode = "exact", downscale_steps = 0)
Value
magick image object of the final transported frame
See Also
pixel_morph_animate for animated version
Examples
if (requireNamespace("magick", quietly = TRUE)) {
imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr")
imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr")
if (nzchar(imgA) && nzchar(imgB)) {
result <- pixel_morph(imgA, imgB, n_frames = 4, show = FALSE)
}
}
Pixel-level image morphing (animation)
Description
Creates an animated morph by computing optimal pixel assignment from image A to image B, then rendering intermediate frames showing the transport.
Usage
pixel_morph_animate(
imgA,
imgB,
n_frames = 16L,
fps = 10L,
format = c("gif", "webp", "mp4"),
outfile = NULL,
show = interactive(),
mode = c("color_walk", "exact", "recursive"),
lap_method = "jv",
maximize = FALSE,
quantize_bits = 5L,
downscale_steps = 0L,
alpha = 1,
beta = 0,
patch_size = 1L,
upscale = 1
)
Arguments
imgA |
Source image (file path or magick image object) |
imgB |
Target image (file path or magick image object) |
n_frames |
Integer number of animation frames (default: 16) |
fps |
Frames per second for playback (default: 10) |
format |
Output format: "gif", "webp", or "mp4" |
outfile |
Optional output file path |
show |
Logical, display animation in viewer (default: interactive()) |
mode |
Assignment algorithm: "color_walk" (default), "exact", or "recursive" |
lap_method |
LAP solver method (default: "jv") |
maximize |
Logical, maximize instead of minimize cost (default: FALSE) |
quantize_bits |
Color quantization for "color_walk" mode (default: 5) |
downscale_steps |
Number of 2x reductions before computing assignment (default: 0) |
alpha |
Weight for color distance in cost function (default: 1) |
beta |
Weight for spatial distance in cost function (default: 0) |
patch_size |
Tile size for tiled modes (default: 1) |
upscale |
Post-rendering upscaling factor (default: 1) |
Details
Assignment vs Rendering Semantics
CRITICAL: This function has two separate phases with different semantics:
Phase 1 - Assignment Computation:
The assignment is computed by minimizing:
cost(i,j) = alpha * color_distance(A[i], B[j]) +
beta * spatial_distance(pos_i, pos_j)
This means B's COLORS influence which pixels from A map to which positions.
Phase 2 - Rendering (Transport-Only):
The renderer uses ONLY A's colors:
Intermediate frames: A's pixels move along paths with motion blur
Final frame: A's pixels at their assigned positions (sharp, no blur)
B's colors NEVER appear in the output
Result: You get A's colors rearranged to match B's geometry/layout.
What This Means
B influences WHERE pixels go (via similarity in cost function)
B does NOT determine WHAT COLORS appear in output
Final image has A's palette arranged to mimic B's structure
Parameter Guidance
For pure spatial rearrangement (ignore B's colors in assignment):
pixel_morph_animate(A, B, alpha = 0, beta = 1)
For color-similarity matching (default):
pixel_morph_animate(A, B, alpha = 1, beta = 0)
For hybrid (color + spatial):
pixel_morph_animate(A, B, alpha = 1, beta = 0.2)
Permutation Guarantees
Assignment is guaranteed to be a bijection (permutation) ONLY when:
-
downscale_steps = 0(no resolution changes) -
mode = "exact"withpatch_size = 1
With downscaling or tiled modes, assignment may have:
-
Overlaps: Multiple source pixels map to same destination (last write wins)
-
Holes: Some destinations never filled (remain transparent)
A warning is issued if overlaps/holes are detected in the final frame.
Value
Invisibly returns a list with animation object and metadata:
animation |
magick animation object |
width |
Image width in pixels |
height |
Image height in pixels |
assignment |
Integer vector of 1-based assignment indices (R convention) |
n_pixels |
Total number of pixels |
mode |
Mode used for matching |
upscale |
Upscaling factor applied |
Examples
if (requireNamespace("magick", quietly = TRUE)) {
imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr")
imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr")
if (nzchar(imgA) && nzchar(imgB)) {
outfile <- tempfile(fileext = ".gif")
pixel_morph_animate(imgA, imgB, outfile = outfile, n_frames = 4, show = FALSE)
}
}
Plot method for balance diagnostics
Description
Produces a Love plot (dot plot) of standardized differences.
Usage
## S3 method for class 'balance_diagnostics'
plot(x, type = c("love", "histogram", "variance"), threshold = 0.1, ...)
Arguments
x |
A balance_diagnostics object |
type |
Type of plot: "love" (default), "histogram", or "variance" |
threshold |
Threshold line for standardized differences (default: 0.1) |
... |
Additional arguments passed to plotting functions |
Value
The balance_diagnostics object (invisibly)
Plot method for matching results
Description
Produces a histogram of pairwise distances from a matching result.
Usage
## S3 method for class 'matching_result'
plot(x, type = c("histogram", "density", "ecdf"), ...)
Arguments
x |
A matching_result object |
type |
Type of plot: "histogram" (default), "density", or "ecdf" |
... |
Additional arguments passed to plotting functions |
Value
The matching_result object (invisibly)
Preprocess matching variables with automatic checks and scaling
Description
Main preprocessing function that orchestrates variable health checks, categorical encoding, and automatic scaling selection.
Usage
preprocess_matching_vars(
left,
right,
vars,
auto_scale = TRUE,
scale_method = "auto",
check_health = TRUE,
remove_problematic = TRUE,
verbose = TRUE
)
Arguments
left |
Data frame of left units |
right |
Data frame of right units |
vars |
Character vector of variable names |
auto_scale |
Logical, whether to perform automatic preprocessing (default: TRUE) |
scale_method |
Scaling method: "auto", "standardize", "range", "robust", or FALSE |
check_health |
Logical, whether to check variable health (default: TRUE) |
remove_problematic |
Logical, automatically exclude constant/all-NA variables (default: TRUE) |
verbose |
Logical, whether to print warnings (default: TRUE) |
Value
A list with class "preprocessing_result" containing:
-
left: Preprocessed left data frame -
right: Preprocessed right data frame -
vars: Final variable names (after exclusions) -
health: Variable health diagnostics -
scaling_method: Selected scaling method -
excluded_vars: Variables that were excluded -
warnings: List of warnings issued
Print Method for Balance Diagnostics
Description
Print Method for Balance Diagnostics
Usage
## S3 method for class 'balance_diagnostics'
print(x, ...)
Arguments
x |
A balance_diagnostics object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object x.
Print Method for Distance Objects
Description
Print Method for Distance Objects
Usage
## S3 method for class 'distance_object'
print(x, ...)
Arguments
x |
A distance_object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object x.
Print method for batch assignment results
Description
Prints a summary and the table of results for a batch of assignment
problems solved with lap_solve_batch().
Usage
## S3 method for class 'lap_solve_batch_result'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments passed to |
Value
Invisibly returns the input object x.
Print method for k-best assignment results
Description
Print method for k-best assignment results
Usage
## S3 method for class 'lap_solve_kbest_result'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments passed to |
Value
Invisibly returns the input object x.
Print method for assignment results
Description
Nicely prints a lap_solve_result object, including the assignments,
total cost, and method used.
Usage
## S3 method for class 'lap_solve_result'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments passed to |
Value
Invisibly returns the input object x.
Print method for matching results
Description
Print method for matching results
Usage
## S3 method for class 'matching_result'
print(x, ...)
Arguments
x |
A matching_result object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object x.
Print method for matchmaker results
Description
Print method for matchmaker results
Usage
## S3 method for class 'matchmaker_result'
print(x, ...)
Arguments
x |
A matchmaker_result object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object x.
Print method for preprocessing result
Description
Print method for preprocessing result
Usage
## S3 method for class 'preprocessing_result'
print(x, ...)
Arguments
x |
A preprocessing_result object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object x.
Print method for variable health
Description
Print method for variable health
Usage
## S3 method for class 'variable_health'
print(x, ...)
Arguments
x |
A variable_health object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object x.
Restore original parallel plan
Description
Restore original parallel plan
Usage
restore_parallel(parallel_state)
Arguments
parallel_state |
State from setup_parallel() |
Value
No return value, called for side effects (restores parallel plan).
Setup parallel processing with future
Description
Setup parallel processing with future
Usage
setup_parallel(parallel = FALSE, n_workers = NULL)
Arguments
parallel |
Logical or plan specification |
n_workers |
Number of workers (NULL for auto-detect) |
Value
List with original plan and whether we set up parallelization
'Sinkhorn-Knopp' optimal transport solver
Description
Compute an entropy-regularized optimal transport plan using the 'Sinkhorn-Knopp' algorithm. Unlike other LAP solvers that return a hard 1-to-1 assignment, this returns a soft assignment (doubly stochastic matrix).
Usage
sinkhorn(
cost,
lambda = 10,
tol = 1e-09,
max_iter = 1000,
r_weights = NULL,
c_weights = NULL
)
Arguments
cost |
Numeric matrix of transport costs. |
lambda |
Regularization parameter (default 10). Higher values produce sharper (more deterministic) transport plans; lower values produce smoother distributions. Typical range: 1-100. |
tol |
Convergence tolerance (default 1e-9). |
max_iter |
Maximum iterations (default 1000). |
r_weights |
Optional numeric vector of row marginals (source distribution). Default is uniform. Will be normalized to sum to 1. |
c_weights |
Optional numeric vector of column marginals (target distribution). Default is uniform. Will be normalized to sum to 1. |
Details
The 'Sinkhorn-Knopp' algorithm solves the entropy-regularized optimal transport problem:
P^* = \arg\min_P \langle C, P \rangle - \frac{1}{\lambda} H(P)
subject to row sums = r_weights and column sums = c_weights.
The entropy term H(P) encourages spread in the transport plan. As lambda -> Inf, the solution approaches the standard (unregularized) optimal transport.
Key differences from standard LAP solvers:
Returns a soft assignment (probabilities) not a hard 1-to-1 matching
Supports unequal marginals (weighted distributions)
Differentiable, making it useful in ML pipelines
Very fast: O(n^2) per iteration with typically O(1/tol^2) iterations
Use sinkhorn_to_assignment() to round the soft assignment to a hard matching.
Value
A list with elements:
-
transport_plan— numeric matrix, the optimal transport plan P. Row sums approximate r_weights, column sums approximate c_weights. -
cost— the transport cost <C, P> (without entropy term). -
u,v— scaling vectors (P = diag(u) * K * diag(v) where K = exp(-lambda*C)). -
converged— logical, whether the algorithm converged. -
iterations— number of iterations used. -
lambda— the regularization parameter used.
References
Cuturi, M. (2013). 'Sinkhorn Distances': Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems, 26.
See Also
assignment() for hard 1-to-1 matching, sinkhorn_to_assignment()
to round soft assignments.
Examples
cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE)
# Soft assignment with default parameters
result <- sinkhorn(cost)
print(round(result$transport_plan, 3))
# Sharper assignment (higher lambda)
result_sharp <- sinkhorn(cost, lambda = 50)
print(round(result_sharp$transport_plan, 3))
# With custom marginals (more mass from row 1)
result_weighted <- sinkhorn(cost, r_weights = c(0.5, 0.25, 0.25))
print(round(result_weighted$transport_plan, 3))
# Round to hard assignment
hard_match <- sinkhorn_to_assignment(result)
print(hard_match)
Round 'Sinkhorn' transport plan to hard assignment
Description
Convert a soft transport plan from sinkhorn() to a hard 1-to-1 assignment
using greedy rounding.
Usage
sinkhorn_to_assignment(result)
Arguments
result |
Either a result from |
Details
Greedy rounding iteratively assigns each row to its most probable column,
ensuring no column is assigned twice. This may not give the globally optimal
hard assignment; for that, use the transport plan as a cost matrix with
assignment().
Value
Integer vector of column assignments (1-based), same format as
assignment().
See Also
Examples
cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE)
result <- sinkhorn(cost, lambda = 20)
hard_match <- sinkhorn_to_assignment(result)
print(hard_match)
Calculate Standardized Difference
Description
Computes the standardized mean difference between two groups. This is a key metric for assessing balance in matched samples.
Usage
standardized_difference(x1, x2, pooled = TRUE)
Arguments
x1 |
Numeric vector for group 1 |
x2 |
Numeric vector for group 2 |
pooled |
Logical, if TRUE use pooled standard deviation (default), if FALSE use group 1 standard deviation |
Details
Standardized difference = (mean1 - mean2) / pooled_sd where pooled_sd = sqrt((sd1^2 + sd2^2) / 2)
Common thresholds: less than 0.1 is excellent balance, 0.1-0.25 is good balance, 0.25-0.5 is acceptable balance, and greater than 0.5 is poor balance.
Value
Numeric value representing the standardized difference
Perfect balance success message
Description
Perfect balance success message
Usage
success_good_balance(mean_std_diff)
Value
No return value, called for side effects (issues a message).
Suggest scaling method based on variable characteristics
Description
Analyzes variable distributions and suggests appropriate scaling methods.
Usage
suggest_scaling(left, right, vars)
Arguments
left |
Data frame of left units |
right |
Data frame of right units |
vars |
Character vector of variable names |
Value
A character string with the suggested scaling method: "standardize", "range", "robust", or "none"
Summarize block structure
Description
Summarize block structure
Usage
summarize_blocks(left, right, block_vars = NULL)
Value
Tibble with block_id, n_left, n_right, and optional variable means.
Summary method for balance diagnostics
Description
Summary method for balance diagnostics
Usage
## S3 method for class 'balance_diagnostics'
summary(object, ...)
Arguments
object |
A balance_diagnostics object |
... |
Additional arguments (ignored) |
Value
A list containing summary statistics (invisibly)
Summary Method for Distance Objects
Description
Summary Method for Distance Objects
Usage
## S3 method for class 'distance_object'
summary(object, ...)
Arguments
object |
A distance_object |
... |
Additional arguments (ignored) |
Value
Invisibly returns the input object.
Get summary of k-best results
Description
Extract summary information from k-best assignment results.
Usage
## S3 method for class 'lap_solve_kbest_result'
summary(object, ...)
Arguments
object |
An object of class |
... |
Additional arguments (unused). |
Value
A tibble with one row per solution containing:
-
rank: solution rank -
solution_id: solution identifier -
total_cost: total cost of the solution -
n_assignments: number of assignments in the solution
Summary method for matching results
Description
Summary method for matching results
Usage
## S3 method for class 'matching_result'
summary(object, ...)
Arguments
object |
A matching_result object |
... |
Additional arguments (ignored) |
Value
A list containing summary statistics (invisibly)
Update Constraints on Distance Object
Description
Apply new constraints to a precomputed distance object without recomputing the underlying distances. This is useful for exploring different constraint scenarios quickly.
Usage
update_constraints(dist_obj, max_distance = Inf, calipers = NULL)
Arguments
dist_obj |
A distance_object from |
max_distance |
Maximum allowed distance (pairs with distance > max_distance become Inf) |
calipers |
Named list of per-variable calipers |
Details
This function creates a new distance_object with modified constraints applied to the cost matrix. The original distance_object is not modified.
Constraints:
-
max_distance: Sets cost to Inf for pairs exceeding this threshold -
calipers: Per-variable restrictions (e.g., calipers = list(age = 5))
The function returns a new object rather than modifying in place, following R's copy-on-modify semantics.
Value
A new distance_object with updated cost_matrix
Examples
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45))
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44))
dist_obj <- compute_distances(left, right, vars = "age")
# Apply constraints
constrained <- update_constraints(dist_obj, max_distance = 2)
result <- match_couples(constrained)
Check if emoji should be used
Description
Check if emoji should be used
Usage
use_emoji()
Value
Logical indicating whether emoji should be used.
Validate calipers parameter
Description
Validate calipers parameter
Usage
validate_calipers(calipers, vars)
Value
Validated calipers (list or named numeric), or NULL if none.
Validate and prepare cost data
Description
Internal helper that ensures a numeric, non-empty cost matrix.
Usage
validate_cost_data(x, forbidden = NA)
Arguments
x |
Cost matrix or data frame |
forbidden |
Value representing forbidden assignments (use NA or Inf) |
Value
Numeric cost matrix
Validate matching inputs
Description
Validate matching inputs
Usage
validate_matching_inputs(left, right, vars = NULL)
Value
Invisibly returns TRUE if validation passes; otherwise throws an error.
Validate weights parameter
Description
Validate weights parameter
Usage
validate_weights(weights, vars)
Value
Numeric vector of validated weights.
All distances identical warning
Description
All distances identical warning
Usage
warn_constant_distance(value)
Value
No return value, called for side effects (issues a warning).
Constant variable warning
Description
Constant variable warning
Usage
warn_constant_var(var)
Value
No return value, called for side effects (issues a warning).
Extreme cost ratio warning
Description
Extreme cost ratio warning
Usage
warn_extreme_costs(p95, p99, ratio, problem_vars = NULL)
Value
No return value, called for side effects (issues a warning).
Many forbidden pairs warning
Description
Many forbidden pairs warning
Usage
warn_many_forbidden(pct_forbidden, n_valid, n_left)
Value
No return value, called for side effects (issues a warning).
Too many zeros warning
Description
Too many zeros warning
Usage
warn_many_zeros(pct, n_zeros)
Value
No return value, called for side effects (issues a warning).
Parallel package missing warning (reuse from matching_parallel.R)
Description
Parallel package missing warning (reuse from matching_parallel.R)
Usage
warn_parallel_unavailable()
Value
No return value, called for side effects (issues a warning).
High distance matches warning
Description
High distance matches warning
Usage
warn_poor_quality(pct_poor, threshold)
Value
No return value, called for side effects (issues a warning).