Repository Mirror for your Cloud Server and Webhosting

Type:

Package

Title:

Asymmetric Smoothed-Association Matrices via GAM Fits

Version:

0.1.0

Description:

Render a pairwise, asymmetric smoothed-association matrix of continuous variables. Each cell shows the fitted spline from an 'mgcv' generalised additive model, with the upper triangle displaying 'gam(x_j ~ s(x_i))' and the lower triangle 'gam(x_i ~ s(x_j))'. Unlike Pearson's correlation matrix, the visualisation is intentionally asymmetric, revealing heteroscedasticity, leverage, and directional non-linearity that a single scalar correlation hides. An asymmetry index and a 24-category shape taxonomy quantify the directional difference and qualitative form of each fitted smooth.

License:

GPL (≥ 3)

URL:

https://github.com/max578/janusplot, https://max578.github.io/janusplot/

BugReports:

https://github.com/max578/janusplot/issues

Encoding:

UTF-8

Language:

en-AU

Depends:

R (≥ 4.3.0)

Imports:

mgcv (≥ 1.9.0), ggplot2 (≥ 3.5.0), patchwork (≥ 1.1.0), grid, stats, cli (≥ 3.6.0), lifecycle, rlang (≥ 1.1.0)

Suggests:

agridat, future, future.apply, knitr, MASS, palmerpenguins, rmarkdown, testthat (≥ 3.0.0), vdiffr (≥ 1.0.0), withr

VignetteBuilder:

knitr

RoxygenNote:

7.3.3

Config/testthat/edition:

Config/testthat/parallel:

true

Config/Needs/website:

pkgdown

LazyData:

true

NeedsCompilation:

Packaged:

2026-04-23 14:06:24 UTC; a1222812

Author:

Max Moldovan

[aut, cre, cph]

Maintainer:

Max Moldovan <max.moldovan@adelaide.edu.au>

Repository:

CRAN

Date/Publication:

2026-04-28 18:30:08 UTC

janusplot: Asymmetric Smoothed-Association Matrices via GAM Fits

Description

janusplot renders pairwise, asymmetric smoothed-association matrices of continuous variables. Each cell shows the fitted spline from an mgcv::gam() model, with upper and lower triangles encoding the two directional regressions y ~ s(x) and x ~ s(y) respectively.

Unlike a Pearson correlation matrix (one scalar per pair, symmetric), a smoothed-association matrix gives two curves per pair and is intentionally asymmetric. Heteroscedasticity, leverage, and directional non-linearity become visually evident.

Main functions

janusplot() — returns a ggplot of the matrix.
janusplot_data() — returns the raw GAM fits + per-cell metrics (for custom plotting or downstream analysis).

Asymmetry index

For each pair, the asymmetry index ⁠A_ij = |EDF_yx - EDF_xy| / (EDF_yx + EDF_xy)⁠ is bounded in [0, 1]. Values near 0 indicate symmetric complexity; values near 1 indicate the two directional fits differ sharply in effective degrees of freedom.

Under the additive noise model (Hoyer et al. 2009; Peters et al. 2014), the two directional regressions are generally asymmetric when the data-generating process is non-linear, and this asymmetry identifies the causal direction under mild conditions. The asymmetry index is offered here as a visual pre-discovery diagnostic rather than a causal inference procedure; see the package vignette and accompanying paper for full scope and limitations (in particular the failure modes under heteroscedasticity, confounding, and Gaussian-linear DGPs).

Author(s)

Maintainer: Max Moldovan max.moldovan@adelaide.edu.au (ORCID) [copyright holder]

Asymmetric smoothed-association matrix

Description

Render a pairwise, asymmetric matrix of smoothed associations between numeric variables. Each cell [i, j] where i != j shows the fitted spline from mgcv::gam():

Upper triangle (i < j): ⁠gam(x_j ~ s(x_i) + <adjust>)⁠.
Lower triangle (i > j): ⁠gam(x_i ~ s(x_j) + <adjust>)⁠.
Diagonal: blank panel when labels live on the border (default), or a variable-name label when labels = "diagonal".

The two triangles intentionally differ — the asymmetry reveals heteroscedasticity, leverage, and directional non-linearity that a single scalar correlation hides.

Usage

janusplot(
  data,
  vars = NULL,
  adjust = NULL,
  method = "REML",
  k = -1L,
  bs = "tp",
  order = c("original", "hclust", "alphabetical"),
  show_data = TRUE,
  show_ci = TRUE,
  display = c("fit", "d1", "d2"),
  derivative_ci = c("none", "pointwise", "simultaneous"),
  derivative_ci_nsim = 1000L,
  n_grid = NULL,
  colour_by = c("pearson", "spearman", "kendall", "edf", "deviance_gap", "none"),
  fill_by = NULL,
  palette = NULL,
  annotations = c("edf", "A"),
  shape_cutoffs = janusplot_shape_cutoffs(),
  show_shape_legend = TRUE,
  glyph_style = c("ascii", "unicode"),
  labels = c("border", "diagonal", "none"),
  diagonal = c("auto", "blank", "name", "density"),
  label_srt = 45,
  label_cex = 1,
  signif_glyph = TRUE,
  show_asymmetry = NULL,
  na_action = c("pairwise", "complete"),
  parallel = FALSE,
  with_data = FALSE,
  text_scale_diag = 1,
  text_scale_off_diag = 1,
  show_glossary = TRUE,
  glossary_scale = 1,
  ...
)

Arguments

data

A data frame with numeric columns to include.

vars

Character vector of column names to use. NULL (default) uses all numeric columns in data. Non-numeric columns trigger an error listing offenders.

adjust

A one-sided formula RHS giving additional covariates and/or random effects to include in every pairwise GAM. For example, adjust = ~ s(age) + s(site, bs = "re") fits gam(y ~ s(x) + s(age) + s(site, bs = "re")) for each pair. Default NULL fits unadjusted pairwise smooths.

method

Smoothing-parameter estimation method passed to mgcv::gam(). Default "REML" per mgcv recommendation.

k

Integer, or named list mapping variable names to integers. Basis dimension for s(). Default -1L (mgcv's automatic choice).

bs

Basis type for s(). Default "tp" (thin plate).

order

One of "original" (default), "hclust" (reorder by hierarchical clustering of Pearson correlations), or "alphabetical".

show_data

Logical. If TRUE (default), overlay raw data points (low alpha) behind each spline. Only applies when display = "fit"; derivative panels never overlay raw data.

show_ci

Logical. If TRUE (default), overlay the 95% confidence envelope from predict(gam, se.fit = TRUE) on the fit panel (i.e. when display = "fit"). CI rendering on derivative panels is controlled separately by derivative_ci.

display

One of "fit" (default), "d1", or "d2". Selects which single quantity is rendered in every off-diagonal cell of the matrix.

"fit" — the fitted smooth \hat f(x); default, behaviour identical to the pre-derivative release.
"d1" — the first derivative \hat f'(x) of the fitted smooth. Zero crossings localise turning points of \hat f.
"d2" — the second derivative \hat f''(x). Zero crossings localise inflection points of \hat f.

A single matrix shows a single quantity by design: stacked multi-panel cells crowd the matrix at any realistic variable count. To compare fit against derivative, render two or three janusplot() calls side-by-side; each call keeps its own with_data = TRUE summary table tagged with the display column.

Orders k \ge 3 are not exposed — higher-order derivatives of penalised regression splines amplify noise and rarely carry usable signal at realistic sample sizes. See vignette("janusplot") for the theoretical justification and applied use-cases.

derivative_ci

One of "none" (default), "pointwise", or "simultaneous". Controls whether — and how — a 95% confidence ribbon is drawn underneath the derivative curve when display %in% c("d1", "d2"). Ignored when display = "fit".

"none" — no ribbon. The curve and the zero reference line are all you see. Default, because pointwise ribbons overshoot nominal coverage as a joint region and can invite over-reading of local features.
"pointwise" — 95% pointwise ribbon from \sqrt{\mathrm{diag}(D V_p D^\top)} (Wood 2017 §7.2.4). Valid marginally; not a simultaneous statement.
"simultaneous" — 95% simultaneous band via the Monte Carlo construction of Ruppert, Wand & Carroll (2003) popularised for GAMs by Simpson (2018, Frontiers Ecol. Evol. 6:149): draw B samples \tilde{\boldsymbol\beta} \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p), compute \max_x |D_i(\tilde{\boldsymbol\beta} - \hat{\boldsymbol\beta})| / \mathrm{se}_i, and use the (1-\alpha) quantile as a critical multiplier on the pointwise SE. Valid for feature localisation ("where is \hat f'(x) significantly non-zero").

derivative_ci_nsim

Integer. Number of Monte Carlo samples used when derivative_ci = "simultaneous". Default 1000L — a compromise between coverage accuracy (Simpson 2018 uses 10000) and CPU budget across every pair in a medium-sized matrix. Ignored for any other derivative_ci.

n_grid

Integer or NULL. Number of equally-spaced points used to evaluate each fitted smooth (and its derivatives). Default NULL resolves to 100 when display = "fit" and 200 otherwise, because finite-difference second derivatives visibly degrade below \sim 150 points on moderate-k smooths. Supplying n_grid directly overrides both defaults. Larger grids shift the numerical shape-metric values (M, C, turning / inflection counts) slightly because they are computed on this same grid. Shapes and asymmetry are the primary reading; M, C and the counts are secondary diagnostics and the grid-induced drift is tolerable.

colour_by

One of "pearson" (default), "spearman", "kendall", "edf", "deviance_gap", or "none". Encodes the per-cell fill colour by the chosen scalar. Correlation choices use a diverging palette with limits c(-1, 1) and a shared corr colour-bar title; "edf" and "deviance_gap" use a sequential palette labelled by the metric.

fill_by

Deprecated alias for colour_by. If supplied emits a single soft deprecation warning and is forwarded to colour_by.

palette

Character. Colour palette for the cell fill scale. Defaults to "RdBu" when colour_by is a correlation and "viridis" otherwise. Sequential choices: "viridis", "magma", "inferno", "plasma", "cividis", "mako", "rocket", "turbo" (not CB-safe), "YlOrRd", "YlGnBu", "Blues", "Greens". Diverging choices: "RdYlBu", "RdBu", "PuOr", "Spectral" (not CB-safe). Passing a sequential palette while colour_by is a correlation silently upgrades to the default diverging palette.

annotations

Character vector, a subset of c("edf", "A", "shape", "code"). Controls which corner annotations appear on each off-diagonal cell:

"code" — 2-letter ASCII shape code, top-left corner.
"A" and "edf" — asymmetry index and effective degrees of freedom, stacked bottom-left.
"shape" — shape glyph (Unicode or ASCII per glyph_style), bottom-right corner.

Default c("edf", "A"). "code" and "shape" occupy distinct corners so both can be requested together. See janusplot_shape_hierarchy() for the full code list.

shape_cutoffs

Named list of classification thresholds used to map the continuous shape indices into discrete shape_category labels; see janusplot_shape_cutoffs().

show_shape_legend

Logical. If TRUE (default), attach a standing shape-types legend plate below the matrix that illustrates every category in the taxonomy as a canonical thumbnail spline. Independent of annotations.

glyph_style

One of "ascii" (default) or "unicode". Controls how cell shape glyphs render when "shape" is included in annotations. Default is "ascii" for maximum portability across typesetting pipelines; switch to "unicode" only when the target font is known to cover the curve glyph set.

labels

One of "border" (default), "diagonal", or "none". Controls where variable names are rendered:

"border" — names along the top (rotated per label_srt) and left margins of the matrix; diagonal cells are left blank. Mirrors corrplot's tl.pos = "lt" convention.
"diagonal" — names centred on the diagonal cells (the pre-0.1 layout).
"none" — labels suppressed entirely; diagonal cells blank.

diagonal

One of "auto" (default), "blank", "name", or "density". Controls what is rendered in the diagonal cells of the matrix.

"auto" — preserves the historical behaviour: variable name when labels = "diagonal", blank otherwise.
"blank" — empty bordered panel (uniform grid reading).
"name" — variable name centred in the cell, bold.
"density" — kernel density of the variable filled in translucent grey, with a rug of raw values along the bottom edge. Mirrors the GGally::ggpairs convention; surfaces tail weight, bimodality, and support clipping that the pairwise smooths alone cannot reveal. Variable names should come from the border (labels = "border", the default) when this mode is on.

label_srt

Numeric. Rotation (degrees) of top labels when labels = "border". Default 45; set to 0 for horizontal or 90 for vertical. Ignored when labels != "border".

label_cex

Positive numeric multiplier on the border-label font size. Default 1. Ignored when labels = "none".

signif_glyph

Logical. If TRUE (default), annotate cells with ⁠·⁠ / * / ⁠**⁠ reflecting the smooth's F-test p-value.

show_asymmetry

Deprecated. Use annotations instead ("A" %in% annotations). When supplied, a soft deprecation warning fires and the argument is merged into annotations.

na_action

One of "pairwise" (default; per-cell complete observations) or "complete" (listwise; all cells use the same rows).

parallel

Logical. If TRUE, use future.apply::future_mapply() to fit pairs in parallel. Requires the future.apply package and a user-configured future::plan(). Default FALSE.

with_data

Logical. If TRUE, return a two-element list list(plot, data) where data is a flat per-cell summary (one row per off-diagonal cell) of everything the plot displays. The data element is always a plain data.frame (base R — no data.table dependency). Default FALSE — in which case only the ggplot is returned.

text_scale_diag

Positive numeric multiplier applied to the diagonal variable-name labels. Default 1. Diagonal labels additionally auto-shrink for long variable names (nchar(var) > 10) so they fit the cell regardless of this value.

text_scale_off_diag

Positive numeric multiplier applied to all off-diagonal annotations (n / EDF readouts, significance glyphs, asymmetry-index labels). Default 1. Use ⁠< 1⁠ when cells are small and the annotations crowd the fit line; use ⁠> 1⁠ for presentation plots.

show_glossary

Logical. If TRUE (default), attach a multi-line caption below the matrix describing the on-plot abbreviations (n, EDF, A, fill encoding, significance glyphs). Only keys actually displayed are listed.

glossary_scale

Positive numeric multiplier on the glossary caption font size. Default 1.

...

Additional arguments passed to mgcv::gam().

Value

If with_data = FALSE (default), a ggplot2::ggplot object (via patchwork::wrap_plots()) carrying a top-of-matrix title that names the displayed quantity ("Direct fit", "First derivative f'", or "Second derivative f''"). If with_data = TRUE, a list with two elements: plot (the ggplot) and data (a tidy table with columns var_x, var_y, position, n_used, edf, pvalue, signif, dev_exp, asymmetry_index, cor_pearson, cor_spearman, cor_kendall, tie_ratio, monotonicity_index, convexity_index, n_turning_points, n_inflections, flat_range_ratio, shape_category, colour_value, display, one row per off-diagonal cell). The display column tags which quantity the call rendered, so separate calls for fit / d1 / d2 yield comparable, stackable tables. Derivative curves themselves (grid of x, fitted \hat f^{(k)}, SE) live on janusplot_data() — see there.

Examples

# Minimal runnable example — 3 variables, 6 asymmetric pairwise GAM fits.
janusplot(mtcars[, c("mpg", "hp", "wt")])


# Heteroscedastic DGP: Pearson r is ~ 0.9 but the inverse fit is
# clearly non-linear, yielding asymmetry index > 0.5.
set.seed(2026L)
n  <- 200L
x1 <- stats::runif(n, 0, 10)
x2 <- x1 + stats::rnorm(n, sd = 0.2 * x1)
janusplot(data.frame(x1 = x1, x2 = x2, x3 = stats::rnorm(n)))

# A single matrix renders a single quantity. To compare the fit
# against its derivatives, render three calls and place them
# side-by-side; each call's title makes the quantity explicit.
set.seed(2026L)
xs <- stats::runif(300L, -3, 3)
df <- data.frame(
  x  = xs,
  y1 = sin(xs)  + stats::rnorm(300L, sd = 0.3),
  y2 = xs^2     + stats::rnorm(300L, sd = 0.6)
)
janusplot(df, display = "fit")
janusplot(df, display = "d1")
janusplot(df, display = "d2")

# Simultaneous CI bands on a derivative panel, per Simpson (2018).
janusplot(df, display = "d1", derivative_ci = "simultaneous")

Raw GAM fits and per-cell metrics for a smoothed-association matrix

Description

Companion to janusplot() returning the raw list of GAM fits plus per-cell metrics (EDF, F-test p-value, deviance explained, asymmetry index, pairwise correlations, shape descriptors) without constructing the ggplot. Useful for custom rendering or downstream analysis.

Usage

janusplot_data(
  data,
  vars = NULL,
  adjust = NULL,
  method = "REML",
  k = -1L,
  bs = "tp",
  na_action = c("pairwise", "complete"),
  parallel = FALSE,
  keep_fits = FALSE,
  derivatives = integer(),
  derivative_ci = c("pointwise", "none", "simultaneous"),
  derivative_ci_nsim = 1000L,
  n_grid = NULL,
  shape_cutoffs = janusplot_shape_cutoffs(),
  ...
)

Arguments

data

A data frame with numeric columns to include.

vars

Character vector of column names to use. NULL (default) uses all numeric columns in data. Non-numeric columns trigger an error listing offenders.

adjust

method

Smoothing-parameter estimation method passed to mgcv::gam(). Default "REML" per mgcv recommendation.

k

Integer, or named list mapping variable names to integers. Basis dimension for s(). Default -1L (mgcv's automatic choice).

bs

Basis type for s(). Default "tp" (thin plate).

na_action

One of "pairwise" (default; per-cell complete observations) or "complete" (listwise; all cells use the same rows).

parallel

Logical. If TRUE, use future.apply::future_mapply() to fit pairs in parallel. Requires the future.apply package and a user-configured future::plan(). Default FALSE.

keep_fits

Logical. If TRUE, retain full mgcv::gam() model objects in the return (large memory footprint for k above ~15). Default FALSE — retains summary metrics and prediction grids only.

derivatives

Integer vector of derivative orders to compute on every pair (subset of 1:2). Default integer() — no derivatives. Unlike janusplot(), the data companion can return multiple orders from a single call for programmatic analysis; pass c(1L, 2L) to surface both.

derivative_ci

"none" — no ribbon. The curve and the zero reference line are all you see. Default, because pointwise ribbons overshoot nominal coverage as a joint region and can invite over-reading of local features.
"pointwise" — 95% pointwise ribbon from \sqrt{\mathrm{diag}(D V_p D^\top)} (Wood 2017 §7.2.4). Valid marginally; not a simultaneous statement.
"simultaneous" — 95% simultaneous band via the Monte Carlo construction of Ruppert, Wand & Carroll (2003) popularised for GAMs by Simpson (2018, Frontiers Ecol. Evol. 6:149): draw B samples \tilde{\boldsymbol\beta} \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p), compute \max_x |D_i(\tilde{\boldsymbol\beta} - \hat{\boldsymbol\beta})| / \mathrm{se}_i, and use the (1-\alpha) quantile as a critical multiplier on the pointwise SE. Valid for feature localisation ("where is \hat f'(x) significantly non-zero").

derivative_ci_nsim

n_grid

shape_cutoffs

Named list of classification thresholds used to map the continuous shape indices (monotonicity_index, convexity_index) into discrete shape_category labels. Defaults from janusplot_shape_cutoffs().

...

Additional arguments passed to mgcv::gam().

Value

A list with components:

vars: Character vector of variables used, in plotted order.
pairs: List of per-pair results. Each element has i, j, var_i, var_j, fit_yx, fit_xy (NULL if keep_fits = FALSE), pred_yx, pred_xy (data frames with x, fit, se, lo, hi), edf_yx, edf_xy, pvalue_yx, pvalue_xy, dev_exp_yx, dev_exp_xy, n_used, asymmetry_index, plus Pearson / Spearman / Kendall correlations (cor_pearson, cor_spearman, cor_kendall), the maximum tie ratio across x and y (tie_ratio), and per-direction shape descriptors (monotonicity_index_yx, convexity_index_yx, monotonicity_index_xy, convexity_index_xy, n_turning_yx, n_inflect_yx, n_turning_xy, n_inflect_xy, shape_yx, shape_xy). When derivatives is non-empty, each pair additionally carries deriv_yx and deriv_xy, each a named list keyed by order ("1", "2") whose entries are data frames with columns x, fit, se, lo, hi, ci_type matching the schema of pred_yx / pred_xy. The ci_type column records whether the lo / hi columns are "pointwise" (default), "simultaneous" (Ruppert–Wand–Carroll / Simpson 2018 critical-multiplier bands), or "none". When derivative_ci = "simultaneous", each derivative frame also carries a "crit_multiplier" attribute giving the MC-derived critical multiplier used. See janusplot_shape_metrics() for the definition of the monotonicity and convexity indices.
call: Match call.

Examples

# Per-pair fits + metrics on a small mtcars slice
out <- janusplot_data(mtcars[, c("mpg", "hp", "wt")])
out$pairs[[1L]]$asymmetry_index
out$pairs[[1L]]$cor_spearman
out$pairs[[1L]]$shape_yx

Default cutoff thresholds for `shape_category` classification

Description

Returns the named list of thresholds used to map the continuous monotonicity (M) and convexity (C) indices (plus inflection counts) into a discrete shape_category. Expose so callers can override individual thresholds or pass a fully custom list to janusplot() / janusplot_shape_metrics().

Usage

janusplot_shape_cutoffs(...)

Arguments

...

Optional named overrides to merge into the defaults.

Value

A named list with numeric thresholds:

mono_strong: ⁠|M|⁠ threshold for a strictly monotone smooth (default 0.9).
mono_mod: ⁠|M|⁠ threshold for a curved-but-monotone smooth (default 0.5).
mono_nonmono: ⁠|M|⁠ below this is considered non-monotone (default 0.3).
mono_s: ⁠|M|⁠ threshold for labelling an S-shape (default 0.5).
curv_low: ⁠|C|⁠ below this is considered near-linear curvature (default 0.2).
curv_mod: ⁠|C|⁠ threshold for a clearly curved monotone (default 0.5).
curv_strong: ⁠|C|⁠ threshold for a U-shape / inverted-U shape (default 0.5).
flat: range(fit) / sd(y) below this is called flat (default 0.05).

Examples

janusplot_shape_cutoffs()
janusplot_shape_cutoffs(curv_mod = 0.6, flat = 0.02)

Shape-category taxonomy table

Description

Return the full janusplot shape taxonomy as a data frame with four hierarchy columns plus presentation fields. The taxonomy is the single source of truth consumed by the classifier, the cell renderer, the legend plate, and the janusplot_data() output.

Hierarchy columns (finest → coarsest):

category: 24-way fine label (linear_up, skewed_peak, bimodal, …). Computed per cell by janusplot().
code: Unique two-letter ASCII shorthand (safe on any font or typesetting pipeline) — e.g. lu for linear_up.
archetype: Seven-family grouping: monotone_linear, monotone_curved, unimodal, wave, multimodal, chaotic, degenerate.
monotonic: Three-way coarse classification: monotone / non_monotone / degenerate.
linear: Binary: linear / non_linear / degenerate.

The broader tiers (linear/non-linear, monotone/non-monotone) are textbook calculus; the archetype layer maps cleanly to shape-constrained regression vocabulary (Pya & Wood 2015; Meyer 2008) and to dose-response shape categories (Calabrese 2008; Calabrese & Baldwin 2001). The ⁠(T, I)⁠ dispatch underlying each fine category is a coarsened Morse-theoretic critical-point classification (Milnor 1963).

Usage

janusplot_shape_hierarchy()

Value

A data frame with 24 rows and columns category, code, archetype, monotonic, linear, glyph, ascii, label, gloss.

References

Calabrese, E. J. (2008). Hormesis: why it is important to toxicology and toxicologists. Environmental Toxicology and Chemistry, 27(7), 1451–1474.

Meyer, M. C. (2008). Inference using shape-restricted regression splines. Annals of Applied Statistics, 2(3), 1013–1033.

Milnor, J. (1963). Morse Theory. Princeton University Press.

Pya, N., & Wood, S. N. (2015). Shape constrained additive models. Statistics and Computing, 25(3), 543–559.

Examples

tax <- janusplot_shape_hierarchy()
head(tax[, c("category", "code", "archetype", "monotonic", "linear")])
# Count how many categories live in each archetype
table(tax$archetype)

Shape metrics for a fitted univariate smooth

Description

Compute the continuous monotonicity and convexity indices, inflection and turning-point counts, and rule-based shape category for a fitted univariate smooth. Works on either a per-pair fit object returned from the janusplot internal machinery or a freshly fitted mgcv::gam() with a single s() term.

Both indices are bounded in ⁠[-1, 1]⁠ and weighted by the empirical density of the predictor:

monotonicity_index (paper symbol M). Let f be the fitted smooth evaluated on a dense grid of n_grid equally-spaced points across the predictor range, ⁠f'⁠ its numerical first derivative, and w the empirical density of the predictor on the same grid with sum(w) = 1. Then ⁠monotonicity_index = sum(w * f') / sum(w * |f'|) in [-1, 1]⁠. +1 is strictly increasing, -1 strictly decreasing, 0 non-monotone.
convexity_index (paper symbol C). With ⁠f''⁠ the numerical second derivative on the same grid, ⁠convexity_index = sum(w * f'') / sum(w * |f''|) in [-1, 1]⁠. +1 is globally convex (bowl-up), -1 globally concave (bowl-down), 0 inflection-dominated (S-curve, sine, flat).

Both indices are scale-invariant (replacing y -> a*y + b leaves them unchanged) and density-weighted so they describe the smooth where the data actually live, not extrapolated tails.

Usage

janusplot_shape_metrics(
  fit,
  x_name = NULL,
  newdata = NULL,
  n_grid = 200L,
  cutoffs = janusplot_shape_cutoffs()
)

Arguments

fit

Either a list returned by a janusplot pair-fit helper (must contain pred and raw), or a fitted mgcv::gam() with a single s(x) term.

x_name

Character. Column name of the predictor when fit is a mgcv::gam() object. Ignored for pair-fit lists.

newdata

Optional data frame supplying the raw predictor values used for density weighting when fit is a mgcv::gam() object. If NULL, the model frame is used.

n_grid

Integer. Prediction grid length when fit is a mgcv::gam() object. Default 200L.

cutoffs

Named list of classification thresholds; see janusplot_shape_cutoffs(). Default uses package defaults.

Value

A named list with components:

monotonicity_index: M in ⁠[-1, 1]⁠. See Description.
convexity_index: C in ⁠[-1, 1]⁠. See Description.
n_turning_points: Integer count of lobe-mass-weighted sign changes of ⁠f'⁠. Equals the number of interior extrema.
n_inflections: Integer count of lobe-mass-weighted sign changes of ⁠f''⁠.
flat_range_ratio: range(f) / sd(y) — small values indicate a degenerate flat smooth.
shape_category: One of 24 labels from janusplot_shape_hierarchy() dispatched on ⁠(n_turning_points, n_inflections)⁠ with ⁠(monotonicity_index, convexity_index)⁠ disambiguation for the monotone case.

Examples

# On a fitted gam
set.seed(2026L)
n  <- 200L
x  <- stats::runif(n, 0, 10)
y  <- log1p(x) + stats::rnorm(n, sd = 0.3)
d  <- data.frame(x = x, y = y)
fit <- mgcv::gam(y ~ s(x), data = d, method = "REML")
janusplot_shape_metrics(fit, x_name = "x", newdata = d)

Shape-recognition sensitivity study

Description

Run a full-factorial sensitivity sweep for the janusplot 24-category shape classifier. For each combination of ground-truth shape, sample size n, noise level sigma, and replicate, the sweep:

Generates n points from the noiseless canonical curve on ⁠[0, 1]⁠ + Gaussian noise with SD = sigma (fraction of the y-range, so signal-to-noise is comparable across shapes).
Fits mgcv::gam(y ~ s(x), method = "REML").
Runs janusplot_shape_metrics() to classify the fitted smooth.
Records correctness at both the fine (24-category) and archetype (7-family) levels.

The function is the package-native implementation of simulation/scripts/scenario_4_shape_recognition.R. A small precomputed dataset is shipped as shape_sensitivity_demo for downstream examples without requiring users to re-run the sweep.

Usage

janusplot_shape_sensitivity(
  shapes = NULL,
  n_grid = c(50L, 100L, 200L, 500L),
  sigma_grid = c(0.02, 0.05, 0.1, 0.2, 0.4),
  n_rep = 200L,
  cutoffs = janusplot_shape_cutoffs(),
  parallel = FALSE,
  seed = 2026L,
  verbose = interactive()
)

Arguments

shapes

Character vector of ground-truth names from janusplot_shape_sensitivity_shapes(). Default NULL → all 14.

n_grid

Integer vector of sample sizes. Default c(50L, 100L, 200L, 500L).

sigma_grid

Numeric vector of noise levels (fraction of the y-range). Default c(0.02, 0.05, 0.10, 0.20, 0.40).

n_rep

Integer. Replicates per cell. Default 200L.

cutoffs

Named list of classification thresholds; see janusplot_shape_cutoffs().

parallel

Logical. If TRUE and future.apply is installed, dispatch replicates in parallel. The caller is responsible for configuring future::plan() (e.g. future::plan(future::multisession)).

seed

Integer. Base seed — each fit uses seed + row_index so results are reproducible and cell-permutation-invariant.

verbose

Logical. Print progress messages to the console. Default is interactive().

Value

A data frame with one row per fit. Columns:

truth: Ground-truth shape name.
n: Sample size for this fit.
sigma: Noise level for this fit.
seed: RNG seed used.
predicted: Classifier output at the fine (24-category) level.
correct: Logical — does predicted == truth?
archetype_truth: Expected archetype for truth.
archetype_pred: Archetype of predicted.
archetype_correct: Logical — archetype-level correctness.
monotonicity_index: Monotonicity index M (see janusplot_shape_metrics()).
convexity_index: Convexity index C (see janusplot_shape_metrics()).
n_turn, n_inflect: Recovered turning-point and inflection counts.
error: "gam_fit_failed" when mgcv::gam() errored; NA otherwise.

Examples

# Tiny-run smoke test (< 2 seconds): 3 shapes x 2 n x 2 sigma x 5 reps.
res <- janusplot_shape_sensitivity(
  shapes     = c("linear_up", "u_shape", "wave"),
  n_grid     = c(100L, 200L),
  sigma_grid = c(0.05, 0.20),
  n_rep      = 5L,
  verbose    = FALSE
)
head(res)
janusplot_shape_sensitivity_summary(res, level = "archetype")

Visualise a shape-sensitivity sweep

Description

Produce one of four diagnostic plots from the raw data frame returned by janusplot_shape_sensitivity():

"confusion_fine": 24 x (|shapes|) confusion matrix at the fine category level — rows = ground truth, columns = predicted, cells coloured by P(pred | truth).
"confusion_archetype": 7 x 7 confusion matrix at the archetype level.
"accuracy_grid": per-shape heatmap of archetype-level accuracy across the ⁠(n, sigma)⁠ design.
"recovery_curves": accuracy as a function of sigma, one line per sample size, faceted by shape.

Usage

janusplot_shape_sensitivity_plot(
  results,
  type = c("confusion_fine", "confusion_archetype", "accuracy_grid", "recovery_curves")
)

Arguments

results

Data frame from janusplot_shape_sensitivity() or the precomputed shape_sensitivity_demo.

type

One of "confusion_fine", "confusion_archetype", "accuracy_grid", or "recovery_curves".

Value

A ggplot2::ggplot object.

Examples

data("shape_sensitivity_demo", package = "janusplot")
janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "recovery_curves")

Canonical ground-truth shapes for the sensitivity study

Description

Return the names of every canonical ground-truth shape that janusplot_shape_sensitivity() can simulate from. Fourteen shapes spanning five archetypes (monotone_linear, monotone_curved, unimodal, wave, multimodal). The chaotic and degenerate archetypes are out of scope (no realistic deterministic generator).

Usage

janusplot_shape_sensitivity_shapes()

Value

Character vector of length 14 — the generator names.

Examples

janusplot_shape_sensitivity_shapes()

Summarise a shape-sensitivity sweep

Description

Aggregate the raw output of janusplot_shape_sensitivity() into a per-cell mean-accuracy table at either the fine (24-category) or archetype (7-family) level.

Usage

janusplot_shape_sensitivity_summary(results, level = c("fine", "archetype"))

Arguments

results

Data frame returned by janusplot_shape_sensitivity().

level

One of "fine" (default) or "archetype".

Value

A data frame with columns truth, n, sigma, accuracy.

Examples

data("shape_sensitivity_demo", package = "janusplot")
head(janusplot_shape_sensitivity_summary(shape_sensitivity_demo,
                                         level = "archetype"))

Precomputed shape-recognition sensitivity results (demo)

Description

Raw output from a small-footprint invocation of janusplot_shape_sensitivity(). Shipped so users can explore the sensitivity API and regenerate every figure in the shape-recognition-sensitivity vignette without having to re-run the sweep themselves. Regenerated via data-raw/shape_sensitivity_demo.R.

Design:

Shapes (6, one per non-degenerate archetype): linear_up, concave_up, u_shape, inverted_u, wave, bimodal.
Sample sizes (3): c(100, 200, 500).
Noise levels (4): c(0.05, 0.10, 0.20, 0.40) fraction of y-range.
Replicates: 30.
Total fits: 2160.
Seed: 2026.

Usage

shape_sensitivity_demo

Format

A data frame with 2160 rows and 14 columns — see the "Value" section of janusplot_shape_sensitivity() for the column schema.

Examples

data("shape_sensitivity_demo", package = "janusplot")
head(shape_sensitivity_demo)
janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "recovery_curves")

Package {janusplot}

janusplot: Asymmetric Smoothed-Association Matrices via GAM Fits

Description

Main functions

Asymmetry index

Author(s)

See Also

Asymmetric smoothed-association matrix

Description

Usage

Arguments

Value

See Also

Examples

Raw GAM fits and per-cell metrics for a smoothed-association matrix

Description

Usage

Arguments

Value

See Also

Examples

Default cutoff thresholds for shape_category classification

Description

Usage

Arguments

Value

See Also

Examples

Shape-category taxonomy table

Description

Usage

Value

References

See Also

Examples

Shape metrics for a fitted univariate smooth

Description

Usage

Arguments

Value

See Also

Examples

Shape-recognition sensitivity study

Description

Usage

Arguments

Value

See Also

Examples

Visualise a shape-sensitivity sweep

Description

Usage

Arguments

Value

See Also

Examples

Canonical ground-truth shapes for the sensitivity study

Description

Usage

Value

See Also

Examples

Summarise a shape-sensitivity sweep

Description

Usage

Arguments

Value

See Also

Examples

Precomputed shape-recognition sensitivity results (demo)

Description

Usage

Format

See Also

Examples

Default cutoff thresholds for `shape_category` classification