The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Asymmetric Smoothed-Association Matrices via GAM Fits
Version: 0.1.0
Description: Render a pairwise, asymmetric smoothed-association matrix of continuous variables. Each cell shows the fitted spline from an 'mgcv' generalised additive model, with the upper triangle displaying 'gam(x_j ~ s(x_i))' and the lower triangle 'gam(x_i ~ s(x_j))'. Unlike Pearson's correlation matrix, the visualisation is intentionally asymmetric, revealing heteroscedasticity, leverage, and directional non-linearity that a single scalar correlation hides. An asymmetry index and a 24-category shape taxonomy quantify the directional difference and qualitative form of each fitted smooth.
License: GPL (≥ 3)
URL: https://github.com/max578/janusplot, https://max578.github.io/janusplot/
BugReports: https://github.com/max578/janusplot/issues
Encoding: UTF-8
Language: en-AU
Depends: R (≥ 4.3.0)
Imports: mgcv (≥ 1.9.0), ggplot2 (≥ 3.5.0), patchwork (≥ 1.1.0), grid, stats, cli (≥ 3.6.0), lifecycle, rlang (≥ 1.1.0)
Suggests: agridat, future, future.apply, knitr, MASS, palmerpenguins, rmarkdown, testthat (≥ 3.0.0), vdiffr (≥ 1.0.0), withr
VignetteBuilder: knitr
RoxygenNote: 7.3.3
Config/testthat/edition: 3
Config/testthat/parallel: true
Config/Needs/website: pkgdown
LazyData: true
NeedsCompilation: no
Packaged: 2026-04-23 14:06:24 UTC; a1222812
Author: Max Moldovan ORCID iD [aut, cre, cph]
Maintainer: Max Moldovan <max.moldovan@adelaide.edu.au>
Repository: CRAN
Date/Publication: 2026-04-28 18:30:08 UTC

janusplot: Asymmetric Smoothed-Association Matrices via GAM Fits

Description

janusplot renders pairwise, asymmetric smoothed-association matrices of continuous variables. Each cell shows the fitted spline from an mgcv::gam() model, with upper and lower triangles encoding the two directional regressions y ~ s(x) and x ~ s(y) respectively.

Unlike a Pearson correlation matrix (one scalar per pair, symmetric), a smoothed-association matrix gives two curves per pair and is intentionally asymmetric. Heteroscedasticity, leverage, and directional non-linearity become visually evident.

Main functions

Asymmetry index

For each pair, the asymmetry index ⁠A_ij = |EDF_yx - EDF_xy| / (EDF_yx + EDF_xy)⁠ is bounded in [0, 1]. Values near 0 indicate symmetric complexity; values near 1 indicate the two directional fits differ sharply in effective degrees of freedom.

Under the additive noise model (Hoyer et al. 2009; Peters et al. 2014), the two directional regressions are generally asymmetric when the data-generating process is non-linear, and this asymmetry identifies the causal direction under mild conditions. The asymmetry index is offered here as a visual pre-discovery diagnostic rather than a causal inference procedure; see the package vignette and accompanying paper for full scope and limitations (in particular the failure modes under heteroscedasticity, confounding, and Gaussian-linear DGPs).

Author(s)

Maintainer: Max Moldovan max.moldovan@adelaide.edu.au (ORCID) [copyright holder]

See Also

Useful links:


Asymmetric smoothed-association matrix

Description

[Experimental]

Render a pairwise, asymmetric matrix of smoothed associations between numeric variables. Each cell [i, j] where i != j shows the fitted spline from mgcv::gam():

The two triangles intentionally differ — the asymmetry reveals heteroscedasticity, leverage, and directional non-linearity that a single scalar correlation hides.

Usage

janusplot(
  data,
  vars = NULL,
  adjust = NULL,
  method = "REML",
  k = -1L,
  bs = "tp",
  order = c("original", "hclust", "alphabetical"),
  show_data = TRUE,
  show_ci = TRUE,
  display = c("fit", "d1", "d2"),
  derivative_ci = c("none", "pointwise", "simultaneous"),
  derivative_ci_nsim = 1000L,
  n_grid = NULL,
  colour_by = c("pearson", "spearman", "kendall", "edf", "deviance_gap", "none"),
  fill_by = NULL,
  palette = NULL,
  annotations = c("edf", "A"),
  shape_cutoffs = janusplot_shape_cutoffs(),
  show_shape_legend = TRUE,
  glyph_style = c("ascii", "unicode"),
  labels = c("border", "diagonal", "none"),
  diagonal = c("auto", "blank", "name", "density"),
  label_srt = 45,
  label_cex = 1,
  signif_glyph = TRUE,
  show_asymmetry = NULL,
  na_action = c("pairwise", "complete"),
  parallel = FALSE,
  with_data = FALSE,
  text_scale_diag = 1,
  text_scale_off_diag = 1,
  show_glossary = TRUE,
  glossary_scale = 1,
  ...
)

Arguments

data

A data frame with numeric columns to include.

vars

Character vector of column names to use. NULL (default) uses all numeric columns in data. Non-numeric columns trigger an error listing offenders.

adjust

A one-sided formula RHS giving additional covariates and/or random effects to include in every pairwise GAM. For example, adjust = ~ s(age) + s(site, bs = "re") fits gam(y ~ s(x) + s(age) + s(site, bs = "re")) for each pair. Default NULL fits unadjusted pairwise smooths.

method

Smoothing-parameter estimation method passed to mgcv::gam(). Default "REML" per mgcv recommendation.

k

Integer, or named list mapping variable names to integers. Basis dimension for s(). Default -1L (mgcv's automatic choice).

bs

Basis type for s(). Default "tp" (thin plate).

order

One of "original" (default), "hclust" (reorder by hierarchical clustering of Pearson correlations), or "alphabetical".

show_data

Logical. If TRUE (default), overlay raw data points (low alpha) behind each spline. Only applies when display = "fit"; derivative panels never overlay raw data.

show_ci

Logical. If TRUE (default), overlay the 95% confidence envelope from predict(gam, se.fit = TRUE) on the fit panel (i.e. when display = "fit"). CI rendering on derivative panels is controlled separately by derivative_ci.

display

One of "fit" (default), "d1", or "d2". Selects which single quantity is rendered in every off-diagonal cell of the matrix.

  • "fit" — the fitted smooth \hat f(x); default, behaviour identical to the pre-derivative release.

  • "d1" — the first derivative \hat f'(x) of the fitted smooth. Zero crossings localise turning points of \hat f.

  • "d2" — the second derivative \hat f''(x). Zero crossings localise inflection points of \hat f.

A single matrix shows a single quantity by design: stacked multi-panel cells crowd the matrix at any realistic variable count. To compare fit against derivative, render two or three janusplot() calls side-by-side; each call keeps its own with_data = TRUE summary table tagged with the display column.

Orders k \ge 3 are not exposed — higher-order derivatives of penalised regression splines amplify noise and rarely carry usable signal at realistic sample sizes. See vignette("janusplot") for the theoretical justification and applied use-cases.

derivative_ci

One of "none" (default), "pointwise", or "simultaneous". Controls whether — and how — a 95% confidence ribbon is drawn underneath the derivative curve when display %in% c("d1", "d2"). Ignored when display = "fit".

  • "none" — no ribbon. The curve and the zero reference line are all you see. Default, because pointwise ribbons overshoot nominal coverage as a joint region and can invite over-reading of local features.

  • "pointwise" — 95% pointwise ribbon from \sqrt{\mathrm{diag}(D V_p D^\top)} (Wood 2017 §7.2.4). Valid marginally; not a simultaneous statement.

  • "simultaneous" — 95% simultaneous band via the Monte Carlo construction of Ruppert, Wand & Carroll (2003) popularised for GAMs by Simpson (2018, Frontiers Ecol. Evol. 6:149): draw B samples \tilde{\boldsymbol\beta} \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p), compute \max_x |D_i(\tilde{\boldsymbol\beta} - \hat{\boldsymbol\beta})| / \mathrm{se}_i, and use the (1-\alpha) quantile as a critical multiplier on the pointwise SE. Valid for feature localisation ("where is \hat f'(x) significantly non-zero").

derivative_ci_nsim

Integer. Number of Monte Carlo samples used when derivative_ci = "simultaneous". Default 1000L — a compromise between coverage accuracy (Simpson 2018 uses 10000) and CPU budget across every pair in a medium-sized matrix. Ignored for any other derivative_ci.

n_grid

Integer or NULL. Number of equally-spaced points used to evaluate each fitted smooth (and its derivatives). Default NULL resolves to 100 when display = "fit" and 200 otherwise, because finite-difference second derivatives visibly degrade below \sim 150 points on moderate-k smooths. Supplying n_grid directly overrides both defaults. Larger grids shift the numerical shape-metric values (M, C, turning / inflection counts) slightly because they are computed on this same grid. Shapes and asymmetry are the primary reading; M, C and the counts are secondary diagnostics and the grid-induced drift is tolerable.

colour_by

One of "pearson" (default), "spearman", "kendall", "edf", "deviance_gap", or "none". Encodes the per-cell fill colour by the chosen scalar. Correlation choices use a diverging palette with limits c(-1, 1) and a shared corr colour-bar title; "edf" and "deviance_gap" use a sequential palette labelled by the metric.

fill_by

Deprecated alias for colour_by. If supplied emits a single soft deprecation warning and is forwarded to colour_by.

palette

Character. Colour palette for the cell fill scale. Defaults to "RdBu" when colour_by is a correlation and "viridis" otherwise. Sequential choices: "viridis", "magma", "inferno", "plasma", "cividis", "mako", "rocket", "turbo" (not CB-safe), "YlOrRd", "YlGnBu", "Blues", "Greens". Diverging choices: "RdYlBu", "RdBu", "PuOr", "Spectral" (not CB-safe). Passing a sequential palette while colour_by is a correlation silently upgrades to the default diverging palette.

annotations

Character vector, a subset of c("edf", "A", "shape", "code"). Controls which corner annotations appear on each off-diagonal cell:

  • "code" — 2-letter ASCII shape code, top-left corner.

  • "A" and "edf" — asymmetry index and effective degrees of freedom, stacked bottom-left.

  • "shape" — shape glyph (Unicode or ASCII per glyph_style), bottom-right corner.

Default c("edf", "A"). "code" and "shape" occupy distinct corners so both can be requested together. See janusplot_shape_hierarchy() for the full code list.

shape_cutoffs

Named list of classification thresholds used to map the continuous shape indices into discrete shape_category labels; see janusplot_shape_cutoffs().

show_shape_legend

Logical. If TRUE (default), attach a standing shape-types legend plate below the matrix that illustrates every category in the taxonomy as a canonical thumbnail spline. Independent of annotations.

glyph_style

One of "ascii" (default) or "unicode". Controls how cell shape glyphs render when "shape" is included in annotations. Default is "ascii" for maximum portability across typesetting pipelines; switch to "unicode" only when the target font is known to cover the curve glyph set.

labels

One of "border" (default), "diagonal", or "none". Controls where variable names are rendered:

  • "border" — names along the top (rotated per label_srt) and left margins of the matrix; diagonal cells are left blank. Mirrors corrplot's tl.pos = "lt" convention.

  • "diagonal" — names centred on the diagonal cells (the pre-0.1 layout).

  • "none" — labels suppressed entirely; diagonal cells blank.

diagonal

One of "auto" (default), "blank", "name", or "density". Controls what is rendered in the diagonal cells of the matrix.

  • "auto" — preserves the historical behaviour: variable name when labels = "diagonal", blank otherwise.

  • "blank" — empty bordered panel (uniform grid reading).

  • "name" — variable name centred in the cell, bold.

  • "density" — kernel density of the variable filled in translucent grey, with a rug of raw values along the bottom edge. Mirrors the GGally::ggpairs convention; surfaces tail weight, bimodality, and support clipping that the pairwise smooths alone cannot reveal. Variable names should come from the border (labels = "border", the default) when this mode is on.

label_srt

Numeric. Rotation (degrees) of top labels when labels = "border". Default 45; set to 0 for horizontal or 90 for vertical. Ignored when labels != "border".

label_cex

Positive numeric multiplier on the border-label font size. Default 1. Ignored when labels = "none".

signif_glyph

Logical. If TRUE (default), annotate cells with ⁠·⁠ / * / ⁠**⁠ reflecting the smooth's F-test p-value.

show_asymmetry

Deprecated. Use annotations instead ("A" %in% annotations). When supplied, a soft deprecation warning fires and the argument is merged into annotations.

na_action

One of "pairwise" (default; per-cell complete observations) or "complete" (listwise; all cells use the same rows).

parallel

Logical. If TRUE, use future.apply::future_mapply() to fit pairs in parallel. Requires the future.apply package and a user-configured future::plan(). Default FALSE.

with_data

Logical. If TRUE, return a two-element list list(plot, data) where data is a flat per-cell summary (one row per off-diagonal cell) of everything the plot displays. The data element is always a plain data.frame (base R — no data.table dependency). Default FALSE — in which case only the ggplot is returned.

text_scale_diag

Positive numeric multiplier applied to the diagonal variable-name labels. Default 1. Diagonal labels additionally auto-shrink for long variable names (nchar(var) > 10) so they fit the cell regardless of this value.

text_scale_off_diag

Positive numeric multiplier applied to all off-diagonal annotations (n / EDF readouts, significance glyphs, asymmetry-index labels). Default 1. Use ⁠< 1⁠ when cells are small and the annotations crowd the fit line; use ⁠> 1⁠ for presentation plots.

show_glossary

Logical. If TRUE (default), attach a multi-line caption below the matrix describing the on-plot abbreviations (n, EDF, A, fill encoding, significance glyphs). Only keys actually displayed are listed.

glossary_scale

Positive numeric multiplier on the glossary caption font size. Default 1.

...

Additional arguments passed to mgcv::gam().

Value

If with_data = FALSE (default), a ggplot2::ggplot object (via patchwork::wrap_plots()) carrying a top-of-matrix title that names the displayed quantity ("Direct fit", "First derivative f'", or "Second derivative f''"). If with_data = TRUE, a list with two elements: plot (the ggplot) and data (a tidy table with columns var_x, var_y, position, n_used, edf, pvalue, signif, dev_exp, asymmetry_index, cor_pearson, cor_spearman, cor_kendall, tie_ratio, monotonicity_index, convexity_index, n_turning_points, n_inflections, flat_range_ratio, shape_category, colour_value, display, one row per off-diagonal cell). The display column tags which quantity the call rendered, so separate calls for fit / d1 / d2 yield comparable, stackable tables. Derivative curves themselves (grid of x, fitted \hat f^{(k)}, SE) live on janusplot_data() — see there.

See Also

janusplot_data() for the raw per-cell fits + metrics.

Other smooth-associations: janusplot_data()

Examples

# Minimal runnable example — 3 variables, 6 asymmetric pairwise GAM fits.
janusplot(mtcars[, c("mpg", "hp", "wt")])


# Heteroscedastic DGP: Pearson r is ~ 0.9 but the inverse fit is
# clearly non-linear, yielding asymmetry index > 0.5.
set.seed(2026L)
n  <- 200L
x1 <- stats::runif(n, 0, 10)
x2 <- x1 + stats::rnorm(n, sd = 0.2 * x1)
janusplot(data.frame(x1 = x1, x2 = x2, x3 = stats::rnorm(n)))

# A single matrix renders a single quantity. To compare the fit
# against its derivatives, render three calls and place them
# side-by-side; each call's title makes the quantity explicit.
set.seed(2026L)
xs <- stats::runif(300L, -3, 3)
df <- data.frame(
  x  = xs,
  y1 = sin(xs)  + stats::rnorm(300L, sd = 0.3),
  y2 = xs^2     + stats::rnorm(300L, sd = 0.6)
)
janusplot(df, display = "fit")
janusplot(df, display = "d1")
janusplot(df, display = "d2")

# Simultaneous CI bands on a derivative panel, per Simpson (2018).
janusplot(df, display = "d1", derivative_ci = "simultaneous")


Raw GAM fits and per-cell metrics for a smoothed-association matrix

Description

[Experimental]

Companion to janusplot() returning the raw list of GAM fits plus per-cell metrics (EDF, F-test p-value, deviance explained, asymmetry index, pairwise correlations, shape descriptors) without constructing the ggplot. Useful for custom rendering or downstream analysis.

Usage

janusplot_data(
  data,
  vars = NULL,
  adjust = NULL,
  method = "REML",
  k = -1L,
  bs = "tp",
  na_action = c("pairwise", "complete"),
  parallel = FALSE,
  keep_fits = FALSE,
  derivatives = integer(),
  derivative_ci = c("pointwise", "none", "simultaneous"),
  derivative_ci_nsim = 1000L,
  n_grid = NULL,
  shape_cutoffs = janusplot_shape_cutoffs(),
  ...
)

Arguments

data

A data frame with numeric columns to include.

vars

Character vector of column names to use. NULL (default) uses all numeric columns in data. Non-numeric columns trigger an error listing offenders.

adjust

A one-sided formula RHS giving additional covariates and/or random effects to include in every pairwise GAM. For example, adjust = ~ s(age) + s(site, bs = "re") fits gam(y ~ s(x) + s(age) + s(site, bs = "re")) for each pair. Default NULL fits unadjusted pairwise smooths.

method

Smoothing-parameter estimation method passed to mgcv::gam(). Default "REML" per mgcv recommendation.

k

Integer, or named list mapping variable names to integers. Basis dimension for s(). Default -1L (mgcv's automatic choice).

bs

Basis type for s(). Default "tp" (thin plate).

na_action

One of "pairwise" (default; per-cell complete observations) or "complete" (listwise; all cells use the same rows).

parallel

Logical. If TRUE, use future.apply::future_mapply() to fit pairs in parallel. Requires the future.apply package and a user-configured future::plan(). Default FALSE.

keep_fits

Logical. If TRUE, retain full mgcv::gam() model objects in the return (large memory footprint for k above ~15). Default FALSE — retains summary metrics and prediction grids only.

derivatives

Integer vector of derivative orders to compute on every pair (subset of 1:2). Default integer() — no derivatives. Unlike janusplot(), the data companion can return multiple orders from a single call for programmatic analysis; pass c(1L, 2L) to surface both.

derivative_ci

One of "none" (default), "pointwise", or "simultaneous". Controls whether — and how — a 95% confidence ribbon is drawn underneath the derivative curve when display %in% c("d1", "d2"). Ignored when display = "fit".

  • "none" — no ribbon. The curve and the zero reference line are all you see. Default, because pointwise ribbons overshoot nominal coverage as a joint region and can invite over-reading of local features.

  • "pointwise" — 95% pointwise ribbon from \sqrt{\mathrm{diag}(D V_p D^\top)} (Wood 2017 §7.2.4). Valid marginally; not a simultaneous statement.

  • "simultaneous" — 95% simultaneous band via the Monte Carlo construction of Ruppert, Wand & Carroll (2003) popularised for GAMs by Simpson (2018, Frontiers Ecol. Evol. 6:149): draw B samples \tilde{\boldsymbol\beta} \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p), compute \max_x |D_i(\tilde{\boldsymbol\beta} - \hat{\boldsymbol\beta})| / \mathrm{se}_i, and use the (1-\alpha) quantile as a critical multiplier on the pointwise SE. Valid for feature localisation ("where is \hat f'(x) significantly non-zero").

derivative_ci_nsim

Integer. Number of Monte Carlo samples used when derivative_ci = "simultaneous". Default 1000L — a compromise between coverage accuracy (Simpson 2018 uses 10000) and CPU budget across every pair in a medium-sized matrix. Ignored for any other derivative_ci.

n_grid

Integer or NULL. Number of equally-spaced points used to evaluate each fitted smooth (and its derivatives). Default NULL resolves to 100 when display = "fit" and 200 otherwise, because finite-difference second derivatives visibly degrade below \sim 150 points on moderate-k smooths. Supplying n_grid directly overrides both defaults. Larger grids shift the numerical shape-metric values (M, C, turning / inflection counts) slightly because they are computed on this same grid. Shapes and asymmetry are the primary reading; M, C and the counts are secondary diagnostics and the grid-induced drift is tolerable.

shape_cutoffs

Named list of classification thresholds used to map the continuous shape indices (monotonicity_index, convexity_index) into discrete shape_category labels. Defaults from janusplot_shape_cutoffs().

...

Additional arguments passed to mgcv::gam().

Value

A list with components:

vars

Character vector of variables used, in plotted order.

pairs

List of per-pair results. Each element has i, j, var_i, var_j, fit_yx, fit_xy (NULL if keep_fits = FALSE), pred_yx, pred_xy (data frames with x, fit, se, lo, hi), edf_yx, edf_xy, pvalue_yx, pvalue_xy, dev_exp_yx, dev_exp_xy, n_used, asymmetry_index, plus Pearson / Spearman / Kendall correlations (cor_pearson, cor_spearman, cor_kendall), the maximum tie ratio across x and y (tie_ratio), and per-direction shape descriptors (monotonicity_index_yx, convexity_index_yx, monotonicity_index_xy, convexity_index_xy, n_turning_yx, n_inflect_yx, n_turning_xy, n_inflect_xy, shape_yx, shape_xy). When derivatives is non-empty, each pair additionally carries deriv_yx and deriv_xy, each a named list keyed by order ("1", "2") whose entries are data frames with columns x, fit, se, lo, hi, ci_type matching the schema of pred_yx / pred_xy. The ci_type column records whether the lo / hi columns are "pointwise" (default), "simultaneous" (Ruppert–Wand–Carroll / Simpson 2018 critical-multiplier bands), or "none". When derivative_ci = "simultaneous", each derivative frame also carries a "crit_multiplier" attribute giving the MC-derived critical multiplier used. See janusplot_shape_metrics() for the definition of the monotonicity and convexity indices.

call

Match call.

See Also

janusplot() for the ggplot front-end, janusplot_shape_metrics() for the shape-metric primitives.

Other smooth-associations: janusplot()

Examples

# Per-pair fits + metrics on a small mtcars slice
out <- janusplot_data(mtcars[, c("mpg", "hp", "wt")])
out$pairs[[1L]]$asymmetry_index
out$pairs[[1L]]$cor_spearman
out$pairs[[1L]]$shape_yx

Default cutoff thresholds for shape_category classification

Description

[Experimental]

Returns the named list of thresholds used to map the continuous monotonicity (M) and convexity (C) indices (plus inflection counts) into a discrete shape_category. Expose so callers can override individual thresholds or pass a fully custom list to janusplot() / janusplot_shape_metrics().

Usage

janusplot_shape_cutoffs(...)

Arguments

...

Optional named overrides to merge into the defaults.

Value

A named list with numeric thresholds:

mono_strong

⁠|M|⁠ threshold for a strictly monotone smooth (default 0.9).

mono_mod

⁠|M|⁠ threshold for a curved-but-monotone smooth (default 0.5).

mono_nonmono

⁠|M|⁠ below this is considered non-monotone (default 0.3).

mono_s

⁠|M|⁠ threshold for labelling an S-shape (default 0.5).

curv_low

⁠|C|⁠ below this is considered near-linear curvature (default 0.2).

curv_mod

⁠|C|⁠ threshold for a clearly curved monotone (default 0.5).

curv_strong

⁠|C|⁠ threshold for a U-shape / inverted-U shape (default 0.5).

flat

range(fit) / sd(y) below this is called flat (default 0.05).

See Also

Other shape-metrics: janusplot_shape_hierarchy(), janusplot_shape_metrics()

Examples

janusplot_shape_cutoffs()
janusplot_shape_cutoffs(curv_mod = 0.6, flat = 0.02)

Shape-category taxonomy table

Description

[Experimental]

Return the full janusplot shape taxonomy as a data frame with four hierarchy columns plus presentation fields. The taxonomy is the single source of truth consumed by the classifier, the cell renderer, the legend plate, and the janusplot_data() output.

Hierarchy columns (finest → coarsest):

category

24-way fine label (linear_up, skewed_peak, bimodal, …). Computed per cell by janusplot().

code

Unique two-letter ASCII shorthand (safe on any font or typesetting pipeline) — e.g. lu for linear_up.

archetype

Seven-family grouping: monotone_linear, monotone_curved, unimodal, wave, multimodal, chaotic, degenerate.

monotonic

Three-way coarse classification: monotone / non_monotone / degenerate.

linear

Binary: linear / non_linear / degenerate.

The broader tiers (linear/non-linear, monotone/non-monotone) are textbook calculus; the archetype layer maps cleanly to shape-constrained regression vocabulary (Pya & Wood 2015; Meyer 2008) and to dose-response shape categories (Calabrese 2008; Calabrese & Baldwin 2001). The ⁠(T, I)⁠ dispatch underlying each fine category is a coarsened Morse-theoretic critical-point classification (Milnor 1963).

Usage

janusplot_shape_hierarchy()

Value

A data frame with 24 rows and columns category, code, archetype, monotonic, linear, glyph, ascii, label, gloss.

References

Calabrese, E. J. (2008). Hormesis: why it is important to toxicology and toxicologists. Environmental Toxicology and Chemistry, 27(7), 1451–1474.

Meyer, M. C. (2008). Inference using shape-restricted regression splines. Annals of Applied Statistics, 2(3), 1013–1033.

Milnor, J. (1963). Morse Theory. Princeton University Press.

Pya, N., & Wood, S. N. (2015). Shape constrained additive models. Statistics and Computing, 25(3), 543–559.

See Also

Other shape-metrics: janusplot_shape_cutoffs(), janusplot_shape_metrics()

Examples

tax <- janusplot_shape_hierarchy()
head(tax[, c("category", "code", "archetype", "monotonic", "linear")])
# Count how many categories live in each archetype
table(tax$archetype)

Shape metrics for a fitted univariate smooth

Description

[Experimental]

Compute the continuous monotonicity and convexity indices, inflection and turning-point counts, and rule-based shape category for a fitted univariate smooth. Works on either a per-pair fit object returned from the janusplot internal machinery or a freshly fitted mgcv::gam() with a single s() term.

Both indices are bounded in ⁠[-1, 1]⁠ and weighted by the empirical density of the predictor:

Both indices are scale-invariant (replacing y -> a*y + b leaves them unchanged) and density-weighted so they describe the smooth where the data actually live, not extrapolated tails.

Usage

janusplot_shape_metrics(
  fit,
  x_name = NULL,
  newdata = NULL,
  n_grid = 200L,
  cutoffs = janusplot_shape_cutoffs()
)

Arguments

fit

Either a list returned by a janusplot pair-fit helper (must contain pred and raw), or a fitted mgcv::gam() with a single s(x) term.

x_name

Character. Column name of the predictor when fit is a mgcv::gam() object. Ignored for pair-fit lists.

newdata

Optional data frame supplying the raw predictor values used for density weighting when fit is a mgcv::gam() object. If NULL, the model frame is used.

n_grid

Integer. Prediction grid length when fit is a mgcv::gam() object. Default 200L.

cutoffs

Named list of classification thresholds; see janusplot_shape_cutoffs(). Default uses package defaults.

Value

A named list with components:

monotonicity_index

M in ⁠[-1, 1]⁠. See Description.

convexity_index

C in ⁠[-1, 1]⁠. See Description.

n_turning_points

Integer count of lobe-mass-weighted sign changes of ⁠f'⁠. Equals the number of interior extrema.

n_inflections

Integer count of lobe-mass-weighted sign changes of ⁠f''⁠.

flat_range_ratio

range(f) / sd(y) — small values indicate a degenerate flat smooth.

shape_category

One of 24 labels from janusplot_shape_hierarchy() dispatched on ⁠(n_turning_points, n_inflections)⁠ with ⁠(monotonicity_index, convexity_index)⁠ disambiguation for the monotone case.

See Also

janusplot_shape_cutoffs(), janusplot(), janusplot_data().

Other shape-metrics: janusplot_shape_cutoffs(), janusplot_shape_hierarchy()

Examples

# On a fitted gam
set.seed(2026L)
n  <- 200L
x  <- stats::runif(n, 0, 10)
y  <- log1p(x) + stats::rnorm(n, sd = 0.3)
d  <- data.frame(x = x, y = y)
fit <- mgcv::gam(y ~ s(x), data = d, method = "REML")
janusplot_shape_metrics(fit, x_name = "x", newdata = d)

Shape-recognition sensitivity study

Description

[Experimental]

Run a full-factorial sensitivity sweep for the janusplot 24-category shape classifier. For each combination of ground-truth shape, sample size n, noise level sigma, and replicate, the sweep:

  1. Generates n points from the noiseless canonical curve on ⁠[0, 1]⁠ + Gaussian noise with SD = sigma (fraction of the y-range, so signal-to-noise is comparable across shapes).

  2. Fits mgcv::gam(y ~ s(x), method = "REML").

  3. Runs janusplot_shape_metrics() to classify the fitted smooth.

  4. Records correctness at both the fine (24-category) and archetype (7-family) levels.

The function is the package-native implementation of simulation/scripts/scenario_4_shape_recognition.R. A small precomputed dataset is shipped as shape_sensitivity_demo for downstream examples without requiring users to re-run the sweep.

Usage

janusplot_shape_sensitivity(
  shapes = NULL,
  n_grid = c(50L, 100L, 200L, 500L),
  sigma_grid = c(0.02, 0.05, 0.1, 0.2, 0.4),
  n_rep = 200L,
  cutoffs = janusplot_shape_cutoffs(),
  parallel = FALSE,
  seed = 2026L,
  verbose = interactive()
)

Arguments

shapes

Character vector of ground-truth names from janusplot_shape_sensitivity_shapes(). Default NULL → all 14.

n_grid

Integer vector of sample sizes. Default c(50L, 100L, 200L, 500L).

sigma_grid

Numeric vector of noise levels (fraction of the y-range). Default c(0.02, 0.05, 0.10, 0.20, 0.40).

n_rep

Integer. Replicates per cell. Default 200L.

cutoffs

Named list of classification thresholds; see janusplot_shape_cutoffs().

parallel

Logical. If TRUE and future.apply is installed, dispatch replicates in parallel. The caller is responsible for configuring future::plan() (e.g. future::plan(future::multisession)).

seed

Integer. Base seed — each fit uses seed + row_index so results are reproducible and cell-permutation-invariant.

verbose

Logical. Print progress messages to the console. Default is interactive().

Value

A data frame with one row per fit. Columns:

truth

Ground-truth shape name.

n

Sample size for this fit.

sigma

Noise level for this fit.

seed

RNG seed used.

predicted

Classifier output at the fine (24-category) level.

correct

Logical — does predicted == truth?

archetype_truth

Expected archetype for truth.

archetype_pred

Archetype of predicted.

archetype_correct

Logical — archetype-level correctness.

monotonicity_index

Monotonicity index M (see janusplot_shape_metrics()).

convexity_index

Convexity index C (see janusplot_shape_metrics()).

n_turn, n_inflect

Recovered turning-point and inflection counts.

error

"gam_fit_failed" when mgcv::gam() errored; NA otherwise.

See Also

janusplot_shape_sensitivity_summary(), janusplot_shape_sensitivity_plot(), janusplot_shape_sensitivity_shapes(), shape_sensitivity_demo.

Other shape-sensitivity: janusplot_shape_sensitivity_plot(), janusplot_shape_sensitivity_shapes(), janusplot_shape_sensitivity_summary()

Examples

# Tiny-run smoke test (< 2 seconds): 3 shapes x 2 n x 2 sigma x 5 reps.
res <- janusplot_shape_sensitivity(
  shapes     = c("linear_up", "u_shape", "wave"),
  n_grid     = c(100L, 200L),
  sigma_grid = c(0.05, 0.20),
  n_rep      = 5L,
  verbose    = FALSE
)
head(res)
janusplot_shape_sensitivity_summary(res, level = "archetype")

Visualise a shape-sensitivity sweep

Description

[Experimental]

Produce one of four diagnostic plots from the raw data frame returned by janusplot_shape_sensitivity():

"confusion_fine"

24 x (|shapes|) confusion matrix at the fine category level — rows = ground truth, columns = predicted, cells coloured by P(pred | truth).

"confusion_archetype"

7 x 7 confusion matrix at the archetype level.

"accuracy_grid"

per-shape heatmap of archetype-level accuracy across the ⁠(n, sigma)⁠ design.

"recovery_curves"

accuracy as a function of sigma, one line per sample size, faceted by shape.

Usage

janusplot_shape_sensitivity_plot(
  results,
  type = c("confusion_fine", "confusion_archetype", "accuracy_grid", "recovery_curves")
)

Arguments

results

Data frame from janusplot_shape_sensitivity() or the precomputed shape_sensitivity_demo.

type

One of "confusion_fine", "confusion_archetype", "accuracy_grid", or "recovery_curves".

Value

A ggplot2::ggplot object.

See Also

Other shape-sensitivity: janusplot_shape_sensitivity(), janusplot_shape_sensitivity_shapes(), janusplot_shape_sensitivity_summary()

Examples

data("shape_sensitivity_demo", package = "janusplot")
janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "recovery_curves")

Canonical ground-truth shapes for the sensitivity study

Description

[Experimental]

Return the names of every canonical ground-truth shape that janusplot_shape_sensitivity() can simulate from. Fourteen shapes spanning five archetypes (monotone_linear, monotone_curved, unimodal, wave, multimodal). The chaotic and degenerate archetypes are out of scope (no realistic deterministic generator).

Usage

janusplot_shape_sensitivity_shapes()

Value

Character vector of length 14 — the generator names.

See Also

janusplot_shape_sensitivity(), janusplot_shape_hierarchy().

Other shape-sensitivity: janusplot_shape_sensitivity(), janusplot_shape_sensitivity_plot(), janusplot_shape_sensitivity_summary()

Examples

janusplot_shape_sensitivity_shapes()

Summarise a shape-sensitivity sweep

Description

[Experimental]

Aggregate the raw output of janusplot_shape_sensitivity() into a per-cell mean-accuracy table at either the fine (24-category) or archetype (7-family) level.

Usage

janusplot_shape_sensitivity_summary(results, level = c("fine", "archetype"))

Arguments

results

Data frame returned by janusplot_shape_sensitivity().

level

One of "fine" (default) or "archetype".

Value

A data frame with columns truth, n, sigma, accuracy.

See Also

Other shape-sensitivity: janusplot_shape_sensitivity(), janusplot_shape_sensitivity_plot(), janusplot_shape_sensitivity_shapes()

Examples

data("shape_sensitivity_demo", package = "janusplot")
head(janusplot_shape_sensitivity_summary(shape_sensitivity_demo,
                                         level = "archetype"))

Precomputed shape-recognition sensitivity results (demo)

Description

Raw output from a small-footprint invocation of janusplot_shape_sensitivity(). Shipped so users can explore the sensitivity API and regenerate every figure in the shape-recognition-sensitivity vignette without having to re-run the sweep themselves. Regenerated via data-raw/shape_sensitivity_demo.R.

Design:

Usage

shape_sensitivity_demo

Format

A data frame with 2160 rows and 14 columns — see the "Value" section of janusplot_shape_sensitivity() for the column schema.

See Also

janusplot_shape_sensitivity(), janusplot_shape_sensitivity_plot(), janusplot_shape_sensitivity_summary().

Examples

data("shape_sensitivity_demo", package = "janusplot")
head(shape_sensitivity_demo)
janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "recovery_curves")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.