The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Asymmetric Smoothed-Association Matrices

Why Pearson is not enough

A Pearson correlation matrix gives one scalar per pair of variables. Two numbers are discarded in that collapse:

  1. The shape of the association — linear, monotone non-linear, U-shaped, or irregular.
  2. The direction — whether \(y\) is a smooth function of \(x\) differs in general from whether \(x\) is a smooth function of \(y\), because leverage and noise are directional.

janusplot() renders both recoveries visually for every pair in a matrix layout, using proper mgcv::gam() fits (not loess) so EDF, F-tests, and random effects are available.

Quick start

library(janusplot)

# Four numeric columns from mtcars (32 rows: small but illustrative)
janusplot(mtcars[, c("mpg", "hp", "wt", "qsec")])

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

Each off-diagonal cell shows:

The cell fill is keyed to EDF: darker = more non-linear.

Non-linear detection

A synthetic quadratic + sinusoidal example. The matrix makes it obvious which variables are genuinely non-linearly related to which.

n <- 300
x1 <- runif(n, -3, 3)
x2 <- x1^2 + rnorm(n, sd = 0.6)   # quadratic on x1
x3 <- sin(x1) + rnorm(n, sd = 0.4) # sinusoidal on x1
x4 <- rnorm(n)                     # independent
d  <- data.frame(x1 = x1, x2 = x2, x3 = x3, x4 = x4)

janusplot(d)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

EDF for x2 ~ s(x1) and x3 ~ s(x1) should clearly exceed 1; the cell fills reflect that. Cells involving x4 should be close to EDF = 1 (linear / flat).

Asymmetry — a heteroscedastic example

When the noise scale depends on a predictor, the two directional smooths diverge: \(y \sim s(x)\) recovers the mean relationship; \(x \sim s(y)\) is distorted by the variance asymmetry.

n <- 400
x <- runif(n, 0, 5)
y <- 0.5 * x + rnorm(n, sd = 0.3 + 0.4 * x)   # variance grows with x
d <- data.frame(x = x, y = y, z = rnorm(n))

janusplot(d)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

The A = ... label per cell reports the asymmetry index \(A_{ij} = |EDF_{y|x} - EDF_{x|y}| / (EDF_{y|x} + EDF_{x|y}) \in [0, 1]\), shown by default in the bottom-left corner alongside EDF = ....

Partial smooths (controlling for covariates)

Pass adjust = as a one-sided formula RHS to include fixed covariates and/or random effects in every pairwise GAM.

library(palmerpenguins)
#> 
#> Attaching package: 'palmerpenguins'
#> The following objects are masked from 'package:datasets':
#> 
#>     penguins, penguins_raw
pp <- na.omit(penguins)

# Without covariate
janusplot(pp[, c("bill_length_mm", "bill_depth_mm",
                 "flipper_length_mm", "body_mass_g")])

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.


# With species as a fixed effect — resolves Simpson's-paradox geometry
janusplot(pp, vars = c("bill_length_mm", "bill_depth_mm",
                       "flipper_length_mm", "body_mass_g"),
         adjust = ~ species)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

Changing the palette

The cell fill encodes the EDF (or deviance-explained) of the smooth and is accompanied by a shared colourbar legend. Choose a palette with palette =.

d <- data.frame(
  x1 = runif(200, -3, 3),
  x2 = rnorm(200),
  x3 = rnorm(200)
)
d$x2 <- d$x1^2 + rnorm(200, sd = 0.8)  # non-linear

janusplot(d, palette = "viridis")  # default, colourblind-safe

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

janusplot(d, palette = "RdYlBu")   # diverging, colourblind-safe

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

janusplot(d, palette = "turbo")    # high-contrast, NOT colourblind-safe

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

Colourblind-safe choices:

High-contrast but not colourblind-safe: turbo, Spectral.

Handling missing data

# airquality has genuine NAs in Ozone and Solar.R
janusplot(airquality[, c("Ozone", "Solar.R", "Wind", "Temp")],
          na_action = "pairwise")

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

na_action = "pairwise" uses all rows for which that pair is complete; "complete" restricts to rows complete across every variable (matching listwise deletion).

Scaling up — order = "hclust"

For k large, reorder the axes by hierarchical clustering on |correlation|:

data(Boston, package = "MASS")
janusplot(Boston[, c("medv", "lstat", "rm", "age",
                     "indus", "nox", "dis")],
          order = "hclust")

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

Programmatic access — janusplot_data()

Returns raw GAM fits and per-cell metrics without constructing a ggplot — useful for custom rendering or downstream analysis.

# Re-create the heteroscedastic example
n <- 400
het <- data.frame(
  x = runif(n, 0, 5),
  y = NA_real_
)
het$y <- 0.5 * het$x + rnorm(n, sd = 0.3 + 0.4 * het$x)

out <- janusplot_data(het, vars = c("x", "y"))
out$pairs[[1L]]$edf_yx
#> [1] 1.00021
out$pairs[[1L]]$edf_xy
#> [1] 6.157621
out$pairs[[1L]]$asymmetry_index
#> [1] 0.720527

Shape metrics explained

Every fitted smooth is summarised by two continuous indices and two discrete counts. These drive the 24-category classifier and appear as columns in janusplot(..., with_data = TRUE)$data and as fields on each entry of janusplot_data()$pairs.

Let f(x) be the fitted smooth on a dense grid of 200 equally-spaced points across the predictor range, with f' and f'' the numerical first and second derivatives. Let w(x) be the empirical density of the predictor on the same grid, normalised to sum(w) = 1.

Both indices are density-weighted so they describe the smooth where the data actually live, not extrapolated tails, and are scale-invariant: replacing y with a * y + b leaves them unchanged.

Together the pair (n_turning_points, n_inflections) drives the primary shape_category dispatch; (monotonicity_index, convexity_index) disambiguate within the monotone (0, 0) and single-extremum (1, 0) cells. The full taxonomy with 2-letter codes, archetypes, and thumbnail curves is available from janusplot_shape_hierarchy() and is rendered as the standing legend below every janusplot() call.

Tune the thresholds applied to these indices via janusplot_shape_cutoffs(). See the shape-recognition-sensitivity vignette for how faithfully the classifier recovers ground-truth shapes across sample-size and noise regimes.

Derivative views: theoretical justification and applied use

Each matrix renders one quantity. display = "fit" (default) shows the fitted smooth; display = "d1" shows ; display = "d2" shows . A top-of-matrix title names the mode, so side-by-side calls compare unambiguously. Orders beyond two are not exposed — see Noise amplification below. Derivative CI rendering is off by default; opt in with derivative_ci = "pointwise" or "simultaneous".

set.seed(2026L)
n  <- 300L
xs <- runif(n, -pi, pi)
df <- data.frame(
  x  = xs,
  y1 = xs + sin(3 * xs) + rnorm(n, sd = 0.15),
  y2 = 0.5 * xs^2       + rnorm(n, sd = 0.8)
)
janusplot(df, display = "fit", show_shape_legend = FALSE)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

janusplot(df, display = "d1", show_shape_legend = FALSE)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

janusplot(df, display = "d2", show_shape_legend = FALSE)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

Turn on simultaneous bands — a single call gets the Monte Carlo critical multiplier per Simpson (2018):

janusplot(df, display = "d1",
          derivative_ci = "simultaneous",
          derivative_ci_nsim = 2000L,
          show_shape_legend = FALSE)

Asymmetric smoothed-association matrix produced by janusplot(); diagonal cells hold variable labels, off-diagonal cells show the fitted mgcv::gam spline with a 95% confidence envelope, raw-data scatter, and per-cell annotations for n, EDF, and smooth significance glyph.

What derivatives reveal that the fit hides

The fitted smooth \(\hat f(x) = \mathbb{E}[y\mid x]\) is a level description. Its derivatives are different statistical objects with their own interpretations:

The asymmetric matrix layout sharpens this. janusplot() fits both \(\hat f_{y\mid x}(x)\) and \(\hat f_{x\mid y}(y)\), so derivative panels on the two triangles answer genuinely different questions: the upper triangle is “how steeply does \(y\) respond to a nudge in \(x\) at this operating point” (forward gain); the lower triangle is “how steeply must \(x\) change to induce a unit change in \(y\)” (inverse sensitivity). For an asymmetric process these do not transpose into each other, and the directional asymmetry is a diagnostic the symmetric correlation matrix cannot expose (Janzing & Schölkopf, 2010).

Estimation — the LP matrix

Let \(X_p = X_p(\mathbf{x}_g)\) denote the design (linear predictor) matrix of the fitted GAM evaluated on the plotting grid \(\mathbf{x}_g\), obtained from predict(gam_fit, newdata = ..., type = "lpmatrix") (Wood, 2017, §7.2.4). With penalised posterior mean \(\hat{\boldsymbol\beta} = \mathtt{coef(gam\_fit)}\) and posterior covariance \(V_p = \mathtt{gam\_fit\$Vp}\), we construct a finite-difference operator \(D^{(k)}\) on the rows of \(X_p\) (central differences in the interior, second-order forward / backward stencils at the endpoints) and read off

\[ \hat f^{(k)}(x_i) = \bigl[D^{(k)} \hat{\boldsymbol\beta}\bigr]_i, \qquad \widehat{\mathrm{Var}}\bigl(\hat f^{(k)}(x_i)\bigr) = \bigl[D^{(k)} V_p \bigl(D^{(k)}\bigr)^{\!\top}\bigr]_{ii}. \]

Pointwise \(95\%\) intervals are \(\hat f^{(k)}(x) \pm 1.96\,\sqrt{\cdot}\). This is the standard Wood (2017) construction, and is what gratia::derivatives() implements in its default mode (Simpson, 2014; Simpson, 2018). Columns of \(X_p\) corresponding to adjust terms held at typical values contribute identical rows across the grid, so their finite differences are zero and they drop out of both \(\hat f^{(k)}\) and its variance — the derivative in the panel is therefore the derivative of the partial smooth actually shown in the fit panel, as expected.

For simultaneous intervals over the full grid (a stricter question than pointwise, and what you want for formal feature localisation), janusplot() implements the Monte Carlo construction of Ruppert, Wand & Carroll (2003, §6.5), popularised for GAMs by Simpson (2018): draw \(\tilde{\boldsymbol\beta}_b \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p)\) for \(b = 1, \ldots, B\) and take the \((1-\alpha)\) quantile of \(\max_i |D^{(k)}_i (\tilde{\boldsymbol\beta}_b - \hat{\boldsymbol\beta})| / \mathtt{se}_i\) across the plotting grid as a critical multiplier \(c_\alpha\) on the pointwise SE, so the simultaneous band is \(\hat f^{(k)}(x) \pm c_\alpha\,\mathtt{se}(x)\). Opt in via derivative_ci = "simultaneous" on either janusplot() or janusplot_data(); the default is derivative_ci = "none" so that no CI is drawn by default — derivative ribbons invite over-reading of local features and should be a deliberate choice, not a default. The implementation uses \(B = 1000\) (see derivative_ci_nsim); Simpson (2018) uses \(10\,000\), which is affordable if you need tighter quantile estimation.

Noise amplification and why we cap at \(k = 2\)

Finite differencing of raw data amplifies noise; penalised splines do not eliminate that amplification, they trade it against bias via the REML-selected smoothing parameter. mgcv’s default thin-plate penalty is on \(\int (f'')^2\), which directly regularises \(\hat f'\) and bounds (but does not penalise) \(\hat f''\) only via the basis rank (Wood, 2017, §5.3; Eilers & Marx, 1996). In practice we find \(\hat f^{(3)}\) is dominated by noise for \(n < 10^4\) at moderate \(k\), and so janusplot refuses \(k \ge 3\) by design. If you have a domain-specific reason to need a higher-order derivative, specify a matching-order P-spline penalty explicitly (Eilers, Marx & Durbán, 2015) and extract it yourself from janusplot_data().

Applied use: gain estimation and dose–response

Two strands in which the asymmetric derivative view is not a cosmetic add-on but the analytical primitive the practitioner actually wants.

References cited in this section

Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655

Eilers, P. H. C., Marx, B. D., & Durbán, M. (2015). Twenty years of P-splines. SORT, 39(2), 149–186.

Janzing, D., & Schölkopf, B. (2010). Causal inference using the algorithmic Markov condition. IEEE Transactions on Information Theory, 56(10), 5168–5194. https://doi.org/10.1109/TIT.2010.2060095

Korda, M., & Mezić, I. (2018). Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93, 149–160. https://doi.org/10.1016/j.automatica.2018.03.046

Leith, D. J., & Leithead, W. E. (2000). Survey of gain-scheduling analysis and design. International Journal of Control, 73(11), 1001–1025. https://doi.org/10.1080/002071700411304

Rugh, W. J., & Shamma, J. S. (2000). Research on gain scheduling. Automatica, 36(10), 1401–1425. https://doi.org/10.1016/S0005-1098(00)00058-3

Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). Semiparametric Regression. Cambridge University Press.

Simpson, G. L. (2014). Simultaneous confidence intervals for derivatives of smooth terms in a GAM. From the Bottom of the Heap (blog post).

Simpson, G. L. (2018). Modelling palaeoecological time series using generalised additive models. Frontiers in Ecology and Evolution, 6, 149. https://doi.org/10.3389/fevo.2018.00149

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781315370279

Zhang, Y., & Chen, Y.-C. (2025). Doubly robust inference on causal derivative effects for continuous treatments. arXiv preprint [2501.06969].

Limitations

Citation

citation("janusplot")
#> To cite janusplot in publications use:
#> 
#>   Moldovan M (2026). _janusplot: Asymmetric Smoothed-Association
#>   Matrices via GAM Fits_. R package version 0.0.0.9001,
#>   <https://github.com/max578/janusplot>.
#> 
#>   Moldovan M (2026). "Beyond Pearson: Visualising Asymmetric Non-linear
#>   Associations with Generalised Additive Models." _The R Journal_. In
#>   preparation.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.