The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Shape-recognition sensitivity study

janusplot() assigns every fitted smooth to one of 24 shape categories via a (n_turning_points, n_inflections) dispatch with additional (monotonicity_index, convexity_index) disambiguation for the monotone cases (see the janusplot vignette for the full definition of the indices). How reliably does this classifier recover the ground-truth shape of a noisy sample? This vignette answers the question with a full-factorial sensitivity sweep.

Design

For each combination of ground-truth shape, sample size n, and noise level sigma, the sweep:

  1. Generates n points from the noiseless canonical curve on x ∈ [0, 1], with y normalised to [0, 1] so that sigma is the fraction of y-range that Gaussian noise contributes — an SNR-comparable scale across shapes.
  2. Fits mgcv::gam(y ~ s(x), method = "REML").
  3. Classifies the fit via janusplot_shape_metrics().
  4. Records correctness at the fine (24-category) and archetype (7-family) levels.

The design factors are orthogonal and replicated. See ?janusplot_shape_sensitivity for the function surface. The 14 canonical ground-truth shapes cover five of the seven archetypes (chaotic and degenerate have no realistic deterministic generator).

library(janusplot)
library(ggplot2)

janusplot_shape_sensitivity_shapes()
#>  [1] "linear_up"    "linear_down"  "convex_up"    "concave_up"   "convex_down" 
#>  [6] "concave_down" "s_shape"      "u_shape"      "inverted_u"   "skewed_peak" 
#> [11] "broad_peak"   "wave"         "bimodal"      "bi_wave"

Pre-registered hypotheses

The sweep’s hypotheses are pinned in simulation/PLAN.md (Scenario 4):

Precomputed demo

The package ships a small-footprint precomputed sweep — 6 shapes (one per non-degenerate archetype) × 3 sample sizes × 4 noise levels × 30 replicates = 2160 fits — so you can explore the API without running the full sweep yourself.

data("shape_sensitivity_demo")
str(shape_sensitivity_demo, vec.len = 2)
#> 'data.frame':    2160 obs. of  14 variables:
#>  $ truth             : chr  "linear_up" "concave_up" ...
#>  $ n                 : int  100 100 100 100 100 ...
#>  $ sigma             : num  0.05 0.05 0.05 0.05 0.05 ...
#>  $ seed              : int  2027 2028 2029 2030 2031 ...
#>  $ predicted         : chr  "linear_up" "concave_up" ...
#>  $ correct           : logi  TRUE TRUE TRUE ...
#>  $ archetype_truth   : chr  "monotone_linear" "monotone_curved" ...
#>  $ archetype_pred    : chr  "monotone_linear" "monotone_curved" ...
#>  $ archetype_correct : logi  TRUE TRUE TRUE ...
#>  $ monotonicity_index: num  1 1 ...
#>  $ convexity_index   : num  0 -0.847 ...
#>  $ n_turn            : int  0 0 1 1 1 ...
#>  $ n_inflect         : int  0 0 0 0 2 ...
#>  $ error             : chr  NA NA ...

Recovery curves (headline figure)

janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "recovery_curves")

Every shape is recovered near-perfectly at low noise; the informative picture is where each shape’s curve falls off as sigma grows. The unimodal and monotone-curved families tolerate more noise than the multimodal ones.

Archetype confusion

janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "confusion_archetype")

The off-diagonals reveal the classifier’s failure modes. A unimodal truth misclassified as wave or multimodal means the spline invented extra turning points under noise.

Archetype-level accuracy grid

janusplot_shape_sensitivity_plot(shape_sensitivity_demo,
                                 "accuracy_grid")

Per-shape heatmap of P(archetype correct) across the (n, sigma) design. Reading across a row shows the noise-tolerance profile of one sample size; reading up a column shows the sample-size sensitivity at one noise level.

Numerical summary

head(janusplot_shape_sensitivity_summary(shape_sensitivity_demo,
                                         level = "archetype"), 10)
#>         truth   n sigma  accuracy
#> 1     bimodal 100  0.05 1.0000000
#> 2  concave_up 100  0.05 1.0000000
#> 3  inverted_u 100  0.05 1.0000000
#> 4   linear_up 100  0.05 0.6666667
#> 5     u_shape 100  0.05 1.0000000
#> 6        wave 100  0.05 0.0000000
#> 7     bimodal 200  0.05 1.0000000
#> 8  concave_up 200  0.05 1.0000000
#> 9  inverted_u 200  0.05 1.0000000
#> 10  linear_up 200  0.05 0.7666667

Running your own sweep

The demo is a starting point. For the publication-grade figure use the full default grid (14 shapes × 4 sample sizes × 5 noise levels × 200 reps = 56 000 fits):

# Configure parallel execution (optional) — you control the plan.
future::plan(future::multisession, workers = 4L)

res <- janusplot_shape_sensitivity(parallel = TRUE)

# Save for your paper
saveRDS(res, "shape_sensitivity_full.rds")
janusplot_shape_sensitivity_plot(res, "recovery_curves")

Custom shape subsets + cutoffs

Every argument is tunable. Below, we rerun only the bimodal/wave family under stricter monotonicity thresholds to see whether tightening mono_strong buys any fine-accuracy improvement for these categories.

strict <- janusplot_shape_cutoffs(mono_strong = 0.95, curv_low = 0.1)

res_strict <- janusplot_shape_sensitivity(
  shapes     = c("wave", "bimodal", "bi_wave"),
  n_grid     = c(200L, 500L),
  sigma_grid = c(0.05, 0.10, 0.20),
  n_rep      = 100L,
  cutoffs    = strict
)

janusplot_shape_sensitivity_summary(res_strict, level = "fine")

References

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Tahoe 26.3.1
#> 
#> Matrix products: default
#> BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Australia/Adelaide
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] palmerpenguins_0.1.1 janusplot_0.1.0      ggplot2_4.0.3       
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.7-4       gtable_0.3.6       jsonlite_2.0.0     dplyr_1.2.1       
#>  [5] compiler_4.5.2     tidyselect_1.2.1   dichromat_2.0-0.1  jquerylib_0.1.4   
#>  [9] splines_4.5.2      scales_1.4.0       yaml_2.3.12        fastmap_1.2.0     
#> [13] lattice_0.22-7     R6_2.6.1           labeling_0.4.3     patchwork_1.3.2   
#> [17] generics_0.1.4     knitr_1.51         MASS_7.3-65        tibble_3.3.1      
#> [21] bslib_0.10.0       pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.2.0       
#> [25] cachem_1.1.0       xfun_0.57          sass_0.4.10        S7_0.2.2          
#> [29] otel_0.2.0         viridisLite_0.4.3  cli_3.6.6          withr_3.0.2       
#> [33] magrittr_2.0.5     mgcv_1.9-3         digest_0.6.39      grid_4.5.2        
#> [37] lifecycle_1.0.5    nlme_3.1-168       vctrs_0.7.3        evaluate_1.0.5    
#> [41] glue_1.8.1         farver_2.1.2       rmarkdown_2.31     tools_4.5.2       
#> [45] pkgconfig_2.0.3    htmltools_0.5.9

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.