The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Segment Profile Extraction via Pattern Analysis: A Workflow Guide

Se-Kang Kim

2026-03-23

Note. All code chunks in this vignette are set to eval = FALSE to keep CRAN check times within limits, as the bootstrap and permutation procedures are computationally intensive. All code is fully executable in an interactive R session. Precomputed results for all three pipelines are stored in inst/extdata/ and can be loaded with readRDS(system.file("extdata", "results_bin.rds", package = "SEPA")) etc. Full output and figures are reported in the accompanying manuscript (Kim and Grochowalski, 2019, doi:10.1007/s00357-018-9277-7).


1 Introduction

The SEPA package implements the Segment Profile Extraction via Pattern Analysis method for row-mean-centered multivariate data. The three automated workflow functions are:

All three pipelines share a common structure:

  1. Dimensionality assessment via parallel analysis
  2. Bootstrap Procrustes stability diagnostics using a simultaneous dual criterion (principal angles and Tucker congruence coefficients)
  3. Variance-weighted aggregation of stable dimensions into a person-level index
library("SEPA")

2 Example 1: Binary Data

This example illustrates the alsi_workflow() pipeline using binary diagnostic data from N = 1,261 individuals assessed for eating disorders.

2.1 Data

data("ANR2", package = "SEPA")
vars <- c("MDD", "DYS", "DEP", "PTSD", "OCD", "GAD", "ANX", "SOPH", "ADHD")
head(ANR2[, vars])

Diagnostic prevalence varies substantially: MDD is the most common diagnosis (44.3%), followed by DEP and ANX, while DYS is the least prevalent (4.7%).

2.2 Full Workflow Call

The following chunk shows the exact call used to generate the precomputed results stored in inst/extdata/results_bin.rds.

results_bin <- alsi_workflow(
  data     = ANR2,
  vars     = vars,
  B_pa     = 2000,
  B_boot   = 2000,
  seed     = 20260123
)

2.3 Load and Inspect Precomputed Results

results_bin <- readRDS(system.file("extdata", "results_bin.rds",
                                    package = "SEPA"))

2.4 Parallel Analysis

print(results_bin$pa)

The first three observed eigenvalues exceed their permutation-based 95th- percentile reference values, supporting retention of a K* = 3-dimensional MCA subspace. These three dimensions account for approximately 48% of total inertia.

2.5 Bootstrap Stability Diagnostics

print(results_bin$boot)
plot_subspace_stability(results_bin$boot)

Median principal angles are 2.77°, 6.94°, and 15.46° for Dimensions 1–3, all well below the 20° threshold. Tucker congruence coefficients range from phi = 0.978 to phi = 0.992. All three dimensions pass the dual criterion, yielding K* = 3.

2.6 ALSI Computation

print(results_bin$alsi)
summary(results_bin$alsi$alpha)

Variance weights are 0.4345, 0.2979, and 0.2676 for Dimensions 1–3. ALSI values range from 0.040 to 1.625 (M = 0.373, Mdn = 0.368).

2.7 Category Projections

plot_category_projections(
  results_bin$fit,
  K         = results_bin$K,
  alpha_vec = results_bin$alsi$alpha_vec,
  top_n     = 10
)

ADHD_1 carries the strongest projection (|p| = 2.07), followed by DYS_1, DEP_1, and PTSD_1.


3 Example 2: Ordinal Data

This example illustrates the alsi_workflow_ordinal() pipeline using the ten Extraversion items (E1–E10) from the Big Five Inventory (BFI; N = 500).

3.1 Data

BFI            <- read.csv(system.file("extdata",
                                        "BFI_Original_Ordinal_N500.csv",
                                        package = "SEPA"))
items          <- paste0("E", 1:10)
reversed_items <- c("E2", "E4", "E6", "E8", "E10")
head(BFI[, items])
freq_table <- sapply(BFI[, items], function(x) table(factor(x, 1:5)))
round(100 * freq_table / nrow(BFI), 1)

Response frequencies are well distributed across the 1–5 scale for all ten items, with no category falling below the 2% rare-category threshold.

3.2 Full Workflow Call

results_ord <- alsi_workflow_ordinal(
  data           = BFI,
  items          = items,
  reversed_items = reversed_items,
  scale_min      = 1L,
  scale_max      = 5L,
  n_permutations = 100,
  B_boot         = 1000,
  seed           = 12345
)

3.3 Load and Inspect Precomputed Results

results_ord <- readRDS(system.file("extdata", "results_ord.rds",
                                    package = "SEPA"))

3.4 Parallel Analysis

print(results_ord$pa_table)

The first four observed eigenvalues exceed their 95th-percentile reference values, supporting an initial K_PA = 4-dimensional solution.

3.5 Bootstrap Stability Diagnostics

print(results_ord$stability_table)
plot_subspace_stability(results_ord)

Dimensions 1–3 satisfy both stability thresholds simultaneously. Dimension 4 fails the angle criterion (median theta = 24.39° > 20°), yielding K* = 3. All 1,000 bootstrap resamples converged successfully (skipped = 0).

3.6 Ordinal ALSI Computation

print(results_ord)
cat("oALSI summary:\n")
print(summary(results_ord$ALSI_index))
cat("\noALSI (z-scored) summary:\n")
print(summary(results_ord$ALSI_z))

Variance weights for K* = 3 are 0.4815, 0.3307, and 0.1878. The ordinal ALSI distribution is slightly negatively skewed, ranging from -0.014 to 0.025 (Mdn = -0.001, M = 0.000).


4 Example 3: Continuous Data

This example illustrates the calsi_workflow() pipeline using N = 900 individuals assessed on p = 9 domain scores from the WAIS-IV and WMS-IV cognitive batteries.

4.1 Data

wawm4   <- read.csv(system.file("extdata", "wawm4.csv", package = "SEPA"))
domains <- c("VC", "PR", "WO", "PS", "IM", "DM", "VWM", "VM", "AM")
X       <- wawm4[, domains]
cat("N =", nrow(X), " p =", ncol(X), "\n")

Domain means ranged from approximately 99 to 101 and standard deviations from approximately 14 to 16, consistent with the standard score metric (normative M = 100, SD = 15). Row-mean-centering is applied internally by calsi_workflow().

4.2 Full Workflow Call

results_cont <- calsi_workflow(
  data       = X,
  B_pa       = 2000,
  B_boot     = 2000,
  q          = 0.95,
  seed       = 20260206,
  K_override = 4
)

4.3 Load and Inspect Precomputed Results

results_cont <- readRDS(system.file("extdata", "results_cont.rds",
                                     package = "SEPA"))

4.4 Parallel Analysis

print(results_cont$pa)

Horn’s parallel analysis supported retention of four dimensions, accounting for approximately 78.28% of total variance in the row-mean-centered solution.

4.5 Bootstrap Stability Diagnostics

print(results_cont$stability_table)
plot_subspace_stability(results_cont)

All four dimensions satisfy both stability thresholds (median principal angles 0.13°-10.37°, all < 20°; Tucker congruence 0.987-0.999, all >= 0.95), yielding K* = 4.

4.6 Continuous ALSI Computation and Domain Contributions

print(results_cont)
print(results_cont$domain_contrib)

Variance weights for K* = 4 are 0.3833, 0.2481, 0.2222, and 0.1465. cALSI values range from 1.58 to 32.53 (M = 11.81, Mdn = 10.96, SD = 5.09). Processing Speed (PS, 21.5%) contributes most to the retained profile subspace.

4.7 Comparison with SEPA Plane-Wise Summaries

sepa_comparison <- compare_sepa_calsi(
  fit = results_cont$boot$ref,
  K   = 4
)
print(sepa_comparison)

The correlation between cALSI and the SEPA combined index was r = 0.988, indicating near-equivalent rank ordering of individuals across approaches.


5 Session Information

sessionInfo()

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.