The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Non-parametric Factorial Analysis with ART

library(pepdiff)
library(dplyr)

What is ART?

ART (Aligned Rank Transform) is a non-parametric method for factorial designs. It lets you analyse multi-factor experiments without assuming your data follows a particular distribution.

The problem it solves: Wilcoxon and other rank tests work great for two groups, but can’t handle interactions. Parametric ANOVA handles interactions but assumes normality. ART gives you both: non-parametric robustness with factorial capability.

How it works: 1. Align: Subtract out all effects except the one you’re testing, so residuals reflect only that effect 2. Rank: Convert aligned values to ranks 3. ANOVA: Run standard ANOVA on the ranks

The alignment step is crucial - it allows the test to isolate each effect (main effects, interactions) correctly.

When to consider ART

Requirements:

Example: 2×2 factorial design

A 2×2 factorial design means two factors, each with two levels. Here: treatment (ctrl, trt) × timepoint (early, late). This gives four experimental conditions:

early late
ctrl
trt

Each peptide is measured in all four conditions (with replicates), letting us ask: does treatment have an effect? Does that effect depend on timepoint?

We’ll use 5 replicates per condition and simulate data with heavy tails (extreme values) - exactly the scenario where ART shines.

The simulated peptide groups

To illustrate how ART handles different effect patterns, we simulate 40 peptides in four groups:

Group Peptides Effect Pattern What ART should find
A PEP_001–010 Treatment effect only Main effect significant, same at both timepoints
B PEP_011–020 Timepoint effect only No treatment effect (treatment comparison)
C PEP_021–030 Interaction Treatment works at late but not early
D PEP_031–040 No effect Nothing significant

What these patterns look like in raw data

Here’s what the mean abundances look like for each group across the four conditions (using a baseline of ~10 for illustration):

Group ctrl @ early ctrl @ late trt @ early trt @ late Pattern
A 10 10 30 30 trt 3× higher at both timepoints
B 10 30 10 30 late 3× higher, same for ctrl and trt
C 10 10 10 40 trt effect only at late (interaction)
D 10 10 10 10 no differences

Notice how Group C is the tricky one: treatment has no effect at early (10 vs 10) but a 4-fold effect at late (10 vs 40). The main effect would average these, giving ~2-fold - which doesn’t represent either timepoint accurately.

We’ll refer back to these groups as we analyse the results.

set.seed(789)

n_peptides <- 40
n_reps <- 5  # 5 reps per condition = 20 observations per peptide

peptides <- paste0("PEP_", sprintf("%03d", 1:n_peptides))
genes <- paste0("GENE_", LETTERS[((1:n_peptides - 1) %% 26) + 1])

# Step 1: Create base abundance for each peptide
peptide_info <- tibble(

  peptide = peptides,
  gene_id = genes,
  pep_num = 1:n_peptides,
  base = rgamma(n_peptides, shape = 6, rate = 0.6)
)

# Step 2: Create the experimental design
design <- expand.grid(
  peptide = peptides,
  treatment = c("ctrl", "trt"),
  timepoint = c("early", "late"),
  bio_rep = 1:n_reps,
  stringsAsFactors = FALSE
)

# Step 3: Join design with peptide info and apply effects
# Using heavier-tailed noise to justify ART
sim_data <- design %>%
  left_join(peptide_info, by = "peptide") %>%
  mutate(
    # Group A (peptides 1-10): Treatment effect only
    # trt is 3-fold higher than ctrl, same at both timepoints
    trt_effect = ifelse(pep_num <= 10 & treatment == "trt", 3, 1),

    # Group B (peptides 11-20): Timepoint effect only
    # late is 3-fold higher than early, same for both treatments
    time_effect = ifelse(pep_num > 10 & pep_num <= 20 & timepoint == "late", 3, 1),

    # Group C (peptides 21-30): Interaction
    # Treatment works at late (4-fold) but NOT at early
    int_effect = case_when(
      pep_num > 20 & pep_num <= 30 & treatment == "trt" & timepoint == "late" ~ 4,
      TRUE ~ 1
    ),

    # Group D (peptides 31-40): No effect
    # All effect multipliers are 1

    # Add heavy-tailed noise (occasional extreme values)
    # This is why we might prefer ART over GLM
    extreme = ifelse(runif(n()) < 0.05, runif(n(), 2, 4), 1),

    # Final abundance with Gamma noise and occasional extremes
    value = rgamma(n(), shape = 10, rate = 10 / (base * trt_effect * time_effect * int_effect)) * extreme
  ) %>%
  select(peptide, gene_id, treatment, timepoint, bio_rep, value)

# Import
temp_file <- tempfile(fileext = ".csv")
write.csv(sim_data, temp_file, row.names = FALSE)

dat <- read_pepdiff(
  temp_file,
  id = "peptide",
  gene = "gene_id",
  value = "value",
  factors = c("treatment", "timepoint"),
  replicate = "bio_rep"
)

dat
#> pepdiff_data object
#> -------------------
#> Peptides: 40
#> Observations: 800
#> Factors: treatment, timepoint
#> 
#> Design:
#>   treatment=ctrl, timepoint=early: 5 reps
#>   treatment=ctrl, timepoint=late: 5 reps
#>   treatment=trt, timepoint=early: 5 reps
#>   treatment=trt, timepoint=late: 5 reps
#> 
#> No missing values

Running ART analysis

results_art <- compare(
  dat,
  compare = "treatment",
  ref = "ctrl",
  method = "art"
)

results_art
#> pepdiff_results object
#> ----------------------
#> Method: art
#> Peptides: 40
#> Comparisons: 1
#> Total tests: 40
#> 
#> Significant (FDR < 0.05): 20 (50.0%)
#> Marked significant: 20

The interface is identical to GLM - just change method = "art".

Did ART find the treatment effects?

Since we simulated this data, we know the truth:

Let’s see what ART finds:

results_art$results %>%
  mutate(
    pep_num = as.numeric(gsub("PEP_", "", peptide)),
    group = case_when(
      pep_num <= 10 ~ "A: Treatment only",
      pep_num <= 20 ~ "B: Timepoint only",
      pep_num <= 30 ~ "C: Interaction",
      TRUE ~ "D: No effect"
    )
  ) %>%
  group_by(group) %>%
  summarise(
    n_significant = sum(significant),
    n_peptides = n(),
    median_fc = round(median(fold_change, na.rm = TRUE), 2)
  )
#> # A tibble: 4 × 4
#>   group             n_significant n_peptides median_fc
#>   <chr>                     <int>      <int>     <dbl>
#> 1 A: Treatment only            10         10      2.75
#> 2 B: Timepoint only             0         10      0.92
#> 3 C: Interaction               10         10      2.31
#> 4 D: No effect                  0         10      1.1
plot(results_art)

Stratified comparisons with within

Just like GLM, use within to get treatment effects at each timepoint separately:

results_strat <- compare(
  dat,
  compare = "treatment",
  ref = "ctrl",
  within = "timepoint",
  method = "art"
)

results_strat
#> pepdiff_results object
#> ----------------------
#> Method: art
#> Peptides: 40
#> Comparisons: 1
#> Total tests: 80
#> 
#> Significant (FDR < 0.05): 26 (32.5%)
#> Marked significant: 26

Stratified results by group

Let’s see how each group looks when we analyse timepoints separately:

results_strat$results %>%
  mutate(
    pep_num = as.numeric(gsub("PEP_", "", peptide)),
    group = case_when(
      pep_num <= 10 ~ "A: Treatment only",
      pep_num <= 20 ~ "B: Timepoint only",
      pep_num <= 30 ~ "C: Interaction",
      TRUE ~ "D: No effect"
    )
  ) %>%
  group_by(group, timepoint) %>%
  summarise(
    n_significant = sum(significant),
    median_fc = round(median(fold_change, na.rm = TRUE), 2),
    .groups = "drop"
  ) %>%
  tidyr::pivot_wider(
    names_from = timepoint,
    values_from = c(n_significant, median_fc)
  )
#> # A tibble: 4 × 5
#>   group    n_significant_early n_significant_late median_fc_early median_fc_late
#>   <chr>                  <int>              <int>           <dbl>          <dbl>
#> 1 A: Trea…                   7                  8            3.31           2.67
#> 2 B: Time…                   1                  0            0.87           0.96
#> 3 C: Inte…                   0                 10            0.95           3.63
#> 4 D: No e…                   0                  0            1.24           0.97

What stratified analysis reveals:

For Group C, stratified analysis recovers the true 4-fold effect at late that was diluted in the main effect.

Handling interactions

Group C illustrates an interaction: the treatment effect depends on timepoint. ART handles this the same way as GLM - use within to get conditional effects at each level.

For more details on interpreting interactions, see vignette("glm_analysis").

ART vs GLM comparison

Let’s compare both methods on the same data:

results_glm <- compare(dat, compare = "treatment", ref = "ctrl", method = "glm")
# Compare significant calls
comparison <- tibble(
  peptide = results_glm$results$peptide,
  glm_sig = results_glm$results$significant,
  art_sig = results_art$results$significant,
  glm_pval = round(results_glm$results$p_value, 4),
  art_pval = round(results_art$results$p_value, 4)
)

# Agreement
cat("Both significant:", sum(comparison$glm_sig & comparison$art_sig), "\n")
#> Both significant: 19
cat("GLM only:", sum(comparison$glm_sig & !comparison$art_sig), "\n")
#> GLM only: 1
cat("ART only:", sum(!comparison$glm_sig & comparison$art_sig), "\n")
#> ART only: 1
cat("Neither:", sum(!comparison$glm_sig & !comparison$art_sig), "\n")
#> Neither: 19

When both methods agree, you can be confident in the result. When they disagree, investigate the peptide:

# Peptides where methods disagree
disagreements <- comparison %>%
  filter(glm_sig != art_sig) %>%
  arrange(pmin(glm_pval, art_pval))

if (nrow(disagreements) > 0) {
  print(disagreements)
} else {
  cat("No disagreements between methods\n")
}
#> # A tibble: 2 × 5
#>   peptide glm_sig art_sig glm_pval art_pval
#>   <chr>   <lgl>   <lgl>      <dbl>    <dbl>
#> 1 PEP_022 FALSE   TRUE      0.0363    0.004
#> 2 PEP_016 TRUE    FALSE     0.0128    0.158

Diagnostics

results_art$diagnostics %>%
  head()
#> # A tibble: 6 × 7
#>   peptide converged error deviance residuals    std_residuals fitted      
#>   <chr>   <lgl>     <chr>    <dbl> <named list> <named list>  <named list>
#> 1 PEP_001 TRUE      <NA>        NA <dbl [20]>   <NULL>        <NULL>      
#> 2 PEP_002 TRUE      <NA>        NA <dbl [20]>   <NULL>        <NULL>      
#> 3 PEP_003 TRUE      <NA>        NA <dbl [20]>   <NULL>        <NULL>      
#> 4 PEP_004 TRUE      <NA>        NA <dbl [20]>   <NULL>        <NULL>      
#> 5 PEP_005 TRUE      <NA>        NA <dbl [20]>   <NULL>        <NULL>      
#> 6 PEP_006 TRUE      <NA>        NA <dbl [20]>   <NULL>        <NULL>

The error column contains diagnostic messages only when models fail - NA means success. Check convergence:

n_converged <- sum(results_art$diagnostics$converged)
n_failed <- sum(!results_art$diagnostics$converged)

cat("Converged:", n_converged, "\n")
#> Converged: 40
cat("Failed:", n_failed, "\n")
#> Failed: 0

The same diagnostic plots apply:

Exporting results

The results are stored in a tidy tibble:

glimpse(results_art$results)
#> Rows: 40
#> Columns: 11
#> $ peptide     <chr> "PEP_001", "PEP_002", "PEP_003", "PEP_004", "PEP_005", "PE…
#> $ gene_id     <chr> "GENE_A", "GENE_B", "GENE_C", "GENE_D", "GENE_E", "GENE_F"…
#> $ comparison  <chr> "ctrl - trt", "ctrl - trt", "ctrl - trt", "ctrl - trt", "c…
#> $ fold_change <dbl> 2.2658580, 3.5678933, 3.5203738, 2.6765771, 4.3006451, 2.8…
#> $ log2_fc     <dbl> 1.18005748, 1.83507248, 1.81572864, 1.42038920, 2.10455309…
#> $ estimate    <dbl> -8.000000e+00, -1.000000e+01, -1.000000e+01, -8.600000e+00…
#> $ se          <dbl> 2.073644, 1.410674, 1.424781, 1.898684, 1.330413, 1.424781…
#> $ test        <chr> "art", "art", "art", "art", "art", "art", "art", "art", "a…
#> $ p_value     <dbl> 1.391822e-03, 2.565233e-06, 2.899704e-06, 3.420731e-04, 1.…
#> $ fdr         <dbl> 3.746732e-03, 1.812999e-05, 1.812999e-05, 1.192848e-03, 1.…
#> $ significant <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…

Stratified results include the timepoint column:

glimpse(results_strat$results)
#> Rows: 80
#> Columns: 13
#> $ peptide     <chr> "PEP_001", "PEP_002", "PEP_003", "PEP_004", "PEP_005", "PE…
#> $ gene_id     <chr> "GENE_A", "GENE_B", "GENE_C", "GENE_D", "GENE_E", "GENE_F"…
#> $ comparison  <chr> "ctrl - trt", "ctrl - trt", "ctrl - trt", "ctrl - trt", "c…
#> $ fold_change <dbl> 2.5351923, 4.3127474, 3.8037220, 3.6106241, 5.5918984, 3.0…
#> $ log2_fc     <dbl> 1.342095192, 2.108607216, 1.927411826, 1.852248235, 2.4833…
#> $ estimate    <dbl> -3.8, -5.0, -5.0, -5.0, -5.0, -5.0, -5.0, -3.8, -3.0, -5.0…
#> $ se          <dbl> 1.523155, 1.000000, 1.000000, 1.000000, 1.000000, 1.000000…
#> $ test        <chr> "art", "art", "art", "art", "art", "art", "art", "art", "a…
#> $ p_value     <dbl> 0.037241306, 0.001052826, 0.001052826, 0.001052826, 0.0010…
#> $ fdr         <dbl> 0.148965224, 0.006016147, 0.006016147, 0.006016147, 0.0060…
#> $ significant <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, T…
#> $ stratum     <chr> "timepoint=early", "timepoint=early", "timepoint=early", "…
#> $ timepoint   <chr> "early", "early", "early", "early", "early", "early", "ear…

Save to CSV for downstream analysis:

# All results
write.csv(results_art$results, "art_results.csv", row.names = FALSE)

# Significant hits only
results_art$results %>%
  filter(significant) %>%
  write.csv("art_significant.csv", row.names = FALSE)

# Stratified results
write.csv(results_strat$results, "art_stratified.csv", row.names = FALSE)

Limitations of ART

Speed: ART is slower than GLM because it fits multiple aligned models per peptide. For large datasets, this matters.

Interpretation: Rank-based effects are less intuitive than fold changes. “Treatment increases ranks” is harder to communicate than “treatment doubles abundance”.

Balance: ART works best with balanced designs (equal n per cell). Unbalanced designs can give unreliable interaction tests.

Not a magic fix: If your data has fundamental problems (outlier samples, batch effects), ART won’t save you. Fix the problems first.

Practical recommendations

  1. Start with GLM - it’s the default for good reason
  2. Check GLM fit - run plot_fit_diagnostics(results) (see vignette("checking_fit"))
  3. If diagnostics look poor, try ART on the same data
  4. Compare results - where do methods agree and disagree?
  5. For publication, report which method you used and why

Decision tree:

GLM diagnostics OK?
  → Yes: Use GLM
  → No: Try ART
         ↓
       ART diagnostics OK?
         → Yes: Use ART
         → No: You may need more data, or the effect isn't there

When both methods fail

If neither GLM nor ART gives sensible results:

  1. Check data quality: Outlier samples, batch effects, normalisation issues
  2. Consider sample size: May not have power to detect effects
  3. Simplify the model: Too many factors for the data?
  4. Accept the null: Sometimes there’s no effect to find

See peppwR for power analysis to understand what effect sizes your experiment can detect.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.