The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to easyLSEA

easyLSEA authors

2026-06-08

Overview

easyLSEA provides a complete pipeline for Lipid Set Enrichment Analysis (LSEA) in R. Starting from a table of differential lipid abundances, it annotates lipids into biological groups, tests whether those groups are systematically shifted between conditions, and produces publication-ready bubble plots, distribution plots, and fatty acid chain visualizations.

The package runs two complementary enrichment engines:

Running both engines and comparing their results (convergence analysis) gives a more complete picture of lipid remodeling than either method alone.


Installation

Install the stable version from CRAN:

install.packages("easyLSEA")

Or the development version from GitHub:

# install.packages("remotes")
remotes::install_github("DavidGO464/easyLSEA")

To enable the fgsea engine, install the optional Bioconductor dependency:

BiocManager::install("fgsea")

Input data

easyLSEA expects a data.frame with at least:

Column Description
LipidName Lipid identifier in standard shorthand notation (e.g. PC 36:4, TG 54:3)
logFC log2 fold-change (case vs reference)
P.Value Raw (unadjusted) p-value — used for the fgsea pi-value rank metric

Additional columns are used when present:

Column Used for
adj.P.Val Counting significantly changed lipids
Confidence_rank Annotation confidence filtering
Shorthand Alternative lipid name fallback

Column names are configurable via lipid_col, fc_col, and pval_col arguments — only the defaults are shown above.


Quick start

The entire pipeline runs in a single call to easyLSEA():

library(easyLSEA)

result <- easyLSEA(
  data      = my_lipid_data,
  lipid_col = "LipidName",
  fc_col    = "logFC",
  pval_col  = "P.Value",
  case_lbl  = "NASH",
  ref_lbl   = "Control",
  engine    = "both",   # run KS and fgsea
  min_rank  = "E"       # include all confidence ranks except P and NA (default)
)

This returns an easyLSEA_result object with five slots described in the next section.


Understanding the output

result$meta

A named list with run metadata: date, comparison labels, engine used, number of lipids, and the original function call.

result$meta$date
result$meta$case_lbl
result$meta$n_lipids

result$lsea

Contains the enrichment statistics. The key sub-elements are:

# KS results — one row per lipid set per grouping level
head(result$lsea$ks)

# fgsea results
head(result$lsea$fgsea)

# Combined table with Convergence column
head(result$lsea$combined)

Key columns in the KS results:

Column Description
Group Lipid set name (e.g. PC, Glycerolipids)
Level Grouping level (LipidClass, LipidCategory_LMAPS, LipidCategory_functional)
N_group Number of lipids in the set
DirectionalScore Standardized mean difference (Cohen’s d analog). Positive = up in case.
KS_pval Two-sided KS p-value
FDR_LSEA BH-adjusted FDR
DS_perm_pval Permutation p-value for the DirectionalScore
ContributingLipids_KS Lipids on the enriched side of the CDF divergence point

Key columns in the fgsea results:

Column Description
NES Normalized Enrichment Score. Positive = enriched toward top of ranked list (up in case).
FDR_fgsea BH-adjusted FDR from fgsea
N_leading Leading edge size
LeadingEdge Lipids in the leading edge
rank_metric Rank metric used (pi_value, logFC, or t_stat)

The Convergence column in the combined table classifies each set as:

Value Meaning
KS+fgsea [strongest] Significant by both engines — highest confidence
KS only [distributed effect] Moderate shift across many lipids
fgsea only [extreme-driven] A few strongly regulated lipids drive the signal
Neither Not significant by either engine

result$chains

Fatty acid chain analysis results, available when run_chains = TRUE:

# Long format — one row per acyl chain per lipid
head(result$chains$parsed)

# Parsing status — one row per lipid
head(result$chains$summary)

# Wide format — one row per lipid with sn positions and totals
head(result$chains$wide)

The wide table is the most convenient for reporting. Each row is one lipid, with columns sn1, sn2, sn3, sn4 containing the individual acyl chain positions (e.g. "18:1"), and total_carbons / total_unsat with the summed totals. The chain_type column clarifies how to interpret the sn columns:

chain_type sn1 sn2 sn3 sn4
sn2 (PC, PE, PS…) sn-1 chain sn-2 chain NA NA
nacyl (Cer, SM…) sphingoid base N-acyl chain NA NA
long_format (TG) chain 1 chain 2 chain 3 NA
long_format (CL) chain 1 chain 2 chain 3 chain 4
single (CAR, LPC) the chain NA NA NA

result$plots

A named list of ggplot2 objects, available when plots = TRUE:

# List all available plots
names(result$plots$lsea)
names(result$plots$chains)

Plot naming convention for LSEA bubble plots:

Name pattern Description
bubble_ks_01_Class KS bubble plot — lipid class level
bubble_ks_sig_01_Class KS bubble plot — significant sets only
bubble_fgsea_01_Class fgsea bubble plot — lipid class level
bubble_fgsea_sig_01_Class fgsea bubble plot — significant sets only
dist_01_Class Distribution (boxplot) — lipid class level

Levels: 01_Class (lipid class), 02_LMAPS (LIPID MAPS category), 03_Functional (functional category).

result$input

The annotated input data and the grouping columns used:

# Annotated data with LipidClass, LipidCategory_LMAPS, etc.
head(result$input$data)

# Grouping columns tested
result$input$group_cols

Viewing plots

Individual plots can be displayed directly:

# KS bubble plot — all lipid classes
result$plots$lsea$bubble_ks_01_Class

# fgsea bubble plot — significant sets only
result$plots$lsea$bubble_fgsea_sig_01_Class

# Distribution plot — lipid class level
result$plots$lsea$dist_01_Class

To customize bubble labels:

# Regenerate plots showing only FDR and n
plots <- plot_lsea(
  result$lsea,
  case_lbl     = "NASH",
  ref_lbl      = "Control",
  bubble_label = c("FDR", "n")
)

Exporting results

export_lsea() saves all results to a timestamped folder. Supply the output directory explicitly via dir (here a temporary directory; for a real analysis use a folder of your choice):

export_lsea(
  result,
  dir    = tempdir(),
  format = c("csv", "excel", "pdf")
)

This creates:

easyLSEA_NASH_vs_Control_2024-01-15_1430/
  tables/
    lsea_results_ks.csv
    lsea_results_fgsea.csv
    lsea_combined.csv
    chain_results.csv
  plots/
    lsea/
      01_Class/
        bubble_ks_01_Class.pdf
        bubble_ks_sig_01_Class.pdf
        bubble_fgsea_01_Class.pdf
        bubble_fgsea_sig_01_Class.pdf
        dist_01_Class.pdf
      02_LMAPS/  ...
      03_Functional/  ...
    chains/
      tile/  ...
      trend/  ...
  results.xlsx

Advanced usage

Running engines separately

# Step 1: annotate
annotated <- annotate_lipids(my_lipid_data, lipid_col = "LipidName")

# Step 2: run enrichment
lsea_res <- run_lsea(
  data      = annotated,
  fc_col    = "logFC",
  engine    = "both",
  case_lbl  = "NASH",
  ref_lbl   = "Control"
)

# Step 3: generate plots manually
plots <- plot_lsea(
  lsea_res,
  case_lbl     = "NASH",
  ref_lbl      = "Control",
  fdr_thresh   = 0.05,
  bubble_label = c("FDR", "DS", "NES", "n")
)

# Step 4: distribution plot for a specific level
p_dist <- plot_distribution(
  data        = annotated,
  lsea_result = lsea_res,
  group_col   = "LipidClass",
  case_lbl    = "NASH",
  ref_lbl     = "Control"
)

Changing the fgsea rank metric

# Default: pi-value = sign(logFC) * -log10(P.Value)
# Combines effect size and statistical evidence

# Alternative: logFC only
result_fc <- easyLSEA(
  data        = my_lipid_data,
  engine      = "fgsea",
  fgsea_rank  = "logFC"
)

# Alternative: LIMMA t-statistic (requires a 't' column)
result_t <- easyLSEA(
  data        = my_lipid_data,
  engine      = "fgsea",
  fgsea_rank  = "t_stat"
)

Adjusting significance thresholds

result <- easyLSEA(
  data      = my_lipid_data,
  min_n     = 5L,      # require at least 5 lipids per set
  n_perm    = 5000L,   # more permutations for DS_perm_pval
  fgsea_nperm = 20000L # more permutations for fgsea
)

Filtering by annotation confidence rank

When lipid annotations include a confidence rank (A > B > C > D > E > P), min_rank controls which lipids enter the chain analysis:

# Default: include all except P and NA
result_all <- easyLSEA(data = my_lipid_data, min_rank = "E")

# Strict: include only high-confidence annotations (A and B)
result_strict <- easyLSEA(data = my_lipid_data, min_rank = "B")

# Or apply directly to parse_lipid_chains
chains_strict <- parse_lipid_chains(annotated, min_rank = "B")
table(chains_strict$summary$status)

Lipids excluded by min_rank appear in result$chains$summary with status excluded_rank_below_X (where X is the chosen threshold), making it easy to audit which lipids were filtered.


Interpreting results

KS vs fgsea — which to trust?

Both engines test the same null hypothesis (no systematic enrichment) but are sensitive to different signal patterns:

Convergent results (significant by both) provide the strongest evidence for coordinated lipid class remodeling.

The DirectionalScore

The DirectionalScore is a standardized mean difference (Cohen’s d analog):

DS = (mean_logFC_set - mean_logFC_background) / SD_pooled

It quantifies the direction and magnitude of the shift, independent of the KS p-value. A set can have a significant KS p-value (distributional difference) with a near-zero DirectionalScore (no net direction) — this pattern suggests heterogeneous within-set regulation.

The pi-value rank metric

By default, fgsea ranks lipids by:

pi-value = sign(logFC) × −log10(P.Value)

This combines the direction of change with statistical confidence, giving more weight to lipids that are both strongly and significantly regulated. It is preferred over logFC alone when p-values are available.


Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Tahoe 26.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     xfun_0.57        
#>  [5] cachem_1.1.0      knitr_1.51        htmltools_0.5.9   rmarkdown_2.31   
#>  [9] lifecycle_1.0.5   cli_3.6.6         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.18.0 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.11.0      yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [21] rlang_1.2.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.