Introduction to easyLSEA

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

easyLSEA authors

2026-06-08

Overview

easyLSEA provides a complete pipeline for Lipid Set Enrichment Analysis (LSEA) in R. Starting from a table of differential lipid abundances, it annotates lipids into biological groups, tests whether those groups are systematically shifted between conditions, and produces publication-ready bubble plots, distribution plots, and fatty acid chain visualizations.

The package runs two complementary enrichment engines:

Kolmogorov–Smirnov (KS) — tests whether the entire logFC distribution of a lipid set differs from the background. Sensitive to distributed moderate effects across many set members.
fgsea — tests whether set members cluster at the extreme ends of the ranked list. Sensitive to a small number of strongly regulated lipids.

Running both engines and comparing their results (convergence analysis) gives a more complete picture of lipid remodeling than either method alone.

Installation

Install the stable version from CRAN:

install.packages("easyLSEA")

Or the development version from GitHub:

# install.packages("remotes")
remotes::install_github("DavidGO464/easyLSEA")

To enable the fgsea engine, install the optional Bioconductor dependency:

BiocManager::install("fgsea")

Input data

easyLSEA expects a data.frame with at least:

Column	Description
`LipidName`	Lipid identifier in standard shorthand notation (e.g. `PC 36:4`, `TG 54:3`)
`logFC`	log2 fold-change (case vs reference)
`P.Value`	Raw (unadjusted) p-value — used for the fgsea pi-value rank metric

Additional columns are used when present:

Column	Used for
`adj.P.Val`	Counting significantly changed lipids
`Confidence_rank`	Annotation confidence filtering
`Shorthand`	Alternative lipid name fallback

Column names are configurable via lipid_col, fc_col, and pval_col arguments — only the defaults are shown above.

Quick start

The entire pipeline runs in a single call to easyLSEA():

library(easyLSEA)

result <- easyLSEA(
  data      = my_lipid_data,
  lipid_col = "LipidName",
  fc_col    = "logFC",
  pval_col  = "P.Value",
  case_lbl  = "NASH",
  ref_lbl   = "Control",
  engine    = "both",   # run KS and fgsea
  min_rank  = "E"       # include all confidence ranks except P and NA (default)
)

This returns an easyLSEA_result object with five slots described in the next section.

Understanding the output

`result$meta`

A named list with run metadata: date, comparison labels, engine used, number of lipids, and the original function call.

result$meta$date
result$meta$case_lbl
result$meta$n_lipids

`result$lsea`

Contains the enrichment statistics. The key sub-elements are:

# KS results — one row per lipid set per grouping level
head(result$lsea$ks)

# fgsea results
head(result$lsea$fgsea)

# Combined table with Convergence column
head(result$lsea$combined)

Key columns in the KS results:

Column	Description
`Group`	Lipid set name (e.g. `PC`, `Glycerolipids`)
`Level`	Grouping level (`LipidClass`, `LipidCategory_LMAPS`, `LipidCategory_functional`)
`N_group`	Number of lipids in the set
`DirectionalScore`	Standardized mean difference (Cohen’s d analog). Positive = up in case.
`KS_pval`	Two-sided KS p-value
`FDR_LSEA`	BH-adjusted FDR
`DS_perm_pval`	Permutation p-value for the DirectionalScore
`ContributingLipids_KS`	Lipids on the enriched side of the CDF divergence point

Key columns in the fgsea results:

Column	Description
`NES`	Normalized Enrichment Score. Positive = enriched toward top of ranked list (up in case).
`FDR_fgsea`	BH-adjusted FDR from fgsea
`N_leading`	Leading edge size
`LeadingEdge`	Lipids in the leading edge
`rank_metric`	Rank metric used (`pi_value`, `logFC`, or `t_stat`)

The Convergence column in the combined table classifies each set as:

Value	Meaning
`KS+fgsea [strongest]`	Significant by both engines — highest confidence
`KS only [distributed effect]`	Moderate shift across many lipids
`fgsea only [extreme-driven]`	A few strongly regulated lipids drive the signal
`Neither`	Not significant by either engine

`result$chains`

Fatty acid chain analysis results, available when run_chains = TRUE:

# Long format — one row per acyl chain per lipid
head(result$chains$parsed)

# Parsing status — one row per lipid
head(result$chains$summary)

# Wide format — one row per lipid with sn positions and totals
head(result$chains$wide)

The wide table is the most convenient for reporting. Each row is one lipid, with columns sn1, sn2, sn3, sn4 containing the individual acyl chain positions (e.g. "18:1"), and total_carbons / total_unsat with the summed totals. The chain_type column clarifies how to interpret the sn columns:

`chain_type`	`sn1`	`sn2`	`sn3`	`sn4`
`sn2` (PC, PE, PS…)	sn-1 chain	sn-2 chain	NA	NA
`nacyl` (Cer, SM…)	sphingoid base	N-acyl chain	NA	NA
`long_format` (TG)	chain 1	chain 2	chain 3	NA
`long_format` (CL)	chain 1	chain 2	chain 3	chain 4
`single` (CAR, LPC)	the chain	NA	NA	NA

`result$plots`

A named list of ggplot2 objects, available when plots = TRUE:

# List all available plots
names(result$plots$lsea)
names(result$plots$chains)

Plot naming convention for LSEA bubble plots:

Name pattern	Description
`bubble_ks_01_Class`	KS bubble plot — lipid class level
`bubble_ks_sig_01_Class`	KS bubble plot — significant sets only
`bubble_fgsea_01_Class`	fgsea bubble plot — lipid class level
`bubble_fgsea_sig_01_Class`	fgsea bubble plot — significant sets only
`dist_01_Class`	Distribution (boxplot) — lipid class level

Levels: 01_Class (lipid class), 02_LMAPS (LIPID MAPS category), 03_Functional (functional category).

`result$input`

The annotated input data and the grouping columns used:

# Annotated data with LipidClass, LipidCategory_LMAPS, etc.
head(result$input$data)

# Grouping columns tested
result$input$group_cols

Viewing plots

Individual plots can be displayed directly:

# KS bubble plot — all lipid classes
result$plots$lsea$bubble_ks_01_Class

# fgsea bubble plot — significant sets only
result$plots$lsea$bubble_fgsea_sig_01_Class

# Distribution plot — lipid class level
result$plots$lsea$dist_01_Class

To customize bubble labels:

# Regenerate plots showing only FDR and n
plots <- plot_lsea(
  result$lsea,
  case_lbl     = "NASH",
  ref_lbl      = "Control",
  bubble_label = c("FDR", "n")
)

Exporting results

export_lsea() saves all results to a timestamped folder. Supply the output directory explicitly via dir (here a temporary directory; for a real analysis use a folder of your choice):

export_lsea(
  result,
  dir    = tempdir(),
  format = c("csv", "excel", "pdf")
)

This creates:

easyLSEA_NASH_vs_Control_2024-01-15_1430/
  tables/
    lsea_results_ks.csv
    lsea_results_fgsea.csv
    lsea_combined.csv
    chain_results.csv
  plots/
    lsea/
      01_Class/
        bubble_ks_01_Class.pdf
        bubble_ks_sig_01_Class.pdf
        bubble_fgsea_01_Class.pdf
        bubble_fgsea_sig_01_Class.pdf
        dist_01_Class.pdf
      02_LMAPS/  ...
      03_Functional/  ...
    chains/
      tile/  ...
      trend/  ...
  results.xlsx

Advanced usage

Running engines separately

# Step 1: annotate
annotated <- annotate_lipids(my_lipid_data, lipid_col = "LipidName")

# Step 2: run enrichment
lsea_res <- run_lsea(
  data      = annotated,
  fc_col    = "logFC",
  engine    = "both",
  case_lbl  = "NASH",
  ref_lbl   = "Control"
)

# Step 3: generate plots manually
plots <- plot_lsea(
  lsea_res,
  case_lbl     = "NASH",
  ref_lbl      = "Control",
  fdr_thresh   = 0.05,
  bubble_label = c("FDR", "DS", "NES", "n")
)

# Step 4: distribution plot for a specific level
p_dist <- plot_distribution(
  data        = annotated,
  lsea_result = lsea_res,
  group_col   = "LipidClass",
  case_lbl    = "NASH",
  ref_lbl     = "Control"
)

Changing the fgsea rank metric

# Default: pi-value = sign(logFC) * -log10(P.Value)
# Combines effect size and statistical evidence

# Alternative: logFC only
result_fc <- easyLSEA(
  data        = my_lipid_data,
  engine      = "fgsea",
  fgsea_rank  = "logFC"
)

# Alternative: LIMMA t-statistic (requires a 't' column)
result_t <- easyLSEA(
  data        = my_lipid_data,
  engine      = "fgsea",
  fgsea_rank  = "t_stat"
)

Adjusting significance thresholds

result <- easyLSEA(
  data      = my_lipid_data,
  min_n     = 5L,      # require at least 5 lipids per set
  n_perm    = 5000L,   # more permutations for DS_perm_pval
  fgsea_nperm = 20000L # more permutations for fgsea
)

Filtering by annotation confidence rank

When lipid annotations include a confidence rank (A > B > C > D > E > P), min_rank controls which lipids enter the chain analysis:

# Default: include all except P and NA
result_all <- easyLSEA(data = my_lipid_data, min_rank = "E")

# Strict: include only high-confidence annotations (A and B)
result_strict <- easyLSEA(data = my_lipid_data, min_rank = "B")

# Or apply directly to parse_lipid_chains
chains_strict <- parse_lipid_chains(annotated, min_rank = "B")
table(chains_strict$summary$status)

Lipids excluded by min_rank appear in result$chains$summary with status excluded_rank_below_X (where X is the chosen threshold), making it easy to audit which lipids were filtered.

Interpreting results

KS vs fgsea — which to trust?

Both engines test the same null hypothesis (no systematic enrichment) but are sensitive to different signal patterns:

KS is powerful when many lipids in a class shift moderately in the same direction. A significant KS result with DirectionalScore ≈ 0 suggests a variance or shape difference rather than a net directional shift — interpret with caution.
fgsea is powerful when a small number of lipids drive an extreme signal. A significant fgsea result without KS support suggests the enrichment is carried by a few lipids rather than the class as a whole.

Convergent results (significant by both) provide the strongest evidence for coordinated lipid class remodeling.

The DirectionalScore

The DirectionalScore is a standardized mean difference (Cohen’s d analog):

DS = (mean_logFC_set - mean_logFC_background) / SD_pooled

It quantifies the direction and magnitude of the shift, independent of the KS p-value. A set can have a significant KS p-value (distributional difference) with a near-zero DirectionalScore (no net direction) — this pattern suggests heterogeneous within-set regulation.

The pi-value rank metric

By default, fgsea ranks lipids by:

pi-value = sign(logFC) × −log10(P.Value)

This combines the direction of change with statistical confidence, giving more weight to lipids that are both strongly and significantly regulated. It is preferred over logFC alone when p-values are available.

Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Tahoe 26.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     xfun_0.57        
#>  [5] cachem_1.1.0      knitr_1.51        htmltools_0.5.9   rmarkdown_2.31   
#>  [9] lifecycle_1.0.5   cli_3.6.6         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.18.0 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.11.0      yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [21] rlang_1.2.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.