Converting between single-cell formats with lstar

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

lstar is a lightweight interchange layer for single-cell and spatial omics. A dataset is a set of axes (labelled sets you index by — cells, genes, pca) and fields (typed data over a tuple of axes — counts, embeddings, graphs, labels), serialized to a portable Zarr store that R, Python, and C++ all read and write. Format conversion is then just write_Y(read_X(obj)) with the L* store as the universal intermediate, and what a target cannot hold is recorded in ds$dropped rather than silently lost.

The model in R, end to end

Everything below runs with only the base dependencies (Matrix); no Seurat/SCE needed.

library(lstar)

cells <- paste0("c", 1:6); genes <- paste0("g", 1:4)
m <- as(matrix(as.numeric(1:24), 6, 4, dimnames = list(cells, genes)), "CsparseMatrix")  # cells x genes

ds <- list(
  kind = "sample",
  axes = list(
    cells = list(labels = cells, origin = "observed", role = "observation"),
    genes = list(labels = genes, origin = "observed", role = "feature")),
  fields = list(
    counts = list(role = "measure", span = c("cells", "genes"), state = "raw", values = m),
    cluster = list(role = "label", span = "cells", values = factor(c("a", "a", "b", "b", "a", "b")))))
class(ds) <- "lstar_dataset"

p <- tempfile(fileext = ".lstar.zarr")
lstar_write(ds, p)            # -> a portable Zarr store (also readable from Python and C++)
ds2 <- lstar_read(p)
ds2
#> lstar_dataset (sample): 2 axes, 2 fields
#>   axis  cells      6
#>   axis  genes      4
#>   field counts         measure    [cells x genes]
#>   field cluster        label      [cells]

A categorical label over cells induces a factor axis whose labels are its categories, so independent per-group results align on one axis.

Converting to and from Seurat / SingleCellExperiment

The profiles map the shared-vocabulary core — counts, normalized/scaled expression, PCA (scores and gene loadings), UMAP/t-SNE, clusterings, cell/gene metadata — between formats. (Not evaluated here, to keep the vignette dependency-free.)

so  <- write_seurat(ds)          # L* dataset  -> Seurat object
ds3 <- read_seurat(so)           # Seurat       -> L* dataset
sce <- write_sce(read_seurat(so))   # Seurat -> SingleCellExperiment, in one line

Cross-language conversions go through the on-disk store — write it on one side, read it on the other, no shared memory and no format re-implementation:

# Python:  lstar.write(read_anndata(ad.read_h5ad("pbmc.h5ad")), "pbmc.lstar.zarr")
ds_from_h5ad <- lstar_read("pbmc.lstar.zarr")
saveRDS(write_seurat(ds_from_h5ad), "pbmc.rds")

The `lstar convert` command line

The Python package ships a one-command CLI that detects formats by path, bridges R and Python through the store automatically, and reports what crossed (and what was dropped):

lstar convert pbmc.h5ad pbmc.rds --report        # AnnData -> Seurat, with a fidelity report
lstar convert pbmc.rds  pbmc.h5ad --check        # + open the result in its native library and smoke-test it

--backend auto|native|direct adds a package-free fallback: .h5ad converts with only h5py (no anndata), and a Seurat .rds reads and writes with base R + this package (no SeuratObject); an SCE .rds reads package-free. See vignette topics and the package website for the full conversion matrix. ```

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

Converting between single-cell formats with lstar

The model in R, end to end

Converting to and from Seurat / SingleCellExperiment

The lstar convert command line

The `lstar convert` command line