The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Atomic Vectors

Atomic vectors are the fundamental data structure in R. They include numeric (integer and double), logical, character, complex, and raw vectors. This vignette explains how h5lite maps these R types to HDF5 datasets and provides guidance on controlling storage types and compression.

library(h5lite)
file <- tempfile(fileext = ".h5")

Basic Usage

Writing a vector to HDF5 is straightforward using h5_write(). The package automatically creates the necessary dataset and handles dimensions.

# Write a numeric vector
vec <- c(1.5, 2.3, 4.2, 5.1)
h5_write(vec, file, "data/numeric_vector")

# Read it back
res <- h5_read(file, "data/numeric_vector")
print(res)
#> [1] 1.5 2.3 4.2 5.1

Scalars vs. 1D Arrays

In R, a “scalar” is simply a vector of length 1. However, HDF5 distinguishes between a Scalar Dataspace (a single value with no dimensions) and a Simple Dataspace (an array) with dimensions [1].

By default, h5lite treats length-1 vectors as 1D arrays to maintain consistency with R’s vector behavior. To write a true HDF5 scalar, you must wrap the value in I().

# 1. Default: 1D Array (Length 1)
h5_write(42, file, "structure/array_1d")

# 2. Explicit Scalar: Wrapped in I()
h5_write(I(42), file, "structure/scalar")

h5_str(file, "structure")
#> structure/
#> ├── array_1d <uint8 × 1>
#> └── scalar <uint8 scalar>

Note: When reading data back into R, both storage formats appear as standard R vectors of length 1.

Numeric and Logical Data

Automatic Type Selection

h5lite attempts to map R types to the most efficient HDF5 equivalents automatically (as = "auto").

  1. Numeric: h5lite analyzes the range of your data and picks the smallest fitting HDF5 type (e.g., uint8, int16, int32, float64).
  2. Logicals: h5lite maps these to uint8 (0 or 1) in HDF5 to save space.

Handling Missing Values (NA)

A key challenge in HDF5 is that standard integer and boolean types do not have a native representation for NA (missing values).

To ensure data safety, h5lite performs the following check:

# Integer vector with NO missing values -> Automatic optimal type (uint8)
h5_write(c(1L, 2L, 3L), file, "safe/ints")
h5_typeof(file, "safe/ints")
#> [1] "uint8"

# Integer vector WITH missing values -> Promoted to float64
h5_write(c(1L, NA, 3L), file, "safe/ints_na")
h5_typeof(file, "safe/ints_na")
#> [1] "float64"

Forcing Specific Types

If you know your data range fits into a smaller type (e.g., int8, uint16), you can use the as argument to force a specific storage type.

Warning: If you force an integer type on data containing NA or values outside the integer type’s range then h5lite will throw an error.

# Store small integers as 8-bit signed integers
h5_write(c(10, -5, 100), file, "small_ints", as = "int8")

# Store logicals as 8-bit unsigned integers
h5_write(c(TRUE, FALSE), file, "bools", as = "uint8")

Character Vectors (Strings)

HDF5 supports two primary methods for storing strings: Variable-Length and Fixed-Length.

Automatic Type Selection

By default (as = "auto"), h5lite chooses the most efficient string representation:

Variable-Length

You can explicitly request variable-length storage using as = "utf8" or as = "ascii".

# Variable length strings (handles NA)
h5_write(c("apple", "banana", NA), file, "strings/var")

Fixed-Length

You can force fixed-length storage using the syntax [n], where n is the number of bytes.

# Fixed length strings (10 bytes per string)
h5_write(c("A", "B", "C"), file, "strings/fixed", as = "ascii[10]")

# Auto-detect max length (converts to fixed length based on longest string)
h5_write(c("short", "longer", "longest"), file, "strings/auto_fixed", as = "ascii[]")

Compression

Compression in HDF5 requires the dataset to be “chunked”. h5lite handles chunking parameters automatically when you enable compression.

You can enable compression using the compress argument:

# Write a large vector with compression
x <- rep(rnorm(100), 100)
h5_write(x, file, "compressed_data", compress = TRUE)

64-bit Integers

R does not natively support 64-bit integers, but the bit64 package provides an integer64 class. h5lite supports reading and writing these types directly to HDF5 int64.

if (requireNamespace("bit64", quietly = TRUE)) {
  val <- bit64::as.integer64(c("9223372036854775807", "-9223372036854775807"))
  
  h5_write(val, file, "huge_ints")
  
  in_val <- h5_read(file, "huge_ints")
  print(class(in_val))
}
#> [1] "numeric"

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.