The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

phinterval

Codecov test coverage Lifecycle: experimental R-CMD-check

phinterval is a package for representing and manipulating time spans that may contain gaps. It implements the <phinterval> (think “potentially-holey-interval”) vector class, designed as an extension of the {lubridate} <Interval>, to represent spans of time that are contiguous, disjoint, empty, or missing.

Functionality for manipulating these spans includes:

Installation

Install the released version from CRAN with:

install.packages("phinterval")

You can install the development version of phinterval from GitHub with:

# install.packages("pak")
pak::pak("EthanSansom/phinterval")

Usage

Each element of a <phinterval> vector is a set of non-overlapping and non-adjacent intervals. For scalar intervals (one span per element), phinterval() works like lubridate::interval():

library(phinterval)
library(lubridate, warn.conflicts = FALSE)

# Create scalar phintervals (equivalent to interval())
phinterval(
  start = ymd(c("2000-01-01", "2000-01-03", "2000-01-04")),
  end = ymd(c("2000-01-02", "2000-01-05", "2000-01-09"))
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-03--2000-01-05} {2000-01-04--2000-01-09}

To create phintervals with multiple disjoint spans per element, use the by argument to group intervals. Overlapping or adjacent spans within each group are automatically merged:

# Create a phinterval with disjoint spans using the by argument
phint <- phinterval(
  start = ymd(c("2000-01-03", "2000-01-01", "2000-01-04")),
  end = ymd(c("2000-01-05", "2000-01-02", "2000-01-09")),
  by = c(1, 2, 2)
)
phint
#> <phinterval<UTC>[2]>
#> [1] {2000-01-03--2000-01-05}                        
#> [2] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}

Graphically, the elements of phint are represented as:

In most cases, a <phinterval> vector will appear as the result of manipulating <Interval> vectors. For example, phint_squash() flattens a vector of time spans into a scalar <phinterval>.

jan_1_to_9 <- interval(ymd("2000-01-01"), ymd("2000-01-09"))
jan_1_to_2 <- interval(ymd("2000-01-01"), ymd("2000-01-02"))
jan_3_to_5 <- interval(ymd("2000-01-03"), ymd("2000-01-05"))
jan_4_to_9 <- interval(ymd("2000-01-04"), ymd("2000-01-09"))

ints <- c(jan_1_to_2, jan_3_to_5, jan_4_to_9)
phint_squash(ints)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-03--2000-01-09}

The squashed intervals contain the set of time spans within any of the input intervals, without duplication.

Example: Employment History

The phinterval package is most useful when working with tabular data, such as a longitudinal employment panel.

library(dplyr, warn.conflicts = FALSE)

jobs <- tribble(
  ~name,   ~job_title,             ~start,        ~end,
  "Greg",  "Mascot",               "2018-01-01",  "2018-06-03",
  "Greg",  "Executive Assistant",  "2018-06-10",  "2020-04-01",
  "Shiv",  "Political Consultant", "2017-01-01",  "2019-04-01"
)

employment <- jobs |>
  # Squash overlapping/adjacent intervals into a single phinterval
  group_by(name) |>
  summarize(employed = datetime_squash(ymd(start), ymd(end))) |>
  # Invert the employment timeline to find gaps
  mutate(unemployed = phint_invert(employed))

employment
#> # A tibble: 2 × 3
#>   name  employed                    unemployed              
#>   <chr> <phint<UTC>>                <phint<UTC>>            
#> 1 Greg  {2018-01-01-[2]-2020-04-01} {2018-06-03--2018-06-10}
#> 2 Shiv  {2017-01-01--2019-04-01}    <hole>

<phinterval> column formatting adapts to the available console width. The "[2]" in Greg’s employment interval "{2018-01-01-[2]-2020-04-01}" indicates that his employment history is made up of two disjoint spans, with the first span beginning on 2018-01-01 and the second ending on 2020-04-01. When more space is available, every span is shown explicitly.

employment |> select(name, employed)
#> # A tibble: 2 × 2
#>   name  employed                                        
#>   <chr> <phint<UTC>>                                    
#> 1 Greg  {2018-01-01--2018-06-03, 2018-06-10--2020-04-01}
#> 2 Shiv  {2017-01-01--2019-04-01}

Operations on <phinterval> vectors behave like those on standard intervals. Here, we can see that there was a 7-day gap in Greg’s employment history:

employment |>
  mutate(
    days_employed = employed / ddays(1),
    days_unemployed = unemployed / ddays(1)
  ) |>
  select(name, days_employed, days_unemployed)
#> # A tibble: 2 × 3
#>   name  days_employed days_unemployed
#>   <chr>         <dbl>           <dbl>
#> 1 Greg            814               7
#> 2 Shiv            820               0

phinterval <-> lubridate

The <phinterval> class is a generalization of the <Interval> class, meaning any <Interval> can be converted into an equivalent <phinterval> and all phinterval functions accept either <Interval> or <phinterval> inputs. The table below shows the lubridate functions that have drop-in phinterval replacements.

phinterval lubridate Returns
phinterval(start, end) interval(start, end) Spans bounded by start/end
phint_intersect(x, y) intersect(x, y) Times in x and y
phint_setdiff(x, y) setdiff(x, y) Times in x, but not in y
phint_union(x, y) union(x, y) Times in x or y
phint_start(x) int_start(x) The start time of x
phint_end(x) int_end(x) The end time of x
phint_length(x) int_length(x) The number of seconds in x
phint_overlaps(x, y) int_overlaps(x, y) Whether x and y intersect
phint_within(x, y) x %within% y Whether y contains x
x / duration(...) x / duration(...) How many durations fit in x

All phinterval set operations work as expected with arbitrary time spans, enabling operations that are not supported by lubridate. For example, the intersection of two non-overlapping intervals is an empty time span, called a <hole>.

lubridate::intersect(jan_1_to_2, jan_4_to_9)
#> [1] NA--NA
phint_intersect(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] <hole>

The set-difference of a time span and itself is also a <hole>.

lubridate::setdiff(jan_1_to_2, jan_1_to_2)
#> [1] 2000-01-01 UTC--2000-01-02 UTC
phint_setdiff(jan_1_to_2, jan_1_to_2)
#> <phinterval<UTC>[1]>
#> [1] <hole>

Performing a set-difference may “punch a hole” in a time span, creating a discontinuous interval.

try(lubridate::setdiff(jan_1_to_9, jan_3_to_5))
#> Error in setdiff.Interval(jan_1_to_9, jan_3_to_5) : 
#>   Cases 1 result in discontinuous intervals.
phint_setdiff(jan_1_to_9, jan_3_to_5)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-03, 2000-01-05--2000-01-09}

The union of two disjoint intervals is a single <phinterval> containing two spans.

lubridate::union(jan_1_to_2, jan_4_to_9)
#> [1] 2000-01-01 UTC--2000-01-09 UTC
phint_union(jan_1_to_2, jan_4_to_9)
#> <phinterval<UTC>[1]>
#> [1] {2000-01-01--2000-01-02, 2000-01-04--2000-01-09}

As with the lubridate equivalents, all phinterval set operations are vectorized.

phint_intersect(
  c(jan_1_to_2, jan_3_to_5, jan_1_to_2),
  c(jan_1_to_9, jan_4_to_9, jan_4_to_9)
)
#> <phinterval<UTC>[3]>
#> [1] {2000-01-01--2000-01-02} {2000-01-04--2000-01-05} <hole>

Inspiration

This package builds on {lubridate}’s <Interval> class for representing contiguous time spans. The prototype <phinterval> data structure (a list of matrices) and the C++ implementation of phint_squash() were inspired by the {intervals} package by Richard Bourgon and Edzer Pebesma. The figures used in this README were inspired by Davis Vaughan’s {ivs} package documentation.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.