The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

A matsindf_apply primer

Matthew Kuperus Heun

2024-01-31

Introduction

matsindf_apply() is a powerful and versatile function that enables analysis with lists and data frames by applying FUN in helpful ways. The function is called matsindf_apply(), because it can be used to apply FUN to a matsindf data frame, a data frame that contains matrices as individual entries in a data frame. (A matsindf data frame can be created by calling collapse_to_matrices(), as demonstrated below.)

But matsindf_apply() can apply FUN across much more: data frames of single numbers, lists of matrices, lists of single numbers, and individual numbers. This vignette demonstrates matsindf_apply(), starting with simple examples and proceeding toward sophisticated analyses.

The basics

The basis of all analyses conducted with matsindf_apply() is a function (FUN) to be applied across data supplied in .dat or .... FUN must return a named list of variables as its result. Here is an example function that both adds and subtracts its arguments, a and b, and returns a list containing its result, c and d.

example_fun <- function(a, b){
  return(list(c = matsbyname::sum_byname(a, b), 
              d = matsbyname::difference_byname(a, b)))
}

Similar to lapply() and its siblings, additional argument(s) to matsindf_apply() include the data over which FUN is to be applied. These arguments can, in the first instance, be supplied as named arguments to the ... argument of matsindf_apply(). All arguments in ... must be named. The ... arguments to matsindf_apply() are passed to FUN according to their names. In this case, the output of matsindf_apply() is the the named list returned by FUN.

matsindf_apply(FUN = example_fun, a = 2, b = 1)
#> $c
#> [1] 3
#> 
#> $d
#> [1] 1

Passing an additional argument (z = 2) causes an unused argument error, because example_fun does not have a z argument.

tryCatch(
  matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
  error = function(e){e}
)
#> <simpleError in matsindf_apply_types(.dat, FUN, ...): In matsindf::matsindf_apply(), the following unused arguments appeared in ...: z>

Failing to pass a needed argument (b) causes an error that indicates the missing argument.

tryCatch(
  matsindf_apply(FUN = example_fun, a = 2),
  error = function(e){e}
)
#> Warning in matsindf_apply_types(.dat, FUN, ...): In matsindf::matsindf_apply(),
#> the following named arguments to FUN were not found in any of .dat, ..., or
#> defaults to FUN: b. Set .warn_missing_FUN_args = FALSE to suppress this warning
#> if you know what you are doing.
#> <simpleError in (function (a, b) {    return(list(c = matsbyname::sum_byname(a, b), d = matsbyname::difference_byname(a,         b)))})(a = 2): argument "b" is missing, with no default>

Alternatively, arguments to FUN can be given in a named list to .dat, the first argument of matsindf_apply(). When a value is assigned to .dat, the return value from matsindf_apply() contains all named variables in .dat (in this case both a and b) in addition to the results provided by FUN (in this case both c and d).

matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
#> $a
#> [1] 2
#> 
#> $b
#> [1] 1
#> 
#> $c
#> [1] 3
#> 
#> $d
#> [1] 1

Extra variables are tolerated in .dat, because .dat is considered to be a store of data from which variables can be drawn as needed.

matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
#> $a
#> [1] 2
#> 
#> $b
#> [1] 1
#> 
#> $c
#> [1] 3
#> 
#> $d
#> [1] 1

In contrast, arguments to ... are named explicitly by the user, so including an extra argument in ... is considered an error, as shown above.

Some details

If a named argument is supplied by both .dat and ..., the argument in ... takes precedence, overriding the argument in .dat.

matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
#> $a
#> [1] 10
#> 
#> $b
#> [1] 1
#> 
#> $c
#> [1] 11
#> 
#> $d
#> [1] 9

When supplying both .dat and ..., ... can contain named strings of length 1 which are interpreted as mappings from named items in .dat to arguments in the signature of FUN. In the example below, a = "z" indicates that argument a to FUN should be supplied by item z in .dat.

matsindf_apply(list(a = 2, b = 1, z = 42),
               FUN = example_fun, a = "z")
#> $a
#> [1] 2
#> 
#> $b
#> [1] 1
#> 
#> $z
#> [1] 42
#> 
#> $c
#> [1] 43
#> 
#> $d
#> [1] 41

If a named argument appears in both .dat and the output of FUN, a name collision occurs in the output of matsindf_apply(), and a warning is issued.

tryCatch(
  matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
  warning = function(w){w}
)
#> <simpleWarning in matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun): Name collision in matsindf::matsindf_apply(). The following arguments appear both in .dat and in the output of `FUN`: c>

FUN can accept more than just numerics. example_fun_with_string() accepts a character string and a numeric. However, because ... argument that is a character string of length 1 has special meaning (namely mapping variables in .dat to arguments of FUN), passing a character string of length 1 can cause an error. To get around the problem, wrap the single string in a list, as shown below.

example_fun_with_string <- function(str_a, b) {
  a <- as.numeric(str_a)
  list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a, b))
}

# Causes an error
tryCatch(
  matsindf_apply(FUN = example_fun_with_string, str_a = "1", b = 2),
  error = function(e){e}
)
#> Warning in matsindf_apply_types(.dat, FUN, ...): In matsindf::matsindf_apply(),
#> the following named arguments to FUN were not found in any of .dat, ..., or
#> defaults to FUN: str_a. Set .warn_missing_FUN_args = FALSE to suppress this
#> warning if you know what you are doing.
#> <simpleError in (function (str_a, b) {    a <- as.numeric(str_a)    list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a,         b))})(b = 2): argument "str_a" is missing, with no default>
# To solve the problem, wrap "1" in list().
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = 2)
#> $added
#> [1] 3
#> 
#> $subtracted
#> [1] -1
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = list(2))
#> $added
#> [1] 3
#> 
#> $subtracted
#> [1] -1
matsindf_apply(FUN = example_fun_with_string, 
               str_a = list("1", "3"), 
               b = list(2, 4))
#> $added
#> $added[[1]]
#> [1] 3
#> 
#> $added[[2]]
#> [1] 7
#> 
#> 
#> $subtracted
#> $subtracted[[1]]
#> [1] -1
#> 
#> $subtracted[[2]]
#> [1] -1
matsindf_apply(.dat = list(str_a = list("1"), b = list(2)), FUN = example_fun_with_string)
#> $str_a
#> [1] "1"
#> 
#> $b
#> [1] 2
#> 
#> $added
#> [1] 3
#> 
#> $subtracted
#> [1] -1
matsindf_apply(.dat = list(m = list("1"), n = list(2)), FUN = example_fun_with_string, 
               str_a = "m", b = "n")
#> $m
#> [1] "1"
#> 
#> $n
#> [1] 2
#> 
#> $added
#> [1] 3
#> 
#> $subtracted
#> [1] -1

matsindf_apply() and data frames

.dat can also contain a data frame (or tibble), both of which are fancy lists. When .dat is a data frame or tibble, the output of matsindf_apply() is a tibble, and FUN acts like a specialized dplyr::mutate(), adding new columns at the right of .dat.

matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), 
               FUN = example_fun_with_string)
#> # A tibble: 2 × 4
#>   str_a     b added subtracted
#>   <chr> <dbl> <dbl>      <dbl>
#> 1 1         2     3         -1
#> 2 3         4     7         -1
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), 
               FUN = example_fun_with_string, 
               str_a = "str_a", b = "b")
#> # A tibble: 2 × 4
#>   str_a     b added subtracted
#>   <chr> <dbl> <dbl>      <dbl>
#> 1 1         2     3         -1
#> 2 3         4     7         -1
matsindf_apply(.dat = data.frame(m = c("1", "3"), n = c(2, 4)), 
               FUN = example_fun_with_string, 
               str_a = "m", b = "n")
#> # A tibble: 2 × 4
#>   m         n added subtracted
#>   <chr> <dbl> <dbl>      <dbl>
#> 1 1         2     3         -1
#> 2 3         4     7         -1

Additional niceties are available when .dat is a data frame or a tibble. matsindf_apply() works when the data frame is filled with single numeric values, as is typical.

df <- data.frame(a = 2:4, b = 1:3)
matsindf_apply(df, FUN = example_fun)
#> # A tibble: 3 × 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     2     1     3     1
#> 2     3     2     5     1
#> 3     4     3     7     1

But matsindf_apply() also works with matsindf data frames, data frames in which each cell of the data frame is filled with a single matrix. To demonstrate use of matsindf_apply() with a matsindf data frame, we’ll construct a simple matsindf data frame (midf) using functions in this package.

# Create a tidy data frame containing data for matrices
tidy <- tibble::tibble(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
                       matnames = c(rep("U", 8), rep("V", 8)),
                       matvals = c(1:4, 11:14, 21:24, 31:34),
                       rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2), 
                                    rep(c(rep("i1", 2), rep("i2", 2)), 2)),
                       colnames = c(rep(c("i1", "i2"), 4), 
                                    rep(c("p1", "p2"), 4))) |>
  dplyr::mutate(
    rowtypes = case_when(
      matnames == "U" ~ "Product",
      matnames == "V" ~ "Industry", 
      TRUE ~ NA_character_
    ),
    coltypes = case_when(
      matnames == "U" ~ "Industry",
      matnames == "V" ~ "Product",
      TRUE ~ NA_character_
    )
  )

tidy
#> # A tibble: 16 × 7
#>     Year matnames matvals rownames colnames rowtypes coltypes
#>    <dbl> <chr>      <int> <chr>    <chr>    <chr>    <chr>   
#>  1  2017 U              1 p1       i1       Product  Industry
#>  2  2017 U              2 p1       i2       Product  Industry
#>  3  2017 U              3 p2       i1       Product  Industry
#>  4  2017 U              4 p2       i2       Product  Industry
#>  5  2018 U             11 p1       i1       Product  Industry
#>  6  2018 U             12 p1       i2       Product  Industry
#>  7  2018 U             13 p2       i1       Product  Industry
#>  8  2018 U             14 p2       i2       Product  Industry
#>  9  2017 V             21 i1       p1       Industry Product 
#> 10  2017 V             22 i1       p2       Industry Product 
#> 11  2017 V             23 i2       p1       Industry Product 
#> 12  2017 V             24 i2       p2       Industry Product 
#> 13  2018 V             31 i1       p1       Industry Product 
#> 14  2018 V             32 i1       p2       Industry Product 
#> 15  2018 V             33 i2       p1       Industry Product 
#> 16  2018 V             34 i2       p2       Industry Product

# Convert to a matsindf data frame
midf <- tidy |>  
  dplyr::group_by(Year, matnames) |> 
  collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") |> 
  tidyr::pivot_wider(names_from = "matnames", values_from = "matvals")

# Take a look at the midf data frame and some of the matrices it contains.
midf
#> # A tibble: 2 × 3
#>    Year U             V            
#>   <dbl> <list>        <list>       
#> 1  2017 <dbl [2 × 2]> <dbl [2 × 2]>
#> 2  2018 <dbl [2 × 2]> <dbl [2 × 2]>
midf$U[[1]]
#>    i1 i2
#> p1  1  2
#> p2  3  4
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
midf$V[[1]]
#>    p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "Industry"
#> attr(,"coltype")
#> [1] "Product"

With midf in hand, we can demonstrate use of tidyverse-style functional programming to perform matrix algebra within a data frame. The functions of the matsbyname package (such as difference_byname() below) can be used for this purpose.

result <- midf |> 
  dplyr::mutate(
    W = difference_byname(transpose_byname(V), U)
  )
result
#> # A tibble: 2 × 4
#>    Year U             V             W            
#>   <dbl> <list>        <list>        <list>       
#> 1  2017 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>
#> 2  2018 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>
result$W[[1]]
#>    i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
result$W[[2]]
#>    i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"

This way of performing matrix calculations works equally well within a 2-row matsindf data frame (as shown above) or within a 1000-row matsindf data frame.

Programming with matsindf_apply()

Users can write their own functions using matsindf_apply(). A flexible calc_W() function can be written as follows.

calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W") {
  # The inner function does all the work.
  W_func <- function(U_mat, V_mat){
    # When we get here, U_mat and V_mat will be single matrices or single numbers, 
    # not a column in a data frame or an item in a list.
    if (length(U_mat) == 0 & length(V_mat == 0)) {
      # Tolerate zero-length arguments by returning a zero-length
      # a list with the correct name and return type.
      return(list(numeric()) |> magrittr::setnames(W))
    }
    # Calculate W_mat from the inputs U_mat and V_mat.
    W_mat <- matsbyname::difference_byname(
      matsbyname::transpose_byname(V_mat), 
      U_mat)
    # Return a named list.
    list(W_mat) |> magrittr::set_names(W)
  }
  # The body of the main function consists of a call to matsindf_apply
  # that specifies the inner function in the FUN argument.
  matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}

This style of writing matsindf_apply() functions is incredibly versatile, leveraging the capabilities of both the matsindf and matsbyname packages. (Indeed, the Recca package uses matsindf_apply() heavily and is built upon the functions in the matsindf and matsbyname packages.)

Functions written like calc_W() can operate in ways similar to matsindf_apply() itself. To demonstrate, we’ll use calc_W() in all the ways that matsindf_apply() can be used, going in the reverse order to our demonstration of the capabilities of matsindf_apply() above.

calc_W() can be used as a specialized mutate function that operates on matsindf data frames.

midf |> calc_W()
#> # A tibble: 2 × 4
#>    Year U             V             W            
#>   <dbl> <list>        <list>        <list>       
#> 1  2017 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>
#> 2  2018 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>

The added column could be given a different name from the default (“W”) using the W argument.

midf |> calc_W(W = "W_prime")
#> # A tibble: 2 × 4
#>    Year U             V             W_prime      
#>   <dbl> <list>        <list>        <list>       
#> 1  2017 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>
#> 2  2018 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>

As with matsindf_apply(), column names in midf can be mapped to the arguments of calc_W() by the arguments to calc_W().

midf |> 
  dplyr::rename(X = U, Y = V) |> 
  calc_W(U = "X", V = "Y")
#> # A tibble: 2 × 4
#>    Year X             Y             W            
#>   <dbl> <list>        <list>        <list>       
#> 1  2017 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>
#> 2  2018 <dbl [2 × 2]> <dbl [2 × 2]> <dbl [2 × 2]>

calc_W() can operate on lists of single matrices, too. This approach works, because the default values for the U and V arguments to calc_W() are “U” and “V”, respectively. The input list members (in this case midf$U[[1]] and midf$V[[1]]) are returned with the output, because list(U = midf$U[[1]], V = midf$V[[1]]) is passed to the .dat argument of matsindf_apply().

calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
#> $U
#>    i1 i2
#> p1  1  2
#> p2  3  4
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"
#> 
#> $V
#>    p1 p2
#> i1 21 22
#> i2 23 24
#> attr(,"rowtype")
#> [1] "Industry"
#> attr(,"coltype")
#> [1] "Product"
#> 
#> $W
#>    i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"

It may be clearer to name the arguments as required by the calc_W() function without wrapping in a list first, as shown below. But in this approach, the input matrices are not returned with the output, because arguments U and V are passed to the ... argument of matsindf_apply(), not the .dat argument of matsindf_apply().

calc_W(U = midf$U[[1]], V = midf$V[[1]])
#> $W
#>    i1 i2
#> p1 20 21
#> p2 19 20
#> attr(,"rowtype")
#> [1] "Product"
#> attr(,"coltype")
#> [1] "Industry"

calc_W() can operate on data frames containing single numbers.

data.frame(U = c(1, 2), V = c(3, 4)) |> calc_W()
#> # A tibble: 2 × 3
#>       U     V     W
#>   <dbl> <dbl> <dbl>
#> 1     1     3     2
#> 2     2     4     2

Finally, calc_W() can be applied to single numbers, and the result is 1x1 matrix.

calc_W(U = 2, V = 3)
#> $W
#> [1] 1

It is good practice to write internal functions that tolerate zero-length inputs, as calc_W() does. Doing so, enables results from different calculations to be rbinded together.

calc_W(U = numeric(), V = numeric())
#> $W
#> numeric(0)
calc_W(list(U = numeric(), V = numeric()))
#> $U
#> numeric(0)
#> 
#> $V
#> numeric(0)
#> 
#> $W
#> numeric(0)

res <- calc_W(list(U = c(2, 3, 4, 5), V = c(3, 4, 5, 6)))
res0 <- calc_W(list(U = numeric(), V = numeric()))
dplyr::bind_rows(res, res0)
#> # A tibble: 4 × 3
#>       U     V     W
#>   <dbl> <dbl> <dbl>
#> 1     2     3     1
#> 2     3     4     1
#> 3     4     5     1
#> 4     5     6     1

Conclusion

This vignette demonstrated use of the versatile matsindf_apply() function. Inputs to matsindf_apply() can be

matsindf_apply() can be used for programming, and functions constructed as demonstrated above share characteristics with matsindf_apply():

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.