A real world example

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This is practically the same code you can find on this blog post of mine: https://www.brodrigues.co/blog/2018-11-14-luxairport/ but with some minor updates to reflect the current state of the {tidyverse} packages as well as logging using {chronicler}.

Let’s first load the required packages, and the avia dataset included in the {chronicler} package:

library(chronicler)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> The following object is masked from 'package:chronicler':
#> 
#>     pick
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:testthat':
#> 
#>     matches
library(stringr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# Ensure chronicler version of `pick()` is being used
pick <- chronicler::pick

data("avia")

Now I need to define the needed functions for the analysis. To improve logging, I add the dim() function as the .g argument of each function below. This will make it possible to see how the dimensions of the data change inside the pipeline:

# Define required functions 
# You can use `record_many()` to avoid having to write everything

r_select <- record(select, .g = dim)
r_pivot_longer <- record(pivot_longer, .g = dim)
r_filter <- record(filter, .g = dim)
r_mutate <- record(mutate, .g = dim)
r_separate <- record(separate, .g = dim)
r_group_by <- record(group_by, .g = dim)
r_summarise <- record(summarise, .g = dim)

avia_clean <- avia %>%
  r_select(1, contains("20")) %>% # select the first column and every column starting with 20
  bind_record(r_pivot_longer, -starts_with("unit"), names_to = "date", values_to = "passengers") %>%
  bind_record(r_separate,
              col = 1,
              into = c("unit", "tra_meas", "air_pr\\time"),
              sep = ",")

avia_monthly <- avia_clean %>%
  bind_record(r_filter,
              tra_meas == "PAS_BRD_ARR",
              !is.na(passengers),
              str_detect(date, "M")) %>%
  bind_record(r_mutate,
              date = paste0(date, "01"),
              date = ymd(date)) %>%
  bind_record(r_select,
              destination = "air_pr\\time", date, passengers)

avia_monthly is an object of class chronicle, but in essence, it is just a list, with its own print method:

avia_monthly
#> OK! Value computed successfully:
#> ---------------
#> Just
#> # A tibble: 7,632 × 3
#>    destination     date       passengers
#>    <chr>           <date>     <chr>     
#>  1 LU_ELLX_AT_LOWW 2018-03-01 3967      
#>  2 LU_ELLX_AT_LOWW 2018-02-01 3232      
#>  3 LU_ELLX_AT_LOWW 2018-01-01 3701      
#>  4 LU_ELLX_AT_LOWW 2017-12-01 4249      
#>  5 LU_ELLX_AT_LOWW 2017-11-01 4311      
#>  6 LU_ELLX_AT_LOWW 2017-10-01 4591      
#>  7 LU_ELLX_AT_LOWW 2017-09-01 4816      
#>  8 LU_ELLX_AT_LOWW 2017-08-01 4399      
#>  9 LU_ELLX_AT_LOWW 2017-07-01 4277      
#> 10 LU_ELLX_AT_LOWW 2017-06-01 4674      
#> # ℹ 7,622 more rows
#> 
#> ---------------
#> This is an object of type `chronicle`.
#> Retrieve the value of this object with pick(.c, "value").
#> To read the log of this object, call read_log(.c).

read_log(avia_monthly)
#> [1] "Complete log:"                                                                                                                   
#> [2] "OK! select(1,contains(\"20\")) ran successfully at 2024-03-20 10:35:20.318707"                                                   
#> [3] "OK! pivot_longer(-starts_with(\"unit\"),date,passengers) ran successfully at 2024-03-20 10:35:20.318605"                         
#> [4] "OK! separate(1,c(\"unit\", \"tra_meas\", \"air_pr\\\\time\"),,) ran successfully at 2024-03-20 10:35:20.318477"                  
#> [5] "OK! filter(tra_meas == \"PAS_BRD_ARR\",!is.na(passengers),str_detect(date, \"M\")) ran successfully at 2024-03-20 10:35:21.95984"
#> [6] "OK! mutate(paste0(date, \"01\"),ymd(date)) ran successfully at 2024-03-20 10:35:21.959718"                                       
#> [7] "OK! select(air_pr\\time,date,passengers) ran successfully at 2024-03-20 10:35:21.959596"                                         
#> [8] "Total running time: 1.80789136886597 secs"

This is especially useful if the object avia_monthly gets saved using saveRDS(). People that then read this object, can read the log to know what happened and reproduce the steps if necessary.

avia_monthly %>%
  pick("value")
#> # A tibble: 7,632 × 3
#>    destination     date       passengers
#>    <chr>           <date>     <chr>     
#>  1 LU_ELLX_AT_LOWW 2018-03-01 3967      
#>  2 LU_ELLX_AT_LOWW 2018-02-01 3232      
#>  3 LU_ELLX_AT_LOWW 2018-01-01 3701      
#>  4 LU_ELLX_AT_LOWW 2017-12-01 4249      
#>  5 LU_ELLX_AT_LOWW 2017-11-01 4311      
#>  6 LU_ELLX_AT_LOWW 2017-10-01 4591      
#>  7 LU_ELLX_AT_LOWW 2017-09-01 4816      
#>  8 LU_ELLX_AT_LOWW 2017-08-01 4399      
#>  9 LU_ELLX_AT_LOWW 2017-07-01 4277      
#> 10 LU_ELLX_AT_LOWW 2017-06-01 4674      
#> # ℹ 7,622 more rows

It is also possible to take a look at the underlying .log_df object that contains more details, and see the output of the .g argument (which was defined in the beginning as the dim() function):

check_g(avia_monthly)
#>   ops_number     function         g
#> 1          1       select  509, 231
#> 2          2 pivot_longer 117070, 3
#> 3          3     separate 117070, 5
#> 4          4       filter   7632, 5
#> 5          5       mutate   7632, 5
#> 6          6       select   7632, 3

After select() the data has 509 rows and 231 columns, after the call to pivot_longer() 117070 rows and 3 columns, separate() adds two columns, after filter() only 7632 rows remain (mutate() does not change the dimensions) and then select() is used to remove 2 columns.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.