The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This is practically the same code you can find on this blog post of
mine: https://www.brodrigues.co/blog/2018-11-14-luxairport/
but with some minor updates to reflect the current state of the
{tidyverse}
packages as well as logging using
{chronicler}
.
Let’s first load the required packages, and the avia
dataset included in the {chronicler}
package:
library(chronicler)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:testthat':
#>
#> matches
#> The following object is masked from 'package:chronicler':
#>
#> pick
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#>
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:testthat':
#>
#> matches
library(stringr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# Ensure chronicler version of `pick()` is being used
pick <- chronicler::pick
data("avia")
Now I need to define the needed functions for the analysis. To
improve logging, I add the dim()
function as the
.g
argument of each function below. This will make it
possible to see how the dimensions of the data change inside the
pipeline:
# Define required functions
# You can use `record_many()` to avoid having to write everything
r_select <- record(select, .g = dim)
r_pivot_longer <- record(pivot_longer, .g = dim)
r_filter <- record(filter, .g = dim)
r_mutate <- record(mutate, .g = dim)
r_separate <- record(separate, .g = dim)
r_group_by <- record(group_by, .g = dim)
r_summarise <- record(summarise, .g = dim)
avia_clean <- avia %>%
r_select(1, contains("20")) %>% # select the first column and every column starting with 20
bind_record(r_pivot_longer, -starts_with("unit"), names_to = "date", values_to = "passengers") %>%
bind_record(r_separate,
col = 1,
into = c("unit", "tra_meas", "air_pr\\time"),
sep = ",")
Let’s focus on monthly data:
avia_monthly <- avia_clean %>%
bind_record(r_filter,
tra_meas == "PAS_BRD_ARR",
!is.na(passengers),
str_detect(date, "M")) %>%
bind_record(r_mutate,
date = paste0(date, "01"),
date = ymd(date)) %>%
bind_record(r_select,
destination = "air_pr\\time", date, passengers)
avia_monthly
is an object of class
chronicle
, but in essence, it is just a list, with its own
print method:
avia_monthly
#> OK! Value computed successfully:
#> ---------------
#> Just
#> # A tibble: 7,632 × 3
#> destination date passengers
#> <chr> <date> <chr>
#> 1 LU_ELLX_AT_LOWW 2018-03-01 3967
#> 2 LU_ELLX_AT_LOWW 2018-02-01 3232
#> 3 LU_ELLX_AT_LOWW 2018-01-01 3701
#> 4 LU_ELLX_AT_LOWW 2017-12-01 4249
#> 5 LU_ELLX_AT_LOWW 2017-11-01 4311
#> 6 LU_ELLX_AT_LOWW 2017-10-01 4591
#> 7 LU_ELLX_AT_LOWW 2017-09-01 4816
#> 8 LU_ELLX_AT_LOWW 2017-08-01 4399
#> 9 LU_ELLX_AT_LOWW 2017-07-01 4277
#> 10 LU_ELLX_AT_LOWW 2017-06-01 4674
#> # ℹ 7,622 more rows
#>
#> ---------------
#> This is an object of type `chronicle`.
#> Retrieve the value of this object with pick(.c, "value").
#> To read the log of this object, call read_log(.c).
Now that the data is clean, we can read the log:
read_log(avia_monthly)
#> [1] "Complete log:"
#> [2] "OK! select(1,contains(\"20\")) ran successfully at 2024-03-20 10:35:20.318707"
#> [3] "OK! pivot_longer(-starts_with(\"unit\"),date,passengers) ran successfully at 2024-03-20 10:35:20.318605"
#> [4] "OK! separate(1,c(\"unit\", \"tra_meas\", \"air_pr\\\\time\"),,) ran successfully at 2024-03-20 10:35:20.318477"
#> [5] "OK! filter(tra_meas == \"PAS_BRD_ARR\",!is.na(passengers),str_detect(date, \"M\")) ran successfully at 2024-03-20 10:35:21.95984"
#> [6] "OK! mutate(paste0(date, \"01\"),ymd(date)) ran successfully at 2024-03-20 10:35:21.959718"
#> [7] "OK! select(air_pr\\time,date,passengers) ran successfully at 2024-03-20 10:35:21.959596"
#> [8] "Total running time: 1.80789136886597 secs"
This is especially useful if the object avia_monthly
gets saved using saveRDS()
. People that then read this
object, can read the log to know what happened and reproduce the steps
if necessary.
Let’s take a look at the final data set:
avia_monthly %>%
pick("value")
#> # A tibble: 7,632 × 3
#> destination date passengers
#> <chr> <date> <chr>
#> 1 LU_ELLX_AT_LOWW 2018-03-01 3967
#> 2 LU_ELLX_AT_LOWW 2018-02-01 3232
#> 3 LU_ELLX_AT_LOWW 2018-01-01 3701
#> 4 LU_ELLX_AT_LOWW 2017-12-01 4249
#> 5 LU_ELLX_AT_LOWW 2017-11-01 4311
#> 6 LU_ELLX_AT_LOWW 2017-10-01 4591
#> 7 LU_ELLX_AT_LOWW 2017-09-01 4816
#> 8 LU_ELLX_AT_LOWW 2017-08-01 4399
#> 9 LU_ELLX_AT_LOWW 2017-07-01 4277
#> 10 LU_ELLX_AT_LOWW 2017-06-01 4674
#> # ℹ 7,622 more rows
It is also possible to take a look at the underlying
.log_df
object that contains more details, and see the
output of the .g
argument (which was defined in the
beginning as the dim()
function):
check_g(avia_monthly)
#> ops_number function g
#> 1 1 select 509, 231
#> 2 2 pivot_longer 117070, 3
#> 3 3 separate 117070, 5
#> 4 4 filter 7632, 5
#> 5 5 mutate 7632, 5
#> 6 6 select 7632, 3
After select()
the data has 509 rows and 231 columns,
after the call to pivot_longer()
117070 rows and 3 columns,
separate()
adds two columns, after filter()
only 7632 rows remain (mutate()
does not change the
dimensions) and then select()
is used to remove 2
columns.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.