The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Generating the When Dimension based on date and time tables

2024-01-08

Introduction

The When Dimension plays a fundamental role in Multidimensional Systems, it allows us to express when the analysed focus of attention have occurred.

This dimension corresponds to the generically called “Time Dimension”, as named in Kimball (1996) and Adamson (2010). Later it has also been called “Date Dimension” (date-based dimension) and “Time Dimension” (time-of-day-based dimension) to express granularity (Kimball and Ross 2013). We prefer to call it “When Dimension” because granularity is not involved in that term.

Although conceptually the date and time can be represented together in a single dimension, in ROLAP (Relational On-Line Analytical Processing) star database systems, it is common to implement these concepts separately, sometimes in separate tables “to avoid a row count explosion in the date dimension” (Kimball and Ross 2013) by adding the time for each day.

In Kimball and Ross (2013) these dimensions are considered static dimensions: they are built at the beginning of the project for the period under consideration and are referenced when adding the instances to the fact table. As mentioned there: “Typically, these dimensions are built in an afternoon with a spreadsheet.”

The purpose of the when package is to assist in the implementation of the When Dimension in Multidimensional Systems implemented on a ROLAP star database, regardless of its granularity, with the following features:

It relies on the functions offered by the lubridate package to obtain the components from the date and time.

The rest of this document is structured as follows: First, the general process of defining these dimensions is presented. Next, we present dimension-specific aspects that we can define, both for date-based and time-based dimensions. Finally, the document ends with conclusions and bibliography.

Definition process

The definition process consists of the following steps:

  1. Creating an object.
  2. Definition of general aspects.
  3. Definition of instances.
  4. Selection of levels and attributes.
  5. Generation of the result.
  6. Getting the dimension table.

Steps 2, 3 and 4 can be done in any order, they can even be included in step 1.

Below we discuss each of them.

Creating an object

The definition process begins by creating an object of the when class.

library(when)

w_date <- when()

By default, an object is created based on date, but we can also create it based on time by indicating it using the type parameter.

w_time <- when(type = 'time')

Virtually all configuration options can be defined using the object creation function. They can also be defined later using specific functions, as shown in the following section.

Definition of general aspects

For example, using the following combination of functions we get the same result as defining the type when we create the object.

w_time_2 <- when() |>
  define_characteristics(type = 'time')

identical(w_time, w_time_2)
#> [1] TRUE

By default it considers the system language. Since our system is Spanish, for the literals of day and month names to appear in English, we must configure it explicitly.

w_date <- w_date |>
  define_characteristics(locale = Sys.setlocale("LC_TIME", "English"))

In section Other configuration options we present the rest of the configuration options for dimension generation. For the examples that we are going to use, the default options of the package are enough.

Next we have to select and configure the levels and define the instances. These operations, along with the current one, can be performed in any order.

Definition of instances

We can define the instances by giving a range of values or by indicating the specific instances. The values in both cases will depend on the type of dimension (date-based or time-based).

We can indicate these values at the time of creating the class object or through the define_instances() function.

In both cases we can use the string format, the one provided by the lubridate package or, when it is a single number (year or hour), we can use an integer value.

For example, below they are used with the format of lubridate.

w_date <- w_date |>
  define_instances(start = lubridate::today(),
                   end = lubridate::today() + lubridate::years(5))

In the following definitions we configure the dimensions so that they consider only the indicated dates. The second will be appropriate if we want to obtain the dimension at the year level, because we are indicating the specific values to consider (not a period).

w_date_2_1 <-
  when(
    values = c(
      "2023-12-31",
      "2023-01-01",
      "2022-12-31",
      "2022-01-01",
      "2021-12-31",
      "2021-01-01"
    )
  )

w_date_2_2 <- w_date |>
  define_instances(values = 2020:2030)

The following definition will be appropriate at any level of detail because we are giving the range of values to consider (dates between those two years).

w_date_3 <- w_date |>
  define_instances(start = 2020, end = 2030)

When only the year is indicated, it is considered its first day. Therefore the previous definition is equivalent to the following one.

w_date_4 <- w_date |>
  define_instances(start = "2020-01-01", end = "2030-01-01")

identical(w_date_3, w_date_4)
#> [1] TRUE

In the case of time, by default, it is considered at second level and all seconds of the day.

w_time_3 <- w_time |>
  define_instances(start = "00:00:00", end = "23:59:59")

identical(w_time, w_time_3)
#> [1] TRUE

We can also indicate a period or a set of times according to the indicated criteria. In the following example, the two definitions are equivalent.

w_time_4 <- w_time |>
  define_instances(start = 8, end = 17)

w_time_5 <- w_time |>
  define_instances(start = "08:00:00", end = "17:00:00")

identical(w_time_4, w_time_5)
#> [1] TRUE

Selection of levels and attributes

In the definition, by default, the attributes that we have considered to be most frequently used have been included. We can consult them for each level using the get_level_attribute_names() function with the parameter selected = TRUE. If we define the parameter as selected = FALSE (default option), it will return all available attributes. We can indicate the name of the level or, if we do not indicate any, it will obtain the attributes of all the levels. We obtain the level names using the get_level_names() function. Below are examples of using these functions for the date-based dimension.

w_date |>
  get_level_attribute_names(selected = TRUE)
#>  [1] "date"           "month_day"      "week_day"       "day_name"      
#>  [5] "day_num_name"   "year_week"      "week"           "year_month"    
#>  [9] "month"          "month_name"     "month_num_name" "year"

w_date |>
  get_level_names()
#> [1] "day"      "week"     "month"    "quarter"  "semester" "year"

w_date |>
  get_level_attribute_names(name = 'month', selected = TRUE)
#> [1] "year_month"     "month"          "month_name"     "month_num_name"

w_date |>
  get_level_attribute_names(name = 'month')
#> [1] "year_month"     "month"          "month_name"     "month_num_name"
#> [5] "month_abbr"     "month_num_abbr"

Also for the time-based dimension.

w_time |>
  get_level_attribute_names()
#> [1] "time"     "hour"     "minute"   "second"   "day_part"

w_time |>
  get_level_names()
#> [1] "time"

There is a function to configure each level. We can select or deselect each attribute individually using the attribute name as a boolean parameter of the function. We can also select or deselect all attributes defined at the level using the special include_all and exclude_all parameters: Once all the attributes have been updated with these special parameters, in the same function call, we can select or deselect the attributes we want.

For example, if we do not want to include the name of the month, we can configure it using the select_month_level() function or at the time of object creation with a parameter of the same name, as shown below.

w_date_5 <- w_date |>
  select_month_level(month_name = FALSE)

w_date_6 <- when(
  start = lubridate::today(),
  end = lubridate::today() + lubridate::years(5),
  month_name = FALSE
)

identical(w_date_5, w_date_6)
#> [1] TRUE

w_date_5 |>
  get_level_attribute_names(name = 'month', selected = TRUE)
#> [1] "year_month"     "month"          "month_num_name"

If we only want to include the name of the month, we can do it as follows.

w_date_7 <- w_date |>
  select_month_level(exclude_all = TRUE, month_name = TRUE)

w_date_7 |>
  get_level_attribute_names(name = 'month', selected = TRUE)
#> [1] "month_name"

There is available a similar function for each level: time, date, week, month, quarter, semester and year.

In the case of the date-based dimension levels, we can select or deselect them directly using the select_date_levels() function. It allows us to treat the levels globally in the same way that we have been treating the attributes. For example, to exclude the month level we can do the following.

w_date_8 <- w_date |>
  select_date_levels(month_level = FALSE)

w_date_8 |>
  get_level_attribute_names(name = 'month', selected = TRUE)
#> character(0)

We can also perform that same operation at object creation time, as shown below.

w_date_9 <- when(
  start = lubridate::today(),
  end = lubridate::today() + lubridate::years(5),
  month_level = FALSE
)

In the case of a time-based dimension, as we have seen, we only consider one level called “time”, as shown again below.

w_time |>
  get_level_names()
#> [1] "time"

We can configure the attributes it includes using the select_time_level() function. The attribute called “hour” must always be included. If we do not include the “minute” attribute then “second” will not be included either. Below are some examples of definition and the result obtained.

w_time_6 <- w_time |>
  select_time_level(exclude_all = TRUE)

w_time_6 |>
  get_level_attribute_names(selected = TRUE)
#> [1] "hour"

w_time_7 <- w_time |>
  select_time_level(minute = FALSE)

w_time_7 |>
  get_level_attribute_names(selected = TRUE)
#> [1] "time"     "hour"     "day_part"

Generation of the result

Once the characteristics of the dimension are defined, we can generate its table.

Although we have not yet generated the dimension table, the following function returns the names of the attributes it will contain. They can be obtained as a string or as a vector of strings.

w_date |>
  get_table_attribute_names(as_string = FALSE)
#>  [1] "id"             "date"           "month_day"      "week_day"      
#>  [5] "day_name"       "day_num_name"   "year_week"      "week"          
#>  [9] "year_month"     "month"          "month_name"     "month_num_name"
#> [13] "year"

If it is generated in the form of a string (as_string = TRUE, default option), the result can be used to easily change the name of the attributes, when the table is generated, using the set_table_attribute_names() function, if deemed necessary, as shown in section Renaming the attributes.

When we consider that the dimension is already properly defined, we can generate its corresponding table using the generate_table() function. We update the definition of the object. Any errors found will be notified to us.

w_date <- w_date |>
  generate_table()

w_time <- w_time |>
  generate_table()

Once the table is generated, we can export it. If we want to access the table and it is not yet generated, it will send us an error message.

If we make changes to the configuration, it is our responsibility to generate the table again.

Getting the dimension table

We can access the dimension table to work with it in R (for example using the rolap package), or export it in the format that is most convenient for us to work with other tools.

Next we access the generated tables and display them. First of all for the date.

t_date <- w_date |>
  get_table()

The first and last rows of the obtained result are shown below.

pander::pandoc.table(rbind(head(t_date, 5), tail(t_date, 5)),
                     split.table = Inf)
id date month_day week_day day_name day_num_name year_week week year_month month month_name month_num_name year
1 2024-01-08 08 1 lunes 1-lunes 2024-02 02 2024-01 01 enero 01-enero 2024
2 2024-01-09 09 2 martes 2-martes 2024-02 02 2024-01 01 enero 01-enero 2024
3 2024-01-10 10 3 miércoles 3-miércoles 2024-02 02 2024-01 01 enero 01-enero 2024
4 2024-01-11 11 4 jueves 4-jueves 2024-02 02 2024-01 01 enero 01-enero 2024
5 2024-01-12 12 5 viernes 5-viernes 2024-02 02 2024-01 01 enero 01-enero 2024
1824 2029-01-04 04 4 jueves 4-jueves 2029-01 01 2029-01 01 enero 01-enero 2029
1825 2029-01-05 05 5 viernes 5-viernes 2029-01 01 2029-01 01 enero 01-enero 2029
1826 2029-01-06 06 6 sábado 6-sábado 2029-01 01 2029-01 01 enero 01-enero 2029
1827 2029-01-07 07 7 domingo 7-domingo 2029-01 01 2029-01 01 enero 01-enero 2029
1828 2029-01-08 08 1 lunes 1-lunes 2029-02 02 2029-01 01 enero 01-enero 2029

Let’s generate the dimension at the week level including all its attributes (if we change the definition, we have to generate the table again).

t_date <- w_date |>
  select_date_levels(day_level = FALSE) |>
  select_week_level(include_all = TRUE) |>
  generate_table() |>
  get_table()

The first and last rows of the obtained result are shown below.

pander::pandoc.table(rbind(head(t_date, 5), tail(t_date, 5)),
                     split.table = Inf)
id year_week week year_month month month_name month_num_name year
1 2024-02 02 2024-01 01 enero 01-enero 2024
2 2024-03 03 2024-01 01 enero 01-enero 2024
3 2024-04 04 2024-01 01 enero 01-enero 2024
4 2024-05 05 2024-01 01 enero 01-enero 2024
5 2024-05 05 2024-02 02 febrero 02-febrero 2024
310 2028-51 51 2028-12 12 diciembre 12-diciembre 2028
311 2028-52 52 2028-12 12 diciembre 12-diciembre 2028
312 2028-53 53 2028-12 12 diciembre 12-diciembre 2028
313 2029-01 01 2029-01 01 enero 01-enero 2029
314 2029-02 02 2029-01 01 enero 01-enero 2029

Next for the time.

t_time <- w_time |>
  get_table()

The start and end of the result table is shown below.

pander::pandoc.table(rbind(head(t_time, 5), tail(t_time, 5)),
                     split.table = Inf)
id time hour minute second day_part
1 00:00:00 00 00 00 Night
2 00:00:01 00 00 01 Night
3 00:00:02 00 00 02 Night
4 00:00:03 00 00 03 Night
5 00:00:04 00 00 04 Night
86396 23:59:55 23 59 55 Night
86397 23:59:56 23 59 56 Night
86398 23:59:57 23 59 57 Night
86399 23:59:58 23 59 58 Night
86400 23:59:59 23 59 59 Night

day_part is predefined in English. Literals or associated hours can be accessed and changed using the get_day_part() and set_day_part() functions, as shown in section Names of the parts of the day.

Now we are going to generate it at the minute level.

t_time <- w_time |>
  select_time_level(second = FALSE) |>
  generate_table() |>
  get_table()

The start and end from the new result table is shown below.

pander::pandoc.table(rbind(head(t_time, 5), tail(t_time, 5)),
                     split.table = Inf)
id time hour minute day_part
1 00:00:00 00 00 Night
2 00:01:00 00 01 Night
3 00:02:00 00 02 Night
4 00:03:00 00 03 Night
5 00:04:00 00 04 Night
1436 23:55:00 23 55 Night
1437 23:56:00 23 56 Night
1438 23:57:00 23 57 Night
1439 23:58:00 23 58 Night
1440 23:59:00 23 59 Night

We can take the generated tables to any Relational DBMS, as shown below, we just have to pass a connection to the database as a parameter.

my_db <- DBI::dbConnect(RSQLite::SQLite())

w_date |>
  get_table_rdb(my_db)

w_time |>
  get_table_rdb(my_db)

DBI::dbListTables(my_db)
#> [1] "date" "time"

DBI::dbDisconnect(my_db)

In the previous example, since we have not explicitly defined any names for the tables in the configuration, they have been assigned as a name the element on which they are based (date or time). We can define these names, as shown in section Table name.

Functions to get the tables in xlsx and csv format are also available: get_table_xlsx() and get_table_csv().

Other configuration options

In section Definition of general aspects we have presented the necessary configuration options for the examples we have used. In particular, we have only needed to define the type on which the dimension is based (time or date) and the language for naming days and months (because it is different from the language of our operating system). The default options have been adequate for the examples shown.

In this section we are going to show other configuration options that may be necessary to define our dimensions.

Dimension table features

The result of a definition operation performed with the where package is a table of a dimension at a given level of granularity, with the attributes selected for that level or associated coarser levels. So far we have focused on level attributes and table instances. In this section we are going to configure other characteristics of it.

Table name

By default, the table is assigned the name of the base component used to generate the dimension: date or time. We have seen it in the tables that we have stored in the database, in the example of the section Getting the dimension table.

We can define the name of the dimension at the time of object creation or through the define_characteristics() function, as shown below.

wd_1 <- when(name = 'dim_where')

wd_2 <- when() |>
  define_characteristics(name = 'dim_where')

This name is the one that will be assigned to the database table where we store it or to the files we obtain. Below is the case for a relational database.

my_db <- DBI::dbConnect(RSQLite::SQLite())

wd_1 |>
  generate_table() |>
  get_table_rdb(my_db)

DBI::dbListTables(my_db)
#> [1] "dim_where"

DBI::dbDisconnect(my_db)

Also the case of obtaining it in csv format (if we do not indicate a location, it stores it in one by default).

wd_2 |>
  generate_table() |>
  get_table_csv()
#> [1] "C:\\Users\\joses\\AppData\\Local\\Temp\\Rtmp6XNQKs/dim_where.csv"

Surrogate key

By default the dimension table will include a surrogate key. In some situations we may be interested in the table having the date itself as its primary key (for example if we want to partition the fact table by the date).

Through the surrogate_key parameter, included in the define_characteristics() function and also the when() function, we can indicate that a surrogate key is not created, as it’s shown in the following.

when() |>
  get_table_attribute_names(as_string = FALSE)
#>  [1] "id"             "date"           "month_day"      "week_day"      
#>  [5] "day_name"       "day_num_name"   "year_week"      "week"          
#>  [9] "year_month"     "month"          "month_name"     "month_num_name"
#> [13] "year"

when(surrogate_key = FALSE) |>
  get_table_attribute_names(as_string = FALSE)
#>  [1] "date"           "month_day"      "week_day"       "day_name"      
#>  [5] "day_num_name"   "year_week"      "week"           "year_month"    
#>  [9] "month"          "month_name"     "month_num_name" "year"

The name of the surrogate key is id, is the first attribute of the table. If necessary, we can rename the attributes, including the surrogate key as shown in the next section.

Renaming the attributes

If we get a table in tibble format, we can rename its attributes using the functions of this class. But if we want to obtain it in a file or in a database, we may be interested in renaming its attributes before obtaining it.

In order to rename the attributes, we first have to generate the table.

wd_3 <- when() |>
  generate_table()

We can rename them using the set_table_attribute_names() function, passing a name for each of the available attributes. To facilitate the operation, using the get_table_attribute_names() function, we can obtain the attributes in string format. With this we only have to replace the names that we want to change, as shown below.

wd_3 |>
  get_table_attribute_names()
#> [1] "c('id', 'date', 'month_day', 'week_day', 'day_name', 'day_num_name', 'year_week', 'week', 'year_month', 'month', 'month_name', 'month_num_name', 'year')"

wd_3 <- wd_3 |>
  set_table_attribute_names(
    c(
      'id_when',
      'date',
      'month_day',
      'week_day',
      'day_name',
      'day_num_name',
      'year_week',
      'week',
      'year_month',
      'month',
      'month_name',
      'month_num_name',
      'year'
    )
  )

wd_3 |>
  get_table_attribute_names(as_string = FALSE)
#>  [1] "id_when"        "date"           "month_day"      "week_day"      
#>  [5] "day_name"       "day_num_name"   "year_week"      "week"          
#>  [9] "year_month"     "month"          "month_name"     "month_num_name"
#> [13] "year"

Names of the parts of the day

We have associated each hour with the usual part of the day in English. We can consult them and change them if necessary (for example, to define them in another language) using the get_day_part() and set_day_part() functions.

Below is an example where the definition is modified to consider only two parts in the day.

when() |>
  get_day_part()
#>          00          01          02          03          04          05 
#>     "Night"     "Night"     "Night"     "Night"     "Night"   "Morning" 
#>          06          07          08          09          10          11 
#>   "Morning"   "Morning"   "Morning"   "Morning"   "Morning"   "Morning" 
#>          12          13          14          15          16          17 
#> "Afternoon" "Afternoon" "Afternoon" "Afternoon" "Afternoon"   "Evening" 
#>          18          19          20          21          22          23 
#>   "Evening"   "Evening"   "Evening"     "Night"     "Night"     "Night"

when() |>
  set_day_part(hour = c(20:23, 0:5), name = "Night") |>
  set_day_part(hour = c(6:19), name = "Day") |>
  get_day_part()
#>      00      01      02      03      04      05      06      07      08      09 
#> "Night" "Night" "Night" "Night" "Night" "Night"   "Day"   "Day"   "Day"   "Day" 
#>      10      11      12      13      14      15      16      17      18      19 
#>   "Day"   "Day"   "Day"   "Day"   "Day"   "Day"   "Day"   "Day"   "Day"   "Day" 
#>      20      21      22      23 
#> "Night" "Night" "Night" "Night"

Configuration of day and week levels

This section shows the configuration options available for the day and week levels.

The first day of the week

For days, in addition to the language of the name of the day of the week, we can define the start day of the week. There are two possibilities: Monday or Sunday. This determines the numbering of the days.

We can define it using the parameter week_starts_monday (which by default has the value TRUE) in the functions define_characteristics() or when(), as shown below.

wd_1 <- when(week_starts_monday = FALSE)

wd_2 <- when() |>
  define_characteristics(week_starts_monday = FALSE)

Numbering of weeks in the year

Using the value of the type parameter (type = "date" or type = "time") in the define_characteristics() or when() functions we can indicate whether the dimension is based on date or time.

This parameter can additionally take two more values: “iso” and “epi”. Using these values (together with the “date” value) we can indicate the type of numbering of the weeks in the year. That is, the week number associated with each date depends on the type of date dimension selected:

  • Default (type = "date"): It numbers blocks of 7 days beginning on January 1 (regardless of what day of the week it is). The last week of the year will last less than 7 days.

  • ISO 8601 (type = "iso"): It numbers blocks of 7 days from Monday to Sunday. The first and last week of the year can contain days from the previous or next year.

  • Epidemiological (type = "epi"): This week is like ISO 8601 only that it considers that the week begins on Sunday.

For the first and last days of the year, the year associated with the week may be different from the year of the date, depending on the type selected (“iso” or “epi”).

The “year” attribute (alone or in combination with other attributes) displays the year associated with the date, regardless of the type selected. Only the “year_week” attribute displays the year of the week according to the selected type.

The next section shows how to obtain the attribute obtaining function and how to change it if we consider it necessary (for example, to get the different year types together).

Get date range of weeks

If we want to work with the dimension at the week level, we may be interested in having the start or end date for each week. To obtain them and be able to use them to generate the dimension, we have the get_week_date_range() function.

t <- get_week_date_range(start = "2024-01-01", end = "2029-12-31")

A period and the type of week numbering (“date”, “iso” or “epi”) are indicated (by default it is “date”). Returns the result in tibble form.

The start and end from the new result is shown below.

pander::pandoc.table(rbind(head(t, 5), tail(t, 5)),
                     split.table = Inf)
year_week first last
2024-01 2024-01-01 2024-01-07
2024-02 2024-01-08 2024-01-14
2024-03 2024-01-15 2024-01-21
2024-04 2024-01-22 2024-01-28
2024-05 2024-01-29 2024-02-04
2029-49 2029-12-03 2029-12-09
2029-50 2029-12-10 2029-12-16
2029-51 2029-12-17 2029-12-23
2029-52 2029-12-24 2029-12-30
2029-53 2029-12-31 2029-12-31

We can use the column of the dates that interest us to generate the dimension. Additionally, we leave only the date and the levels that interest us (week and year).

tw <- when(values = t[["last"]], month_level = FALSE) |>
  select_day_level(exclude_all = TRUE, date = TRUE) |>
  generate_table() |>
  get_table()

The start and end from the new result table is shown below.

pander::pandoc.table(rbind(head(tw, 5), tail(tw, 5)),
                     split.table = Inf)
id date year_week week year
1 2024-01-07 2024-01 01 2024
2 2024-01-14 2024-02 02 2024
3 2024-01-21 2024-03 03 2024
4 2024-01-28 2024-04 04 2024
5 2024-02-04 2024-05 05 2024
314 2029-12-09 2029-49 49 2029
315 2029-12-16 2029-50 50 2029
316 2029-12-23 2029-51 51 2029
317 2029-12-30 2029-52 52 2029
318 2029-12-31 2029-53 53 2029

Change the definition function of an attribute

Each attribute is defined by a specific function. All definition functions have the same structure: they receive a table, a column with the date or time values, possibly additional configuration parameters, and return the table to which a new column with the attribute name and values has been added, calculated from the values.

We can get the definition function of any attribute using the get_attribute_definition_function() function. Below is the function for the “year” attribute and also the function of the year and week combination, “year_week” attribute (it uses one of the additional parameters available, the dimension type).

wd <- when()

wd |>
  get_attribute_definition_function(name = "year")
#> function (table, values, ...) 
#> {
#>     table[["year"]] <- as.character(lubridate::year(values))
#>     table
#> }
#> <bytecode: 0x000001cf0e5cdc48>
#> <environment: namespace:when>

wd |>
  get_attribute_definition_function(name = "year_week")
#> function (table, values, ...) 
#> {
#>     dots <- list(...)
#>     type <- dots[["type"]]
#>     switch(type, iso = {
#>         year <- lubridate::isoyear(values)
#>         week <- lubridate::isoweek(values)
#>     }, epi = {
#>         year <- lubridate::epiyear(values)
#>         week <- lubridate::epiweek(values)
#>     }, {
#>         year <- lubridate::year(values)
#>         week <- lubridate::week(values)
#>     })
#>     table[["year_week"]] <- paste0(year, "-", sprintf("%02d", 
#>         week))
#>     table
#> }
#> <bytecode: 0x000001cf0e628150>
#> <environment: namespace:when>

In view of the functions, we can define one with the same structure and establish it as a new attribute definition function using the set_attribute_definition_function() function. For example, below shows how to change the year attribute definition function to get an additional attribute if the year associated with the week can be different from the year of the date.

f <- function(table, values, ...) {
  dots <- list(...)
  type <- dots[['type']]
  table[['year']] <- as.character(lubridate::year(values))
  if (type == 'iso') {
    table[['week_year']] <- as.character(lubridate::isoyear(values))
  } else if (type == 'epi') {
    table[['week_year']] <- as.character(lubridate::epiyear(values))
  }
  table
}

wd <- wd |>
  set_attribute_definition_function(name = "year", f)

In this case, what we do is add two columns for the year: one with the year of the date and another with the year of the week if this can be different from the first. We can see the result by changing the type of the dimension and generating the table, as shown below (the two year attributes have been added).

t <- wd |>
  define_characteristics(type = 'iso') |>
  generate_table() |>
  get_table()

names(t)
#>  [1] "id"             "date"           "month_day"      "week_day"      
#>  [5] "day_name"       "day_num_name"   "year_week"      "week"          
#>  [9] "year_month"     "month"          "month_name"     "month_num_name"
#> [13] "year"           "week_year"

Conclusions

The when package offers a set of operations that allow us to generate the When Dimension based on date or time at the level of detail we need.

We create an object of the when class and, from it, we configure the instances and characteristics of the levels and attributes of the dimension. To define the instances we can indicate a range of values or provide the specific values from which to generate the rest of the attributes.

As a result, we can obtain tables for this dimension to work from R, for example with the rolap package, or work in other environments, including Relational DBMSs.

Bibliography

Adamson, Christopher. 2010. Star Schema: The Complete Reference. McGraw Hill Professional.
Kimball, Ralph. 1996. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley & Sons.
Kimball, Ralph, and Margy Ross. 2013. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. John Wiley & Sons.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.