The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to forestry

Jiena Gu McLellan

2023-11-20

Built on the top of data.tree, a Node (tree) is an R6 object that is especially useful when we are facing hierarchical data. The forestry package helps to reshape or create tree objects. This package is a series of utility functions to help with nested data. Since data.tree has the capability to convert a tree to JSON using toJSON() after converting to a list using as.list(), the forestry package is particularly useful when creating a specific JSON object for building htmlwidgets. The forestry package aims to reshape or create tree objects with a specific format.

Create a Node with Assigned Attributes

create_nodes() creates a Node object. tree_name is to assign the name of this Node. add_children_count is to assign the number of children to this Node, it will be listed in numerical order. To assign values to each node, simply put the appropriate variable as a parameter with a vector containing the values. The name of the parameter will be the variable name and the values in the vector will be assigned to each node respectively.

library(data.tree)
library(forestry)
new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         class = c("A", "B", "C") )
print(new_node, "class")
#>   levelName class
#> 1     tree1      
#> 2      ¦--1     A
#> 3      ¦--2     B
#> 4      °--3     C

Fill Missing Values Across a Level

The fill_NA_level() function will fill missing values across the desired level with desired value (default as 0). For example, new_node is a tree with missing value in hc field.

new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         hc = c(1, 2, NA))
print(new_node, "hc" )
#>   levelName hc
#> 1     tree1 NA
#> 2      ¦--1  1
#> 3      ¦--2  2
#> 4      °--3 NA

We apply fill_NA_level() to new_node, simply put new_node as input_node, assign the field_name with hc, and assign by_level = 2, we will fill the NA in hc field with 0 across level 2.

result <- fill_NA_level(input_node = new_node, 
                        field_name = "hc", 
                        by_level = 2, 
                        fill_with = 0)
print(result, "hc")
#>   levelName hc
#> 1     tree1 NA
#> 2      ¦--1  1
#> 3      ¦--2  2
#> 4      °--3  0

Create a Tree From a List

create_tree() creates a new tree from a list. It appends each item of the input list as a numbered child in the new tree. This is useful when we convert a Node to a JSON array.

For instance, let’s use test_node$children (a list) as an example. We can see a list of groupA, groupB and groupC.

data(test_df)
test_node <- data.tree::as.Node(test_df)
print(test_node$children)
#> $groupA
#>    levelName
#> 1 groupA    
#> 2  ¦--Male  
#> 3  °--Female
#> 
#> $groupB
#>    levelName
#> 1 groupB    
#> 2  ¦--Male  
#> 3  °--Female
#> 
#> $groupC
#>    levelName
#> 1 groupC    
#> 2  ¦--Male  
#> 3  °--Female

Now we see that this list is reshaped into a list, new_tree, with each item in test_node$children added as a child. The index of each item in the list is assigned as the name of each child.

library(data.tree)
test_node <- as.Node(test_df)
new_shape <- create_tree(test_node$children,"new_tree")
print(new_shape, "hc")
#>             levelName hc
#> 1  new_tree           NA
#> 2   ¦--1              NA
#> 3   ¦   °--groupA     NA
#> 4   ¦       ¦--Male   80
#> 5   ¦       °--Female 97
#> 6   ¦--2              NA
#> 7   ¦   °--groupB     NA
#> 8   ¦       ¦--Male   44
#> 9   ¦       °--Female 37
#> 10  °--3              NA
#> 11      °--groupC     NA
#> 12          ¦--Male   81
#> 13          °--Female 46

Expand Children Nodes

fix_items() creates a tree with fixed children nodes from another tree. It automatically copies fields to the tree and fills missing values with NA. Similar to left joining to a tree with certian children nodes.

This function is to make sure the tree has the desired children nodes.

See cell_node2, it has only B and C.

cell_node2 <- Node$new("cell2")
cell_node2$AddChild("B")
cell_node2$AddChild("C")
cell_node2$Set(class = c(NA, "B1", "C1"))
print(cell_node2, "class")
#>   levelName class
#> 1     cell2      
#> 2      ¦--B    B1
#> 3      °--C    C1

Now we put fix_vector = c("A", "B", "C", "D") and assign to a new tree, cell_fixed_items. We can see that cell_fixed_items has all of the nodes from fix_vector and still inherits the fields from cell_node2.

cell_fixed_items <- fix_items(fix_vector = c("A", "B", "C", "D"), 
                              input_node = cell_node2)
print(cell_fixed_items, "class")
#>   levelName class
#> 1     cell2      
#> 2      ¦--A      
#> 3      ¦--B    B1
#> 4      ¦--C    C1
#> 5      °--D

Sort Chidren Nodes

children_sort() function sorts the children nodes into a desired order. If there are children nodes not listed in the input_order, we can set the mismatch_last parameter (default is T) to put the mismatched children nodes to the top or bottom.

data(test_df)
test_node <- data.tree::as.Node(test_df)
sorted_node <- children_sort(
  input_node = test_node, 
  input_order = c("groupB", "groupA"),
  mismatch_last = T)
print(sorted_node)
#>         levelName
#> 1  tree1         
#> 2   ¦--groupB    
#> 3   ¦   ¦--Male  
#> 4   ¦   °--Female
#> 5   ¦--groupA    
#> 6   ¦   ¦--Male  
#> 7   ¦   °--Female
#> 8   °--groupC    
#> 9       ¦--Male  
#> 10      °--Female

Cumulative Sum Across a Level

cumsum_across_level() gets the cumulative value across a level, the cumulative value will be added to the cumsum_number field.

In this example, it calculates the cumulative exercise_time field across level 3.

data(exercise_df)
exercise_node <- as.Node(exercise_df)
test <- forestry::cumsum_across_level(input_node = exercise_node, 
                              attri_name = "exercise_time", 
                              level_num = 3)
print(test, "cumsum_number", "exercise_time", "level")
#>      levelName cumsum_number exercise_time level
#> 1  Year                   NA            NA     1
#> 2   ¦--Q1                 NA            NA     2
#> 3   ¦   ¦--Jan          0.83          0.83     3
#> 4   ¦   ¦--Feb          1.14          0.31     3
#> 5   ¦   °--Mar          1.98          0.84     3
#> 6   ¦--Q2                 NA            NA     2
#> 7   ¦   ¦--Apr          2.17          0.19     3
#> 8   ¦   ¦--May          2.18          0.01     3
#> 9   ¦   °--Jun          2.45          0.27     3
#> 10  ¦--Q3                 NA            NA     2
#> 11  ¦   ¦--Jul          2.56          0.11     3
#> 12  ¦   ¦--Aug          3.54          0.98     3
#> 13  ¦   °--Sep          4.30          0.76     3
#> 14  °--Q4                 NA            NA     2
#> 15      ¦--Oct          4.49          0.19     3
#> 16      ¦--Nov          5.25          0.76     3
#> 17      °--Dec          5.54          0.29     3

In addition, level_num = "All" will get the cumulative value across all levels. Please note that there should be no missing values in the appropriate level when we apply cumsum_across_level().

data(exercise_df)
exercise_node <- as.Node(exercise_df)
exercise_node$Do(function(node) node$exercise_time <- Aggregate(node, 
                                                   attribute = "exercise_time", 
                                                   aggFun = sum), 
             traversal = "post-order")
print(exercise_node,  "exercise_time")
#>      levelName exercise_time
#> 1  Year                 5.54
#> 2   ¦--Q1               1.98
#> 3   ¦   ¦--Jan          0.83
#> 4   ¦   ¦--Feb          0.31
#> 5   ¦   °--Mar          0.84
#> 6   ¦--Q2               0.47
#> 7   ¦   ¦--Apr          0.19
#> 8   ¦   ¦--May          0.01
#> 9   ¦   °--Jun          0.27
#> 10  ¦--Q3               1.85
#> 11  ¦   ¦--Jul          0.11
#> 12  ¦   ¦--Aug          0.98
#> 13  ¦   °--Sep          0.76
#> 14  °--Q4               1.24
#> 15      ¦--Oct          0.19
#> 16      ¦--Nov          0.76
#> 17      °--Dec          0.29
exercise_node_test <- cumsum_across_level(input_node = exercise_node, 
                              attri_name = "exercise_time", 
                              level_num = "All")
print(exercise_node_test,"exercise_time", "cumsum_number", "level")
#>      levelName exercise_time cumsum_number level
#> 1  Year                 5.54            NA     1
#> 2   ¦--Q1               1.98          1.98     2
#> 3   ¦   ¦--Jan          0.83          0.83     3
#> 4   ¦   ¦--Feb          0.31          1.14     3
#> 5   ¦   °--Mar          0.84          1.98     3
#> 6   ¦--Q2               0.47          2.45     2
#> 7   ¦   ¦--Apr          0.19          2.17     3
#> 8   ¦   ¦--May          0.01          2.18     3
#> 9   ¦   °--Jun          0.27          2.45     3
#> 10  ¦--Q3               1.85          4.30     2
#> 11  ¦   ¦--Jul          0.11          2.56     3
#> 12  ¦   ¦--Aug          0.98          3.54     3
#> 13  ¦   °--Sep          0.76          4.30     3
#> 14  °--Q4               1.24          5.54     2
#> 15      ¦--Oct          0.19          4.49     3
#> 16      ¦--Nov          0.76          5.25     3
#> 17      °--Dec          0.29          5.54     3

Prepare for JSON array

The pre_get_array() function changes the numeric item name in a list into a format that is compatible with the JSON array standard. As mentioned earlier, when converting a tree to JSON, we need to save the tree as a list using as.list() then use htmlwidgets:::toJSON() to convert the list to JSON data.

For example, new_node is a tree with numeric children nodes.

new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         class = c("A", "B", "C"))
print(as.list(new_node) )
#> $name
#> [1] "tree1"
#> 
#> $`1`
#> $`1`$class
#> [1] "A"
#> 
#> 
#> $`2`
#> $`2`$class
#> [1] "B"
#> 
#> 
#> $`3`
#> $`3`$class
#> [1] "C"

We can see the numeric children node names are listed. If we apply pre_get_array() to this list, we can change all numeric names so the nodes can be saved as a JSON array instead of JSON objects after we use htmlwidgets:::toJSON().

new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         class = c("A", "B", "C"))
print(pre_get_array(as.list(new_node) ) )
#> [[1]]
#> [1] "tree1"
#> 
#> [[2]]
#> [[2]]$class
#> [1] "A"
#> 
#> 
#> [[3]]
#> [[3]]$class
#> [1] "B"
#> 
#> 
#> [[4]]
#> [[4]]$class
#> [1] "C"

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.