The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
{recforest}
offers a flexible solution for analyzing
recurrent events in survival data, outperforming traditional methods
like the Cox model, which struggle with repeated events (e.g., hospital
readmissions) and terminal events like death. By leveraging machine
learning (Random Survival Forests), RecForest models both the timing and
frequency of events, even with right-censored data, leading to more
accurate predictions and insights, ultimately aiding in better
decision-making and patient care.
The methodology is fully described in Murris, J., Bouaziz, O., Jakubczak, M., Katsahian, S., & Lavenu, A. (2024).
You can install the development version of {recforest}
like so:
TODO : TO BE COMPLETED ONCE THE PACKAGE IS ON GITHUB / CRAN
# Install from CRAN:
install.packages("recforest")
# Or the development version from GitHub:
# install.packages("pak")
::pak("XXXX/recforest") pak
A example dataset is provided with the package. It is a modified
version of the bladder1 dataset, studying bladder cancer recurrences,
from the {survival}
package, adapted to be usable with
{recforest}
. Please use ?survival::bladder
and
?bladder1_recforest
to get more information about the
dataset.
library(recforest)
data("bladder1_recforest")
head(bladder1_recforest)
#> # A tibble: 6 × 8
#> id t.start t.stop treatment number size death event
#> <int> <int> <int> <fct> <int> <int> <dbl> <dbl>
#> 1 1 0 0 placebo 1 1 0 0
#> 2 2 0 1 placebo 1 3 0 0
#> 3 3 0 4 placebo 2 1 0 0
#> 4 4 0 7 placebo 1 1 0 0
#> 5 5 0 10 placebo 5 1 0 0
#> 6 6 0 6 placebo 4 1 0 1
<- train_forest(
trained_forest data = bladder1_recforest,
id_var = "id",
covariates = c("treatment", "number", "size"),
time_vars = c("t.start", "t.stop"),
death_var = "death",
event = "event",
n_trees = 5,
n_bootstrap = 70,
mtry = 2,
minsplit = 3,
nodesize = 15,
method = "NAa",
min_score = 5,
max_nodes = 20,
seed = 111,
parallel = FALSE,
verbose = FALSE
)
A full explanation of the data-related and model-related parameters is provided in the vignette (see Further details).
The implementation of parallel computing in this package is based on
the {future}
and {future.apply}
packages. To
enable parallel processing, the parallel
parameter must be
set to TRUE
. The number of cores to use can be specified by
adjusting the workers
parameter in the
future::plan()
function, which configures the
parallelization strategy. Two commonly used strategies are:
Multicore: Recommended for UNIX systems, this strategy uses multiple distinct processes on the available cores. Example of implementation:
::plan(future::multicore, workers = n_cores - 1) future
Multisession: Suitable for Windows systems or for more isolated executions, it launches multiple R sessions. Example of implementation:
::plan(future::multisession, workers = n_cores - 1) future
In both cases, the number of cores (n_cores
) should be
defined by the user based on the available resources.
The following can be run before training the model:
# Define the strategy and number of cores
<- min(future::availableCores(), n_trees)
n_cores ::plan(future::multisession, workers = n_cores - 1) future
If you wish to use parallel computing, please refer to the
{future}
package
documentation for more information.
print(trained_forest)
#>
#> ── Tree 1 ──
#>
#> ℹ Number of nodes : 5
#> ℹ c_index : 0.7490882567469
#> ℹ mse_imse : 316.52338769398
#> ℹ mse_iscore : -23.8459208072833
#>
#> ── Tree 2 ──
#>
#> ℹ Number of nodes : 7
#> ℹ c_index : 0.749320446994866
#> ℹ mse_imse : 317.020277729943
#> ℹ mse_iscore : -24.3379911330769
#>
#> ── Tree 3 ──
#>
#> ℹ Number of nodes : 9
#> ℹ c_index : 0.725611597704621
#> ℹ mse_imse : 552.713717581106
#> ℹ mse_iscore : -260.02620872387
#>
#> ── Tree 4 ──
#>
#> ℹ Number of nodes : 5
#> ℹ c_index : 0.757197981596913
#> ℹ mse_imse : 391.055121586804
#> ℹ mse_iscore : -98.20144371709
#>
#> ── Tree 5 ──
#>
#> ℹ Number of nodes : 7
#> ℹ c_index : 0.75103734439834
#> ℹ mse_imse : 419.451806839106
#> ℹ mse_iscore : -126.772535366445
summary(trained_forest)
#>
#> ── Data summary ────────────────────────────────────────────────────────────────
#> ℹ Number of individuals : 118
#> ℹ Number of predictors : 3
#>
#> ── Model parameters ────────────────────────────────────────────────────────────
#> ℹ mtry : 2
#> ℹ minsplit : 3
#> ℹ nodesize : 15
#> ℹ method : NAa
#> ℹ min_score : 5
#> ℹ max_nodes : 20
#>
#> ── Metrics ─────────────────────────────────────────────────────────────────────
#> ℹ c_index : 0.746451125488328
#> ℹ mse_imse : 399.352862286188
#> ℹ mse_iscore : -106.636819949553
#> ℹ computation time (seconds) : 4.4
The model can be used to predict an expected mean cumulative number of recurrent events per individual at the end of follow-up.
<- predict(
predictions
trained_forest,newdata = bladder1_recforest,
id_var = "id",
covariates = c("treatment", "number", "size"),
time_vars = c("t.start", "t.stop"),
death_var = "death"
)
A deeper explanation of the methodology and the features of the package can be found in the Vignettes.
The Vignettes are structured as follows:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.