The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This Vignette provides a small collection of functions that have been
deprecated in scoringutils
. These functions are no longer,
but may still prove useful or illustrative.
merge_pred_and_obs()
scoringutils
requires that both forecasts and
observations are provided in a single data frame. If you have forecasts
and observations in two different data frames,
merge_pred_and_obs()
may help you to merge the two. The
function is mostly a wrapper around merge()
, but does some
additional work to deal with duplicated column names.
#' @title Merge forecast data and observations
#'
#' @description
#'
#' The function more or less provides a wrapper around `merge` that
#' aims to handle the merging well if additional columns are present
#' in one or both data sets. If in doubt, you should probably merge the
#' data sets manually.
#'
#' @param forecasts A data.frame with the forecast data (as can be passed to
#' [score()]).
#' @param observations A data.frame with the observations.
#' @param join Character, one of `c("left", "full", "right")`. Determines the
#' type of the join. Usually, a left join is appropriate, but sometimes you
#' may want to do a full join to keep dates for which there is a forecast, but
#' no ground truth data.
#' @param by Character vector that denotes the columns by which to merge. Any
#' value that is not a column in observations will be removed.
#' @returns a data.table with forecasts and observations
#' @importFrom checkmate assert_subset
#' @importFrom data.table as.data.table
#' @keywords data-handling
#' @export
merge_pred_and_obs <- function(forecasts, observations,
join = c("left", "full", "right"),
by = NULL) {
forecasts <- as.data.table(forecasts)
observations <- as.data.table(observations)
join <- match.arg(join)
assert_subset(by, intersect(names(forecasts), names(observations)))
if (is.null(by)) {
protected_columns <- c(
"predicted", "observed", "sample_id", "quantile_level",
"interval_range", "boundary"
)
by <- setdiff(colnames(forecasts), protected_columns)
}
obs_cols <- colnames(observations)
by <- intersect(by, obs_cols)
join <- match.arg(join)
if (join == "left") {
# do a left_join, where all data in the observations are kept.
combined <- merge(observations, forecasts, by = by, all.x = TRUE)
} else if (join == "full") {
# do a full, where all data is kept.
combined <- merge(observations, forecasts, by = by, all = TRUE)
} else {
combined <- merge(observations, forecasts, by = by, all.y = TRUE)
}
# get colnames that are the same for x and y
colnames <- colnames(combined)
colnames_x <- colnames[endsWith(colnames, ".x")]
colnames_y <- colnames[endsWith(colnames, ".y")]
# extract basenames
basenames_x <- sub(".x$", "", colnames_x)
basenames_y <- sub(".y$", "", colnames_y)
# see whether the column name as well as the content is the same
content_x <- as.list(combined[, ..colnames_x])
content_y <- as.list(combined[, ..colnames_y])
overlapping <- (content_x %in% content_y) & (basenames_x == basenames_y)
overlap_names <- colnames_x[overlapping]
basenames_overlap <- sub(".x$", "", overlap_names)
# delete overlapping columns
if (length(basenames_overlap) > 0) {
combined[, paste0(basenames_overlap, ".x") := NULL]
combined[, paste0(basenames_overlap, ".y") := NULL]
}
return(combined[])
}
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.