The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The EFSATools package brings together all the functions developed for EFSA’s ad hoc data collections, providing tools for dataset operations as well as utilities designed to preserve data history.
The package is intended for researchers, analysts, and practitioners who require convenient programmatic access to data collection utilities.
During installation, the following packages developed by EFSA are also installed: - eppoFindeR - Website | CRAN. - distilleR - Website | CRAN.
These packages are not required to use EFSATools, but are included for convenience and can be used directly in the code if needed, for example:
The main purpose of EFSATools is to provide tools for managing datasets and tracking data history within the context of data collections.
Below are examples demonstrating how to use the functions in this package. First, load the EFSATools package:
To explore the arguments and usage of a specific function, you can run:
This will show the full documentation for the function, including its arguments, return values, and usage examples.
For example, if you are working with the SCD2()
function, you can check its documentation with:
If a data frame contain empty rows or columns, you can remove them
using the dropEmpty() function, as follows:
The enrich() function enables the augmentation of a data
frame using information stored in an EFSA’s catalogue. It requires
specifying the column used to join the two datasets, as well as the name
of the column that will contain the enriched information (namely, the
‘NAME’ field of EFSA’s catalogues).
The removeReplicatedColumns() function merges all the
replicated columns in a data frame into a single column whose name
includes the “_deduplicated” suffix. After the merge, the original
replicated columns are removed from the data frame.
In the following example, we present a data frame containing the
columns region_1, region_2, …, region_n with
n > 100. Using the removeReplicatedColumns()
function, these columns can be efficiently consolidated into a single
region_deduplicated column, assuming that for each row only one
of the n columns contains a meaningful (non-NA) value.
The SSCD2() function makes it possible to preserve data
history when new data becomes available by implementing a simplified
version of Slowly Changing Dimension Type 2. It marks all records in the
current data frame as inactive and appends the new data, flagging each
newly added record as active.
Unlike the SCD2() function, SSCD2() does
not check which records have actually changed. Instead, it marks all
existing records as inactive and treats all incoming records as new,
setting the previous ones to inactive status even if they are still
included in the updated dataset.
An example of how to use the function is provided below:
The SCD2() function makes it possible to preserve data
history when new data becomes available by implementing a Slowly
Changing Dimension Type 2. It compares the current records with the new
ones, marking as inactive any existing records that no longer appear in
the updated dataset. Then, it flags as active any new records that are
not present among the currently active data.
Unlike the SSCD2() function, SCD2() checks
which records have actually changed. It marks as inactive any existing
records that no longer appear in the updated dataset, and flags as
active any new records that are not present among the currently active
data.
An example of how to use the function is provided below:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.