The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Out of the box, deident
features a set of
transformations to aid in the de-identification of data sets. Each
transformation is implemented via R6Class
and extends
BaseDeident
. User defined transformations can be
implemented in a similar manner.
To demonstrate the different transformation we supply a toy data set,
df
, comprising 26 observations of three variables:
X
if B <= 13
,
Y
if B > 13
Apply a cached random replacement cipher. Re-occurrence of the same key will receive the same hash.
Implemented deident
options:
deident(df, "psudonymize", A)
deident(df, "Pseudonymizer", A)
deident(df, Pseudonymizer, A)
deident(df, Pseudonymizer$new(), A)
psu <- Pseudonymizer$new()
deident(df, psu, A)
By default Pseudonymizer
replaces values in variables
with a random alpha-numeric string of 5 characters. This can be replaced
via calling set_method
on an instantiated Pseudonymizer
with the desired function:
psu <- Pseudonymizer$new()
new_method <- function(key, ...){
paste(sample(letters, 12, T), collapse="")
}
psu$set_method(new_method)
deident(df, psu, A)
#> DeidentList
#> 1 step(s) implemented
#> Step 1 : 'Pseudonymizer' on variable(s) A
#> For data:
#> columns: A, B, C
The first argument to the method receives the key to be transformed.
Implemented deident
options:
Apply cryptographic hashing to a variable.
Implemented deident
options:
deident(df, "encrypt", A)
deident(df, "Encrypter", A)
deident(df, Encrypter, A)
deident(df, Encrypter$new(), A)
encrypt <- Encrypter$new()
deident(df, encrypt, A)
At initialization, Encrypter
can be given
hash_key
and seed
values to control the
cryptographic encryption. It is recommended users set these values and
do not disclose them.
Apply Gaussian white noise to a numeric variable.
Implemented deident
options:
Aggregate categorical values dependent on a user supplied list. the
list must be supplied to Blur
at initialization.
Implemented deident
options:
Aggregate numeric values dependent on a user supplied vector of
breaks/ cuts. If no vector is supplied NumericBlurer
defaults to a binary classification about 0.
Implemented deident
options:
deident(df, "numeric_blur", B)
deident(df, "NumericBlurer", B)
deident(df, NumericBlurer, B)
deident(df, NumericBlurer$new(), B)
numeric_blur <- NumericBlurer$new()
deident(df, numeric_blur, B)
At initialization NumericBlurer
takes an argument
cuts
to define the limits of each interval.
Apply Shuffler
to a data set having first grouped the
data on column(s). The grouping needs to be defined at
initialization.
Implemented deident
options:
At initialization GroupedShuffler
takes an argument
limit
such that if any aggregated sub group has fewer than
limit
observations all values are dropped.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.