README

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

DataFakeR

Overview

DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one.

Installation

Learning DataFakeR

You may find there a list of useful articles that will guide you through the package functionality.

Usage

Configure schema YAML structure

Define custom simulation methods if needed

books <- function(n, add_second = FALSE) {
  first <- c("Learning", "Amusing", "Hiding", "Symbols", "Hunting", "Smile")
  second <- c("Of", "On", "With", "From", "In", "Before")
  third <- c("My", "Your", "The", "Common", "Mysterious", "A")
  fourth <- c("Future", "South", "Technology", "Forest", "Storm", "Dreams")
  second_res <- NULL
  if (add_second) {
    second_res <- sample(second, n, replace = TRUE)
  }
  paste(
    sample(first, n, replace = TRUE), second_res, 
    sample(third, n, replace = TRUE), sample(fourth, n, replace = TRUE)
  )
}

simul_spec_character_book <- function(n, unique, spec_params, ...) {
  spec_params$n <- n
  
  DataFakeR::unique_sample(
    do.call(books, spec_params), 
    spec_params = spec_params, unique = unique
  )
}

set_faker_opts(
  opt_simul_spec_character = opt_simul_spec_character(book = simul_spec_character_book)
)

Source schema (and check table and column dependencies)

options("dfkr_verbose" = TRUE) # set `dfkr_verbose` option to see the workflow progress
sch <- schema_source("schema_books.yml")

schema_plot_deps(sch)

schema_plot_deps(sch, "books")

Run data simulation

sch <- schema_simulate(sch)
#> =====> Simulating table 'books' started..
#>   ===> Simulating column 'author' started..
#>   ===> Simulating column 'title' started..
#>   ===> Simulating column 'genre' started..
#>   ===> Simulating column 'bought' started..
#>   ===> Simulating column 'amount' started..
#>   ===> Simulating column 'book_id' started..
#>   ===> Simulating column 'purchase_id' started..
#> =====> Simulating table 'borrowed' started..
#>   ===> Simulating column 'book_id' started..
#>   ===> Simulating column 'user_id' started..

Check the results

schema_get_table(sch, "books")
#> # A tibble: 10 × 7
#>    book_id      author                   title                           
#>    <chr>        <chr>                    <chr>                           
#>  1 DormAmus2021 Dorman Abshire           Amusing In Common Forest        
#>  2 Dr. Symb2020 Dr. Montie Kihn          Symbols In My Future            
#>  3 SharAmus2021 Sharde Howell MD         Amusing With Your Forest        
#>  4 Dr. Lear2020 Dr. Maggie Lind          Learning From A Storm           
#>  5 NathSmil2020 Nathanael Upton-Prosacco Smile Of Common Future          
#>  6 AnasSmil2021 Anastacia Dickens        Smile In Common Forest          
#>  7 RyleSymb2020 Ryleigh Brekke           Symbols From Mysterious Storm   
#>  8 HortAmus2020 Hortense Rosenbaum       Amusing Before Common Technology
#>  9 MariHidi2021 Mariana Auer-Sauer       Hiding On The Forest            
#> 10 TrisSmil2021 Tristen Larkin           Smile With The South            
#>    genre     bought     amount purchase_id        
#>    <chr>     <date>      <int> <chr>              
#>  1 Adventure 2021-04-13     17 purchase_2021-04-13
#>  2 Horror    2020-03-16     81 purchase_2020-03-16
#>  3 Adventure 2021-01-06     55 purchase_2021-01-06
#>  4 Adventure 2020-02-02     NA purchase_2020-02-02
#>  5 Adventure 2020-04-13     93 purchase_2020-04-13
#>  6 Romance   2021-03-02      2 purchase_2021-03-02
#>  7 Horror    2020-08-09     42 purchase_2020-08-09
#>  8 Adventure 2020-10-12     NA purchase_2020-10-12
#>  9 Horror    2021-05-27     47 purchase_2021-05-27
#> 10 Horror    2021-05-30     72 purchase_2021-05-30

schema_get_table(sch, "borrowed")
#> # A tibble: 30 × 2
#>    book_id      user_id   
#>    <chr>        <chr>     
#>  1 DormAmus2021 PKPFJGYlKQ
#>  2 SharAmus2021 YiitBNRqgN
#>  3 RyleSymb2020 ZmFaiKZrsn
#>  4 RyleSymb2020 hKKanzSLlW
#>  5 AnasSmil2021 vvTGnzCNAP
#>  6 DormAmus2021 BZcsAzAjzm
#>  7 RyleSymb2020 gEfcYAuUVw
#>  8 SharAmus2021 oVcYOaJXBc
#>  9 HortAmus2020 YDCQQTGlce
#> 10 AnasSmil2021 uLrpKuAFVd
#> # … with 20 more rows

Acknowledgment

The package was created thanks to Roche support and contributions from RWD Insights Engineering Team.

Lifecycle

Getting help

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.