The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

messy

When teaching examples using R, instructors often using nice datasets - but these aren’t very realistic, and aren’t what students will later encounter in the real world. Real datasets have typos, missing values encoded in strange ways, and weird spaces. The {messy} R package takes a clean dataset, and randomly adds these things in - giving students the opportunity to practice their data cleaning and wrangling skills without having to change all of your examples.

Installation

Install from CRAN using:

install.packages("messy")

Install development version from GitHub using:

remotes::install_github("nrennie/messy")

Usage

For more in-depth usage instructions, see the package documentation at nrennie.rbind.io/messy which has examples of each function.

The simplest way to use the {messy} package is applying the messy() function:

set.seed(1234)
messy(ToothGrowth[1:10,])

    len supp dose
1   4.2   VC  0.5
2  11.5 <NA> <NA>
3  7.3    VC  0.5
4   5.8  (VC  0.5
5   6.4   VC <NA>
6    10   VC  0.5
7  11.2 <NA>  0.5
8  11.2   VC  0.5
9  5.2    VC  0.5
10    7   VC 0.5

You can vary the amount of messiness for each function, and chain together different functions to create customised messy data:

set.seed(1234)
ToothGrowth[1:10,] |> 
  make_missing(cols = "supp", missing = " ") |> 
  make_missing(cols = c("len", "dose"), missing = c(NA, 999)) |> 
  add_whitespace(cols = "supp", messiness = 0.5) |> 
  add_special_chars(cols = "supp")

    len supp dose
1   4.2   VC  0.5
2  11.5  VC    NA
3   7.3   VC  0.5
4   5.8 *VC   0.5
5   6.4  VC   0.5
6  10.0   VC  0.5
7  11.2       0.5
8  11.2  V#C   NA
9   5.2  !VC  0.5
10  7.0 VC*   0.5

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.