The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with basepenguins

Introduction

The basepenguins package provides tools to convert R scripts and R Markdown/Quarto documents (or other specified file types) that use the palmerpenguins package to use the versions of penguins and penguins_raw from datasets (R ≥ 4.5.0).

With R ≥ 4.5.0, the popular Palmer Penguins datasets are now directly available without loading the palmerpenguins package. This makes them more accessible, especially for new R users and for teaching purposes. However, there are some differences between the variable names in the palmerpenguins package and those in R’s datasets package:

palmerpenguins datasets
bill_length_mm bill_len
bill_depth_mm bill_dep
flipper_length_mm flipper_len
body_mass_g body_mass

These shorter variable names in the base R version were chosen for more compact code and data display. It does mean, however, that for those wanting to use R’s version of penguins, it isn’t simply a case of removing the call to library(palmerpenguins) or replacing palmerpenguins with datasets in data("penguins", package = "palmerpenguins") and the script still running.

The basepenguins package takes care of converting files by removing the call to palmerpenguins and making the necessary conversions to variable names, ensuring that the resulting scripts still run using the datasets (R ≥ 4.5.0) versions of penguins and penguins_raw.

library(basepenguins)

Package features

The basepenguins package provides four functions to convert files:

If using convert_files_inplace() or convert_dir_inplace(), we recommend doing so in conjunction with a version-control system such as git, so that any changes can be easily checked.

Additionally, there are helper functions:

What changes when converting files?

When a file is ‘convertible’, i.e. contains a call to library(palmerpenguins) or data("penguins", package = "palmerpenguins") and has one of the specified extensions (by default "R", "r", "qmd", "rmd", "Rmd"), the conversion makes these changes:

Example directory and files

The package includes an example directory with four example files to demonstrate how the conversion works. These are accessible through example_files() and example_dir().

# List all example files
example_files()
#> [1] "nested/not_a_script.md" "nested/penguins.qmd"    "no_penguins.Rmd"       
#> [4] "penguins.R"

These example files include:

You can examine the content of any of these files, e.g.:

penguins_script <- example_files("penguins.R")
cat(readLines(penguins_script), sep = "\n")
#> library(palmerpenguins)
#> library(ggplot2)
#> library(dplyr)
#> 
#> # exploring scatterplots
#> penguins |>
#>   select(body_mass_g, ends_with("_mm")) |>
#>   ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
#>   geom_point(aes(color = species, shape = species), size = 2) +
#>   scale_color_manual(values = c("darkorange", "darkorchid", "cyan4"))

The example_dir() function returns the path to the directory containing all example files. It also has a copy.dir argument that allows you to copy all the example files to a new directory. This is especially useful for testing the conversion functions that modify files in-place without affecting the original example files distributed with the package:

# Copy all example files to a new subdirectory of the working directory
example_dir("examples")

# List the files in the copied directory
list.files("examples", recursive = TRUE)
#> [1] "nested/not_a_script.md" "nested/penguins.qmd"    "no_penguins.Rmd"       
#> [4] "penguins.R"

Note that for the purposes of this vignette (and to adhere to CRAN policies), the working directory has been set to a tempdir and all new directories and files are written there, using relative paths.

Converting files

The package offers two main approaches to converting files: creating new converted versions with convert_files() or modifying files in place withconvert_files_inplace().

Let’s start by converting a single file to see how it works:

# Convert a single file to a new output file
convert_files(penguins_script, "converted_penguins.R")
#> - ends_with("_mm") replaced on line 7 in converted_penguins.R
#> - Please check the changed output files.
# Look at the converted file
cat(readLines("converted_penguins.R"), sep = "\n")
#> 
#> library(ggplot2)
#> library(dplyr)
#> 
#> # exploring scatterplots
#> penguins |>
#>   select(body_mass, starts_with("flipper_"), starts_with("bill_")) |>
#>   ggplot(aes(x = flipper_len, y = body_mass)) +
#>   geom_point(aes(color = species, shape = species), size = 2) +
#>   scale_color_manual(values = c("darkorange", "darkorchid", "cyan4"))

Notice how the function has:

Both the input and output parameters of convert_files() take a vector of file paths, allowing you to convert multiple files at once.

If you want to overwrite the original files rather than creating new ones, you can use convert_files_inplace(), which works exactly the same as convert_files(), except that it doesn’t take an output argument - it is simply a convenience wrapper around convert_files(input, input, extensions).

Return values and messages

All the convert_*() functions invisibly return a list with two components:

If the output paths are different than the input paths, the values in the changed and not_changed vectors will be subsets of output, and they will be named with the corresponding input paths. If files are overwritten, then the values in changed and not_changed will be subsets of input and the vectors will not be named.

This list is returned invisibly for two reasons:

  1. If many files are converted, and/or absolute file paths are used, this list can occupy a lot of console space
  2. With the list occupying a lot of console space, messages generated by the functions might be missed

The convert_*() functions generate messages in the following circumstances:

Converting a directory

To convert all convertible files in a directory (and its subdirectories), use convert_dir(). We’ll use the "examples" directory that we created above with the call to example_dir("examples").

result <- convert_dir("examples", "converted_examples")
#> - ends_with("_mm") replaced on line 7 in converted_examples/penguins.R
#> - Please check the changed output files.
#> - Remember to re-knit or re-render and changed Rmarkdown or Quarto documents.
result
#> $changed
#>             examples/nested/penguins.qmd 
#> "converted_examples/nested/penguins.qmd" 
#>                      examples/penguins.R 
#>          "converted_examples/penguins.R" 
#> 
#> $not_changed
#>                    examples/no_penguins.Rmd 
#>        "converted_examples/no_penguins.Rmd" 
#>             examples/nested/not_a_script.md 
#> "converted_examples/nested/not_a_script.md"

To convert all files in a directory in place, use convert_dir_inplace(). A useful call is convert_dir_inplace(".") to overwrite all convertible files in the working directory, though we don’t run that here, demonstrating on a fresh copy of the example directory instead.

example_dir("in_place_dir")

result <- convert_dir_inplace("in_place_dir")
#> - ends_with("_mm") replaced on line 7 in in_place_dir/penguins.R
#> - Please check the changed output files.
#> - Remember to re-knit or re-render and changed Rmarkdown or Quarto documents.
result
#> $changed
#> [1] "in_place_dir/nested/penguins.qmd" "in_place_dir/penguins.R"         
#> 
#> $not_changed
#> [1] "in_place_dir/no_penguins.Rmd"        "in_place_dir/nested/not_a_script.md"

Helper functions

Finding files with specific extensions

When working with large directories, the files_to_convert() function helps you find files with specific extensions that might be candidates for conversion:

# List all files with convertible extensions in a directory
potential_files <- files_to_convert("examples")
potential_files
#> [1] "nested/penguins.qmd" "no_penguins.Rmd"     "penguins.R"

It’s important to note that files_to_convert() only filters files by their extensions and does not look for palmerpenguins in their content.

By default, this function looks for files with extensions "R", "r", "qmd", "rmd", or "Rmd". You can specify different extensions if needed, or return absolute file paths. See files_to_convert() for further details:

# Only look for R scripts
files_to_convert("examples", extensions = "R")
#> [1] "penguins.R"
# All extensions
files_to_convert("examples", extensions = NULL)
#> [1] "nested/not_a_script.md" "nested/penguins.qmd"    "no_penguins.Rmd"       
#> [4] "penguins.R"

Generating output paths

When converting files to new locations, the output_paths() function helps generate appropriate output paths, based on the input paths (which are preserved as names). These can then be passed to the output argument in convert_files(). By default, output_paths() adds a "_new" suffix to the file name, but other suffixes, or prefixes, can be specified. Other output directories can also be given:

input_files <- files_to_convert("examples")

# Default
output_paths(input_files)
#>       nested/penguins.qmd           no_penguins.Rmd                penguins.R 
#> "nested/penguins_new.qmd"     "no_penguins_new.Rmd"          "penguins_new.R"
# Generate output paths with prefix instead, in new directory
output_paths(input_files, prefix = "base_", suffix = "", dir = "~/output")
#>                 nested/penguins.qmd                     no_penguins.Rmd 
#> "~/output/nested/base_penguins.qmd"     "~/output/base_no_penguins.Rmd" 
#>                          penguins.R 
#>          "~/output/base_penguins.R"

Considerations regarding the ends_with("_mm") substitution

The palmerpenguins Get started vignette has examples of using ends_with("_mm") within calls to dplyr::select(), as a convenient way to select the flipper_length_mm, bill_length_mm and bill_depth_mm columns.

This pattern presents a design challenge for basepenguins. We need a way to select the flipper_len, bill_len and bill_dep columns.

The most obvious substition for ends_with("_mm") is therefore flipper_len, starts_with("bill_"), which preserves the use of a tidyselect function. However, suppose we have a previous call to dplyr::select(), and have converted the file with the above. Then following code will generate an error, because flipper_len is no longer available to be selected:

penguins |>
  select(bill_len, bill_dep) |>
  select(flipper_len, starts_with("bill_"))

Although the above example is contrived, we don’t want to break anyone’s code, so instead we replace ends_with("_mm") with:

starts_with("flipper_"), starts_with("bill_")

This won’t error, even if there are no column names starting with "flipper_" or "bill_". However, we shouldn’t ever really need starts_with("flipper_") as there is only one column in penguins that meets that criteria, so we suggest manually checking this substitution and either replacing starts_with("flipper_") with flipper_len if flipper_len is still a column in the data frame, or removing starts_with("flipper_") entirely if not.

To facilitate this, the convert_*() functions all print a message indicating where these substitutions were made, to help you manually review and potentially refine these changes if desired.

The use of the ends_with("_mm") pattern with the penguins dataset is also the reason why we only convert files if library(palmerpenguins) or data("penguins", package = "palmerpenguins") is found in the file. It is possible to imagine different data frames for which this selector could be used, and we don’t want to inadvertently alter those. We provide an example file to demonstrate this:

no_penguins_file <- "examples/no_penguins.Rmd"
cat(readLines(no_penguins_file), sep = "\n")
#> ---
#> title: No penguins
#> ---
#> 
#> A file to make sure we're not changing `ends_with("_mm")` 
#> if the script doesn't load the palmerpenguins package.
#> 
#> ```{r}
#> dat <- data.frame(length_mm = 1:3, depth_mm = 4:6)
#> 
#> dat |>
#>   dplyr::select(ends_with("_mm"))
#> ```
# Pass it to a convert function
convert_files(no_penguins_file, "no_penguins_converted.Rmd")

# The content doesn't change
cat(readLines("no_penguins_converted.Rmd"), sep = "\n")
#> ---
#> title: No penguins
#> ---
#> 
#> A file to make sure we're not changing `ends_with("_mm")` 
#> if the script doesn't load the palmerpenguins package.
#> 
#> ```{r}
#> dat <- data.frame(length_mm = 1:3, depth_mm = 4:6)
#> 
#> dat |>
#>   dplyr::select(ends_with("_mm"))
#> ```

Even though this file contains ends_with("_mm"), and is an R Markdown file, it doesn’t use the palmerpenguins package, so no substitutions are made. Notice also that there were no messages generated when convert_files() was called, indicating that none of the input files changed.

Final considerations

Class

The versions of penguins and penguins_raw in R ≥ 4.5.0’s datasets package will always (just) have class data.frame. In contrast, the palmerpenguins versions will have classes tbl_df, tbl and data.frame if the tibble package is installed on your computer (and just class data.frame if not).

penguins_raw

The versions of penguins_raw in palmerpenguins and datasets are identical, except potentially for their class, as described above. No specific changes are made to penguins_raw by the convert_*() functions in basepenguins, but by removing the call to library(palmerpenguins), the datasets version will be used in any scripts, which is always a data.frame (never a tbl_df).

The palmerpenguins package

Note that the palmerpenguins package provides features that are not in R, such as vignettes and articles on the package website. The package also contains the data in two csv files and provides a function to access them. And, of course, Allison Horst’s wonderful penguins artwork! The palmerpenguins package will remain on CRAN and keep its package website.

We are extremely grateful to the authors of palmerpenguins, Allison Horst, Alison Hill and Kristen Gorman, for their support for adding the Palmer Penguins data to datasets, and their enthusiasm about basepenguins.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.