An Introduction to xfun

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

1 No more partial matching for lists!
2 Output character vectors for human eyes
3 Print the content of a text file
4 Get the data URI of a file
5 Match strings and do substitutions
6 Search and replace strings in files
7 Manipulate filename extensions
8 Find files (in a project) without the pain of thinking about absolute/relative paths
9 Types of operating systems
10 Loading and attaching packages
11 Read/write files in UTF-8
12 Convert numbers to English words
13 Cache an R expression to disk or in memory
14 Check reverse dependencies of a package
15 Input a character vector into the RStudio source editor
16 Print session information

After writing about 20 R packages, I found I had accumulated several utility functions that I used across different packages, so I decided to extract them into a separate package. Previously I had been using the evil triple-colon ::: to access these internal utility functions. Now with xfun, these functions have been exported, and more importantly, documented. It should be better to use them under the sun instead of in the dark.

This page shows examples of a subset of functions in this package. For a full list of functions, see the help page help(package = 'xfun'). The source package is available on GitHub: https://github.com/yihui/xfun.

1 No more partial matching for lists!

I have been bitten many times by partial matching in lists, e.g., when I want x$a but the element a does not exist in the list x, it returns the value x$abc if abc exists in x. A strict list is a list for which the partial matching of the $ operator is disabled. The functions xfun::strict_list() and xfun::as_strict_list() are the equivalents to base::list() and base::as.list() respectively which always return as strict list, e.g.,

library(xfun)
(z = strict_list(aaa = "I am aaa", b = 1:5))

#> $aaa
#> [1] "I am aaa"
#> 
#> $b
#> [1] 1 2 3 4 5
#>

z$a  # NULL (strict matching)

#> NULL

z$aaa  # I am aaa

#> [1] "I am aaa"

z$b

#> [1] 1 2 3 4 5

z$c = "you can create a new element"

z2 = unclass(z)  # a normal list
z2$a  # partial matching

#> [1] "I am aaa"

z3 = as_strict_list(z2)  # a strict list again
z3$a  # NULL (strict matching) again!

#> NULL

Similarly, the default partial matching in attr() can be annoying, too. The function xfun::attr2() is simply a shorthand of attr(..., exact = TRUE).

I want it, or I do not want. There is no “I probably want”.

2 Output character vectors for human eyes

When R prints a character vector, your eyes may be distracted by the indices like [1], double quotes, and escape sequences. To see a character vector in its “raw” form, you can use cat(..., sep = '\n'). The function raw_string() marks a character vector as “raw”, and the corresponding printing function will call cat(sep = '\n') to print the character vector to the console.

library(xfun)
raw_string(head(LETTERS))

A
B
C
D
E
F

(x = c("a \"b\"", "hello\tworld!"))

[1] "a \"b\""       "hello\tworld!"

raw_string(x)  # this is more likely to be what you want to see

a "b"
hello	world!

3 Print the content of a text file

I have used paste(readLines('foo'), collapse = '\n') many times before I decided to write a simple wrapper function xfun::file_string(). This function also makes use of raw_string(), so you can see the content of a file in the console as a side-effect, e.g.,

f = system.file("LICENSE", package = "xfun")
xfun::file_string(f)

YEAR: 2018-2025
COPYRIGHT HOLDER: Yihui Xie

as.character(xfun::file_string(f))  # essentially a character string

[1] "YEAR: 2018-2025\nCOPYRIGHT HOLDER: Yihui Xie"

4 Get the data URI of a file

Files can be encoded into base64 strings via base64_uri(). This is a common technique to embed arbitrary files in HTML documents (which is what xfun::embed_file() does and it is based on base64_uri()).

f = system.file("LICENSE", package = "xfun")
xfun::base64_uri(f)

#> [1] "data:text/plain;base64,WUVBUjogMjAxOC0yMDI1CkNPUFlSSUdIVCBIT0xERVI6IFlpaHVpIFhpZQo="

5 Match strings and do substitutions

After typing the code x = grep(pattern, x, value = TRUE); gsub(pattern, '\\1', x) many times, I combined them into a single function xfun::grep_sub().

xfun::grep_sub('a([b]+)c', 'a\\U\\1c', c('abc', 'abbbc', 'addc', '123'), perl = TRUE)

#> [1] "aBc"   "aBBBc"

6 Search and replace strings in files

I can never remember how to properly use grep or sed to search and replace strings in multiple files. My favorite IDE, RStudio, has not provided this feature yet (you can only search and replace in the currently opened file). Therefore I did a quick and dirty implementation in R, including functions gsub_files(), gsub_dir(), and gsub_ext(), to search and replace strings in multiple files under a directory. Note that the files are assumed to be encoded in UTF-8. If you do not use UTF-8, we cannot be friends. Seriously.

All functions are based on gsub_file(), which performs searching and replacing in a single file, e.g.,

library(xfun)
f = tempfile()
writeLines(c("hello", "world"), f)
gsub_file(f, "world", "woRld", fixed = TRUE)
file_string(f)

hello
woRld

The function gsub_dir() is very flexible: you can limit the list of files by MIME types, or extensions. For example, if you want to do substitution in text files, you may use gsub_dir(..., mimetype = '^text/').

The function process_file() is a more general way to process files. Basically it reads a file, process the content with a function that you pass to it, and writes back the text, e.g.,

process_file(f, function(x) {
  rep(x, 3)  # repeat the content 3 times
})
file_string(f)

hello
woRld
hello
woRld
hello
woRld

WARNING: Before using these functions, make sure that you have backed up your files, or version control your files. The files will be modified in-place. If you do not back up or use version control, there is no chance to regret.

7 Manipulate filename extensions

Functions file_ext() and sans_ext() are based on functions in tools. The function with_ext() adds or replaces extensions of filenames, and it is vectorized.

library(xfun)
p = c("abc.doc", "def123.tex", "path/to/foo.Rmd")
file_ext(p)

#> [1] "doc" "tex" "Rmd"

sans_ext(p)

#> [1] "abc"         "def123"      "path/to/foo"

with_ext(p, ".txt")

#> [1] "abc.txt"         "def123.txt"      "path/to/foo.txt"

with_ext(p, c(".ppt", ".sty", ".Rnw"))

#> [1] "abc.ppt"         "def123.sty"      "path/to/foo.Rnw"

with_ext(p, "html")

#> [1] "abc.html"         "def123.html"      "path/to/foo.html"

8 Find files (in a project) without the pain of thinking about absolute/relative paths

The function proj_root() was inspired by the rprojroot package, and tries to find the root directory of a project. Currently it only supports R package projects and RStudio projects by default. It is much less sophisticated than rprojroot.

The function from_root() was inspired by here::here(), but returns a relative path (relative to the project’s root directory found by proj_root()) instead of an absolute path. For example, xfun::from_root('data', 'cars.csv') in a code chunk of docs/foo.Rmd will return ../data/cars.csv when docs/ and data/ directories are under the root directory of a project.

root/
  |-- data/
  |   |-- cars.csv
  |
  |-- docs/
      |-- foo.Rmd

If file paths are too much pain for you to think about, you can just pass an incomplete path to the function magic_path(), and it will try to find the actual path recursively under subdirectories of a root directory. For example, you may only provide a base filename, and magic_path() will look for this file under subdirectories and return the actual path if it is found. By default, it returns a relative path, which is relative to the current working directory. With the above example, xfun::magic_path('cars.csv') in a code chunk of docs/foo.Rmd will return ../data/cars.csv, if cars.csv is a unique filename in the project. You can freely move it to any folders of this project, and magic_path() will still find it. If you are not using a project to manage files, magic_path() will look for the file under subdirectories of the current working directory.

9 Types of operating systems

The series of functions is_linux(), is_macos(), is_unix(), and is_windows() test the types of the OS, using the information from .Platform and Sys.info(), e.g.,

xfun::is_macos()

#> [1] TRUE

xfun::is_unix()

#> [1] TRUE

xfun::is_linux()

#> [1] FALSE

xfun::is_windows()

#> [1] FALSE

10 Loading and attaching packages

Oftentimes I see users attach a series of packages in the beginning of their scripts by repeating library() multiple times. This could be easily vectorized, and the function xfun::pkg_attach() does this job. For example,

library(testit)
library(parallel)
library(tinytex)
library(mime)

is equivalent to

xfun::pkg_attach(c('testit', 'parallel', 'tinytex', 'mime'))

I also see scripts that contain code to install a package if it is not available, e.g.,

if (!requireNamespace('tinytex')) install.packages('tinytex')
library(tinytex)

This could be done via

xfun::pkg_attach2('tinytex')

The function pkg_attach2() is a shorthand of pkg_attach(..., install = TRUE), which means if a package is not available, install it. This function can also deal with multiple packages.

The function loadable() tests if a package is loadable.

11 Read/write files in UTF-8

Functions read_utf8() and write_utf8() can be used to read/write files in UTF-8. They are simple wrappers of readLines() and writeLines().

12 Convert numbers to English words

The function numbers_to_words() (or n2w() for short) converts numbers to English words.

n2w(0, cap = TRUE)

#> [1] "Zero"

n2w(seq(0, 121, 11), and = TRUE)

#>  [1] "zero"                       "eleven"                    
#>  [3] "twenty-two"                 "thirty-three"              
#>  [5] "forty-four"                 "fifty-five"                
#>  [7] "sixty-six"                  "seventy-seven"             
#>  [9] "eighty-eight"               "ninety-nine"               
#> [11] "one hundred and ten"        "one hundred and twenty-one"

n2w(1e+06)

#> [1] "one million"

n2w(1e+11 + 12345678)

#> [1] "one hundred billion, twelve million, three hundred forty-five thousand, six hundred seventy-eight"

n2w(-987654321)

#> [1] "minus nine hundred eighty-seven million, six hundred fifty-four thousand, three hundred twenty-one"

n2w(1e+15 - 1)

#> [1] "nine hundred ninety-nine trillion, nine hundred ninety-nine billion, nine hundred ninety-nine million, nine hundred ninety-nine thousand, nine hundred ninety-nine"

13 Cache an R expression to disk or in memory

Since xfun v0.44, you are recommended to use the function cache_exec(), which provides a simple yet flexible caching mechanism. See https://yihui.org/litedown/#sec:option-cache for how it works. Previously, cache_rds() was mentioned here but it is no longer recommended (see #100).

14 Check reverse dependencies of a package

Running R CMD check on the reverse dependencies of knitr and rmarkdown is my least favorite thing in developing R packages, because the numbers of their reverse dependencies are huge. The function rev_check() reflects some of my past experience in this process. I think I have automated it as much as possible, and made it as easy as possible to discover possible new problems introduced by the current version of the package (compared to the CRAN version). Finally I can just sit back and let it run.

15 Input a character vector into the RStudio source editor

The function rstudio_type() inputs characters in the RStudio source editor as if they were typed by a human. I came up with the idea when preparing my talk for rstudio::conf 2018 (see this post for more details).

16 Print session information

Since I have never been fully satisfied by the output of sessionInfo(), I tweaked it to make it more useful in my use cases. For example, it is rarely useful to print out the names of base R packages, or information about the matrix products / BLAS / LAPACK. Oftentimes I want additional information in the session information, such as the Pandoc version when rmarkdown is used. The function session_info() tweaks the output of sessionInfo(), and makes it possible for other packages to append information in the output of session_info().

You can choose to print out the versions of only the packages you specify, e.g.,

xfun::session_info(c('xfun', 'litedown', 'tinytex'), dependencies = FALSE)

R Under development (unstable) (2025-04-02 r88097)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.7.4

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
  litedown_0.6 tinytex_0.56 xfun_0.52

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.