README

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

tidypopgen

The goal of tidypopgen is to provide a tidy grammar of population genetics, facilitating the manipulation and analysis of biallelic single nucleotide polymorphisms (SNPs). tidypopgen scales to very large genetic datasets by storing genotypes on disk, and performing operations on them in chunks, without ever loading all data in memory. The full functionalities of the package are described in Carter et al. (2025). Please cite this paper if you use tidypopgen in your research.

Installation

install.packages("tidypopgen")

You can install the latest development version directly from r-universe (recommended):

install.packages('tidypopgen', repos = c('https://evolecolgroup.r-universe.dev',
                 'https://cloud.r-project.org'))

Alternatively, you can install tidypopgenusing devtools (but you might need to set up your development environment, which can be a bit more complex):

install.packages("devtools")
devtools::install_github("EvolEcolGroup/tidypopgen")

Examples

There are several vignettes designed to teach you how to use tidypopgen. A short introduction to the package is available in the ‘introduction’ vignette. A more detailed and technical description of the grammar of population genetics, explaining how to manipulate individuals and loci, is available in the ‘grammar’ vignette.

The ‘quality control’ vignette illustrates the tidypopgen functions that help running a full QC of a dataset before analysis.

We also provide a ‘PLINK cheatsheet’ aimed at translating common tasks performed in PLINK into tidypopgen commands.

There is also an article showing how manage aDNA sample that have been coded as pseudohaploids, including how to project ancient DNA data onto a PCA fitted to modern data and prepare data for admixtools: ‘aDNA pseudohaploids’ article.

Finally, tidypopgen is fast and can handle large datasets easily. See a ‘benchmark’ article using the HGDP, a dataset of over 1000 individuals typed for 650k SNPs. We can load the data, clean it, run imputation, PCA and pairwise Fst among 51 populations in less than 20 seconds on a powerful desktop (and less than a minute on a laptop).

When something does not work

If something does not work, check the issues on GitHub to see whether the problem has already been reported. If not, feel free to create an new issue. Please make sure you have updated to the latest version of tidypopgen on r-universe/Github, as well as updating all other packages on your system, and provide a reproducible example for the developers to investigate the problem. Ideally, try to create a minimalistic dataset that reproduces the error, as it will be much easier (and thus faster!) for the developers to track down the problem.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.