The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Estimate Genome Size of Polyploid Species Using K-mer Frequencies

library(findGSEP)
#> Loading required package: RColorBrewer
#> Loading required package: ggplot2

Description

findGSEP is a function for multiple polyploidy genome size estimation by fitting k-mer frequencies iteratively with a normal distribution model.

To use findGSEP, one needs to prepare a histo file, which contains two tab-separated columns. The first column gives frequencies at which k-mers occur in reads, while the second column gives counts of such distinct k-mers. Parameters k and related histo file are required for any estimation.

Dependencies (R library) required: pracma, fGarch, etc. - see DESCRIPTION for details.

Usage

findGSEP(
  path,
  samples,
  sizek,
  exp_hom,
  ploidy,
  range_left,
  range_right,
  xlimit,
  ylimit,
  output_dir
)

Arguments

Example Usage

To run the algorithm, follow these steps:

  1. Prepare a Path: Create a directory where the histo file will be stored. For example, create a directory named test_findGSEP.

  2. Put Histo File in the Path: Place your histo file in the test_findGSEP directory. In this example, the histo file name is ara_simulate_4ploidy_25x_rep4.histo.

  3. Provide Output Directory: Specify the output directory where the results will be saved. In this example, we use tempdir() as the output directory.

  4. Run the Algorithm: Use the following command to run the algorithm with the specified parameters:

    findGSEP(
        path = 'test_findGSEP',
        samples = 'ara_simulate_4ploidy_25x_rep4.histo',
        sizek = 21,
        exp_hom = 35,
        ploidy = 4,
        output_dir = tempdir(),
        range_left = 35 * 0.2, ## exp_hom*0.2
        range_right = 35 * 0.2, ## exp_hom*0.2
        xlimit = -1, ## will calculate automatically
        ylimit = -1 ## will calculate automatically
    )
  5. Output: The output will include:

    • A PDF file named ${samples}._hap_genome_size_est.pdf, which contains the estimated genome size.
    • A CSV file named ${samples}._haploid_size.csv, which contains the predicted genome size.

References

Laiyi Fu, Yanxin Xie, Shunkang Ling, and Hequan Sun# etc. al. findGSEP: a web application for estimating ge-nome size of polyploid species using k-mer frequencies

Session Info

R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] findGSEP_1.2.0     dplyr_1.1.4        png_0.1-8          scales_1.3.0       fGarch_4033.92    
[6] pracma_2.4.4       ggplot2_3.5.0      RColorBrewer_1.1-3

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5        gtable_0.3.5        compiler_4.3.3      fBasics_4032.96     gbutils_0.5        
 [6] tidyselect_1.2.1    cvar_0.5            timeSeries_4032.109 yaml_2.3.8          fastmap_1.1.1      
[11] lattice_0.22-5      R6_2.5.1            generics_0.1.3      knitr_1.45          rbibutils_2.2.16   
[16] tibble_3.2.1        spatial_7.3-17      munsell_0.5.1       timeDate_4032.109   pillar_1.9.0       
[21] rlang_1.1.3         utf8_1.2.4          xfun_0.43           pkgload_1.3.4       cli_3.6.2          
[26] withr_3.0.0         magrittr_2.0.3      Rdpack_2.6          digest_0.6.35       grid_4.3.3         
[31] rstudioapi_0.16.0   lifecycle_1.0.4     vctrs_0.6.5         evaluate_0.23       glue_1.7.0         
[36] fansi_1.0.6         colorspace_2.1-0    rmarkdown_2.26      htmltools_0.5.8.1   tools_4.3.3        
[41] pkgconfig_2.0.3

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.