The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

cubar

Comprehensive Codon Usage Bias Analysis in R

Overview
Features
Why Choose cubar?
Installation
Documentation & Tutorials
- 🎯 Getting Started
- 📚 Advanced Topics
Example Workflow
🆘 Getting Help
Related Packages
License
Acknowledgments

Overview

Codon usage bias refers to the non-uniform usage of synonymous codons (codons that encode the same amino acid) across different organisms, genes, and functional categories. cubar is a comprehensive R package for analyzing codon usage bias in coding sequences. It provides a unified framework for calculating established codon usage metrics, conducting sliding-window analyses or differential usage analyses, and optimizing sequences for heterologous expression.

Features

🧬 Codon-Level Analysis

RSCU calculation: Relative synonymous codon usage analysis
Amino acid usage: Frequency of each amino acid in sequences
Codon weights: Calculate weights based on gene expression, tRNA availability, and mRNA stability
Optimal codon inference: Machine learning-based identification of optimal codons
Codon-anticodon visualization: Visualization of codon-tRNA pairing relationships

📊 Gene-Level Metrics

Codon frequency tabulation: Count codon occurrences across sequences
CAI (Codon Adaptation Index): Measure similarity to highly expressed genes
ENC (Effective Number of Codons): Assess codon usage bias strength
Fop (Fraction of Optimal codons): Calculate proportion of optimal codons
tAI (tRNA Adaptation Index): Match codon usage to tRNA availability
CSCg (Codon Stabilization Coefficients): Quantify mRNA stability effects
Dp (Deviation from Proportionality): Analyze virus-host codon usage relationships
GC content metrics: Overall GC, GC3s (3rd codon positions), GC4d (4-fold degenerate sites)

🛠️ Utilities & Tools

Sliding window analysis: Positional codon usage patterns within genes
Sequence optimization: Redesign sequences for optimal expression
Differential codon usage: Statistical comparison between sequence sets
Quality control: Comprehensive CDS validation and preprocessing

Why Choose cubar?

🚀 High Performance: Process large datasets (>100,000 sequences) efficiently using optimized Biostrings and data.table backends
🧬 Flexible Genetic Codes: Support for all NCBI genetic codes plus custom genetic code tables
🔗 R Ecosystem Integration: Seamlessly integrate with other bioinformatics and data analysis packages
📚 Comprehensive Documentation: Extensive tutorials, examples, and theoretical background
🔬 Research Ready: Implements established metrics with proper citations and validation

Installation

Stable Release (Recommended)

Install the latest stable version from CRAN:

install.packages("cubar")

Development Version

Install the latest development version from GitHub:

# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install cubar from GitHub
devtools::install_github("mt1022/cubar", dependencies = TRUE)

Dependencies

System Requirements: - R (≥ 4.1.0)

Required Packages: - Biostrings (≥ 2.60.0) - Bioconductor package for sequence manipulation - IRanges (≥ 2.34.0) - Bioconductor infrastructure for range operations
- data.table (≥ 1.14.0) - High-performance data manipulation - ggplot2 (≥ 3.3.5) - Data visualization - rlang (≥ 0.4.11) - Language tools

Note: Bioconductor packages will be installed automatically, but you may need to update your R installation if you encounter compatibility issues.

Documentation & Tutorials

📖 Complete documentation is available within R (?function_name) and on our package website.

🎯 Getting Started

Introduction to cubar - Basic usage and core functionality
Non-standard Genetic Codes - Working with alternative genetic codes
Codon Optimization - Sequence optimization strategies

📚 Advanced Topics

Mathematical Foundations - Detailed theory behind the metrics
Function Reference - Complete function documentation

Example Workflow

Here’s a typical analysis workflow demonstrating key functionality:

library(cubar)
library(ggplot2)

# 1. Load and quality-check sequences
data(yeast_cds)
clean_cds <- check_cds(yeast_cds)

# 2. Calculate codon frequencies
codon_freq <- count_codons(clean_cds)

# 3. Calculate multiple metrics
enc <- get_enc(codon_freq)           # Effective number of codons
gc3s <- get_gc3s(codon_freq)         # GC content at 3rd positions

# 4. Analyze highly expressed genes
data(yeast_exp)
yeast_exp <- yeast_exp[yeast_exp$gene_id %in% rownames(codon_freq), ]
high_expr <- head(yeast_exp[order(-yeast_exp$fpkm), ], 500)
rscu_high <- est_rscu(codon_freq[high_expr$gene_id, ])
cai <- get_cai(codon_freq, rscu_high)

# 5. Visualize results
df <- data.frame(ENC = enc, CAI = cai, GC3s = gc3s)
ggplot(df, aes(color = GC3s, x = ENC, y = CAI)) + 
  geom_point(alpha = 0.6) + 
  scale_color_viridis_c() +
  labs(title = "Codon Usage Bias Relationships",
       x = "Effective Number of Codons", y = "Codon Adaptation Index")

🆘 Getting Help

📋 GitHub Issues: Report bugs, request features, or ask questions
📖 Documentation: Check function help (?function_name) and online docs

For complementary analysis, consider these R packages:

Biostrings - Sequence input/output and manipulation
Peptides - Peptide and protein property calculations

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

GitHub Copilot was used to suggest code snippets during development
GitHub Education for providing free access to development tools
The R and Bioconductor communities for excellent foundational packages
Contributors and users who have provided feedback and improvements

📚 Documentation • 🐛 Report Bug • 💡 Request Feature

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.