The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

ClassifyITS Pipeline Overview

Introduction

This vignette shows how to use ClassifyITS to assign taxonomy to fungal ITS sequences, visualize results, and review QC outputs. ClassifyITS returns summary plots and tables in-memory; optionally, it can also write a CSV and a multi-page PDF when you provide an output directory.


Run the Pipeline

Assuming you have downloaded and installed the ClassifyITS package, you can run the taxonomy assignment pipeline using the example BLAST and FASTA files included in the package. Replace these paths with your own files as needed.

For information on generating the required BLAST and FASTA files, see the README.

library(ClassifyITS)

# Paths to example BLAST and FASTA files in the package (replace with your own paths)
blast_path <- system.file("extdata", "example_BLAST.tsv", package = "ClassifyITS")
fasta_path <- system.file("extdata", "example_FASTA.fasta", package = "ClassifyITS")

# Run the assignment pipeline (no files are written by default)
results <- ITS_assignment(
  blast_file = blast_path,
  rep_fasta  = fasta_path
)

Messages and Warnings

By default, the pipeline is quiet. If you set verbose = TRUE, ClassifyITS will emit progress messages. If any OTUs failed quality control steps or did not receive an assignment, a warning may be displayed (this is normal in large datasets):

In ITS_assignment(...) :
  Warning: X of your X FASTA sequences failed QC and could not be classified using this pipeline due to missing or poor BLAST results.

This is normal in large datasets: a small number of OTUs often fail QC or don’t receive a taxonomy assignment at certain levels. The most common reason for an OTU to fail QC is that the BLAST file did not contain any BLAST results for that OTU. This warning is meant to remind users to review the outputs and consider manual curation for unassigned OTUs, especially if they are abundant or of particular interest in downstream analyses. Or perhaps discard these OTUs if they are rare and likely spurious. The pipeline provides the information needed to make informed decisions about how to handle these cases in your dataset.

Optional: Write CSV/PDF outputs

To write outputs, supply an output directory. In vignettes, we write to a temporary directory:

outdir <- file.path(tempdir(), "ClassifyITS_outputs")
dir.create(outdir, showWarnings = FALSE)

results_written <- ITS_assignment(
  blast_file = blast_path,
  rep_fasta  = fasta_path,
  outdir     = outdir,
  verbose    = TRUE
)

results_written$assignments_file
## [1] "/var/folders/59/jpwzlxb97sj5dz0y29k3n7680000gn/T//RtmpKg32Jm/ClassifyITS_outputs/initial_assignments.csv"
results_written$pdf_file
## [1] "/var/folders/59/jpwzlxb97sj5dz0y29k3n7680000gn/T//RtmpKg32Jm/ClassifyITS_outputs/combined_taxonomy_graphics.pdf"

Visualize Summary Plots

Below are the main summary plots. You can also compile them into a multi-page PDF by providing pdf_file.

Alignment length histogram

This figure shows the distribution of BLAST alignment lengths across user specified BLAST results. One of the essential quality control steps in ClassifyITS is to filter out BLAST hits that are too short, as these may not provide reliable taxonomic information. The histogram includes vertical lines indicating the median BLAST alignment length, the cutoff length applied for filtering, and the mean length of the representative sequences in the FASTA file. The default alignment cutoff is 0.6 of the Median BLAST Alignment length. This can also be adjusted with the parameter cutoff_fraction in the plot_alignment_hist function. See the quick start for details on how to adjust this parameter.

Assignment summary bar chart

ClassifyITS is meant to assign taxonomy to studies targeting fungi, as ITS fungal primers occasionally pick up other reads (plants, algae, etc.) but these taxa are generally discarded in downstream analyses. The first taxonomic step in ClassifyITS is to apply kingdom specific cutoffs to the BLAST results. The taxonomic pipeline then proceeds to assign taxonomy to reads in kingdom fungi at the phylum, class, order, family, genus, and species levels. The assignment bar chart shows the number of fungal OTUs that received an assignment (i.e., not “Unclassified”) at each taxonomic level.

Phylum level stacked bar chart

This figure provides a quick summary of the taxonomic composition of the dataset at the phylum level. Importantly, this is the count of OTUs not the relative abundance of each phylum in the OTU table.


Review Tabular Outputs

Step Summary Table

A breakdown of how many OTUs passed each step in the pipeline, including QC failures and taxonomic assignments at each level. This is a useful table to quickly assess the overall success of the taxonomy assignment process and identify any steps where a large number of OTUs may have failed or not received an assignment.

Step Count
# of OTUs in Representative Sequence 11 (100%)
# of OTUs pass quality check 10 (90.9%)
OTUs assigned to fungal kingdom 9 (81.8%)
OTUs assigned to fungal phylum 9 (81.8%)
OTUs assigned to fungal class 9 (81.8%)
OTUs assigned to fungal order 8 (72.7%)

Unique Taxa Counts Table

This table shows the number of unique taxa assigned at each taxonomic level, but only for OTUs that were classified as kingdom Fungi. This is a useful summary to understand the diversity of taxa represented in the dataset at each taxonomic level.

Rank Unique Count
phylum Phylum 4
class Class 7
order Order 8
family Family 7
genus Genus 7
species Species 4

(Optional) Save the multi-page PDF

pdf_file <- file.path(tempdir(), "combined_taxonomy_graphics.pdf")

graphics_pdf <- save_taxonomy_graphics(
  all_results = results$all_results,
  hist_plot   = hist_plot,
  pdf_file    = pdf_file,
  verbose     = TRUE
)

graphics_pdf$pdf_file
## [1] "/var/folders/59/jpwzlxb97sj5dz0y29k3n7680000gn/T//RtmpKg32Jm/combined_taxonomy_graphics.pdf"

Display Final Assignments Table

The assignments table returned by the pipeline has the format shown below. If you ran the pipeline with outdir set, the same table is also written to the CSV file path reported in results_written$assignments_file. The file is named initial assingments to remind users to think carefully about the research question.For example, if you wish to describe fungal diversity in a coral reef, you may want to manually review the unassigned OTUs at the phylum and class level rather than reporting X% of OTUs could not be classified at the phylum level. This is a common practice in microbial ecology when dealing with novel or poorly characterized taxa. The reality is that assigning taxonomy to fungi is computationally challenging as so little of the fungal trees of life is available in reference databases so any classifying software has to deal with a high number of slightly unclear taxonomic matches. ClassifyITS takes the stance of when in doubt, leaving as “Unclassified” and allowing users to inspect the BLAST themselves to make a case by case, phylum by phylum, call. The initial assignments file provides the information needed to make informed decisions about how to handle these cases in your dataset.

qseqid kingdom phylum class order family genus species notes
32fc01b9c7d792aa5dcd67b7fa12df2c Fungi Mortierellomycota Mortierellomycetes Mortierellales Mortierellaceae Mortierella Unclassified ITS pipeline completed
4511c6ae6420214d3233a449573b4eed Fungi Chytridiomycota Rhizophydiomycetes Rhizophydiales Alphamycetaceae Betamyces Unclassified ITS pipeline completed
5eb519d468e41551ef420125c81c98ec Fungi Basidiomycota Tremellomycetes Tremellales Bulleribasidiaceae Derxomyces Derxomyces nakasei ITS pipeline completed
68393f493dc12adb0fe5b5917548432e Fungi Ascomycota Leotiomycetes Unclassified Unclassified Unclassified Unclassified ITS pipeline completed
7526d2c3870858ace1465cc46cbfd3b NA NA NA NA NA NA NA Failed QC: no BLAST result passed Quality Control step (no BLAST result, too many N, alignment too short, low quality results)
7526d2c3880858ace1465cc46cbfd3b8 Fungi Ascomycota Sordariomycetes Amphisphaeriales Apiosporaceae Apiospora Apiospora guiyangensis ITS pipeline completed
94127162136c50efd9c774a9aed392cc Fungi Ascomycota Leotiomycetes Helotiales Unclassified Unclassified Unclassified ITS pipeline completed
b6b03675ed56ad95e223cab49d1c2898 Fungi Basidiomycota Agaricomycetes Agaricales Agaricaceae Lepiota Lepiota subincarnata ITS pipeline completed
c9f32c45006a8048908bc9893d90fc25 Fungi Ascomycota Sordariomycetes Coniochaetales Coniochaetaceae Coniochaeta Coniochaeta sinensis ITS pipeline completed
cbf21fbcf4a991e34023abe8e85285e3 Fungi Ascomycota Saccharomycetes Saccharomycetales Phaffomycetaceae Barnettozyma Unclassified ITS pipeline completed

Tip: To browse the full taxonomy assignment interactively, use View(results$all_results) in your own R session


Conclusion

ClassifyITS optionally produces summary visualizations and tables for every run. A background rate of failed QC is expected. Additionally, ClassifyITS is designed to be conservative in its taxonomic assignments, so it is normal for a significant number of OTUs to not receive an assignment at certain taxonomic levels, especially at the species and genus level. The cause for this is generally multiple equally good/likely assignments to a genus/species. It is recommended to at manually minimum inspect any fungal OTU that was unassigned at the phylum level. See Inspection for a complete guide to careful examination of taxonomic assingments.

Thank you for doing the hard work to continue exploring fungal diversity and its ecological roles! ClassifyITS is designed to be a tool to help you assign taxonomy to your fungal ITS sequences, but it is not a black box. It is important to review the outputs carefully and consider manual curation for unassigned OTUs, especially if they are abundant or of particular interest in downstream analyses. The summary plots and tables generated by ClassifyITS provide a comprehensive overview of the taxonomy assignment process and can help guide your decisions about how to handle unassigned OTUs in your dataset.

See the README, custom-cutoffs, data-preparation and other tabs for more details.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.