Repository Mirror for your Cloud Server and Webhosting

Type:

Package

Title:

Native R Implementation of an Efficient BLAST-Like Algorithm

Version:

1.0.7

Date:

2023-08-22

Maintainer:

Manu Tamminen <mavatam@utu.fi>

Description:

Implementation of an efficient BLAST-like sequence comparison algorithm, written in 'C++11' and using native R datatypes. Blaster is based on 'nsearch' - Schmid et al (2018) <doi:10.1101/399782>.

License:

BSD_3_clause + file LICENSE

Imports:

Rcpp (≥ 1.0.5)

LinkingTo:

Rcpp

SystemRequirements:

C++

RoxygenNote:

7.2.3

URL:

https://github.com/tamminenlab/blaster

NeedsCompilation:

yes

Packaged:

2023-08-22 13:12:19 UTC; mavatam

Author:

Manu Tamminen

[aut, cre], Timothy Julian

[aut], Aditya Jeevennavar

[aut], Steven Schmid [aut]

Repository:

CRAN

Date/Publication:

2023-08-22 14:40:09 UTC

blaster: Native R Implementation of an Efficient BLAST-Like Algorithm

Description

Implementation of an efficient BLAST-like sequence comparison algorithm, written in 'C++11' and using native R datatypes. Blaster is based on 'nsearch' - Schmid et al (2018) doi:10.1101/399782.

Author(s)

Maintainer: Manu Tamminen mavatam@utu.fi (ORCID)

Authors:

Timothy Julian tim.julian@eawag.ch (ORCID)
Aditya Jeevennavar aditya.a.jeevannavar@utu.fi (ORCID)
Steven Schmid stevschmid@gmail.com

Runs BLAST sequence comparison algorithm.

Description

Runs BLAST sequence comparison algorithm.

Usage

blast(
  query,
  db,
  maxAccepts = 1,
  maxRejects = 16,
  minIdentity = 0.75,
  alphabet = "nucleotide",
  strand = "both",
  output_to_file = FALSE
)

Arguments

query

A dataframe of the query sequences (containing Id and Seq columns) or a string specifying the FASTA file of the query sequences.

db

A dataframe of the database sequences (containing Id and Seq columns) or a string specifying the FASTA file of the database sequences.

maxAccepts

A number specifying the maximum accepted hits.

maxRejects

A number specifying the maximum rejected hits.

minIdentity

A number specifying the minimal accepted sequence similarity between the query and hit sequences.

alphabet

A string specifying the query and database alphabet: 'nucleotide' or 'protein'. Defaults to 'nucleotide'.

strand

A string specifying the strand to search: 'plus', 'minus' or 'both'. Defaults to 'both'. Only affects nucleotide searches.

output_to_file

A boolean specifying the output type. If TRUE, the results are written into a temporary file a string containing the file name and location is returned. Otherwise a dataframe of the results is returned. Defaults to FALSE.

Value

A dataframe or a string. A dataframe is returned by default, containing the BLAST output in columns QueryId, TargetId, QueryMatchStart, QueryMatchEnd, TargetMatchStart, TargetMatchEnd, QueryMatchSeq, TargetMatchSeq, NumColumns, NumMatches, NumMismatches, NumGaps, Identity and Alignment. A string is returned if 'output_to_file' is set to TRUE. This string points to the file containing the output table.

Examples


query <- system.file("extdata", "query.fasta", package = "blaster")
db <- system.file("extdata", "db.fasta", package = "blaster")

blast_table <- blast(query = query, db = db)

query <- read_fasta(filename = query)
db <- read_fasta(filename = db)
blast_table <- blast(query = query, db = db)

prot <- system.file("extdata", "prot.fasta", package = "blaster")
prot_blast_table <- blast(query = prot, db = prot, alphabet = "protein")

Reads the contents of nucleotide or protein FASTA file into a dataframe.

Description

Reads the contents of nucleotide or protein FASTA file into a dataframe.

Usage

read_fasta(
  filename,
  filter = "",
  non_standard_chars = "error",
  alphabet = "nucleotide"
)

Arguments

filename

A string specifying the name of the FASTA file to be imported.

filter

An optional string specifying a sequence motif for sequence filtering. Only keeps those sequences containing this motif. Also splits the matched sequences and provides the split parts in two additional columns.

non_standard_chars

A string specifying instructions for handling non-standard nucleotide or amino acid characters. Options include 'remove', 'ignore' or throw an 'error'. Defaults to 'error'.

alphabet

A string specifying the query and database alphabet: 'nucleotide' or 'protein'. Defaults to 'nucleotide'.

Value

A dataframe containing FASTA ids (Id column) and sequences (Seq column). If 'filter' is specified, the split sequences are stored in additional columns Part1 and Part2.

Examples


query <- system.file("extdata", "query.fasta", package = "blaster")

query <- read_fasta(filename = query)

blaster: Native R Implementation of an Efficient BLAST-Like Algorithm

Description

Author(s)

See Also

Runs BLAST sequence comparison algorithm.

Description

Usage

Arguments

Value

Examples

Reads the contents of nucleotide or protein FASTA file into a dataframe.

Description

Usage

Arguments

Value

Examples