The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Version: 1.0.13
Date: 2023-9-14
Title: Distance Based Cell Lineage Reconstruction
Author: Il-Youp Kwak [aut, cre], Wuming Gong [aut]
Maintainer: Il-Youp Kwak <ikwak2@cau.ac.kr>
License: GPL-3
VignetteBuilder: knitr
LinkingTo: Rcpp, RcppArmadillo
RoxygenNote: 7.2.3
Depends: R (≥ 4.1.0), tensorflow(≥ 2.2.0)
Encoding: UTF-8
Imports: BiocParallel, dplyr, Matrix, matrixStats, ape, phangorn, Rcpp, igraph, methods, purrr, stringr, tidyr, rBayesianOptimization, rlang, BiocGenerics
Suggests: knitr, rmarkdown, markdown
Description: R codes for distance based cell lineage reconstruction. Our methods won both sub-challenges 2 and 3 of the Allen Institute Cell Lineage Reconstruction DREAM Challenge in 2020. References: Gong et al. (2021) <doi:10.1016/j.cels.2021.05.008>, Gong et al. (2022) <doi:10.1186/s12859-022-04633-x>.
URL: https://github.com/ikwak2/DCLEAR
NeedsCompilation: yes
Packaged: 2023-09-14 07:01:28 UTC; ikwak2
Repository: CRAN
Date/Publication: 2023-09-14 07:32:35 UTC

DCLEAR: A package for DCLEAR: Distance based Cell LinEAge Reconstruction

Description

Distance based methods for inferring lineage treess from single cell data


WH

Description

implementation of weighted hamming algorithm

Usage

WH(x, InfoW, dropout = FALSE)

Arguments

x

Sequence object of 'phyDat' type.

InfoW

Weight vector for the calculation of weighted hamming distance

dropout

Different weighting strategy is taken to consider interval dropout with dropout = 'TRUE'. Default is, dropout = 'FALSE'.

Value

Calculated distance matrix of input sequences. The result is a 'dist' class object.

Author(s)

Il-Youp Kwak

Examples


set.seed(1)
library(phangorn)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10  ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
        d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )

## RF score with hamming distance
D_h = dist.hamming(sD$seqs)
tree_h= NJ(D_h)
RF.dist(tree_h, sD$tree, normalize = TRUE)

## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5

D_wh = WH(sD$seqs, InfoW)
tree_wh= NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)

## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3

D_wh2 = WH(sD$seqs, InfoW, dropout=TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)


Train weights for WH

Description

Train weights for WH and output weight vector

Usage

WH_train(X, loc0 = 2, locDropout = 1, locMissing = FALSE)

Arguments

X

a list of k number of input data, X[[1]] ... X[[k]]. The ith data have sequence information as phyDat format in X[[i]][[1]], and tree information in X[[i]][[2]] as phylo format.

loc0

weight location of initial state

locDropout

weight location of dropout state

locMissing

weight location of missing state, FALSE if there is no missing values

Value

a weight vector

Author(s)

Il-Youp Kwak (ikwak2@cau.ac.kr)


Train weights for WH, and output distance object

Description

Train weights for WH using the given data, and fit the distance matrix for a input sequence.

Usage

WH_train_fit(x, X)

Arguments

x

input data in phyDat format

X

a list of k number of input data, X[[1]] ... X[[k]]. The ith data have sequence information as phyDat format in X[[i]][[1]], and tree information in X[[i]][[2]] as phylo format.

Value

a dist object

Author(s)

Il-Youp Kwak (ikwak2@cau.ac.kr)


add_deletion

Description

Add deletion

Usage

add_deletion(x, tree, mutation_site, config)

Arguments

x

a character matrix

tree

a matrix representing the lineage tree

mutation_site

a binary matrix for mutation site

config

a lineage_tree_config object

Value

a character matrix with deletions


add_dropout

Description

Add dropout events

Usage

add_dropout(x, config)

Arguments

x

a character matrix

config

a lineage_tree_config object

Value

a character matrix with dropout events


Generic function for as_igraph

Description

Generic function for as_igraph

Usage

as_igraph(x, ...)

Arguments

x

a phylo object

...

additional parameters


as_igraph

Description

Convert an phylo object to an igraph object, while keeping the weight (in contrast to igraph::as.igraph)

Usage

## S4 method for signature 'data.frame'
as_igraph(x, config)

Arguments

x

a phylo object

config

a 'lineage_tree_config' object

Value

an igraph object


as_igraph

Description

Convert an phylo object to an igraph object, while keeping the weight (in contrast to igraph::as.igraph)

Usage

## S4 method for signature 'phylo'
as_igraph(x)

Arguments

x

a phylo object

Value

an igraph object


Generic function for as_lineage_tree

Description

Generic function for as_lineage_tree

Usage

as_lineage_tree(x, y, config, ...)

Arguments

x

a phyDat object

y

a phylo object

config

a lineage_tree_config object

...

additional parameters


as_lineage_tree

Description

Convert a phylo object and a phyDat object to a lineage_tree object

Usage

## S4 method for signature 'phyDat,phylo,lineage_tree_config'
as_lineage_tree(x, y, config, ...)

Arguments

x

a phyDat object

y

a phylo object

config

a lineage_tree_config object

...

additional parameters

Value

a lineage_tree object


Generic function for as_phylo

Description

Generic function for as_phylo

Usage

as_phylo(x, ...)

Arguments

x

a graph object

...

additional parameters


as_phylo

Description

Convert an igraph object to a phylo object

Usage

## S4 method for signature 'igraph'
as_phylo(x)

Arguments

x

an igraph object

Value

a phylo object or a igraph object


Core function of computing kmer replacement distance

Description

Compute the sequence distance matrix using inferred kmer replacement matrix

Usage

dist_kmer_replacement_inference(x, kmer_summary, k = 2)

Arguments

x

input data in phyDat format

kmer_summary

a kmer_summary object

k

k-mers (default k=2)

Value

a dist object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for dist_replacement

Description

Generic function for dist_replacement

Usage

dist_replacement(x, kmer_summary, k, ...)

Arguments

x

a sequence object

kmer_summary

a kmer_summary object

k

k-mer length

...

additional parameters


Compute the kmer replacement distance

Description

Compute the kmer replacement distance between sequences

Usage

## S4 method for signature 'phyDat,kmer_summary,integer'
dist_replacement(x, kmer_summary, k = 2, ...)

Arguments

x

input data in phyDat format

kmer_summary

a kmer_summary object

k

k-mer length

...

other arguments passed to substr_kmer

Value

a dist object

Author(s)

Wuming Gong (gongx030@umn.edu)


Compute the kmer replacement distance

Description

Compute the kmer replacement distance between sequences

Usage

## S4 method for signature 'phyDat,missing,integer'
dist_replacement(x, kmer_summary, k = 2L, ...)

Arguments

x

input data in phyDat format

kmer_summary

a kmer_summary object

k

k-mer length

...

other arguments passed to substr_kmer

Value

a dist object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for dist_weighted_hamming

Description

Generic function for dist_weighted_hamming

Usage

dist_weighted_hamming(x, wVec, ...)

Arguments

x

a sequence object

wVec

weight vector

...

additional parameters


dist_weighted_hamming

Description

implementation of weighted hamming algorithm

Usage

## S4 method for signature 'phyDat,numeric'
dist_weighted_hamming(x, wVec, dropout = FALSE)

Arguments

x

Sequence object of 'phyDat' type.

wVec

Weight vector for the calculation of weighted hamming distance

dropout

Different weighting strategy is taken to consider interval dropout with dropout = 'TRUE'. Default is, dropout = 'FALSE'.

Value

Calculated distance matrix of input sequences. The result is a 'dist' class object.

Author(s)

Il-Youp Kwak

Examples


library(DCLEAR)
library(phangorn)
library(ape)

set.seed(1)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10  ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
      d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_hm = dist.hamming(sD$seqs)
tree_hm = NJ(D_hm)
RF.dist(tree_hm, sD$tree, normalize = TRUE)

## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = dist_weighted_hamming(sD$seqs, InfoW, dropout = FALSE)
tree_wh = NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)

## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = dist_weighted_hamming(sD$seqs, InfoW, dropout = TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)



Generic function for downsample

Description

Generic function for downsample

Usage

downsample(x, ...)

Arguments

x

a data object

...

additional parameters


downsample

Description

Sample a lineage tree

Usage

## S4 method for signature 'igraph'
downsample(x, n = 10L, ...)

Arguments

x

a igraph object

n

number of leaves (tips) in the down-sampled tree

...

additional parameters

Value

a phylo object


downsample

Description

Sample a lineage tree

Usage

## S4 method for signature 'lineage_tree'
downsample(x, n = 10L, ...)

Arguments

x

a lineage_tree object

n

number of leaves (tips) in the down-sampled tree

...

additional parameters

Value

a lineage_tree object


get_distance_prior

Description

prior distribution of distance

Usage

get_distance_prior(x)

Arguments

x

a kmer_summary object

Value

a probabilistic vector of the distribution of nodal distances

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for get_leaves

Description

Generic function for get_leaves

Usage

get_leaves(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters


get_leaves

Description

Get the leaf sequences

Usage

## S4 method for signature 'lineage_tree'
get_leaves(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters

Value

a phyDat object


get_node_names

Description

Convenient function for get node names

Usage

get_node_names(x)

Arguments

x

node id

Value

node names

Author(s)

Wuming Gong (gongx030@umn.edu)


get_replacement_probability

Description

Compute p(A,B|d), the conditional probability of seeing a replacement of from kmer A to B or vice versa

Usage

get_replacement_probability(x)

Arguments

x

a kmer_summary object

Value

an 3D probabilistic array (kmers by kmers by distances)

Author(s)

Wuming Gong (gongx030@umn.edu)


get_sequence

Description

Get sequencees

Usage

get_sequence(x, tree, outcome, config)

Arguments

x

a character matrix

tree

a matrix representing the lineage tree

outcome

a character matrix

config

a lineage_tree_config object

Value

a character matrix


get_transition_probability

Description

Compute p(A,X|B,Y,d), the conditional probability of seeing a replacement from A to B given the previous replacement B from Y at nodal distance d

Usage

get_transition_probability(x)

Arguments

x

a kmer_summary object

Value

an 3D probabilistic array (kmers by kmers by distances)

Author(s)

Wuming Gong (gongx030@umn.edu)


Lineage data

Description

Lineage data

Usage

data(lineages)

Format

An object of class list of length 100.

Examples

data(lineages)

positional_mutation_prob

Description

Convenient function for get node names

Usage

positional_mutation_prob(x, config)

Arguments

x

a phyDat object

config

a lineage_tree_config object

Value

a positional mutation probability matrix


Generic function for process_sequence

Description

Generic function for process_sequence

Usage

process_sequence(x, ...)

Arguments

x

a sequence object

...

additional parameters


Process sequences

Description

Process sequences

Usage

## S4 method for signature 'phyDat'
process_sequence(
  x,
  division = 16L,
  dropout_character = "*",
  default_character = "0",
  deletion_character = "-"
)

Arguments

x

input data in phyDat format

division

cell division

dropout_character

Dropout character (default: '*')

default_character

Default character (default: '0')

deletion_character

Deletion character (default: '-')

Value

a 'lineage_tree_config' object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for prune

Description

Generic function for prune

Usage

prune(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters


prune

Description

Trim a full lineage tree into phylogenetic tree

Usage

## S4 method for signature 'igraph'
prune(x, weighted = TRUE, ...)

Arguments

x

an igraph object

weighted

whether or not keep the edge weight (default: TRUE)

...

additional parameters

Value

an igraph object


prune

Description

Trim a full lineage tree into phylogenetic tree

Usage

## S4 method for signature 'lineage_tree'
prune(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters passed to as_phylo()

Value

a lineage_tree object


random_tree

Description

Simulate a random lineage tree

Usage

random_tree(n_samples, division = 16L)

Arguments

n_samples

number of samples to simulate

division

number of cell division

Value

a data frame

Author(s)

Wuming Gong (gongx030@umn.edu)


rbind

Description

Concatenate multiple phyDat objects

Usage

## S4 method for signature 'phyDat'
rbind(..., deparse.level = 1)

Arguments

...

a list of phyDat objects

deparse.level

see definition in generic rbind

Value

a phyDat object


sample_mutation_outcome

Description

Sample mutation outcome

Usage

sample_mutation_outcome(x, mp = NULL, config)

Arguments

x

an igraph object

mp

a mutation site matrix

config

a lineage_tree_config object

Value

a outcome matrix


sample_mutation_site

Description

Sample mutation site

Usage

sample_mutation_site(tree, config)

Arguments

tree

a data frame

config

a lineage_tree_config object

Value

a mutation site matrix


sample_outcome_prob

Description

Sampling outcome probability based on a gamma distribution

Usage

sample_outcome_prob(config, num_states = 20L, shape = 0.1, scale = 2)

Arguments

config

a lineage_tree_config object

num_states

number of states used in simulation.

shape

shape parameter in gamma distribution

scale

scale parameter in gamma distribution

Value

a probability vector for each alphabet

Author(s)

Wuming Gong (gongx030@umn.edu)


score_simulation

Description

Compare two sets of sequences

Usage

score_simulation(x, y, config)

Arguments

x

a character matrix

y

a character matrix

config

a lineage_tree_config object

Value

numeric scores


sim_seqdata

Description

Generate singe cell barcode data set with tree shaped lineage information

Usage

sim_seqdata(
  sim_n = 200,
  m = 200,
  mu_d = 0.03,
  d = 15,
  n_s = 23,
  outcome_prob = NULL,
  p_d = 0.003
)

Arguments

sim_n

Number of cell samples to simulate.

m

Number of targets.

mu_d

Mutation rate. (a scalar or a vector)

d

Number of cell divisions.

n_s

Number of possible outcome states

outcome_prob

Outcome probability vector (default is NULL)

p_d

Dropout probability

Value

The result is a list containing two objects, 'seqs' and 'tree'. The 'seqs' is 'phyDat' object of 'sim_n' number of simulated barcodes corresponding to each cell, and The 'tree' is a 'phylo' object, a ground truth tree structure for the simulated data.

Author(s)

Il-Youp Kwak

Examples


library(DCLEAR)
library(phangorn)
library(ape)

set.seed(1)
mu_d1 = c( 30, 20, 10, 5, 5, 1, 0.01, 0.001)
mu_d1 = mu_d1/sum(mu_d1)
simn = 10 # number of cell samples
m = 10  ## number of targets
sD = sim_seqdata(sim_n = simn, m = m, mu_d = 0.03,
        d = 12, n_s = length(mu_d1), outcome_prob = mu_d1, p_d = 0.005 )
## RF score with hamming distance
D_hm = dist.hamming(sD$seqs)
tree_hm = NJ(D_hm)
RF.dist(tree_hm, sD$tree, normalize = TRUE)

## RF score with weighted hamming
InfoW = -log(mu_d1)
InfoW[1:2] = 1
InfoW[3:7] = 4.5
D_wh = dist_weighted_hamming(sD$seqs, InfoW, dropout=FALSE)
tree_wh = NJ(D_wh)
RF.dist(tree_wh, sD$tree, normalize = TRUE)

## RF score with weighted hamming, cosidering dropout situation
nfoW = -log(mu_d1)
InfoW[1] = 1
InfoW[2] = 12
InfoW[3:7] = 3
D_wh2 = dist_weighted_hamming(sD$seqs, InfoW, dropout = TRUE)
tree_wh2= NJ(D_wh2)
RF.dist(tree_wh2, sD$tree, normalize = TRUE)


Generic function for simulate

Description

Generic function for simulate

Usage

simulate(config, x, ...)

Arguments

config

a lineage_tree_config object

x

a sequence object

...

additional parameters


simulate

Description

Simulate a cell lineage tree Adoped from https://github.com/elifesciences-publications/CRISPR_recorders_sims/blob/master/MATLAB_sims/GESTALT_30hr_1x_simulation.m

Usage

## S4 method for signature 'lineage_tree_config,missing'
simulate(config, x, n_samples = 200, ...)

Arguments

config

simulation configuration; a lineage_tree_config object

x

missing

n_samples

number of samples to simulate

...

additional parameters

Value

a lineage_tree object

Author(s)

Wuming Gong (gongx030@umn.edu)


simulate

Description

Simulate a cell lineage tree based on a set of sequences

Usage

## S4 method for signature 'lineage_tree_config,phyDat'
simulate(config, x, n_samples = 200L, k = 50, greedy = TRUE, ...)

Arguments

config

simulation configuration; a lineage_tree_config object

x

a sequence object

n_samples

number of samples to simulate

k

Number of trials

greedy

Whether ot not use a greedy search

...

additional parameters

Value

a lineage_tree object

Author(s)

Wuming Gong (gongx030@umn.edu)


simulate_core

Description

Simulate a cell lineage tree Adoped from https://github.com/elifesciences-publications/CRISPR_recorders_sims/blob/master/MATLAB_sims/GESTALT_30hr_1x_simulation.m

Usage

simulate_core(config, tree, mutation_site, outcome)

Arguments

config

simulation configuration; a lineage_tree_config object

tree

a matrix representing the lineage tree

mutation_site

a binary matrix indicating the mutation sites

outcome

a character matrix

Value

a 'lineage_tree' object


Generic function for substr_kmer

Description

Generic function for substr_kmer

Usage

substr_kmer(x, ...)

Arguments

x

a kmer object

...

additional parameters


Subseting a kmer_summary object

Description

Summarize the short k-mer summary from the long k-mer summary

Usage

## S4 method for signature 'kmer_summary'
substr_kmer(x, k = 2)

Arguments

x

a kmer_summary object

k

k-mer length(default: 2)

Value

a new kmer_summary object

Author(s)

Wuming Gong (gongx030@umn.edu)


Generic function for subtract

Description

Generic function for subtract

Usage

subtract(x, y, ...)

Arguments

x

a lineage_tree object

y

a lineage_tree object

...

additional parameters


subtract

Description

Subtract a subtree from a large tree

Usage

## S4 method for signature 'lineage_tree,lineage_tree'
subtract(x, y, ...)

Arguments

x

a lineage_tree object

y

a lineage_tree object

...

additional parameters

Value

a lineage_tree object


Generic function for subtree

Description

Generic function for subtree

Usage

subtree(x, ...)

Arguments

x

a lineage_tree object

...

additional parameters


subtree

Description

Extract a subtree with specific leaves

Usage

## S4 method for signature 'lineage_tree'
subtree(x, leaves = NULL, ...)

Arguments

x

a lineage_tree object

leaves

leaves of the extracted tree

...

additional parameters

Value

a lineage_tree object


subtree

Description

Extract a subtree with specific leaves

Usage

## S4 method for signature 'phylo'
subtree(x, leaves = NULL, ...)

Arguments

x

a phylo object

leaves

leaves of the extracted tree

...

additional parameters

Value

a pylo object


Generic function for summarize_kmer

Description

Generic function for summarize_kmer

Usage

summarize_kmer(x, ...)

Arguments

x

a sequence object

...

additional parameters


summarize_kmer

Description

Summarize kmer distributions with input sequences

Usage

## S4 method for signature 'phyDat'
summarize_kmer(
  x,
  division = 16L,
  k = 2,
  reps = 20L,
  n_samples = 200L,
  n_nodes = 100L,
  n_targets
)

Arguments

x

input data as a phyDat object

division

number of cell division

k

k-mer (default = 2)

reps

number of simulated trees

n_samples

number of samples to simulate

n_nodes

number of nodes to sample (including both leaves and internval nodes)

n_targets

sequence length. If this argument is missing, the length of the input sequences will be used.

Value

a kmer_summary object

Author(s)

Wuming Gong (gongx030@umn.edu)


summarize_kmer_core

Description

Summarize kmer distributions (core function)

Usage

summarize_kmer_core(
  k = 2,
  reps = 20L,
  n_samples = 200L,
  n_nodes = 100L,
  config = NULL
)

Arguments

k

k-mer (default = 2)

reps

number of simulated trees

n_samples

number of samples to simulate

n_nodes

number of nodes to sample (including both leaves and internval nodes)

config

lineage tree configuration (a lineage_tree_config object)

Value

a kmer_summary object

Author(s)

Wuming Gong (gongx030@umn.edu)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.