Repository Mirror for your Cloud Server and Webhosting

Title:

Implement the Group Walk Algorithm

Version:

0.1.2

Description:

A procedure that uses target-decoy competition (or knockoffs) to reject multiple hypotheses in the presence of group structure. The procedure controls the false discovery rate (FDR) at a user-specified threshold.

URL:

https://www.biorxiv.org/content/10.1101/2022.01.30.478144v1, https://github.com/freejstone/groupwalk

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.1.2

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

NeedsCompilation:

Packaged:

2022-06-17 23:49:57 UTC; jackfreestone

Author:

Jack Freestone [aut, cre, cph], Uri Keich [aut, cph]

Maintainer:

Jack Freestone <jfre0619@uni.sydney.edu.au>

Repository:

CRAN

Date/Publication:

2022-06-18 06:30:02 UTC

Implements group-walk algorithm

Description

This function returns a list of q-values corresponding to hypotheses that have been partitioned into groups. For FDR control, users should report the target-hypotheses with q-values less than or equal to their choice of threshold, alpha. For further details about how group-walk works, see: https://www.biorxiv.org/content/10.1101/2022.01.30.478144v1

Usage

group_walk(
  winning_scores,
  labels,
  all_group_ids,
  K = 40,
  return_frontier = FALSE,
  correction = 1
)

Arguments

winning_scores

A numerical vector of winning scores generated from the target-decoy competitions for each hypothesis.

labels

A vector of winning labels indicating whether it was a target (= 1) or a decoy (!= 1) for each hypothesis.

all_group_ids

A vector of group IDs associated to each hypothesis (can be recorded as integers, factors, characters).

K

A window size parameter (integer).

return_frontier

A boolean indicating whether the function should return the complete sequence of frontiers.

correction

A correction factor used to in the numerator of the estimated false discovery rate (FDR) (Use 1 for FDR control).

Value

A sequence of q-values for each hypothesis. If return_frontier = T, additionally the sequence of frontiers will be returned.

Examples

create_uncalibrated_hypotheses <- function(m_vec, pi_0_vec, mus, sds) {
  total <- sum(m_vec)
  g_total <- length(m_vec)
  data <- matrix(0, ncol = 4, nrow = total)
  for (g in 1:length(m_vec)){
    m <- m_vec[g]
    pi_0 <- pi_0_vec[g]
    mu <- mus[g]
    sd <- sds[g]
    if (g == 1) {
      start <- 0
    } else {
      start <- sum(m_vec[1:(g - 1)])
    }
    targets_nonnull <- rnorm(floor(m*pi_0), mean = mu, sd = sd)
    targets_null <- rnorm(m - floor(m*pi_0), mean = 0, sd = 1)
    decoys <- rnorm(m, mean = 0, sd = 1)
    targets <- c(targets_nonnull, targets_null)
    W <- pmax(targets, decoys)
    data[(start + 1):(start + m), 1] <- W
    data[(start + 1):(start + m), 2] <- g
    decoy_inds <- which(decoys > targets)
    inc_native_inds <- (which(targets_null > decoys[(floor(m*pi_0) + 1):m])) + floor(m*pi_0)
    X <- rep(0, m)
    X[decoy_inds] <- -1
    X[inc_native_inds] <- 1
    Y <- X
    X[X == 0] <- 1
    data[(start + 1):(start + m), 3] <- Y
    data[(start + 1):(start + m), 4] <- X
  }
  return(data)
}

data <- create_uncalibrated_hypotheses(m_vec = rep(1000, 3),
           pi_0_vec = rep(0.6, 3), mus = c(2.5, 3, 3.5), sds = rep(1, 3))

winning_scores <- data[, 1]
all_group_ids <- data[, 2]
labels <- data[, 4]
q_vals <- group_walk(winning_scores, labels, all_group_ids)