The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Take-all units

When drawing a sample of size n from a population of size N, where each unit is drawn proportional to some measure of size xi, i = 1, …, N, the probability that the ith unit is included in the sample is

$$ \pi_{i} = n x_{i} / \sum_{i=1}^{N} x_{i}. $$

In theory these inclusion probabilities should be less than 1; in practice units with a large xi can have an inclusion probability greater than 1, especially when n is large. The usual procedure to deal with these units is to put them in a special take-all stratum so that they are always included in the sample, essentially fixing their inclusion probabilities at 1, with the remaining units (the take-some units) drawn at random. The usual algorithm used by, say, sampling::inclusionprobabilities() repeatedly moves units into the take-all stratum and recalculates the inclusion probabilities for the remaining units until all inclusion probabilities are less than 1.1

Sequential poisson sampling is a bit more complicated because it can be useful to place units with an inclusion probability greater than 1 − α, for some small α, into the take-all stratum. Unless α = 0, the usual algorithm for finding take-all units can put units into the take-all stratum that have an inclusion probability less than 1 − α. We can see this with the following example (with a larger value for α).

pi <- function(x, n) {
  n * (x / sum(x))
}

# Population with 13 units.
x <- c(1:8, 9.5, 10, 20, 20, 30)

alpha <- 0.15

# Units 11, 12, and 13 have an inclusion probability
# greater than 1 - alpha.
which(pi(x, 8) >= 1 - alpha)
#> [1] 11 12 13

# Now units 9 and 10 have an inclusion probability
# greater than 1 - alpha.
which(pi(x[1:10], 5) >= 1 - alpha)
#> [1]  9 10

# After two rounds of removal all inclusion probabilities
# are less than 1 - alpha.
any(pi(x[1:8], 3) >= 1 - alpha)
#> [1] FALSE

Although all inclusion probabilities are less than 1 − α after two rounds of placing units into a take-all stratum, the inclusion probability for unit 9 would also be less than 1 − α despite it being in the take-all stratum.

pi(x[1:9], 4)[9] >= 1 - alpha
#> [1] FALSE

This means that units have to be considered one at a time, from largest to smallest, to determine if they belong in the take-all stratum. Rather than doing this as a loop and potentially calculating the inclusion probabilities many times for most units, we can calculate a sequence of inclusion probabilities for the n units with the largest sizes

$$ p_{i} = \frac{x_i i}{\bar{x} + \sum_{j=1}^{i} x_j}, $$

where is the total size for all remaining units (those always in the take-some stratum) in the population. If units are sorted in increasing order of size, then the ith element of this sequence gives the largest inclusion probability after putting the n − i largest units in the take-all stratum.

Finding the take-all units is now simply a case of finding the smallest integer i* (if one exists, for otherwise there are no take-all units) such that pi* ≥ 1 − α. Although the sequence given by p1, …, pn is not increasing, it is monotonic enough to get a unique set of take-all units. To see this, write the i + 1th element of this sequence as

$$ \begin{align*} p_{i+1} &= \frac{x_{i+1} (i + 1)}{\bar{x} + \sum_{j=1}^{i+1} x_j} \\ &= \frac{x_{i+1} (i + 1)}{x_{i} i / p_{i} + x_{i+1}}. \end{align*} $$

It is straightforward to verify that

$$ p_{i+1} > p_{i} \Leftrightarrow p_{i} < \frac{x_{i+1} - x_{i}}{x_{i+1}} i + 1 $$

and

pi + 1 ≥ 1 ⇔ pi ≥ xi/xi + 1.

This means that pi + 1 > pi if pi < 1 and pi + 1 ≥ 1 if pi ≥ 1—the sequence is strictly increasing whenever it is less than 1 and does not drop below 1 after reaching this point. Therefore if pi* ≥ 1 − α then all units i > i* have pi ≥ 1 − α.

Returning to the previous example, we can correctly find the four take-all units from the elements of pi that are greater than 1 − α.

p <- function(x, n) {
  ord <- order(x, decreasing = TRUE)
  s <- seq_len(n)
  possible_ta <- rev(ord[s])
  x_ta <- x[possible_ta] # ties are in reverse
  definite_ts <- ord[seq.int(n + 1, length.out = length(x) - n)]

  x_ta * s / (sum(x[definite_ts]) + cumsum(x_ta))
}

plot(
  1:4,
  p(x, 8)[1:4],
  xlab = "",
  ylab = "p",
  xlim = c(1, 8),
  ylim = c(0, 2),
  pch = 20
)
points(5:8, p(x, 8)[5:8], pch = 19)
abline(1 - alpha, 0, lty = 2)
legend(1, 2, c("take-some", "take-all"), pch = c(20, 19))

Diagram showing how to find units that belong in the take-all stratum.

As each element pi in the sequence is a strictly increasing function of i, we can also identify the unique value for n when a unit enters the take-all stratum.

plot(
  2:5,
  p(x, 8)[1:4],
  xlab = "",
  ylab = "p",
  xlim = c(1, 9),
  ylim = c(0, 2.5),
  pch = 20
)
points(6:9, p(x, 8)[5:8], pch = 19)
points(1:3, p(x, 9)[1:3], pch = 20, col = "red")
points(4:9, p(x, 9)[4:9], pch = 19, col = "red")
abline(1 - alpha, 0, lty = 2)
legend(1, 2.5, c("take-some", "take-all"), pch = c(20, 19))
legend(1, 2, c("n = 8", "n = 9"), pch = 20, col = c("black", "red"))

Diagram showing when units first enter the take-all stratum.


  1. To see why this needs to be done repeatedly, consider a population with three units with sizes 1, 2, and 3 and find which units have an inclusion probability greater than 1 when the sample size is 3.↩︎

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.