The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The goal of this vignette is to show how to use custom asymptotic references. As an example, we explore the differences in asymptotic time complexity between different implementations of binary segmentation.
The code below uses the following arguments:
N
is a numeric vector of data sizes,setup
is an R expression to create the data,library(data.table)
seg.result <- atime::atime(
N=2^seq(2, 20),
setup={
max.segs <- as.integer(N/2)
max.changes <- max.segs-1L
set.seed(1)
data.vec <- 1:N
},
"changepoint\n::cpt.mean"={
cpt.fit <- changepoint::cpt.mean(data.vec, method="BinSeg", Q=max.changes)
sort(c(N,cpt.fit@cpts.full[max.changes,]))
},
"binsegRcpp\nmultiset"={
binseg.fit <- binsegRcpp::binseg(
"mean_norm", data.vec, max.segs, container.str="multiset")
sort(binseg.fit$splits$end)
},
"fpop::\nmultiBinSeg"={
mbs.fit <- fpop::multiBinSeg(data.vec, max.changes)
sort(c(mbs.fit$t.est, N))
},
"wbs::sbs"={
wbs.fit <- wbs::sbs(data.vec)
split.dt <- data.table(wbs.fit$res)[order(-min.th, scale)]
sort(split.dt[, c(N, cpt)][1:max.segs])
},
"binsegRcpp\nlist"={
binseg.fit <- binsegRcpp::binseg(
"mean_norm", data.vec, max.segs, container.str="list")
sort(binseg.fit$splits$end)
})
plot(seg.result)
#> Warning in ggplot2::scale_y_log10("median line, min/max band"): log-10 transformation introduced infinite values.
#> log-10 transformation introduced infinite values.
#> log-10 transformation introduced infinite values.
The plot method creates a log-log plot of median time and memory vs
data size, for each of the specified R expressions.
The plot above shows some speed differences between binary
segmentation algorithms, but they could be even easier to see for
larger data sizes (exercise for the reader: try modifying the N
and
seconds.limit
arguments). You can also see that memory usage is much
larger for changepoint than for the other packages.
You can use
references_best
to get a tall/long data table that can be plotted to
show both empirical time and memory complexity:
seg.best <- atime::references_best(seg.result)
plot(seg.best)
#> Warning in ggplot2::scale_y_log10(""): log-10 transformation introduced
#> infinite values.
The figure above shows asymptotic references which are best fit for each expression. We do the best fit by adjusting each reference to the largest N, and then ranking each reference by distance to the measurement of the second to largest N.
(seg.pred <- predict(seg.best))
#> atime_prediction object
#> unit expr.name unit.value N
#> <char> <char> <num> <num>
#> 1: seconds changepoint\n::cpt.mean 0.01 521.5553
#> 2: seconds binsegRcpp\nmultiset 0.01 13118.4553
#> 3: seconds fpop::\nmultiBinSeg 0.01 20115.8351
#> 4: seconds wbs::sbs 0.01 18492.6648
#> 5: seconds binsegRcpp\nlist 0.01 3389.8913
plot(seg.pred)
#> Warning in ggplot2::scale_x_log10("N", breaks = meas[,
#> 10^seq(ceiling(min(log10(N))), : log-10 transformation introduced infinite
#> values.
If you have one or more expected time complexity classes that you want
to compare with your empirical measurements, you can use the
fun.list
argument. Note that each function in that list should take
as input the data size N
and output log base 10 of the reference
function, as below:
my.refs <- list(
"N \\log N"=function(N)log10(N) + log10(log(N)),
"N^2"=function(N)2*log10(N),
"N^3"=function(N)3*log10(N))
my.best <- atime::references_best(seg.result, fun.list=my.refs)
plot(my.best)
#> Warning in ggplot2::scale_y_log10(""): log-10 transformation introduced
#> infinite values.
From the plot above you should be able to see the asymptotic time
complexity class of each algorithm. Keep in mind that the plot method
displays only the next largest and next smallest reference lines, from those specified in fun.list
.
seconds.limit
to see the differences more clearly.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.