Minimal Set of binnmu Packages

Combining the R and Debian package systems

Dirk Eddelbuettel

First version 2017-Jul-16; this version 2017-Aug-05


Step 0: Problem Definition

Upstream Change

R 3.4.0, released in April, included the following paragraph in its NEWS file:

  • Packages which register native routines for .C or .Fortran need to be re-installed for this version (unless installed with R-devel SVN revision r72375 or later).

This transition has no fallback behavior (as is more common with R changes) and requires a rebuild. Packages build under older R version still load and function partially, but will be unable to access any native (i.e., compiled) routines.

Impact

For the Debian packages, this means that we need to consider the set of packages which

  • match r-cran-*, r-bioc-* and alike
  • contain compiled code (as R-only packages have no native routines)
  • use at least one .C() or .Fortran() (but not .Call()) call
  • use (the hitherto optional) routine registration (so that the change in behaviour is noticible)
  • have not yet been recompiled with R 3.4.0 or R 3.4.1

This note computes this set and provides the input for a wanna-build request.

This version is updated version which reflects the fourth point above which was pointed out to me by Kurt Hornik after I shared the initial version with him. The point he raised (“does it use R_registerRoutines ?”) is important and further reduces the effective set.

Step 1: Reverse Dependencies of R

Fresh Debian unstable session

For this we drop into a clean Docker container running Debian unstable. Later, we will need the current sources of the RcppAPT package so we start from a local git directory:

Update Debian

Inside the Docker container, we update the package information and install what is needed to build RcppAPT for R.
This includes Rcpp and libapt-pkg-dev. We also install the data.table package used for aggregating the (R and Debian) package data computed below.

This step takes a short moment, with the exact time dependent on the network connection and other factors.

Launch R

All Candidates

Inside the same Docker session, we now launch R and run (almost all of) the remainder from R.

We use RcppAPT to compute the reverse depends of the main R package providing the R engine: r-base-core. Among those (currently) 516 packages are both other packages from the upstream source (r-base*, r-doc*) which we exclude first as well as other, non-R-package dependencies (such as rpy2) which we also exclude.

This leaves 489 candidate packages out of the initial 514. The version field tells which r-base-core version was used to build the package—information we need per the setup described above.

Next we need to filter out two versions with unsortable (i.e.,non-semantic) version numbers, and apply a logical filter depending on whether the package was built with R version 3.3.3 or earlier, indicating a possibe required rebuild.

To cover some corner case, we derive a skip field:

Compiled Packages

Next, we find the actual dependencies of each of these packages by constructing a large regular expression which we feed into RcppAPT::getDepends()

Next we subset to those have libc6 as a Depends, meaning they are compiled packages. This excludes all the R packages having only R code.

We are now getting closer. We set keys on the data.table objects, and then do an inner join:

We have 242 potential rebuilds, down from 514 reverse depends at the outset.

Version check

Next, we can concentrate on those having been built with the older versions requiring a rebuild:

Now we are down to 167 packages.

Among these are 17 BioConductor packages. This is a superset as we do not know which of these use only .Call() meaning that no rebuild would be required.

There are also three which are neither BioC nor CRAN.

We have 147 possible NMUs based off CRAN.

Next, we mix this with information from CRAN.

This is our set of 147 candidate packages with their CRAN name, Debian name and upstream version.

We save this file to be used on another machine.

Step 2: Grep

On another machine with access to all CRAN package sources (which I happen to have access to), we use the list of 147 candidate packages and run a recursive grep for each. We store the output from two egrep runs, called via system(), directly in the same data structure. The first checks for .C() or .Fortran() calls in the R scripts; the second checks for R_registerRoutines() in the compiled C code (with thanks again to Kurt Hornik for the suggestion)

Step 3: Finalize

We read the data back in and subset on those for which the recursive grep found actual uses of .C() or .Fortran(). The list contains 72 packages.

Similarly, the 17 BioC and 3 other packages can be tested via recursive greps (not shown) in a directory filled with apt-get source downloads:

This leads to a further four packages:

These 42, along with the 4 (from the initally 17 BioC and 3 ‘other’) packages are our target set.

We need to retrieve the version number in Debian unstable of these packages by once agaim relying of a function from RcppAPT

With this, we can write out the content of the NMU request:

> 
> for (i in 1:nrow(res))
+     cat("nmu", paste(res[i,], collapse="_"), ". ANY . -m 'Rebuild against R 3.4.*, see #861333'\n")
nmu r-bioc-edger_3.14.0+dfsg-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-coin_1.1-3-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-mnp_2.6-4-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-fields_8.10-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-desolve_1.14-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-deldir_0.1-12-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-rniftilib_0.0-35.r79-2 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-data.table_1.10.0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-qtl_1.40-8-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-bioc-preprocesscore_1.36.0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-contfrac_1.1-10-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-glmnet_2.0-5-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-sp_1:1.2-4-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-brglm_0.5-9-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-bioc-affy_1.52.0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-ncdf4_1.15-1+b2 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-treescape_1.10.18-6 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-mapproj_1.2-4-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-blockmodeling_0.1.8-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-hdf5_1.6.10-4+b1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-ade4_1.7-5-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-vgam_1.0-3-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-mixtools_1.0.4-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-phylobase_0.8.2-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-spam_1.4-0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-medadherence_1.03-2 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-surveillance_1.13.0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-randomfieldsutils_0.3.15-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-rcurl_1.95-4.8-2 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-mcmcpack_1.3-8-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-spatstat_1.48-0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-vegan_2.4-2-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-bayesm_3.0-2-2 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-expm_0.999-0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-phangorn_2.1.1-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-maptools_1:0.8-41+dfsg-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-caret_6.0-73+dfsg1-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-goftest_1.0-3-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-igraph_1.0.1-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-maps_3.1.1-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-eco_3.1-7-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-randomfields_3.1.36-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-bioc-genefilter_1.56.0-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-mcmc_0.9-4-2 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-spdep_0.6-9-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
nmu r-cran-gam_1.14-1 . ANY . -m 'Rebuild against R 3.4.*, see #861333'
>

Summary

The final set of 46 NMUs is the minimal change required, and reasonable relative to the 516 reverse dependencies of R itself. We are able to narrow the set of packages requiring a rebuild down by a combining data from the R package system, the Debian package system and (some) package sources we were able to access on a CRAN-related server.

Acknowledgements

Thanks for Kurt Hornik for pointing out the additional check for R_registerRoutine in the in C code, leading to a further reduction from 90+ packages to 46.

History

The first published version (Julyu 2017) did not check for R_registerRoutines. The second version (August 2017) does, leading to 46 suggested NMUs.

See Also

The source file is on GitHub as is the revision history. The corresponding Debian bug report is based on this analysis.