In R (and its predecessors S and S+), we have always known and often used round(x, digits)
to round a numeric (or complex) vector of numbers x
to digits
decimals after the (decimal) point. However, not many R users (nor scientists for that matter) have been aware of the fact that such rounding is not trivial because our computers use binary (base 2) arithmetic and we are rounding to decimal digits, aka decimals, i.e., in base 10.
On the topic of floating point computation, we have had the most frequently asked question (FAQ) about R, the infamous R FAQ 7.31, and in 2017, Romain François even created an R package seven31 (not on CRAN) to help useRs exploring what we say in the FAQ.
Recently, there has been an official R bug report (on R’s bugzilla), PR#17668
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17668 with summary title Artificial numerical error in round()
causing round to even to fail.
Adam Wheeler started with a shorter version (just using digits = 1,2,..,8) of the following examples and his own remarks about correctness:
## [1] 56
## [1] 55.5
## [1] 55.55
## [1] 55.556
## [1] 55.5555
## [1] 55.55555
## [1] 55.555555
## [1] 55.5555556
## [1] 55.55555555
## [1] 55.555555555
## [1] 55.5555555556
## [1] 55.55555555556
Whereas the exact result of the R code above currently depends on your version of R, our round
package’s roundX(x, dig, version = "r1.C")
provides these, using the same C source code as R 3.6.2 (note I have adopted the convention to use “r
## Loading required package: round
## [1] 56
## [1] 55.5
## [1] 55.55
## [1] 55.556
## [1] 55.5555
## [1] 55.55555
## [1] 55.555555
## [1] 55.5555556
## [1] 55.55555555
## [1] 55.555555555
## [1] 55.5555555556
## [1] 55.55555555556
He used his own C code, to see what happens in R’s C code for round()
and proposed to simplify the C code, not doing offset calculations which substract the integer part intx
, round and re-add intx
. I’ve committed a version of his proposal to R-devel (svn r77609
, on 2019-12-21) but found that the simplification improved the above examples in that it always rounded to even, but it clearly broke cases that were working correctly in R 3.6.x. That version is available with our roundX(*, version = "r0.C")
(Version 0
as it is even simpler than version 1
).
One CRAN package had relied on round(x, digits = .Machine$integer.max)
to return integers unchanged, but
## [1] -Inf Inf
and there were less extreme cases of relatively large digits
had stopped working with “r0”. Here, showing two roundX()
versions via simple wrapper roundAll()
:
i <- c(-1,1)* 2^(33:16)
stopifnot(i == floor(i)) # are integer valued
roundAll(i, digits = 300, versions = c("r0.C", "r1.C"))
## r0.C r1.C
## [1,] -Inf -8589934592
## [2,] Inf 4294967296
## [3,] -Inf -2147483648
## [4,] Inf 1073741824
## [5,] -Inf -536870912
## [6,] Inf 268435456
## [7,] -134217728 -134217728
## [8,] 67108864 67108864
## [9,] -33554432 -33554432
## [10,] 16777216 16777216
## [11,] -8388608 -8388608
## [12,] 4194304 4194304
## [13,] -2097152 -2097152
## [14,] 1048576 1048576
## [15,] -524288 -524288
## [16,] 262144 262144
## [17,] -131072 -131072
## [18,] 65536 65536
Looking at these, I also found that internally, R’s round had done digits <- pmin(308, digits)
, i.e., truncated digits larger than 308 which is clearly not good enough for very small numbers (in absolute value),
e <- 5.555555555555555555555e-308
d <- 312:305 ; names(d) <- paste0("d=", d)
roundAll(e, d, versions = c("r0.C", "r1.C", "r2.C"))
## r0.C r1.C r2.C
## d=312 6e-308 6e-308 5.5556e-308
## d=311 6e-308 6e-308 5.5560e-308
## d=310 6e-308 6e-308 5.5600e-308
## d=309 6e-308 6e-308 5.6000e-308
## d=308 6e-308 6e-308 6.0000e-308
## d=307 1e-307 1e-307 1.0000e-307
## d=306 0e+00 0e+00 0.0000e+00
## d=305 0e+00 0e+00 0.0000e+00
As I was embarrassed to have blundered, I’ve worked and committed what now corresponds to roundX(*, version = "r2.C")
to R-devel (svn r77618
, on 2019-12-24, 16:11).
Also, Jeroen Ooms, maintainer of CRAN package jsonlite
, contacted the CRAN team and me about the change in R-devel which broke one regression test of that package, and on Dec 27, he noticed that R 3.6.2’s version of round()
was compatible with the (C library dependent) versions of sprintf()
and also with R’s format()
whereas the R-devel versions where not, for his example:
## [1] "9.1867"
## [1] "9.1867"
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 9.1867 9.1866 9.1867 9.1867 9.1866 9.1866 9.1866 9.1866 9.1866
## [1] 9.1866500000000002
which (typically) shows 9.1866500000000002
, i.e., a number closer to 9.1867 than to 9.1866, and so really should be rounded up, not down. However, that is partly wrong: Whereas it is true that it’s closer to 9.1867 than to 9.1866, one must be aware that these two decimal numbers are neither exactly representable in binary double precision, and a further careful look shows actually the double precision version of these rounded numbers do have the exact same distance to x
and that the main principle round to nearest here gives a tie:
## [,1]
## [1,] 9.18660000000000032
## [2,] 9.18665000000000020
## [3,] 9.18670000000000009
options(digits=7) # revert to default
(dx <- c(9.1866, 9.1867) - x) # [1] -4.99999999998835e-05 4.99999999998835e-05
## [1] -5e-05 5e-05
## [1] 0
and because of the tie, the round to even rule must apply which means rounding down to 9.1866, and so both libc’s printf and hence R’s sprintf()
are as wrong as R 3.6.x has been, and indeed all our roundX()
versions apart from "sprintf"
and the ’"r1*"’ (previous R) ones, do round down:
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 9.1867 9.1866 9.1867 9.1867 9.1866 9.1866 9.1866 9.1866 9.1866
Finally, I think we’ve seen the light and on one hand recalled what we have known for long (but most R users will not be aware of at all)
- Almost all finite decimal fractions are not (exactly) representable as binary double precision numbers,
and consequently,
- round to nearest applies much more often directly rather than via the tie breaking rule round to even even for the case where the decimal fraction ends in a
5
.
and hence, a “correct” implementation must really measure, not guess which of the two possible decimals is closer to x
. This lead to our R level algorithm round_r3()
which is the workhorse used by roundX(x,d, version = "r3")
:
## function (x, d, info = FALSE, check = TRUE)
## {
## if (check)
## stopifnot(!anyNA(d), length(d) == 1L)
## max10e <- 308L
## if (d > +max10e + 15L)
## return(x)
## else if (d < -max10e)
## return(0 * x)
## p10 <- 10^d
## x10 <- as.vector(p10 * x)
## xd <- (i10 <- floor(x10))/p10
## xu <- ceiling(x10)/p10
## D <- (xu - x) - (x - xd)
## e <- i10%%2
## r <- x
## i <- (D < 0) | (e & (D == 0))
## r[i] <- xu[i]
## r[!i] <- xd[!i]
## if (info)
## list(r = r, D = D, e = e)
## else r
## }
## <bytecode: 0xa832e58>
## <environment: namespace:round>
and it’s two C level versions "r3.C"
(using long double
) and "r3d.C"
(“d” for “double”, as it uses double precision only).
My current proposal is to use (the equivalent of) "r3d.C"
for R 4.0.0, i.e. from April 2020, as not using long double
and being very close to the R level implementation "r3"
(i.e., round_r3()
) renders it potentially less platform dependent and easier to explain and document.
Lastly, note that the original set of examples is then treated differently from all previous proposals:
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 56 56 56 56 56 56 56 56 56
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.5 55.6 55.5 55.5 55.6 55.6 55.5 55.5 55.5
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.55 55.56 55.55 55.55 55.56 55.56 55.56 55.56 55.56
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.556 55.556 55.556 55.556 55.556 55.556 55.556 55.556 55.556
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.5555 55.5556 55.5555 55.5555 55.5556 55.5556 55.5555 55.5555 55.5555
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.55555 55.55556 55.55555 55.55555 55.55556 55.55556 55.55556 55.55556 55.55556
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.555555 55.555556 55.555555 55.555555 55.555556 55.555556 55.555555 55.555555 55.555555
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.5555556 55.5555556 55.5555556 55.5555556 55.5555556 55.5555556 55.5555556 55.5555556 55.5555556
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.55555555 55.55555556 55.55555555 55.55555555 55.55555556 55.55555556 55.55555555 55.55555555 55.55555555
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.555555555 55.555555556 55.555555555 55.555555555 55.555555556 55.555555556 55.555555556 55.555555556 55.555555556
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556
## sprintf r0.C r1.C r1a.C r2.C r2a.C r3.C r3d.C r3
## 55.55555555556 55.55555555556 55.55555555556 55.55555555556 55.55555555556 55.55555555556 55.55555555556 55.55555555556 55.55555555556
## R version 3.6.2 Patched (2019-12-14 r77587)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora 30 (Thirty)
##
## Matrix products: default
## BLAS: /u/maechler/R/D/r-patched/F30-64-inst/lib/libRblas.so
## LAPACK: /u/maechler/R/D/r-patched/F30-64-inst/lib/libRlapack.so
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] round_0.12-1
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.2 magrittr_1.5 htmltools_0.4.0 tools_3.6.2
## [5] yaml_2.2.0 Rcpp_1.0.3 stringi_1.4.4 rmarkdown_2.0
## [9] knitr_1.26 stringr_1.4.0 digest_0.6.23 xfun_0.12
## [13] rlang_0.4.2 evaluate_0.14