CRAN Task View: High Performance and Parallel Computing
Maintainer: | Dirk Eddelbuettel |
Contact: | Dirk.Eddelbuettel at R-project.org |
Version: | 2009-01-20 |
This CRAN task view contains a list of packages, grouped by topic, that
are useful for high-performance computing (HPC) with R. In this context, we
are defining 'high-performance computing' rather loosely as just about anything
related to pushing R a littler further: using compiled code,
parallel computing (in both explicit and implicit modes), working with
large objects as well as profiling.
Unless otherwise mentioned, all packages presented with hyperlinks
are available from CRAN, the
Comprehensive R Archive Network.
Several of the areas discussed in this Task View are undergoing rapid
change. Please send suggestions for additions and extensions for this task
view to the
task view maintainer
.
Parallel computing: Explicit parallelism
-
Several packages provide the communications layer required for parallel
computing. The first packages in this area was
rpvm
by Li and Rossini which uses the PVM (Parallel
Virtual Machine) standard and libraries.
rpvm
is no
longer actively maintained.
-
In recent years, the
alternative MPI (Message Passing Interface) standard has become the
de facto standard in parallel computing. It is supported in R via
the
Rmpi
by Yu.
Rmpi
is mature yet actively
maintained and offers access to numerous functions from the MPI
API, as well as a number of R-specific extensions.
Rmpi
can be used with the LAM, MPICH / MPICH2, Open MPI, and Deino MPI
implementations.
-
An alternative is provided by the
nws
(NetWorkSpaces)
packages from REvolution Computing. It is the successor to the
earlier LindaSpaces approach to parallel computing, and is
implemented on top of the Twisted networking toolkit for Python.
-
The
snow
(Simple Network of Workstations) package by
Tierney et al can use PVM, MPI, NWS as well as direct networking
sockets. It provides an abstraction layer by hiding the
communications details. The
snowFT
package provides
fault-tolerance extensions to
snow, but is no longer
actively maintained.
-
The
snowfall
package by Knaus provides a more recent
alternative to
snow. It is however not yet at the same level of
maturity, and supports only the LAM implementation of the MPI
standard.
-
The
papply
package by Currie provided a subset of the
Rmpi
functionality, but is no longer actively maintained either.
-
The
biopara
package by Lazar and Schoenfeld offers socket-based parallel
execution with some support for load-balancing and fault-tolerance.
-
The
taskPR
package by Samatova et al builds on top of LAM MPI and offers
parallel execution of tasks.
Parallel computing: Implicit parallelism
-
The pnmath package by Tierney uses the Open MP parallel
processing directives of recent compilers (such gcc 4.2 or later) for implicit
parallelism by replacing a number of internal R functions with
replacements that can make use of multiple cores --- without
any explicit requests from the user. The alternate
pnmath0 package offers the same functionality using
Pthreads for environments in which the newer compilers are not
available. Similar functionality is expected to become integrated
into R 'eventually'.
-
The romp package was presented at the useR! 2008 conference and
offers another interface to Open MP using Fortran; this code is
still pre-alpha.
-
The
fork
package provides R-equivalents to low-level Unix system functions
like fork, signal, wait, kill and exit in order to spawn
sub-processes for parallel execution.
-
The
multicore
package provides a way of running parallel
computations in R on machines with multiple cores or CPUs.
-
The R/parallel package offers a C++-based master-slave dispatch
mechanism for parallel execution.
-
The
RScaLAPACK
provides an interface to the ScaLAPACK
libraries which can replace the standard BLAS libraries
and offer parallel execution of the same BLAS functions.
Parallel computing: Grid computing
-
The
gridR
package by Wegener et al can be used in a grid computing
environment via a web service, via ssh or via Condor or Globus.
-
The multiR package was presented at useR! 2008 but has not been
released. It may offer a snow-style framework on a grid computing platform.
-
The Biocep-R project offers a Java-based framework for local, Grid,
or Cloud computing. It is under active development.
Parallel computing: Random numbers
-
Random-number generators for parallel computing are available via
the
sprng
and
lecuyer
packages.
Parallel computing: Resource managers and batch schedulers
-
Job-scheduling toolkits permit management of
parallel computing resources and tasks. The slurm (Simple Linux
Utility for Resource Management) set of programs (written by a
consortium led by Lawrence Livermore Labs) works well with
MPI.
-
The Condor toolkit from the University of Wisconsin-Madison
has been used with R.
-
The
sfCluster
package can be used
with
snowfall.
-
The
Rsge
package offers an interface to the Sun Grid
Engine batch-queuing system.
Large memory and out-of-memory data
-
The
biglm
package uses incremental computations to
offers
lm()
and
glm()
functionality to
data sets stored outside of R's main memory.
-
The
ff
package offers file-based access to data sets
that are too large to be loaded into memory, along with a number of
higher-level functions.
-
The
bigmemory
packages permits storing large objects such
as matrices in memory and uses external pointer objects to refer to
them. This permits transparent access from R without bumping
against R's internal memory limits. Several R processes on the
same computer can also shared big memory objects.
-
A large number of database packages, and database-alike packages
(such as
sqldf) are also of potential interest but not (yet?)
reviewed here.
Easier interfaces for Compiled code
-
The
inline
eases adding code in C, C++ or Fortran to R. It
takes care of the compilation, linking and loading of embeded code
segments that are stored as R strings.
-
The
Rcpp
package offers a number of C++ clases that makes
transferring R objects to C++ functions (and back) easier.
-
The
rJava
package provides a low-level interface to Java
similar to the
.Call()
interface for C and C++.
Profiling tools
-
The
profr
package can visualize output from
the
Rprof
interface for profiling.
-
The
proftools
package can be used to analyse profiling output.
CRAN packages:
Related links: