bigmemory-package {bigmemory} | R Documentation |
bigmemory implements massive matricies in C++
and supports their basic manipulation and exploration.
Access to and manipulation
of a big.matrix
object is exposed in R by an S4
class whose interface is simlar to an R matrix
.
Package: | bigmemory |
Type: | Package |
Version: | 3.12 |
Date: | 2009-10-24 |
License: | LGPL-3 |
Multi-gigabyte data sets challenge and frustrate R
users even on well-equipped hardware.
C/C++ and Fortran programming can be helpful, but
are cumbersome for interactive data analysis and
lack the flexibility and power of R's rich statistical
programming environment. The package bigmemory
bridges this gap, implementing massive matrices
and supporting their basic
manipulation and exploration. It is ideal for problems
involving the analysis in R of manageable subsets of the data,
or when an analysis is conducted mostly in C++.
The data structures may be allocated to shared memory with
transparent read and write locking, allowing separate
processes on the same computer to share access to a single copy of the
data set. The data structures may also be file-backed, allowing users
to more easily manage and analyze data sets larger than available RAM.
These features of bigmemory
open the door for powerful and
memory-efficient parallel analyses and data mining of massive data sets.
This package is still actively developed, although the 3.X tree has essentially been frozen. The upcoming 4.0 release (Fall 2009) will include some important changes (see below). Please send us an email letting us know you are trying the package, and we'll keep you abreast on updates.
Note that options(bigmemory.typecast.warning)
is available and can
be set to avoid annoying warnings that might occur if, for example you
assign R objects (typically type double) to char, short, or integer
big.matrix
objects.
Earlier versions of bigmemory included a function for k-means clustering. This has been temporarily removed and will be located in a new package, biganalytics (or perhaps bigmemoryanalytics0 in the Fall of 2009. At the same time, biglm.big.matrix and bigglm.big.matrix will be relocated to the same new package and removed from bigmemory itself.
The 3.X and earlier versions support a limited number of columns (due to mutex limitations), roughly 50,000 on a typical Linux system. This restriction will be removed in versions 4.0 and beyond, when the mutex will be removed from bigmemory and made available in a new package, synchronicity.
There were row limitations (due to a bug that has now been fixed) in versions 3.8 and earlier of roughly 1 billion, but this has been fixed in versions 3.82 and later. We apologize for the inconvenience, and appreciate and and all feedback. - Jay and Mike
John W. Emerson and Michael J. Kane
Maintainer: Jay Emerson <john.emerson@yale.edu>
See http://www.stat.yale.edu/~jay/bigmemory.
For example,
big.matrix
, mwhich
, colmean
# Our examples are all trivial in size, rather than burning huge amounts # of memory simply to demonstrate the package functionality. x <- big.matrix(5, 2, type="integer", init=0) colnames(x)=c("alpha", "beta") x x[,] x[,1] <- 1:5 x[,] mean(x) colmean(x) summary(x)