Michael J. Kane
Yale University and Phronesis LLC
A package for creating, storing, accessing, and manipulate dense (and semi-dense)Â matrices that are larger than available RAM.
It's been around since 2008 - I wrote it with Jay Emerson
Part of a suite of packages for processing matrices out-of-core (biganalytics, bigtabulate, bigalgebra, synchronicity)
Currently being maintained by myself andÂ PeteÂ Haverty
> library(bigmemory) > x = big.matrix(3, 3, type='integer', init=123, + backingfile="example.bin", + descriptorfile="example.desc", + dimnames=list(c('a','b','c'), + c('d', 'e', 'f'))) > x[,] d e f a 123 123 123 b 123 123 123 c 123 123 123 > rm(x) > y = attach.big.matrix("example.desc") > y[,] d e f a 123 123 123 b 123 123 123 c 123 123 123
All data movement (disk to RAM to cache) is handled transparently by the operating system.
The binary representation of the matrix is stored directly on disk.
The descriptor file holds meta-information (number of row, number of columns, etc.).
Works with any filesystem supporting mmap (including distributed ones).
Reverse depends: bigalgebra, biganalytics, bigpca, bigrf, bigtabulate Reverse imports: Rdsm Reverse linking to: bigalgebra, biganalytics, bigrf, bigtabulate Reverse suggests: bio3d, matpow, mlDNA, nat.nblast, NMF, PopGenome, rsgcc Reverse enhances: bigmemory.sri
Reverse depends: bigmemoryExtras, ChipXpressData, Biobase and BiocGenerics (through bigmemoryExtras)
Only import data once
It's generally faster than swapping
It's compatible with BLAS and LAPACK libraries
Data structures (the binary representation) could be stored persistentlyÂ and would not need to be explicitly imported
bigmemory's (and ff's) users show that there is a demand for memory mapped objects
We've also showed that theyÂ can be performant
How can they be better integrated?
More importantlyÂ should they be better integrated?Â