ChunkedArrays.jl

ChunkedArrays.jl is a package for increasing the performance of arrays generated inside of loops. Some basic benchmarks show chunked arrays being almost 50% faster than naive approaches. One use case for this is using random numbers in a loop. It's well known that for many reasons (including SIMD) that generating 1000 random number generators at once using rand(1000) is faster than generating 1000 random numbers in separate calls of rand(). ChunkedArrays allows you to generate your arrays in larger amounts, but provides a convenient wrapper to hide the details. Also included is the ability to generate the next buffer in parallel, which allows you to utilize your array chunk and have it replaced with a new chunk generated by a different process, maximizing efficiency.

Installation

To install the package, simply use

Pkg.add("ChunkedArrays")
using ChunkedArrays

Note that version v0.0.2 is the last version which targets Julia v0.4. The current master has some changes which only work on v0.5. For an up-to-date version with v0.4 compatibility, check out the v0.4-compat branch.

Using the Package

You can define a ChunkedArray in one of the three forms:

ChunkedArray(chunkfunc::Function,bufferSize::Int=BUFFER_SIZE_DEFAULT,T::Type=Float64;parallel=PARALLEL_DEFAULT)
ChunkedArray(chunkfunc::Function,outputSize::NTuple,bufferSize::Int=BUFFER_SIZE_DEFAULT,T::Type=Float64;parallel=PARALLEL_DEFAULT)
ChunkedArray(chunkfunc,randPrototype::AbstractArray,bufferSize=BUFFER_SIZE_DEFAULT;parallel=PARALLEL_DEFAULT)

Then, to generate the next value in the array, you use the next function like:

next(chunked)

Examples

Let's say for example we wished to generate a bunch of standard normal random numbers in a loop. The naive way to do this is via

j=0.0
for i = 1:loopSize
  j += randn()
end

To use a ChunkedArray which outputs standard normal random numbers, we would use the definition the following:

chunkRand = ChunkedArray(randn)
j=0.0
for i = 1:loopSize
  j += next(chunkRand)
end

This uses the constructor ChunkedArray(chunkfunc::Function,bufferSize::Int=BUFFER_SIZE_DEFAULT,T::Type=Float64,parallel=PARALLEL_DEFAULT) which has a buffer size of 1000 and does not use parallel generation. Note that you do not need to be running multiple processes for parallel generation to work. If we instead wished to generate randn(4,2) each time in the loop, we can specify the dimensions:

chunkRand = ChunkedArray(randn,(4,2))
j=[0 0
   0 0
   0 0
   0 0]
for i = 1:loopSize
  j += next(chunkRand)
end

or simply give it a prototype:

j=[0 0
   0 0
   0 0
   0 0]
chunkRand = ChunkedArray(randn,j)
for i = 1:loopSize
  j += next(chunkRand)
end

and it will generate standard normals of similar(j).

Benchmarks

These benchmarks can be found in the test folder. For small runs (which are required for CI) there is little difference, but as the loop size increases the difference grows.

const loopSize = 1000000
const buffSize = 10000
const numRuns = 400
Test Results For Average Time:
One-by-one:                             0.148530531075
Thousand-by-Thousand:                   0.189417186075
Altogether:                             0.2057703961
Hundred-by-hundred:                     0.191497048
Take at Beginning:                      0.20445405967500002
Pre-made Rands:                         0.16260088565
Chunked Rands Premade:                  0.1032136674
Chunked Rands 10000 buffer:             0.10846818174999999
Chunked Rands Direct:                   0.134752111825
Chunked Rands Max buffer:               0.120411857925
Parallel Chunked Rands 10000 buffer:    0.1276476319