ChunkedArrays.jl is a package for increasing the performance of arrays generated
inside of loops. Some basic benchmarks show chunked arrays being almost 50%
faster than naive approaches. One use case for this is using random numbers in
a loop. It's well known that for many reasons (including SIMD) that generating
1000 random number generators at once using rand(1000)
is faster than generating
1000 random numbers in separate calls of rand()
. ChunkedArrays allows you to
generate your arrays in larger amounts, but provides a convenient wrapper to hide the
details. Also included is the ability to generate the next buffer in parallel,
which allows you to utilize your array chunk and have it replaced with a new
chunk generated by a different process, maximizing efficiency.
To install the package, simply use
Pkg.add("ChunkedArrays")
using ChunkedArrays
Note that version v0.0.2 is the last version which targets Julia v0.4. The current master has some changes which only work on v0.5. For an up-to-date version with v0.4 compatibility, check out the v0.4-compat branch.
You can define a ChunkedArray in one of the three forms:
ChunkedArray(chunkfunc::Function,bufferSize::Int=BUFFER_SIZE_DEFAULT,T::Type=Float64;parallel=PARALLEL_DEFAULT)
ChunkedArray(chunkfunc::Function,outputSize::NTuple,bufferSize::Int=BUFFER_SIZE_DEFAULT,T::Type=Float64;parallel=PARALLEL_DEFAULT)
ChunkedArray(chunkfunc,randPrototype::AbstractArray,bufferSize=BUFFER_SIZE_DEFAULT;parallel=PARALLEL_DEFAULT)
Then, to generate the next value in the array, you use the next
function like:
next(chunked)
Let's say for example we wished to generate a bunch of standard normal random numbers in a loop. The naive way to do this is via
j=0.0
for i = 1:loopSize
j += randn()
end
To use a ChunkedArray which outputs standard normal random numbers, we would use the definition the following:
chunkRand = ChunkedArray(randn)
j=0.0
for i = 1:loopSize
j += next(chunkRand)
end
This uses the constructor ChunkedArray(chunkfunc::Function,bufferSize::Int=BUFFER_SIZE_DEFAULT,T::Type=Float64,parallel=PARALLEL_DEFAULT)
which has a buffer size of 1000 and does not use parallel generation. Note that you
do not need to be running multiple processes for parallel generation to work.
If we instead wished to generate randn(4,2)
each time in the loop, we can
specify the dimensions:
chunkRand = ChunkedArray(randn,(4,2))
j=[0 0
0 0
0 0
0 0]
for i = 1:loopSize
j += next(chunkRand)
end
or simply give it a prototype:
j=[0 0
0 0
0 0
0 0]
chunkRand = ChunkedArray(randn,j)
for i = 1:loopSize
j += next(chunkRand)
end
and it will generate standard normals of similar(j)
.
These benchmarks can be found in the test folder. For small runs (which are required for CI) there is little difference, but as the loop size increases the difference grows.
const loopSize = 1000000
const buffSize = 10000
const numRuns = 400
Test Results For Average Time:
One-by-one: 0.148530531075
Thousand-by-Thousand: 0.189417186075
Altogether: 0.2057703961
Hundred-by-hundred: 0.191497048
Take at Beginning: 0.20445405967500002
Pre-made Rands: 0.16260088565
Chunked Rands Premade: 0.1032136674
Chunked Rands 10000 buffer: 0.10846818174999999
Chunked Rands Direct: 0.134752111825
Chunked Rands Max buffer: 0.120411857925
Parallel Chunked Rands 10000 buffer: 0.1276476319