Implementation of flexible (and thread-safe) temporary arrays and array pools for situations where a little bit of semi-manual memory management improves performance. Used quite heavily throughout the ACE codebase, can lead to significant performance gains in some cases. Unfortunately, at this point, those gains are not always as systematic as one would hope.
The following Table shows a basic benchmark for evaluating a Chebyshev basis, for multiple inputs at the same time. This is a typical use-case for which this package is intended: the cost of arithmetic is on the same order of magnitude as the cost of allocation.
| nB / nX | 10 / 16 | 10 / 32 | 30 / 16 | 30 / 32 |
|---|---|---|---|---|
| Array | 147 / 259 | 163 / 566 | 377 / 876 | 412 / 1286 |
| pre-allocated | 89 / 97 | 65 / 66 | 253 / 263 | 213 / 214 |
| FlexArray | 95 / 100 | 63 / 63 | 264 / 273 | 207 / 213 |
| ArrayPool(FlexArray) | 91 / 93 | 68 / 70 | 264 / 271 | 216 / 223 |
| FlexArrayCache | 104 / 106 | 88 / 94 | 280 / 287 | 270 / 283 |
| ArrayPool(FlexArrayCache) | 111 / 112 | 93 / 98 | 285 / 292 | 275 / 287 |
| TSafe(FlexArray) | 87 / 89 | 67 / 68 | 262 / 269 | 212 / 219 |
| TSafe(ArrayPool(FlexArray)) | 96 / 97 | 74 / 77 | 262 / 271 | 224 / 232 |
ObjectPools.jl exports FlexArray which can be used to keep memory for an array and adapt its type and size as needed. In particular the eltype and size can change at runtime without performance loss. They are constructed as follows:
tmp = FlexArray()This stores a resizable array that can be obtained via
A = acquire!(tmp, (N,), Float64) # N = length of array
A = acquire!(tmp, (10, 10, 10), Bool)The object tmp actually stores a Vector{UInt8} which is converted into a PtrArray and then re-interpreted and reshaped at essentially zero-cost.
ObjectPools.jl exports FlexArrayCache, which provides stacks of arrays to reuse without garbage collection. This can be thought of as a very limited and manual re-implementation of garbage collection. They are used as follows:
cache = FlexArrayCache()
A = acquire!(cache, (N, ), Float64)
# do something with A
release!(A)The acquire! function obtains an array of size (N,) from the stack (in the current thread). After the array is no longer needed, it can be returned to the stack via release!. It is ok if it is never released. Once there is no longer a reference to A, it will just be garbage collected.
One can also use the unwrap function to get the PtrArray of a FlexCachedArray or adjoint/transpose of a FlexCachedArray:
cache = FlexArrayCache()
A = acquire!(cache, (M, N), Float64)
Aptr = unwrap(A) # PtrArray of size (M, N)
At = A' # Adjoint of A in FlexCachedArray
Atptr = unwrap(At) # PtrArray of At with size (N, M)Warning: Use of parent to obtain the PtrArray of a FlexCachedArray is deprecated. Always use unwrap instead.
A pool is a dictionary of temporary arrays or array caches indexed by symbols. It enables the management of many temporary arrays (or caches) within a single field. For example,
pool = ArrayPool(FlexArray)
A = acquire!(pool, :A, (10, 10), Float64)
B = acquire!(pool, :B, (10, 100), ComplexF64)One can similarly create a ArrayPool(FlexArrayCache)
In multi-threaded code it can become important that each thread uses its own temporary work array. This can be achieved by wrapping a FlexArray or FlexArrayCache or an ArrayPool into a TSafe, e.g.
tmp = TSafe(ArrayPool(FlexArrayCache))We can now access this as follows:
@threads for n = 1:N
A = acquire!(tmp, :A, (10,10), SVector{3, Float64})
# do something with A
release!(A)
end Here, tmp actually stores a separate ArrayPool for each thread. Note that due to the dynamic scheduler it is possible that an array A is aquired in thread i and released in thread j in which case it is released back to a different stack.
Note that due to the dynamics scheduler, TSafe(FlexArray) is NOT entirely thread-safe. These arrays are only thread safe when using the static scheduler, e.g.
tmp = TSafe(FlexCache)
@threads :static for i = 1:10
A = acquire!(tmp, (20, 30, 5), ComplexF32)
# do something with A
endThe following example contains code that is not intended to run, but only indicative:
The simplest use-case of ObjectPools.jl is to have flexible temporary variables and output arrays that can be reused. For example, suppose we want to evaluate the spherical harmonics. This could be implemented as follows
struct Ylms
L::Int
tmpP::FlexArray
outY::FlexArrayCache
end
Ylms(L::Integer) = Ylms(L, FlexArray(), FlexCacheArray())
function (ylms::Ylms)(r::SVector{3, T}) where {T <: Real}
L = ylms.L
P = acquire!(ylms.tmpP, (lenP(L),), T)
eval_alp!(P, r) # not shown
Y = acquire!(ylms.outY, (lenY(L),), Complex{T})
eval_ylm!(Y, P, r) # not shown
return Y
end
ylms = Ylms(L)
for i = 1:niter
r = @SVector randn(3) # generate an input somehow
Y = ylms(r) # evaluate the Ylms
# .... do something with Y
release!(Y) # return it to the pool
end The first advantage of the above implementation is that the input type parameter T need not be known at any point other than runtime. E.g., we can now use ForwardDiff to differentiate the basis and the FlexArrays will just become arrays of Dual numbers.
The second advantage is that the output array gets released back to the array cache and is not newly allocated at each step. Of course one could instead pre-allocate and write an in-place version of the evaluation code. But this requires type management outside of the Ylms implementation, which can get tedious. The FlexArrayCache is a simple mechanism to keep all type management localized to the actual implementation.
If we wanted to make the for i = 1:niter loop multi-threaded then we could rewrite this code as follows:
struct Ylms
L::Int
tmpP::TSafe{FlexArray}
outY::TSafe{FlexArrayCache}
end
ylms = Ylms(L)
@threads :static for i = 1:niter
r = @SVector randn(3) # generate an input somehow
Y = ylms(r) # evaluate the Ylms
# .... do something with Y
release!(Y) # return it to the pool
end We use the static scheduler because TSafe{FlexArray} is not safe to use with the dynamic scheduler.
To use the dynamic scheduler we need to swap it for a TSafe{FlexArrayCache}:
struct Ylms
L::Int
tmpP::TSafe{FlexArrayCache}
outY::TSafe{FlexArrayCache}
end
ylms = Ylms(L)
@threads for i = 1:niter
r = @SVector randn(3) # generate an input somehow
Y = ylms(r) # evaluate the Ylms
# .... do something with Y
release!(Y) # return it to the pool
end