ManagedLoops defines an API to separate the task of writing loops from the task of defining how to
execute the loops (e.g. using SIMD, multiple threads, or on a GPU). In addition it provides convenience macros
@unroll to unroll loops whose length is known at parse time and @vec to mark loops as suitable for SIMD vectorization.
The API is based on the abstract type LoopManager and its descendents, and on the function offload.
Manager types deriving from LoopManager offer different iteration strategies, see package LoopManagers.
Function offload may be called directly (lower-level API), or may hide behind the @loops macro (higher-level API).
ManagedLoops is very lightweight and depends only on MacroTools. Thus, making it a dependency of your "provider" modules, where number-crunching routines are defined, is very cheap. LoopManagers is a more heavyweight package, but only "consumer" modules, those which create loop managers and pass them to functions, need to depend on it.
Furthermore the high-level API is designed to have a small entry cost (loops must be placed in loop-only functions) and a zero exit cost:
- If do not want to depend on
LoopManagers, passnothingas the first argument of@loopsfunctions and they will work "normally", as if@loopsand@vecwere not there. - If you decide to drop
ManagedLoopsaltogether, simply remove@loopsand@vecand your code just works.
using LoopManagers: @loops, @vec
@loops function loop1!(_, x, y) # do not omit the unused first argument !
let irange = eachindex(x,y)
@vec for i in irange # signals that this loop supports vectorization
# do some work at index i
# when indexing arrays, `i` should always be innermost (first) due to `@vec`
end
end
end
loop!(x,y) = loop1!(nothing, x, y) # plain loop, no manager needed
loop1!(mgr::LoopManager, x, y) # loops are "managed" by `mgr`, provided for instance by `LoopManagers` @loops function loop2!(_, x, y)
let (irange, jrange) = axes(x) # these ranges are "managed"
# we can have an outer loop here, whose range will not be "managed"
for j in jrange
# do some computation shared by all indices i
@vec for i in irange
# do some work at index i, j
# we can have a non-managed inner loop here
end
end
end
end
loop2!(nothing, x, y)
loop2!(mgr::LoopManager, x, y)"supports vectorization" means no conditional branching / flow control inside the loop. Ternary expressions a ? b : c and if-then-else expressions may be used if prepended with the @vec macro, e.g.:
@vec for i in irange
x[i] = @vec if y[i]>0 ; log(y[i]) ; else zero(y[i]); end
x[i] = @vec y[i]>0 ? log(y[i]) : zero(y[i])
endType annotations are not supported with @loops. This limitation could be relaxed with some work.
The @loops macro expands to something similar to the following.
function loop1!(irange, x, y)
@vec for i in irange
# do some work at index i
end
end
loop1!(mgr, x, y) = offload(loop1!, mgr, eachindex(x,y), x, y) function loop2!((irange, jrange), x, y)
for j in jrange
# do some computation shared by all indices i
@vec for i irange
# do some work at index i
end
end
end
loop2!(mgr, x, y) = offload(loop2!, mgr, axes(x), x, y)Broadcast expressions such as @. x = sin(y) are semantically equivalent to loops. To manage the implied loop with mgr::LoopManager, use:
@. mgr[x] = sin(y)or, if x is a function argument, pass mgr[x] instead of x. Since this works by specializing functions in Base Julia,
this can be used with function defined in modules that are not using LoopManagers at all.
# function definition, possibly in a module that knows nothing about LoopManagers
f!(x,y) = @. x = sin(y)
# elsewhere call f(x,y) as usual
f!(x,y)
# we can also let `mgr::LoopManager` control the loops
f!(mgr[x], y)@unroll (x^2 for x in (1,2,3)) expands to (1^2, 2^2, 3^2).
@unroll for x in 1:3
myfun(x)
endexpands to :
myfun(1)
myfun(2)
myfun(3)Only tuples and ranges with elements or bounds known at parse time are supported. To use this macro with type parameters, one may use generated functions.
@generated function fun(x::MyType{N}) where N
quote
@vec for i in 1:$N
otherfun(i, x)
end
end
endManagedLoops also defines parallel, barrier, master, share to support
OpenMP-like execution with long-lived threads. This is currently experimental and not more performant than short-lived threads launched at each outer loop.
@vecfor ternary operator.@vec a ? b : cnow expands tochoose(a, ()->b, ()->c)
@vecforif-then-elseexpressions.@vec if a ; b ; else c ; endnow expands tochoose(a, ()->b, ()->c).choose(a, B, C)evaluates onlyB()(resp.C()) whenais alltrue(resp. allfalse). Otherwise both are evaluated and blended.
- support for "managed broadcasting": if the l.h.s of a broadcast expression
@. lhs = rhsis of the formlhs = mgr[array]then the broadcast loop is managed bymgr. Limited to 4D arrays for the moment.