Fast multidimensional algorithms for the Julia language.
This package provides macros that currently appear to be the most performant way to implement numerous multidimensional algorithms in Julia.
If you're using at least a pre-release version of Julia 0.3, I recommend using the version
in base, which you can access with using Base.Cartesian
.
I also recommend the base documentation.
At this point, the best purpose for this package is to provide a base-compatible
implementation of Cartesian for Julia 0.2. This was implemented in the
the 0.2 release of this package. Unfortunately, this changed several features,
including the naming convention for variables
(from i1
to i_1
). If you directly use these names (most likely, you do not), this will break your code.
Sorry about that. You can either pin the package at the 0.1.5 release, or make changes in your code.
The following documentation applies only for this package's 0.1 series. Use the Julia documentation if you are using a more recent version of this package.
In practice, Cartesian
effectively introduces a separate "dialect" of
Julia. There is reason to hope that the main language will eventually
support much of this functionality, and if/when that happens some or all of this
should become obsolete. In the meantime, this package appears to be the most
powerful way to write efficient multidimensional algorithms.
Install with the package manager, Pkg.add("Cartesian")
.
Most macros in this package work like this:
@nloops 3 i A begin
s += @nref 3 A i
end
which generates the following code:
for i3 = 1:size(A,3)
for i2 = 1:size(A,2)
for i1 = 1:size(A,1)
s += A[i1,i2,i3]
end
end
end
The (basic) syntax of @nloops
is as follows:
- The first argument must be an integer (not a variable) specifying the number of loops.
- The second argument is the symbol-prefix used for the iterator variable. Here
we used
i
, and variablesi1, i2, i3
were generated. - The third argument specifies the range for each iterator variable. If you use
a variable (symbol) here, it's taken as
1:size(A,dim)
. More flexibly, you can use the anonymous-function expression syntax described below. - The last argument is the body of the loop. Here, that's what appears between
the
begin...end
.
There are some additional features described below.
@nref
follows a similar pattern, generating A[i1,i2,i3]
from @nref 3 A i
.
The general practice is to read from left to right, which is why
@nloops
is @nloops 3 i A expr
(as in for i2 = 1:size(A,2)
, where i2
is
to the left and the range is to the right) whereas @nref
is @nref 3 A i
(as
in A[i1,i2,i3]
, where the array comes first).
If you're developing code with Cartesian, you may find that debugging is made easier when you can see the generated code. This is possible via the (unexported) underscore-function variants:
julia> Cartesian._nref(3, :A, :i)
:($(Expr(:escape, :(A[i1,i2,i3]))))
and similarly for Cartesian._nloops
.
There are two additional important general points described below.
The first argument to both of these macros is the dimensionality, which must be
an integer. When you're writing a function that you intend to work in multiple
dimensions, this may not be something you want to hard-code. Fortunately, it's
straightforward to use an @eval
construct, like this:
for N = 1:4
@eval begin
function laplacian{T}(A::Array{T,$N})
B = similar(A)
@nloops $N i A begin
...
end
end
end
end
This would generate versions of laplacian
for dimensions 1 through 4. While
it's somewhat more awkward, you can generate truly arbitrary-dimension functions
using a wrapper that keeps track of whether it has already compiled a version of
the function for different dimensionalities and data types:
let _mysum_defined = Dict{Any, Bool}()
global mysum
function mysum{T,N}(A::StridedArray{T,N})
def = get(_mysum_defined, typeof(A), false)
if !def
ex = quote
function _mysum{T}(A::StridedArray{T,$N})
s = zero(T)
@nloops $N i A begin
s += @nref $N A i
end
s
end
end
eval(current_module(), ex)
_mysum_defined[typeof(A)] = true
end
_mysum(A)
end
end
In addition to being longer than the first version, there's a (small) performance price for checking the dictionary.
Perhaps the single most powerful feature in Cartesian
is the ability to supply
anonymous-function expressions to many macros. Let's consider implementing the
laplacian
function mentioned above. The (discrete) laplacian of a two
dimensional array would be calculated as
lap[i,j] = A[i+1,j] + A[i-1,j] + A[i,j+1] + A[i,j-1] - 4A[i,j]
One obvious issue with this formula is how to handle the edges, where A[i-1,j]
might not exist. As a first illustration of anonymous-function expressions,
for now let's take the easy way out and avoid dealing with them (later you'll
see how you can handle them properly). In 2d we might write
such code as
for i2 = 2:size(A,2)-1
for i1 = 2:size(A,1)-1
lap[i1,i2] = ...
end
end
where one should note that the range 2:size(A,2)-1
omits the first and last
index.
In Cartesian
this can be written in the following way:
@nloops 2 i d->(2:size(A,d)-1) begin
(@nref 2 lap i) = ...
end
Note here that the range argument is being supplied as an anonymous function. A
key point is that this anonymous function is evaluated when the macro runs.
(Whatever symbol appears as the variable of the anonymous function, here d
, is
looped over the dimensionality.) Effectively, this expression gets inlined,
and hence generates exactly the code above with no extra runtime overhead.
There is an important bit of extra syntax associated with these expressions: the
expression i_d
, for d=3
, is translated into i3
. Suppose we have two sets
of variables, a "main" group of indices i1
, i2
, and i3
, and an "offset" group
of indices j1
, j2
, and j3
. Then the expression
@nref 3 A d->(i_d+j_d)
gets translated into
A[i1+j1, i2+j2, i3+j3]
The _
notation mimics the subscript notation of LaTeX; also like LaTeX, you
can use curly-braces to group sub-expressions. For example,
d->p_{d-1}=p_d-1
generates p2 = p3 - 1
.
With this, we have enough machinery to implement a simple multidimensional
function imfilter
, which computes the correlation (similar to a convolution)
between an array A
and a filtering kernel kern
. We're going to require that
kernel
has odd-valued sizes along each dimension, say of size 2*w[d]+1
, and
suppose that there is a function padarray
that pads A
with width w[d]
along each edge in dimension d
, using whatever boundary conditions the user
desires.
A complete implementation of imfilter
is:
for N = 1:5
@eval begin
function imfilter{T}(A::Array{T,$N}, kern::Array{T,$N}, boundaryargs...)
w = [div(size(kern, d), 2) for d = 1:$N]
for d = 1:$N
if size(kern, d) != 2*w[d]+1
error("kernel must have odd size in each dimension")
end
end
Apad = padarray(A, w, boundaryargs...)
B = similar(A)
@nloops $N i A begin
# Compute the filtered value
tmp = zero(T)
@nloops $N j kern begin
tmp += (@nref $N Apad d->(i_d+j_d-1))*(@nref $N kern j)
end
# Store the result
(@nref $N B i) = tmp # note the ()
end
B
end
end
end
The line
tmp += (@nref $N Apad d->(i_d+j_d-1))*(@nref $N kern j)
gets translated into
tmp += Apad[i1+j1-1, i2+j2-1, ...] * kern[j1, j2, ...]
We also note that the assignment to B
requires parenthesis around the @nref
,
because otherwise the expression i = tmp
would be passed as the final argument
of the @nref
macro.
We could implement the laplacian with imfilter
, but that would be quite
wasteful: we don't need the "corners" of the 3x3x... grid, just its edges and
center. Consequently, we can write a considerably faster algorithm, where the
advantage over imfilter
would grow rapidly with dimensionality. Implementing
this algorithm will further illustrate the flexibility of anonymous-function
range expressions as well as another key macro, @nexprs
.
In two dimensions, we'll generate the following code, which uses "replicating boundary conditions" to handle the edges gracefully:
function laplacian{T}(A::Array{T,2})
B = similar(A)
for i2 = 1:size(A,2), i1 = 1:size(A,1)
tmp = zero(T)
tmp += i1 < size(A,1) ? A[i1+1,i2] : A[i1,i2]
tmp += i2 < size(A,2) ? A[i1,i2+1] : A[i1,i2]
tmp += i1 > 1 ? A[i1-1,i2] : A[i1,i2]
tmp += i2 > 1 ? A[i1,i2-1] : A[i1,i2]
B[i1,i2] = tmp - 4*A[i1,i2]
end
B
end
To generate those terms with all but one of the indices unaltered, we'll use an anonymous function like this:
d2->(d2 == d1) ? i_d2+1 : i_d2
which shifts by 1 only when d2 == d1
. We'll use the macro @nexprs
(documented below) to generate the N
expressions we need. Here is the complete
implementation for dimensions 1 through 5:
for N = 1:5
@eval begin
function laplacian{T}(A::Array{T,$N})
B = similar(A)
@nloops $N i A begin
tmp = zero(T)
# Do the shift by +1.
@nexprs $N d1->begin
tmp += (i_d1 < size(A,d1)) ? (@nref $N A d2->(d2==d1)?i_d2+1:i_d2) : (@nref $N A i)
end
# Do the shift by -1.
@nexprs $N d1->begin
tmp += (i_d1 > 1) ? (@nref $N A d2->(d2==d1)?i_d2-1:i_d2) : (@nref $N A i)
end
# Subtract the center and store the result
(@nref $N B i) = tmp - 2*$N*(@nref $N A i)
end
B
end
end
end
Cartesian makes it easy to implement reductions and broadcasting, using the "pre" and "post" expression syntax described below. Suppose we want to implement a function that can compute the maximum along user-supplied dimensions of an array:
B = maxoverdims(A, (1,2)) # computes the max of A along dimensions 1&2
but allow the user to choose these dimensions arbitrarily. For two-dimensional arrays, we might hand-write such code in the following way:
function maxoverdims{T}(A::AbstractMatrix{T}, region)
szout = [size(A,1), size(A,2)]
szout[[region...]] = 1 # output has unit-size in chosen dimensions
B = fill(typemin(T), szout...)::Array{T,2} # julia can't infer dimensionality here
szout1 = szout[1]
szout2 = szout[2]
for i2 = 1:size(A, 2)
j2 = szout2 == 1 ? 1 : i2
for i1 = 1:size(A, 1)
j1 = szout1 == 1 ? 1 : i1
@inbounds B[j1,j2] = max(B[j1,j2], A[i1,i2])
end
end
B
end
This code can be generated for arbitrary dimensions in the following way:
for N = 1:4
@eval begin
function maxoverdims{T}(A::AbstractArray{T,$N}, region)
szout = [size(A,d) for d = 1:$N]
szout[[region...]] = 1
B = fill(typemin(T), szout...)::Array{T,$N}
Cartesian.@nextract $N szout szout
Cartesian.@nloops $N i A d->(j_d = szout_d==1 ? 1 : i_d) begin
@inbounds (Cartesian.@nref $N B j) = max((Cartesian.@nref $N B j), (Cartesian.@nref $N A i))
end
B
end
end
end
You might be slightly concerned about the conditional expression inside the inner-most loop, and what influence that might have on performance. Fortunately, in most cases the impact seems to be very modest (in the author's tests, a few percent). The reason is that on any given execution of this function, each one of these branches always resolves the same way. Consequently, the CPU can learn to predict, with 100% accuracy, which branch will be taken. The computation time is therefore dominated by the cache-misses generated by traversing the array.
@nloops N itersym rangeexpr bodyexpr
@nloops N itersym rangeexpr preexpr bodyexpr
@nloops N itersym rangeexpr preexpr postexpr bodyexpr
Generate N
nested loops, using itersym
as the prefix for the iteration
variables. rangeexpr
may be an anonymous-function expression, or a simple
symbol var
in which case the range is 1:size(var,d)
for dimension d
.
Optionally, you can provide "pre" and "post" expressions. These get executed first and last, respectively, in the body of each loop. For example,
@nloops 2 i A d->j_d=min(i_d,5) begin
s += @nref 2 A j
end
would generate
for i2 = 1:size(A, 2)
j2 = min(i2, 5)
for i1 = 1:size(A, 1)
j1 = min(i1, 5)
s += A[j1, j2]
end
end
If you want just a post-expression, supply nothing
for the pre-expression.
Using parenthesis and semicolons, you can supply multi-statement expressions.
``` @nref N A indexexpr ``` Generate expressions like `A[i1,i2,...]`. `indexexpr` can either be an iteration-symbol prefix, or an anonymous-function expression.
``` @nexpr N expr ``` Generate `N` expressions. `expr` should be an anonymous-function expression. See the `laplacian` example above.
``` @nextract N esym isym ``` Given a tuple or vector `I` of length `N`, `@nextract 3 I I` would generate the expression `I1, I2, I3 = I`, thereby extracting each element of `I` into a separate variable.
``` @nall N expr ``` `@nall 3 d->(i_d > 1)` would generate the expression `(i1 > 1 && i2 > 1 && i3 > 1)`. This can be convenient for bounds-checking.
``` P, k = @nlinear N A indexexpr ``` Given an `Array` or `SubArray` `A`, `P, k = @nlinear N A indexexpr` generates an array `P` and a linear index `k` for which `P[k]` is equivalent to `@nref N A indexexpr`. Use this when it would be most convenient to implement an algorithm using linear-indexing.
This is particularly useful when an anonymous-function
expression cannot be evaluated at compile-time. For example, using an
example from Images
, suppose you wanted to iterate over each pixel and perform
a calculation base on the color dimension of an array. In particular, we have a
function rgb
which generates an RGB color from 3 numbers. We can do this for
each pixel of the array in the following way:
sz = [size(img,d) for d = 1:ndims(img)]
cd = colordim(img) # determine which dimension of img represents color
sz[cd] = 1 # we'll "iterate" over color separately
strd = stride(img, cd)
@nextract $N sz sz
A = data(img)
@nloops $N i d->1:sz_d begin
P, k = @nlinear $N A i
rgbval = rgb(P[k], P[k+strd], P[k+2strd])
end
It appears to be difficult to generate code like this just using @nref
,
because the expression d->(d==cd)
could not be evaluated at compile-time.
@ntuple N itersym
@ntuple N expr
Generates an N
-tuple from a symbol prefix (e.g., (i1,i2,...)
) or an
anonymous-function expression.
@nrefshift N A i j
Generates A[i1+j1,i2+j2,...]
. This is legacy from before @nref
accepted
anonymous-function expressions.
@nlookup N A I i
Generates A[I1[i1], I2[i2], ...]
. This can also be easily achieved with
@nref
.
@indexedvariable N i
Generates the expression iN
, e.g., @indexedvariable 2 i
would generate i2
.
@forcartesian itersym sz bodyexpr
This is the oldest macro in the collection, and quite an outlier in terms of functionality:
sz = [5,3]
@forcartesian c sz begin
println(c)
end
This generates the following output:
[1, 1]
[2, 1]
[3, 1]
[4, 1]
[5, 1]
[1, 2]
[2, 2]
[3, 2]
[4, 2]
[5, 2]
[1, 3]
[2, 3]
[3, 3]
[4, 3]
[5, 3]
From the simple example above, @forcartesian
generates a block of code like
this:
if !(isempty(sz) || prod(sz) == 0)
N = length(sz)
c = ones(Int, N)
sz1 = sz[1]
isdone = false
while !isdone
println(c) # This is whatever code we put inside the "loop"
if (c[1]+=1) > sz1
idim = 1
while c[idim] > sz[idim] && idim < N
c[idim] = 1
idim += 1
c[idim] += 1
end
isdone = c[end] > sz[end]
end
end
end
This has more overhead than the direct for-loop approach of @nloops
, but for
many algorithms this shouldn't matter. Its advantage is that the dimensionality
does not need to be known at compile-time.
Julia currently lacks efficient linear-indexing for generic SubArrays
.
Consequently, cartesian indexing is the only high-performance way to
address elements of SubArray
s. Many simple algorithms, like sum
, can have
their performance boosted immensely by implementing them for SubArray
s using
Cartesian
. For example, in 3d it's easy to get a boost on the scale of
100-fold.