Note: recent changes in the way Julia represents functions and closures require a redesign of this package, which will be published when the 0.5 IR stabilises a bit more.
ISPC.jl uses the Intel ISPC compiler to compile fragments
of ISPC-C or Julia code to vector code (eg. SSE/AVX on Intel CPUs) at runtime.
It is similar to the standard @simd
macro but supports control statements and a
number of math functions.
Speedups of 2x-8x compared to plain Julia code can be achieved (single core).
- high-level interface: translates Julia to ISPC
- low-level interface: compiles and loads ISPC programs
- Julia 0.4 or 0.5-dev up to cf93d6f (2016-01-29 02:19 UTC)
ispc
(must be found in $PATH)libtool
org++
(must be found in $PATH)- (optional)
llvm_dis
for looking at ISPC llvm assembly.
Work in progress! The high-level interface is fairly functional already, but bugs and the odd missing math function should be expected. More documentation to come.
See the example notebook for an overview.
The high-level interface translates Julia code that has been annotated with a set of macros into code compiled by ISPC on-the-fly. Example:
using ISPC
@inline function mandel(c_re, c_im, count)
z_re = c_re
z_im = c_im
i = 0
while i < count
if (z_re * z_re + z_im * z_im > 4.0f0)
break
end
new_re = z_re*z_re - z_im*z_im
new_im = 2.0f0 * z_re * z_im
z_re = c_re + new_re
z_im = c_im + new_im
i += 1
end
return i
end
@ispc function mandelbrot_ispc(x0, y0, x1, y1, output, max_iters)
height, width = size(output)
dx = (x1 - x0) / width
dy = (y1 - y0) / height
@kernel(`--target=avx1-i32x8`) do
for i = 1:width
@foreach(1:height) do j
x = x0 + i * dx
y = y0 + j * dy
output[j,i] = mandel(x, y, max_iters)
end
end
end
output
end;
Supported ISPC constructs:
@foreach
@foreach(:active)
(untested)@unmasked
(untested)@coherent
(untested)
The high-level interface is meant to be used with numerical kernels only. Not all of Julia's syntax and types are supported:
-
Arrays can only be indexed with integers, providing either a single index (linear indexing) or one index per dimension (multi-dimensional arrays). Indexing must yield an array element, not a sub-array.
-
All outer variables ("kernel arguments") must be primitive types or arrays of primitive types.
-
Simple composite types like
UnitRange
are supported inside kernels (eg. infor
loops) but not as kernel arguments at the moment. -
Only functions that have a direct translation to ISPC are supported inside kernels. User-defined functions may be used if they are declared
@inline
. This restriction may be lifted in the future. -
Inner functions, exceptions, task switching, I/O etc. are not supported.
ISPC task-level constructs are not yet supported.
The low-level interface allows you to load and call fragments of ISPC code:
using ISPC
# A basic ISPC kernel:
code = """
export void simple(uniform float vin[], uniform float vout[],
uniform int count) {
foreach (index = 0 ... count) {
float v = vin[index];
if (v < 0.5)
v = v * v;
else
v = sqrt(v);
vout[index] = v;
}
}
"""
# Compile the code and get a function pointer to our kernel:
lib = load_ispc(code, `--target=avx1-i32x8`)
fptr = Libdl.dlsym(lib, "simple")
# Call the kernel:
vin = rand(Float32, 1000);
vout = zeros(Float32, 1000);
ccall(fptr, Void, (Ref{Float32}, Ref{Float32}, UInt64), vin, vout, length(vout))
- The
@kernel
macro creates a lambda function containing the kernel code. code_lowered()
orcode_typed()
is run on the main@ispc
function to get information about closure variables captured by the kernel lambda. These become kernel arguments.- Kernel fragments in the main function are replaced by calls to a
@generated
kernel_call
function. - When
kernel_call
is called for the first time, type inference is run on the kernel fragment. This gives us types for local variables and inlines functions that can be inlined. - The lowered and typed AST is transformed to un-lower
goto
s back intoif
andwhile
statements (ISPC does not support varyinggoto
s) - The transformed AST is translated to ISPC C
- The resulting code is compiled with
ispc
, loaded withLibdl
and called withccall
.