Fast vectorized mathematical functions for SIMD.jl , using SLEEFPirates.jl .
This package is registered. To install it :
] add SIMDMathFunctions
The primary goal of SIMDMathFunctions
is to provide efficient methods for mathematical functions with SIMD.Vec
arguments. Under the hood, optimized implementations provided by SLEEFPirates.jl
are used. This allows explicitly vectorized code using SIMD.jl
to benefit from fast vectorized math functions.
using SIMD: VecRange
using SIMDMathFunctions: is_supported, is_fast, fast_functions
using BenchmarkTools
function exp!(xs::Vector{T}, ys::Vector{T}) where {T}
@inbounds for i in eachindex(xs,ys)
xs[i] = @fastmath exp(ys[i])
end
end
function exp!(xs::Vector{T}, ys::Vector{T}, ::Val{N}) where {N, T}
@assert length(ys) == length(xs)
@assert length(xs) % N == 0
@assert is_supported(@fastmath exp)
@inbounds for istart in 1:N:length(xs)
i = VecRange{N}(istart)
xs[i] = @fastmath exp(ys[i])
end
end
y=randn(Float32, 1024*1024); x=similar(y);
@benchmark exp!($x, $y)
@benchmark exp!($x, $y, Val(8))
@benchmark exp!($x, $y, Val(16))
@benchmark exp!($x, $y, Val(32))
is_fast(exp)
unary_funs = fast_functions(1)
binary_funs = fast_functions(2)
is_supported(fun)
returns true
if function fun
supports SIMD.Vec
arguments. Similarly is_fast(fun)
returns true
if fun
has an optimized implementation.
fast_functions([ninputs])
returns a vector of functions benefitting from a fast implementation, restricted to those accepting ninputs
input arguments if ninputs
is provided.
SIMDMathFunctions
also provides a helper function vmap
to vectorize not-yet-supported mathematical functions. For example :
using SIMD: Vec
import SIMDMathFunctions: vmap
import SpecialFunctions: erf
erf(x::Vec) = vmap(erf, x)
erf(x::Vec, y::Vec) = vmap(erf, x, y)
erf(x::Vec{N,T}, y::T) where {N,T} = vmap(erf, x, y)
x = Vec(randn(Float32, 16)...)
@benchmark erf($x)
The default vmap
method simply calls erf
on each element of x
. There is no performance benefit, but it allows generic code to use erf
. If erf_SIMD
is optimized for vector inputs, you can provide a specialized method for vmap
:
using VectorizationBase: verf # vectorized implementation
using SIMDMathFunctions: SIMDVec, VBVec # VectorizationBase <=> SIMD conversion
erf_SIMD(x) = SIMDVec(verf(VBVec(x)))
vmap(::typeof(erf), x) = erf_SIMD(x)
@benchmark erf($x)