VectorizationBase

NOTE: Looking for new maintainers, otherwise deprecated in Julia 1.11

This is a library providing basic SIMD support in Julia. VectorizationBase exists in large part to serve the needs of LoopVectorization.jl's code gen, prioritizing this over a stable user-facing API. Thus, you may wish to consider SIMD.jl as an alternative when writing explicit SIMD code in Julia. That said, the Vec and VecUnroll types are meant to "just work" as much as possible when passed to user-defined functions, so it should be reasonably stable in practice. Other parts of the code -- e.g, loading and storing vectors as well as the stridedpointer function -- will hopefully converge reasonably soon, and have support for various AbstractArray types propogated through the ecosystem by taking advantage of ArrayInterface.jl, so that VectorizationBase can begin to offer a stable, ergonomic, and well supported API fairly soon.

It additionally provides some information on the host computer it is running on, which can be used to automate target-specific optimizations. Currently, x86_64 support is best on that front, but I'm looking to improve the quality of information provided for other architectures.

Vecs are Numbers and behave as a single objects; they just happen to contain multiple Float64. Therefore, it will behave like a single number rather than a collection with respect to indexing and reductions:

julia> using VectorizationBase

julia> vx = Vec(ntuple(_ -> 10randn(), VectorizationBase.pick_vector_width(Float64))...)
Vec{8,Float64}<14.424983437388981, -7.7378330531368045, -3.499708331670689, -3.358981392002452, 22.519898671389406, -13.08647686033593, 13.96943264299162, -9.518537139443254>

julia> vx[1]
Vec{8,Float64}<14.424983437388981, -7.7378330531368045, -3.499708331670689, -3.358981392002452, 22.519898671389406, -13.08647686033593, 13.96943264299162, -9.518537139443254>

julia> sum(vx)
Vec{8,Float64}<14.424983437388981, -7.7378330531368045, -3.499708331670689, -3.358981392002452, 22.519898671389406, -13.08647686033593, 13.96943264299162, -9.518537139443254>

julia> a = 1.2;

julia> a[1]
1.2

julia> sum(a)
1.2

To extract elements from a Vec, you call it, using parenthesis to index as you would in Fortran or MATLAB:

julia> vx(1), vx(2)
(14.424983437388981, -7.7378330531368045)

julia> ntuple(vx, Val(8))
(14.424983437388981, -7.7378330531368045, -3.499708331670689, -3.358981392002452, 22.519898671389406, -13.08647686033593, 13.96943264299162, -9.518537139443254)

julia> Tuple(vx) # defined for convenience
(14.424983437388981, -7.7378330531368045, -3.499708331670689, -3.358981392002452, 22.519898671389406, -13.08647686033593, 13.96943264299162, -9.518537139443254)

Unfortunately, this means no support for indexing with begin/end.

Reductions are like the ordinary version, but prefixed with v:

julia> using VectorizationBase: vsum, vprod, vmaximum, vminimum

julia> vsum(vx), sum(Tuple(vx))
(13.712777975180877, 13.712777975180877)

julia> vprod(vx), prod(Tuple(vx))
(-5.141765647043406e7, -5.141765647043406e7)

julia> vmaximum(vx), maximum(Tuple(vx))
(22.519898671389406, 22.519898671389406)

julia> vminimum(vx), minimum(Tuple(vx))
(-13.08647686033593, -13.08647686033593)

Here is an example of using vload:

julia> A = rand(8,8);

julia> vload(stridedpointer(A), (MM(W, 1), 1))
Vec{8, Float64}<0.23659378106523243, 0.1572296679962767, 0.4139998988982545, 0.4068544124895789, 0.6365683129363592, 0.10041731176364777, 0.6198701180649783, 0.18351031426464992>

julia> A[1:W,1]'
1×8 adjoint(::Vector{Float64}) with eltype Float64:
 0.236594  0.15723  0.414  0.406854  0.636568  0.100417  0.61987  0.18351

julia> vload(stridedpointer(A), (1, MM(W, 1)))
Vec{8, Float64}<0.23659378106523243, 0.43800087768259754, 0.5833216557209256, 0.8076063696863035, 0.12069215155721758, 0.6015627184700922, 0.1390837892914757, 0.9139206013822945>

julia> A[1,1:W]'
1×8 adjoint(::Vector{Float64}) with eltype Float64:
 0.236594  0.438001  0.583322  0.807606  0.120692  0.601563  0.139084  0.913921

julia> vload(stridedpointer(A), (MM(W,1), MM(W, 1)))
Vec{8, Float64}<0.23659378106523243, 0.7580627352162604, 0.044776171518136954, 0.218587536875811, 0.4596625543892163, 0.2933303822991349, 0.30481677678671315, 0.3595115888246907>

julia> getindex.(Ref(A), 1:W, 1:W)'
1×8 adjoint(::Vector{Float64}) with eltype Float64:
 0.236594  0.758063  0.0447762  0.218588  0.459663  0.29333  0.304817  0.359512

The basic idea is that you have a tuple of indices. The MM type indicates that it is vectorized. In the above example, we vectorize the load along colums, then rows, and then both. This is equivalent to loading the column, row, and diagonal. Note that you can pass a Mask argument to mask off extra loads/stores.