TriangularSolve.jl

Rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)
Author JuliaSIMD
Popularity
12 Stars
Updated Last
1 Year Ago
Started In
July 2021

TriangularSolve

Stable Dev Build Status Coverage

Performs some triangular solves. For example:

julia> using TriangularSolve, LinearAlgebra, MKL;

julia> BLAS.set_num_threads(1)

julia> BLAS.get_config().loaded_libs
1-element Vector{LinearAlgebra.BLAS.LBTLibraryInfo}:
 LBTLibraryInfo(libmkl_rt.so, ilp64)

julia> N = 100;

julia> A = rand(N,N); B = rand(N,N); C = similar(A);

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  15.909 μs …  41.524 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     17.916 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.751 μs ± 697.786 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▁    ▁    ▁     ▄▁    ▇▆    ▆█▃                             ▂
  ██▃▁▁██▁▁▁▁█▆▁▁▃▇██▄▃▁███▆▁▄▄███▄▄▅▅▆▇█▇▄▅▆▇██▇█▇▇▆▄▅▄▁▄▁▄▄▇ █
  15.9 μs       Histogram: log(frequency) by time      19.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  17.578 μs … 75.835 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     19.852 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.827 μs ±  1.342 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▂              ▂    ▆▅   ▁█▇▂   ▅▃     ▂                   ▂
  ██▁▁▃█▇▁▁▁█▇▄▄▁██▇▄▄▄██▆▅▄████▅▄▆██▆▆▆▆▇██▇▇▆▆▇▆▅▆▄▅▅▆▄▅▄▅▅ █
  17.6 μs      Histogram: log(frequency) by time      22.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  19.102 μs …  69.966 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     21.561 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.565 μs ± 890.952 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▂                 ▂▃     ▄▄     ▆█▄     ▅▅                  ▂
  ██▃▁▁▁▇█▁▁▁▁▅█▁▁▁▁▁██▅▁▁▁▅██▆▁▁▁▆███▆▅▃▅████▃▄▅██▇▇▅▆▆▇▇█▇▆▆ █
  19.1 μs       Histogram: log(frequency) by time      23.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  19.082 μs …  39.078 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     19.694 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.765 μs ± 774.848 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃        ▄█         ▁
  ▂▇██▄▂▁▁▂▂▃███▃▂▁▂▁▂▂▅█▇▃▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂ ▃
  19.1 μs         Histogram: frequency by time         22.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Multithreaded benchmarks:

julia> BLAS.set_num_threads(min(Threads.nthreads(), TriangularSolve.VectorizationBase.num_cores()))

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  8.309 μs …  24.357 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     8.769 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.812 μs ± 382.702 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▁▃▄▆▆██▇▆▅▃▁
  ▂▁▂▂▂▂▃▃▃▄▅▇██████████████▇▆▅▄▃▃▃▃▃▂▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▄
  8.31 μs         Histogram: frequency by time         9.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  11.996 μs … 151.147 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     14.163 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.281 μs ±   2.372 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▂▄▇███▇▆▅▃▂ ▁   ▂▄▄▅▅▅▆▃▃         ▁
  ▁▁▁▂▂▃▄▇██████████████████████████▇▆▅▄▅▆▇███▆▅▅▃▄▂▂▂▁▁▁▁▁▁▁▁ ▅
  12 μs           Histogram: frequency by time         17.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  7.903 μs …  22.442 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     9.871 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.789 μs ± 864.957 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▃  ▄▃  ▃▅   ▅▃   ▆▂   ▆▄   ▂▇▄   ▃█▅▂▂▁▁▄▆▃▁ ▁             ▂
  ██▅▂██▆▅██▆▆▆██▇▇███▇▇▇████▇█████▆██████████████▇███▇▇▆▇▆▅▆ █
  7.9 μs       Histogram: log(frequency) by time      11.8 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  13.507 μs … 142.574 μs  ┊ GC (min … max): 0.00%0.00%
 Time  (median):     15.258 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.319 μs ±   2.045 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▃   ▁▂   ▁▃▅▁  ▁▄▄▁  ▂▆█▆▃
  ▁▂▅███▆▇███▆▅████▆▅████▆▆█████▆▄▄▆▆▅▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁ ▄
  13.5 μs         Histogram: frequency by time         18.5 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.8.0-DEV.438
Commit 88a6376e99* (2021-08-28 11:03 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
  JULIA_NUM_THREADS = 8

For editing convenience (you can copy/paste the above into a REPL and it should automatically strip julia> s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):

using TriangularSolve, LinearAlgebra, MKL;
BLAS.set_num_threads(Threads.nthreads())
BLAS.get_config().loaded_libs
N = 100;

A = rand(N,N); B = rand(N,N); C = similar(A);

@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))

@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false))
@benchmark ldiv!($C, LowerTriangular($B), $A)

BLAS.set_num_threads(TriangularSolve.VectorizationBase.num_cores())
@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))

@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
@benchmark ldiv!($C, LowerTriangular($B), $A)

versioninfo()

Currently, rdiv! with UpperTriangular and ldiv! with LowerTriangulra matrices are the only supported configurations.