The cost of multiplying a chain of matrices can vary significantly (e.g., by 1-2 orders of magnitude in real problems) depending on the order in which the multiplication steps are applied. The Matrix chain multiplication algorithm applied here (described in Introduction to Algorithms, 3rd Edition by Cormen et al.) finds the optimal multiplication sequence.
With Julia, it is easy to allow for specialization on the matrix types.
For example, the types given in the ArrayFire.jl package or matrices of matrices (block matrices) should work, too. To be clear, the
*(x, y) that applies to a specific types of
y is automatically used with Julia's multiple-dispatch system after the operation order is optimized.
Installing this Package
Since this package is not registered, you must install it by cloning. To add this package, use:
Using this Package
The only function most users should need is
added to the scope by writing:
using MatrixChainMultiply using BenchmarkTools a = rand(1000,1000) b = rand(1000,100) c = rand(100, 500) d = rand(500) @benchmark result1 = matrixchainmultiply(a,b,c,d) @benchmark result2 = *(a,b,c,d)
julia> @benchmark result1 = matrixchainmultiply(a,b,c,d) BenchmarkTools.Trial: memory estimate: 18.45 KiB allocs estimate: 42 -------------- minimum time: 265.715 μs (0.00% GC) median time: 299.071 μs (0.00% GC) mean time: 329.042 μs (0.16% GC) maximum time: 4.162 ms (74.04% GC) -------------- samples: 10000 evals/sample: 1 julia> @benchmark result2 = *(a,b,c,d) BenchmarkTools.Trial: memory estimate: 4.59 MiB allocs estimate: 5 -------------- minimum time: 4.178 ms (0.00% GC) median time: 4.765 ms (0.00% GC) mean time: 5.281 ms (8.05% GC) maximum time: 11.310 ms (18.85% GC) -------------- samples: 944 evals/sample: 1
You can see the optimal ordering and the cost of the operation
Note that the optional keyword argument
printout=true prints out the
optimal multiplication order and the number of FLOPS in case you want
to investigate. It is
false by default.
When to use this
It is certainly possible to see speedups of one or two orders of magnitude through the use of the optimal operation order generated here.
If all matrices are the same size, the overhead of the optimization is not worth the (non-existing) benefit.
- Recognize situations that don't need the analysis to eliminate the
overhead (allowing for a more ubiquitous use of
- For repeated uses, given some matrix size sequence (so at run time), produce the function a single time at run time so that the analysis isn't done unnecessarily.
- Faster or more general algorithms (i.e., those that apply to the general tensor problem) are certainly desirable if you would like to add them.
It has come to my attention that another package implementing this is DynMultiply.jl.