COSMA.jl provides wrappers for eth-cscs/COSMA to do communication-optimal matrix-matrix multiplication for DArray's of element types Float32, Float64, ComplexF32 and ComplexF64.
Install via the package manager
using Pkg
Pkg.add("COSMA")A typical prerequisite is to use MPIClusterManager to setup some MPI ranks and to load the package everywhere:
using MPIClusterManagers, DistributedArrays, Distributed
manager = MPIManager(np = 6)
addprocs(manager)
@everywhere using COSMA
# Just on the host we have to configure the mapping of Julia's pids to MPI ranks (hopefully this can be removed in a later release)
COSMA.use_manager(manager)Next create some distributed matrices and multiply them:
using LinearAlgebra
# Float64 matrices, automatically distributed over the MPI ranks
A = drand(100, 100)
B = drand(100, 100)
# Use DistributedArrays to allocate the new matrix C and multiply using COSMA
C = A * B
# Or allocate your own distributed target matrix C:
A_complex = drand(ComplexF32, 100, 100)
B_complex = drand(ComplexF32, 100, 100)
C_complex = dzeros(ComplexF32, 100, 100)
mul!(C_complex, A_complex, B_complex)COSMA.jl depends on MPI.jl, which ships MPICH as a default MPI library. If you need a system-specific version, see the instructions from the docs of MPI.jl.
COSMA supports Julia's DArray matrix distribution perfectly, and is in fact more powerful: Julia's DArray supports only a single local block per MPI rank, whereas COSMA supports an arbitrary number of them.