MCAnalyzer.jl provides a interface to LLVM MCA for Julia functions.
MCAnalyzer.jl provides the two functions mark_start and mark_end both will insert some special markers into you code.
llvm-mca will then analyse the generated object file and only analyse the parts in between the two markers.
Currently supported analysis modes:
analyze(function, types)- Corresponds to a basic analysis with no specific analysis flags
timeline(function, types)- Corresponds to a timeline analysis with the
-timelineflag of llvm-mca
- Corresponds to a timeline analysis with the
bottleneck(function, types)- Corresponds to a bottleneck analysis with the
-bottleneck-analysisflag of llvm-mca
- Corresponds to a bottleneck analysis with the
The analysis is printed to stdout.
HSW: HaswellBDW: BroadwellSKL: SkylakeSKX: Skylake-X
By default analyse will use SKL, but you can supply a target architecture through analyze(func, tt, :SKX)
iaca 3.0 currently only supports throughput analysis. This means that currently it is only useful to analyze loops.
mark_start() has to be in the beginning of the loop body and mark_end() has to be after the loop. iaca will then treat the loop as an infite loop.
It is recommended to use @code_llvm/@code_native to inspect the IR/assembly and check that the annotations are
in the expected place.
using MCAnalyzer
function mysum(A)
acc = zero(eltype(A))
for i in eachindex(A)
mark_start()
@inbounds acc += A[i]
end
mark_end()
return acc
end
analyze(mysum, (Vector{Float64},))using MCAnalyzer
function f(y::Float64)
x = 0.0
for i=1:100
mark_start()
x += 2*y*i
end
mark_end()
x
end
analyze(f, (Float64,))using MCAnalyzer
function g(y::Float64)
x1 = x2 = x3 = x4 = x5 = x6 = x7 = 0.0
for i=1:7:100
mark_start()
x1 += 2*y*i
x2 += 2*y*(i+1)
x3 += 2*y*(i+2)
x4 += 2*y*(i+3)
x5 += 2*y*(i+4)
x6 += 2*y*(i+5)
x7 += 2*y*(i+6)
end
mark_end()
x1 + x2 + x3 + x4 + x5 + x6 + x7
end
analyze(g, Tuple{Float64})