Cumulants.jl

Multivariate cumulants of any order
Author iitis
Popularity
13 Stars
Updated Last
10 Months Ago
Started In
January 2017

Cumulants.jl

Coverage Status DOI

Calculates cumulant tensors of any order for multivariate data. Functions return tensor or array of tensors in SymmetricTensors type. Requires SymmetricTensors.jl. To convert to array, run:

julia> Array(data::SymmetricTensors{T, N})

As of 01/01/2017 kdomino is the lead maintainer of this package.

Installation

Within Julia, run

pkg> add Cumulants

to install the files. Julia 1.0 or later is required.

Functions

Moment

julia> moment(data::Matrix{T}, m::Int, b::Int = 2) where T<: AbstractFloat

Returns a SymmetricTensor{T, m} of the moment of order m of multivariate data represented by a t by n matrix, i.e. data with n marginal variables and t realisations. The argument b with defalt value 2, is an optional Int that determines the size of the blocks in SymmetricTensors type.

julia> data = reshape(collect(1.:15.),(5,3))
5×3 Array{Float64,2}:
 1.0   6.0  11.0
 2.0   7.0  12.0
 3.0   8.0  13.0
 4.0   9.0  14.0
 5.0  10.0  15.0
julia> m = moment(data, 3)
SymmetricTensor{Float64,3}(Union{Nothing, Array{Float64,3}}[[45.0 100.0; 100.0 230.0]

[100.0 230.0; 230.0 560.0] nothing; nothing nothing]

Union{Nothing, Array{Float64,3}}[[155.0 360.0; 360.0 890.0] [565.0; 1420.0]; nothing [2275.0]], 2, 2, 3, false)

To convert to array use:

julia> Array(m)
3×3×3 Array{Float64,3}:
[:, :, 1] =
  45.0  100.0  155.0
 100.0  230.0  360.0
 155.0  360.0  565.0

[:, :, 2] =
 100.0  230.0   360.0
 230.0  560.0   890.0
 360.0  890.0  1420.0

[:, :, 3] =
 155.0   360.0   565.0
 360.0   890.0  1420.0
 565.0  1420.0  2275.0

Cumulants

julia> cumulants(data::Matrix{T}, m::Int = 4, b::Int = 2) where T<: AbstractFloat

Returns a vector of SymmetricTensor{T, i} i = 1,2,3,...,m of cumulants of order 1,2,3,...,m. Cumulants are calculated for multivariate data represented by matrix of size t by n, i.e. data with n marginal variables and t realisations.

julia> c = cumulants(data, 3);

julia> c[2]
SymmetricTensor{Float64,2}(Union{Nothing, Array{Float64,2}}[[2.0 2.0; 2.0 2.0] [2.0; 2.0]; nothing [2.0]], 2, 2, 3, false)

julia> c[3]
SymmetricTensor{Float64,3}(Union{Nothing, Array{Float64,3}}[[0.0 0.0; 0.0 0.0]

[0.0 0.0; 0.0 0.0] nothing; nothing nothing]

Union{Nothing, Array{Float64,3}}[[0.0 0.0; 0.0 0.0] [0.0; 0.0]; nothing [0.0]], 2, 2, 3, false)

To convert to array:

julia> Array(c[2])
3×3 Array{Float64,2}:
 2.0  2.0  2.0
 2.0  2.0  2.0
 2.0  2.0  2.0

 julia> Array(c[3])
3×3×3 Array{Float64,3}:
[:, :, 1] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 2] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 3] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

Block size

The argument b with default value 2, is an optional Int that determines a size of blocks in SymmetricTensors type. This block size b is the parameter that affect the algorithm performance, for most cases the performance is optimal for b = 2, 3. The block size must fulfil 0 < b ≦ size(data, 2) otherwise error will be raised. For the performance analysis for various bolck sizes see Section 5.2.1 in Krzysztof Domino, Piotr Gawron, Łukasz Pawela, Efficient Computation of Higher-Order Cumulant Tensors, SIAM J. Sci. Comput. 40, A1590 (2018) DOI, https://arxiv.org/abs/1701.05420. For benchmarking one can also use benchmarks/comptimeblocks.jl

The purpose of this package is to compute moments and cumulants for multivariate data. It works for univariate data X structured in the form of matrix with size(X, 2) = 1 if taking b=1. Such univariate application is not efficient however.

julia> X = [1., 2., 3., 4.];

julia> X = reshape(X, (4,1));

julia> c = cumulants(X,4,1);

julia> map(x -> Array(x)[1], c)
4-element Array{Float64,1}:
  2.5
  1.25
  0.0
 -2.125

We do not suply exact univariate fisher's k-statistics.

Parallel computation

Parallel computation is efficient for large number of data realisations, e.g. t = 1000000. For parallel computation just run

julia> addprocs(n)
julia> @everywhere using Cumulants

Naive algorithms of moment and cumulant tensors calculations are also available.

julia> naivemoment(data::Matrix{T}, m::Int = 4) where T<: AbstractFloat

Returns array{T, m} of the m'th moment of data. calculated using a naive algorithm.

julia> naivemoment(data, 3)
3×3×3 Array{Float64,3}:
[:, :, 1] =
  45.0  100.0  155.0
 100.0  230.0  360.0
 155.0  360.0  565.0

[:, :, 2] =
 100.0  230.0   360.0
 230.0  560.0   890.0
 360.0  890.0  1420.0

[:, :, 3] =
 155.0   360.0   565.0
 360.0   890.0  1420.0
 565.0  1420.0  2275.0
julia> naivecumulant(data::Matrix{T}, m::Int = 4) where T<: AbstractFloat

Returns Array{T, m} of the m'th cumulant of data, calculated using a naive algorithm. Works for 1 <= m < 7, for m >= 7 throws exception.

julia> naivecumulant(data, 2)
3×3 Array{Float64,2}:
 2.0  2.0  2.0
 2.0  2.0  2.0
 2.0  2.0  2.0
julia> naivecumulant(data, 3)
3×3×3 Array{Float64,3}:
[:, :, 1] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 2] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 3] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

Performance analysis, folder benchmarks

To analyse the computational time of cumulants vs naivecumulants and moment vs naivemoment, we supply the executable script comptimes.jl. This script returns to a .jld file computational times, given following parameters:

  • -m (Int): cumulant's order, by default m = 4,
  • -n (vararg Int): numbers of marginal variables, by default m = 20 24 28,
  • -t (vararg Int): number of realisations of random variable, by defalut t = 10000. Be careful while using n>4 and large m, where naive algorithms might need a large computational time and memory usage. Naive algorithms does not use the block structures, hence they computes and stores a whole cumulant tensor regardless its symmetry. All comparisons performed by this script use one core.

To analyse the computational time of cumulants for different block sizes 1 =< b =< Int(sqrt(n)), we supply the executable script comptimeblocks.jl. This script returns to a .jld file computational times, given following parameters:

  • -m (Int): cumulant's order, by default m = 4,
  • -n (Int): numbers of marginal variables, by default m = 48,
  • -t (vararg Int): number of realisations of random variable, by default t = 10000 20000. Computational times and parameters are saved in the .jld file in /res directory. All comparisons performed by this script use one core.

To analyse the computational time of moment on different numbers of processes, we supply the executable script comptimeprocs.jl. This script returns to a .jld file computational times, given following parameters:

  • -m (Int): moment's order, by default m = 4,
  • -n (Int): numbers of marginal variables, by default m = 50,
  • -t (Int): number of realisations of random variable, by default t = 100000,
  • -p (Int): maximal number of processes, by default p = 4,
  • -b (Int): blocks size, by default b = 2.

All result files are saved in /res directory. To plot a graph run /res/plotcomptimes.jl followed by a *.jld file name

For the computational example on data use the following.

The script gandata.jl generates t = 150000000 realisations of n = 4 dimensional data form the t-multivariate distribution with ν = 14 degrees of freedom, and theoretical super-diagonal elements of those cumulants. Results are saved in data/datafortests.jld

The script testondata.jl computes cumulant tensors of order m = 1 - 6 for data/datafortests.jld, results are saved in data/cumulants.jld.

To read cumulants.jld please run

julia> using JLD

julia> using SymmetricTensors

julia> load("cumulants.jld")

To plot super-diagonal elements of those cumulants and their theoretical values from t-student distribution pleas run plotsuperdiag.jl

Citing this work

Krzysztof Domino, Piotr Gawron, Łukasz Pawela, Efficient Computation of Higher-Order Cumulant Tensors, SIAM J. Sci. Comput. 40, A1590 (2018) DOI, https://arxiv.org/abs/1701.05420

This project was partially financed by the National Science Centre, Poland – project number 2014/15/B/ST6/05204.