## Cumulants.jl

Multivariate cumulants of any order
Author iitis
Popularity
10 Stars
Updated Last
1 Year Ago
Started In
January 2017

# Cumulants.jl

Calculates cumulant tensors of any order for multivariate data. Functions return tensor or array of tensors in `SymmetricTensors` type. Requires SymmetricTensors.jl. To convert to array, run:

`julia> Array(data::SymmetricTensors{T, N})`

As of 01/01/2017 kdomino is the lead maintainer of this package.

## Installation

Within Julia, run

`pkg> add Cumulants`

to install the files. Julia 1.0 or later is required.

## Functions

### Moment

`julia> moment(data::Matrix{T}, m::Int, b::Int = 2) where T<: AbstractFloat`

Returns a `SymmetricTensor{T, m}` of the moment of order `m` of multivariate data represented by a `t` by `n` matrix, i.e. data with `n` marginal variables and `t` realisations. The argument `b` with defalt value `2`, is an optional `Int` that determines the size of the blocks in `SymmetricTensors` type.

```julia> data = reshape(collect(1.:15.),(5,3))
5×3 Array{Float64,2}:
1.0   6.0  11.0
2.0   7.0  12.0
3.0   8.0  13.0
4.0   9.0  14.0
5.0  10.0  15.0```
```julia> m = moment(data, 3)
SymmetricTensor{Float64,3}(Union{Nothing, Array{Float64,3}}[[45.0 100.0; 100.0 230.0]

[100.0 230.0; 230.0 560.0] nothing; nothing nothing]

Union{Nothing, Array{Float64,3}}[[155.0 360.0; 360.0 890.0] [565.0; 1420.0]; nothing [2275.0]], 2, 2, 3, false)```

To convert to array use:

```julia> Array(m)
3×3×3 Array{Float64,3}:
[:, :, 1] =
45.0  100.0  155.0
100.0  230.0  360.0
155.0  360.0  565.0

[:, :, 2] =
100.0  230.0   360.0
230.0  560.0   890.0
360.0  890.0  1420.0

[:, :, 3] =
155.0   360.0   565.0
360.0   890.0  1420.0
565.0  1420.0  2275.0```

### Cumulants

`julia> cumulants(data::Matrix{T}, m::Int = 4, b::Int = 2) where T<: AbstractFloat`

Returns a vector of `SymmetricTensor{T, i}` `i = 1,2,3,...,m` of cumulants of order `1,2,3,...,m`. Cumulants are calculated for multivariate data represented by matrix of size `t` by `n`, i.e. data with `n` marginal variables and `t` realisations.

```julia> c = cumulants(data, 3);

julia> c[2]
SymmetricTensor{Float64,2}(Union{Nothing, Array{Float64,2}}[[2.0 2.0; 2.0 2.0] [2.0; 2.0]; nothing [2.0]], 2, 2, 3, false)

julia> c[3]
SymmetricTensor{Float64,3}(Union{Nothing, Array{Float64,3}}[[0.0 0.0; 0.0 0.0]

[0.0 0.0; 0.0 0.0] nothing; nothing nothing]

Union{Nothing, Array{Float64,3}}[[0.0 0.0; 0.0 0.0] [0.0; 0.0]; nothing [0.0]], 2, 2, 3, false)
```

To convert to array:

```julia> Array(c[2])
3×3 Array{Float64,2}:
2.0  2.0  2.0
2.0  2.0  2.0
2.0  2.0  2.0

julia> Array(c[3])
3×3×3 Array{Float64,3}:
[:, :, 1] =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0

[:, :, 2] =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0

[:, :, 3] =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0```

#### Block size

The argument `b` with default value `2`, is an optional `Int` that determines a size of blocks in `SymmetricTensors` type. This block size `b` is the parameter that affect the algorithm performance, for most cases the performance is optimal for `b = 2, 3`. The block size must fulfil `0 < b ≦ size(data, 2)` otherwise error will be raised. For the performance analysis for various bolck sizes see `Section 5.2.1` in Krzysztof Domino, Piotr Gawron, Łukasz Pawela, Efficient Computation of Higher-Order Cumulant Tensors, SIAM J. Sci. Comput. 40, A1590 (2018) , https://arxiv.org/abs/1701.05420. For benchmarking one can also use `benchmarks/comptimeblocks.jl`

The purpose of this package is to compute moments and cumulants for multivariate data. It works for univariate data `X` structured in the form of matrix with `size(X, 2) = 1` if taking `b=1`. Such univariate application is not efficient however.

```julia> X = [1., 2., 3., 4.];

julia> X = reshape(X, (4,1));

julia> c = cumulants(X,4,1);

julia> map(x -> Array(x)[1], c)
4-element Array{Float64,1}:
2.5
1.25
0.0
-2.125```

We do not suply exact univariate fisher's k-statistics.

#### Parallel computation

Parallel computation is efficient for large number of data realisations, e.g. `t = 1000000`. For parallel computation just run

```julia> addprocs(n)
julia> @everywhere using Cumulants```

Naive algorithms of moment and cumulant tensors calculations are also available.

`julia> naivemoment(data::Matrix{T}, m::Int = 4) where T<: AbstractFloat`

Returns array{T, m} of the m'th moment of data. calculated using a naive algorithm.

```julia> naivemoment(data, 3)
3×3×3 Array{Float64,3}:
[:, :, 1] =
45.0  100.0  155.0
100.0  230.0  360.0
155.0  360.0  565.0

[:, :, 2] =
100.0  230.0   360.0
230.0  560.0   890.0
360.0  890.0  1420.0

[:, :, 3] =
155.0   360.0   565.0
360.0   890.0  1420.0
565.0  1420.0  2275.0```
`julia> naivecumulant(data::Matrix{T}, m::Int = 4) where T<: AbstractFloat`

Returns `Array{T, m}` of the `m`'th cumulant of data, calculated using a naive algorithm. Works for `1 <= m < 7`, for `m >= 7` throws exception.

```julia> naivecumulant(data, 2)
3×3 Array{Float64,2}:
2.0  2.0  2.0
2.0  2.0  2.0
2.0  2.0  2.0```
```julia> naivecumulant(data, 3)
3×3×3 Array{Float64,3}:
[:, :, 1] =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0

[:, :, 2] =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0

[:, :, 3] =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0```

# Performance analysis, folder `benchmarks`

To analyse the computational time of cumulants vs naivecumulants and moment vs naivemoment, we supply the executable script `comptimes.jl`. This script returns to a .jld file computational times, given following parameters:

• `-m (Int)`: cumulant's order, by default `m = 4`,
• `-n (vararg Int)`: numbers of marginal variables, by default `m = 20 24 28`,
• `-t (vararg Int)`: number of realisations of random variable, by defalut `t = 10000`. Be careful while using `n`>`4` and large `m`, where naive algorithms might need a large computational time and memory usage. Naive algorithms does not use the block structures, hence they computes and stores a whole cumulant tensor regardless its symmetry. All comparisons performed by this script use one core.

To analyse the computational time of cumulants for different block sizes `1 =< b =< Int(sqrt(n))`, we supply the executable script `comptimeblocks.jl`. This script returns to a .jld file computational times, given following parameters:

• `-m (Int)`: cumulant's order, by default `m = 4`,
• `-n (Int)`: numbers of marginal variables, by default `m = 48`,
• `-t (vararg Int)`: number of realisations of random variable, by default `t = 10000 20000`. Computational times and parameters are saved in the .jld file in /res directory. All comparisons performed by this script use one core.

To analyse the computational time of moment on different numbers of processes, we supply the executable script `comptimeprocs.jl`. This script returns to a .jld file computational times, given following parameters:

• `-m (Int)`: moment's order, by default `m = 4`,
• `-n (Int)`: numbers of marginal variables, by default `m = 50`,
• `-t (Int)`: number of realisations of random variable, by default `t = 100000`,
• `-p (Int)`: maximal number of processes, by default `p = 4`,
• `-b (Int)`: blocks size, by default `b = 2`.

All result files are saved in /res directory. To plot a graph run /res/plotcomptimes.jl followed by a `*.jld` file name

For the computational example on data use the following.

The script `gandata.jl` generates `t = 150000000` realisations of `n = 4` dimensional data form the `t`-multivariate distribution with `ν = 14` degrees of freedom, and theoretical super-diagonal elements of those cumulants. Results are saved in `data/datafortests.jld`

The script `testondata.jl` computes cumulant tensors of order `m = 1 - 6` for `data/datafortests.jld`, results are saved in `data/cumulants.jld`.

To read `cumulants.jld` please run

```julia> using JLD

julia> using SymmetricTensors

To plot super-diagonal elements of those cumulants and their theoretical values from t-student distribution pleas run `plotsuperdiag.jl`