Compositional data analysis in Julia
9 Stars
Updated Last
11 Months Ago
Started In
December 2017

This package is inspired by the R compositions package for compositional data analysis. Currently, only parts of the total features are implemented. Contributions are very welcome.

CoDa.jl defines a Composition{D} type representing a D-part composition as defined by Aitchison 1986. In Aitchison's geometry, the D-simplex together with addition (a.k.a. pertubation) and scalar multiplication (a.k.a. scaling) form a vector space, and important properties hold:

  • Scaling invariance
  • Pertubation invariance
  • Permutation invariance
  • Subcompositional coherence

In practice, this means that one can operate on compositional data (i.e. vectors whose entries represent parts of a total) without destroying the ratios of the parts.


Get the latest stable release with Julia's package manager:

] add CoDa



Compositions are static vectors with named parts:

julia> using CoDa
julia> cₒ = Composition(CO₂=1.0, CH₄=0.1, N₂O=0.1)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   CH₄ ┤■■■■ 0.1                                  
   N₂O ┤■■■■ 0.1                                  
       └                                        ┘ 
julia> c = Composition(CO₂=2.0, CH₄=0.1, N₂O=0.3)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 2.0   
   CH₄ ┤■■ 0.1                                    
   N₂O ┤■■■■■ 0.3                                 
       └                                        ┘ 

Default names are added otherwise:

julia> c = Composition(1.0, 0.1, 0.1)
                     3-part composition
          ┌                                        ┐ 
   part-1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   part-2 ┤■■■■ 0.1                                  
   part-3 ┤■■■■ 0.1                                  
          └                                        ┘ 

and serve for internal compile-time checks.

Compositions can be added, subtracted, negated, and multiplied by scalars. Other operations are also defined including dot product, induced norm, and distance:

julia> -cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■ 0.047619047619047616                   
   CH₄ ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
   N₂O ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
       └                                        ┘ 
julia> 0.5c
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■ 0.6207690197922022   
   CH₄ ┤■■■■ 0.13880817265812764                  
   N₂O ┤■■■■■■■■ 0.24042280754967013              
       └                                        ┘ 
julia> c - cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■ 0.3333333333333333  
   CH₄ ┤■■■■■■■■■■■■ 0.16666666666666666          
   N₂O ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5   
       └                                        ┘ 
julia> c  cₒ
julia> norm(c)
julia> distance(c, cₒ)

More complex functions can be defined in terms of these operations. For example, the function below defines the composition line passing through cₒ in the direction of c:

julia> f(λ) = cₒ + λ*c
f (generic function with 1 method)

Finally, two compositions are considered to be equal when their closure is approximately equal:

julia> c == c
julia> c == cₒ


Currently, the following transformations are implemented:

julia> alr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
julia> clr(c)
3-element StaticArrays.SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
julia> ilr(c) # TODO


It is often useful to compose D columns of a table into D-part compositions. The package provides some utility functions for loading tabular data and for type conversion.

The function readcoda(args...; codanames=[], kwargs...) accepts the same arguments of the function from CSV.jl plus an additional keyword argument codanames that specifies the columns with the parts of the composition.

Similarly, the function compose(table, cols) takes an already loaded table and converts the specified columns into a single column with Composition objects.


The most practical reference by far is the book Analyzing Compositional Data With R by van den Boogaart K. G. et al. 2013. The book contains the examples that I reproduced in this README and is a good start for scientists who are seeing this material for the first time.

A more theoretical exposition can be found in the book Modeling and Analysis of Compositional Data by Pawlowsky-Glahn, V. et al. 2015. It contains detailed explanations of the concepts introduced by Aitchison in the 80s, and is co-authored by important names in the field.

Used By Packages

No packages found.