EMpht.jl
A Julia port of the EMpht.c program, used for fitting Phase-Type distributions via an EM algorithm.
The original C — which is available on Søren Asmussen's website — is well documented and has a decent performance for phase-type distributions with a small or moderate number of phases. However it is quite slow for when the number of phases is large (>= 20), and the UX is very old-school Unix.
This port is much simpler and faster. See the examples given below and the test cases.
Examples
To fit a phase-type distribution to some data:
using Distributions
using EMpht
data = rand(Exponential(1/10), 1_000) # Generate some data to fit
sample = EMpht.Sample(obs=data) # Create an EMpht Sample object with this data
ph = empht(sample, p=5) # Fit the data using p=5 phases
xGrid = range(0, 8, length=1_00) # Create a grid to evaluate the density function over
fitPDFs = pdf.(ph, xGrid) # The probability density function of the fitted phase-type
The default structure of the phase-type is "Coxian" (see below for details).
For large values of p
the "CanonicalForm1" is recommended.
To impose no structure on the phase-type, use "General", though the results degrade quickly with p > 5
or so.
Another available option is "GeneralisedCoxian".
phGen = empht(sample, p=20, ph_structure="General")
phCox = empht(sample, p=20, ph_structure="Coxian")
phCF1 = empht(sample, p=50, ph_structure="CanonicalForm1")
If the data is not fully observed, i.e. the data is binned (interval-censored), then the Sample object is updated like so:
# The intervals
int = [1.5 2.0; 2.0 2.5; 2.5 3.0; 3.0 3.5; 3.5 4.0; 4.0 4.5; 4.5 5.0; 5.0 5.5;
5.5 6.0; 6.0 6.5; 6.5 7.0; 7.0 7.5]
# The number of observations falling into each interval
intweight = [4.0, 34.0, 107.0, 170.0, 202.0, 222.0, 140.0, 77.0, 24.0, 14.0,
4.0, 2.0]
# Create an EMpht Sample object with this data
sInt = EMpht.Sample(int=int, intweight=intweight)
# Fitting the interval-censored data
phCF1 = empht(sInt, p=100, ph_structure="CanonicalForm1")
xGrid = range(0, 8, length=1_000)
fitPDFs = pdf.(phCF1, xGrid)
To choose the algorithm used to fit the data (see papers below for details):
phunif = empht(sample, p=5, method=:unif) # Fit using the uniformization technique (default)
phode = empht(sample, p=5, method=:ode) # Fit using the more traditional ODE solving technique
EMpht.jl can read all necessary information from a JSON file (the number of phases to fit, the special structure of the phase-type, the sample to fit). For example, if you download the Coxian100.json file inside the test directory, the following will launch a fit based on those parameters:
ph100 = empht("Coxian100.json")
Resources
The relevant papers for the algorithms are:
- S. Asmussen, O. Nerman & M. Olsson, Fitting phase-type distribution via the EM algorithm, Scandinavian Journal of Statistics 23, 419-441 (1996),
- M. Olsson, Estimation of phase-type distributions from censored data, Scandinavian Journal of Statistics 23, 443-460 (1996).
- H. Okamura, T. Dohi, K.S. Trivedi, A refined EM algorithm for PH distributions, Performance Evaluation 68, 938-954 (2011)
- H. Okamura, T. Dohi, K.S. Trivedi, Improvement of expectation-maximization algorithm for phase-type distributions with grouped and truncated data, Appl. Stochastic Models Bus. Ind. 29, 141-156 (2013)
Some case studies using this package are:
- S. Asmussen, P.J. Laub, H. Yang, Phase-type models in life insurance: fitting and valuation of equity-linked benefits, Risks 7(1), 17 pages (2019)
- A. Vuorinen, The blockchain propagation process: a machine learning and matrix analytic approach, University of Melbourne Masters Thesis (2019), see website or thesis.