Survival analysis in Julia
Author piever
5 Stars
Updated Last
1 Year Ago
Started In
January 2017


Build Status

Coverage Status

Important notice

I am porting the functionality of this repository into Survival. This package should still be usable (under the name AcceleratedFailure, to avoid name conflicts): I will officially deprecate it when all the functionality has been ported.

Setting up

To install the package, simply run


in the Julia REPL.

Basic usage:

Load relevant packages:

using AcceleratedFailure
using DataFrames

Load dataset and create event column. Event.time is time, whereas Event.censored is true if the data is censored and false otherwise.

filepath = joinpath(Pkg.dir("AcceleratedFailure", "examples"), "rossi.csv")
rossi = readtable(filepath)
rossi[:event] = Event.(rossi[:week], rossi[:arrest].== 0)

Run Cox regression

outcome = coxph(@formula(event ~ fin+age+race+wexp+mar+paro+prio),rossi)

And you should get this outcome (computed with Efron method for ties):

Model: Cox; Formula: event ~ fin + age + race + wexp + mar + paro + prio

        Estimate Std.Error   z value Pr(>|z|)
fin    -0.378636  0.191352  -1.97874   0.0478
age   -0.0570578 0.0219674  -2.59738   0.0094
race    0.314349  0.308045   1.02046   0.3075
wexp   -0.152399  0.212026 -0.718774   0.4723
mar    -0.431883  0.381726   -1.1314   0.2579
paro  -0.0850866  0.195743 -0.434686   0.6638
prio   0.0907363 0.0286157   3.17086   0.0015

This package also includes Kaplan Meier estimator of the survival function, which takes a vector of events as input:

x,surv = kaplan_meier(rossi[:event])

In the output x will be the array with the times of death, surv the estimated cdf.

To visualize it (using Plots.jl) you can simply type:

using Plots
plot(x,surv, line = :step)

Nelson Aalen estimator for the cumulative hazard:

x, chaz = nelson_aalen(rossi[:event])

To check that everything went well, you may want to verify that cumulative = 1 - exp.(-chaz)

plot(x,surv, line = :step)
plot!(x,exp.(-chaz), line = :step)

You can also get the baseline cumulative hazard from the outcome of a Cox regression:

x, chaz = nelson_aalen(outcome)

In turn, it's always possible to get the cdf from the cumulative hazard:


and the same is true for the hazard:

chaz2haz(x,chaz, bw)

where, in the second case, bw is a parameter used for smoothing.

Accelerated Failure Time models

This package also supports accelerated failure time models.

using Distributions
using DataFrames
using AcceleratedFailure

Let's generate a fake dataset:

N = 50000
x = randn(N)
y = randn(N)
z = randn(N)
t1 = rand.(Gamma.(10,exp.(x-0.3y)))
t2 = rand(Gamma(15,1),N)
W = [(t2[i]>t1[i]) ? Event(t1[i], false) : Event(t2[i], true) for i = 1:N]
df = DataFrame(x = x, y = y, z = z, a = W)

Let's specify the formula and distribution (only PGamma is implemented so far, where PGamma(params) = Gamma(exp(params[1]),exp(-params[1]))):

res = aft(@formula(a ~ 1 + x +y + x*y+ z), df, PGamma(); tol = 1e-3)

The outcome should look something like this:

Model: Accelerated Failure Time, dist = AcceleratedFailure.PGamma(params=[2.30856]);
Formula: a ~ 1 + x + y + x * y + z

                 Estimate  Std.Error   z value Pr(>|z|)
params1           2.30856 0.00754425   306.003   <1e-99
(Intercept)       2.30315 0.00205682   1119.76   <1e-99
x                0.999012 0.00226804   440.474   <1e-99
y               -0.295107 0.00200584  -147.124   <1e-99
z            -0.000542461 0.00168103 -0.322695   0.7469
x & y         -0.00222899 0.00215507   -1.0343   0.3010

The first coefficient is the parameter of the distribution (in this case the parameter of PGamma is the log of the shape paramete of the corresponding Gamma distribution). The remaining coefficients are the log of the contribution to the mean of the various covariates.

Using DataFrames

nelson_aalen, kaplan_meier, Cox regression and accelerated failure time models can be used with Nullable Arrays. The rows missing relevant data will be discarded.