Fast and scalable non-parametric Bayesian inference for Poisson point processes

Poisson point processes are among the basic modelling tools in many areas. Their probabilistic properties are determined by their intensity function, the density *λ*.
This Julia package implements our non-parametric Bayesian approach to estimation of the intensity function *λ* for univariate Poisson point processes. For full details see our preprint

- S. Gugushvili, M. Schauer, F. van der Meulen, and P. Spreij. Fast and scalable non-parametric Bayesian inference for Poisson point processes.
**arXiv:1804.03616 [stat.ME]**, 2018.

Intuitively, a univariate Poisson point processes *X*, also called a non-homogeneous Poisson process, can be thought of as random
scattering of points in the time interval *[0,T]*, where the way the scattering occurs is determined by the intensity function *λ*.
An example is the ordinary Poisson process, for which the intensity *λ* is constant.

We infer the intensity function *λ* in a non-parametric fashion. The function *λ* is a priori modelled as piecewise constant. This is even more natural, if the data have been already binned,
as is often the case in, e.g., astronomy. Thus, fix a positive integer *N* and a grid *b* of points `b[1] == 0`

, `b[N] == T`

on the interval *[0,T]*, for instance a uniform grid.
The intensity *λ* is then modelled as
`λ(x) = ψ[k]`

for `b[k] <= x < b[k+1]`

.

Now we postulate that a priori the coefficients *ψ* form a Gamma Markov chain (GMC). As explained in our preprint, this prior induces smoothing across the coefficients *ψ*, and leads to conjugate posterior computations
via the Gibbs sampler. The data-generating intensity is not assumed to be necessarily piecewise constant. Our methodology provides both a point estimate of the intensity function (posterior mean) and uncertainty quantification via marginal credible bands; see the examples below.

Change julia into the package manager mode by hitting `]`

. Then run the command `add PointProcessInference`

.

```
pkg> add PointProcessInference
```

In the following example we load the UK coal mining disasters data and performs its statistical analysis via the Poisson point process.

```
using PointProcessInference
using Random
Random.seed!(1234) # set RNG
observations, parameters, λinfo = PointProcessInference.loadexample("coal")
res = PointProcessInference.inference(observations; parameters...)
```

The main procedure has signature

` PointProcessInference.inference(observations; title = "Poisson process", T = 1.0, n = 1, ...)`

where `observations`

is the sorted vector of Poisson event times, `T`

is the endpoint of the time interval considered, and if
`observations`

is an aggregate of `n`

different independent observations (say aggregated for `n`

days), this can be indicated by the parameter `n > 1`

. A full list of parameters is as follows:

```
function inference(observations;
title = "Poisson process", # optional caption for mcmc run
summaryfile = nothing, # path to summary file or nothing
T0 = 0.0, # start time
T = maximum(observations), # end time
n = 1, # number of aggregated samples in `observations`
N = min(length(observations)÷4, 50), # number of bins
samples = 1:1:30000, # run for `i in 1:last(samples)` iterations, save coefficients if `i ∈ samples`
α1 = 0.1, β1 = 0.1, # parameters for Gamma Markov chain
Π = Exponential(10), # prior on alpha
τ = 0.7, # Set scale for random walk update on log(α)
αind = 0.1, βind = 0.1, # parameters for the independent Gamma prior
emp_bayes = false, # estimate βind using empirical Bayes
verbose = true
)
```

Iterates of *ψ* are the rows of the matrix

`res.ψ`

For high quality plotting, the package has a script `process-output-simple.jl`

that visualizes
the results with the help of `R`

and `ggplot2`

.
After installing the additional dependencies

```
pkg> add RCall
pkg> add DataFrames
```

include the script (it is located in the `contrib`

folder, the location can be retrieved by calling `PointProcessInference.plotscript()`

)

```
include(PointProcessInference.plotscript())
plotposterior(res)
```

The script starts `ggplot2`

with `RCall`

, and `plotposterior`

expects as its argument the result `res`

returned from `inference`

. For computing the posterior summary measures, the first half of the MCMC iterates is treated as burnin samples.

Here, we generate data from a nonhomogeneous Poissson process as follows:

```
λ0(x) = (20 + 8*cos(x))
λ0max = 28
obs = PointProcessInference.samplepoisson(λ0, λ0max, 0, 10)
```

The nonparametric estimator is obtained by running

`res = PointProcessInference.inference(obs)`

Finally, a default graph is obtained by

```
include(PointProcessInference.plotscript())
plotposterior(res)
```

- Illustration: Intensity estimation for the generated data in example 1. The data are displayed via the rug plot in the upper margin of the plot, the posterior mean is given by a solid black line, while a 95% marginal credible band is shaded in light blue.

A slightly refined plot, where the true intensity is added to the figure can be obtained by passing the data-generating intensity function as an extra argument.

`plotposterior(res;figtitle="Cosine intensity", λ=λ0)`

This results in the plot

Here, we analyse the well-known coal mining disasters data set.

```
observations, parameters, λinfo = PointProcessInference.loadexample("coal")
res = PointProcessInference.inference(observations)
plotposterior(res)
```

- Illustration: Intensity estimation for the UK coal mining disasters data (1851-1962). The data are displayed via the rug plot in the upper margin of the plot, the posterior mean is given by a solid black line, while a 95% marginal credible band is shaded in light blue.

The horizontal tickmarks can be adjusted, as the offset date of the data, which is March 15, 1851 in this case.

```
start = 1851+(28+31+15)/365
plotposterior(res; figtitle="Coal mining disasters", offset = start,hortics=1850:10:1960)
```

If you use the package in your work, we encourage you to cite it using the following BibTeX code:

```
@misc{pppjl,
title = { {PointProcessInference 0.1.0 -- Code and Julia package accompanying the article ``Gugushvili, van der Meulen, Schauer, Spreij (2018): Fast and scalable non-parametric Bayesian inference for Poisson point processes" ({http://arxiv.org/abs/1804.03616})} },
author = {Gugushvili, Shota and van der Meulen, Frank and Schauer, Moritz and Spreij, Peter},
year = {2019},
doi = {10.5281/zenodo.2591395},
url = {https://doi.org/10.5281/zenodo.2591395},
}
```