# DependentBootstrap.jl

## Note: Package tests may fail on Julia v1.6 or later due to an update in random number generation. I need to update the tests in this package so they don't depend on the specifics of the underlying random number generator in Julia, which I will do in due course. For now, don't worry if the tests fail, nothing of note has been changed in this package in years.

A module for the Julia language that implements several varieties of the dependent statistical bootstrap as well as the corresponding block-length selection procedures.

## News

This package is compatible with julia v1.0+. If you are running v0.6, you will need to use `Pkg.pin("DependentBootstrap", v"0.1.1")`

at the REPL, and if you are running v0.5, use `Pkg.pin("DependentBootstrap", v"0.0.1")`

. Compability with versions before v0.5 is not available.

## Main features

This module allows Julia users to estimate the distribution of a test statistic using any of several varieties of the dependent bootstrap.

The following bootstrap methods are implemented:

- the
*iid*bootstrap proposed in Efron (1979) "Bootstrap Methods: Another Look at the Jackknife", - the stationary bootstrap proposed in Politis, Romano (1994) "The Stationary Bootstrap"
- the moving block bootstrap proposed in Kunsch (1989) "The jackknife and the bootstrap for general stationary observations" and (independently) Liu, Singh (1992) "Moving blocks jackknife and bootstrap capture weak dependence",
- the circular block bootstrap proposed in Politis, Romano (1992) "A circular block resampling procedure for stationary data", and
- the non-overlapping block bootstrap described in Lahiri (1999)
*Resampling Methods for Dependent Data*(this method is not usually used and is included mainly as a curiosity).

The module also implements the following block length selection procedures:

- the block length selection procedure proposed in Politis, White (2004) "Automatic Block Length Selection For The Dependent Bootstrap", including the correction provided in Patton, Politis, and White (2009)

Bandwidth selection for the block length procedures is implemented using the method proposed in Politis (2003) "Adaptive Bandwidth Choice".

Some work has been done to implement the tapered block bootstrap of Paparoditis, Politis (2002) "The tapered block bootstrap for general statistics from stationary sequences", along with corresponding block-length selection procedures, but it is not yet complete.

The module is implemented entirely in Julia.

## What this package does not include

I have not included any procedures for bootstrapping confidence intervals in a linear regression framework, or other parametric models. This functionality is provided by Bootstrap.jl, and work is currently under way to add bootstrap methods from this package to the `Bootstrap`

API.

I also have not included support for the jackknife, wild bootstrap, or subsampling procedures. I would be quite open to pull requests that add these methods to the present package, but have not had time to implement them myself. Work is ongoing to include the tapered block bootstrap, and ideally, the package will also eventually include the extended tapered block bootstrap. If you are interested in working on any of these projects, please feel free to contact me.

## How to use this package

#### Installation

This package should be added using `using Pkg ; Pkg.add("DependentBootstrap")`

, and can then be called with `using DependentBootstrap`

. The package depends on `StatsBase`

and `Distributions`

for some functionality, and on `DataFrames`

and `TimeSeries`

so that `DataFrame`

and `TimeArray`

datasets can be supported by this packages methods.

#### Terminology

In what follows, I use the terminology from Lahiri (1999) *Resampling Methods for Dependent Data* and refer to the underlying test statistic of interest as a *level 1 statistic*, and the distribution parameter of the test statistic that is of interest as a *level 2 parameter*. For example, the user might have some dataset `x`

of type `T_data`

, and be interested in the variance of the sample mean of `x`

. In this case, the level 1 statistic is the sample mean function `mean`

, and the level 2 parameter is the sample variance function `var`

.

I use `T_data`

to refer to the type of the users dataset, `T_level1`

to refer to the output type obtained by applying the level 1 statistic function to the dataset, and `T_level2`

to refer to the output type obtained by applying the level 2 statistic to a `Vector{T_level1}`

(i.e. a vector of resampled level 1 statistics).

#### Exported functions

The package exports the following functions, all of which have docstrings that can be called interactively at the REPL:

`dbootinds(...)::Vector{Vector{Int}}`

-> Returns indices that can be used to index into the underlying data to obtain bootstrapped data. Note, each inner vector of the output corresponds to a single re-sample for the underlying data.`dbootdata(...)::Vector{T_data}`

-> Returns the bootstrapped data. Each element of the output vector corresponds to one re-sampled dataset, and the output vector will have length equal to`numresample`

(a parameter discussed later).`dbootlevel1(...)::Vector{T_level1}`

-> Returns a vector of bootstrapped level 1 statistics, where the output vector will have length equal to`numresample`

.`dbootlevel2(...)::T_level2`

-> Returns the bootstrapped distribution parameter of the level 1 statistic.`dboot(...)::T_level2`

-> Identical to dbootlevel2. Most users will want to use this function.`dbootvar(...)::Float64`

-> Identical to`dboot`

but automatically sets`flevel2`

to`var`

(the sample variance function)`dbootconf(...)::Vector{Float64}`

-> Identical to`dboot`

but automatically sets`flevel2`

to the anonymous function`x -> quantile(x, [0.025, 0.975])`

, so the level 2 distribution parameter is a 95% confidence interval. In addition to the usual keywords, the keyword version of this function also accepts the keyword`alpha::Float64=0.05`

, which controls the width of the confidence interval. Note,`0.05`

corresponds to a 95% confidence interval,`0.1`

to a 90% interval, and`0.01`

to a 99% interval (and so on).`optblocklength(...)::Float64`

-> Returns the optimal block length.

The function `bandwidth_politis_2003{T<:Number}(x::AbstractVector{T})::Tuple{Int, Float64, Vector{Float64}}`

is not exported, but the docstrings can be accessed using `?DependentBootstrap.bandwidth_politis_2003`

at the REPL. This function implements the bandwidth selection procedure from Politis (2003) discussed above, and may be of independent interest to some users.

All of the above functions exhibit the following two core methods:

`f(data ; kwargs...)`

`f(data, bi::BootInput)`

where `data`

is the users underlying dataset, `kwargs`

is a collection of keyword arguments, and `bi::BootInput`

is a core type exported by the module that will be discussed later (but can be safely ignored by most users). The following types for `data`

are currently accepted:

`Vector{<:Number}`

,`Matrix{<:Number}`

where rows are observations and columns are variables,`Vector{Vector{<:Number}}`

where each inner vector is a variable,`DataFrame`

`TimeArray`

Of the two core methods, most users will want the `kwargs`

method. A list of valid keyword arguments and their default values follows:

`blocklength`

<- Block length for bootstrapping procedure. The default value is`0`

. Set to <= 0 to auto-estimate the optimal block length from the dataset.`Float64`

inputs are allowed.`numresample`

<- Number of times to resample the input dataset. The default value is the module constant`NUM_RESAMPLE`

, currently set to`1000`

.`bootmethod`

<- Bootstrapping methodology to use. The default value is`:stationary`

(for the stationary bootstrap).`blocklengthmethod`

<- Block length selection procedure to use if user wishes to auto-estimate the block length. Default value is`:ppw2009`

(use the method described in Patton, Politis, and White (2009)).`flevel1`

<- A function that converts the input dataset to the estimator that the user wishes to bootstrap. The default value is`mean`

.`flevel2`

<- A function that converts a vector of estimators constructed by`flevel1`

into a distributional parameter. The default value is`var`

.`numobsperresample`

<- Number of observations to be drawn (with replacement) per resample. The default value is the number of observations in the dataset (the vast majority of users will want this default value).`fblocklengthcombine`

<- A function for converting a`Vector{Float64}`

of estimated blocklengths to a single`Float64`

blocklength estimate, which is necessary when the input dataset is a multivariate type. The default value is`median`

.

A list of acceptable keyword arguments for `bootmethod`

and `blocklengthmethod`

follows. Note you can use either `String`

or `Symbol`

when specifying these arguments. For `bootmethod`

we have:

`:iid`

or`:efron`

<- IID bootstrap`:stationary`

<- Stationary bootstrap`:movingblock`

or`:moving`

<- Moving block bootstrap`:nonoverlappingblock`

or`:nooverlap`

<- Nonoverlapping block bootstrap`:circularblock`

or`circular`

<- Circular block bootstrap

For `blocklengthmethod`

we have:

`:ppw2009`

<- Block length selection method of Patton, Politis, and White (2009)

Acceptable arguments can also be examined interactively by examining the keys of the module dictionaries `BOOT_METHOD_DICT`

and `BLOCKLENGTH_METHOD_DICT`

.

In practice, the keyword argument method `f(data ; kwargs...)`

actually just wraps a call to `f(data, BootInput(kwargs...))`

under the hood. However, most users will not need to concern themselves with this level of detail.

For those who wish more fine-grained control, please use `?BootInput`

at the REPL to get more information on this core module type.

#### Examples

Let `data::Vector{Float64}`

.

The variance of the sample mean of `data`

can be bootstrapped using a stationary bootstrap with optimally estimated block length using `dboot(data)`

or `dbootvar(data)`

.

A 90% confidence interval for the sample median using a circular block bootsrap with block length of 5 can be estimated using `dboot(data, blocklength=5, bootmethod=:circular, flevel1=median, flevel2=(x -> quantile(x, [0.05, 0.95])))`

or `dbootconf(data, blocklength=5, bootmethod=:circular, flevel1=median, alpha=0.1)`

.

Moving block bootstrap indices for generating bootstrapped data with optimally estimated block length can be obtained using `dbootinds(data, bootmethod=:moving)`

, or if the user wants the bootstrapped data not the indices, `dbootdata(data, bootmethod=:moving)`

. If the user wants bootstrapped sample medians of `data`

, then use `dbootlevel1(data, bootmethod=:moving, flevel1=median)`

.

If the user wants the optimal block length using the method proposed in Patton, Politis, and White (2009), use `optblocklength(data, blmethod=:ppw2009)`

.

Now let `data::Matrix{Float64}`

.

If the user wants the median optimal block length from each column of `data`

, use `optblocklength(data, blmethod=:ppw2009)`

. If the user wants the average optimal block length use `optblocklength(data, blmethod=:ppw2009, fblocklengthcombine=mean)`

.

If the user wants the median of the test statistic that is the maximum of the sample mean of each column, using a stationary bootstrap with optimal block length, then use `dboot(data, flevel1=(x -> maximum(mean(x, dims=1))), flevel2=median)`

. If `data::Vector{Vector{Float64}}`

instead, and the user wanted the 95% confidence interval, use `dbootconf(data, flevel1=(x -> maximum([ mean(x[k]) for k = 1:length(x) ])))`

.