Purpose of StatisticalRethinkingStan.jl
As stated many times by the author in his online lectures, StatisticalRethinking is a hands-on course. This project is intended to assist with the hands-on aspect of learning the key ideas in StatisticalRethinking.
StatisticalRethinkingStan is a Julia project that uses Pluto notebooks for this purpose. Each notebook demonstrates Julia versions of
code snippets and
mcmc models contained in the R package "rethinking" associated with the book Statistical Rethinking by Richard McElreath.
If you prefer to work with scripts instead of notebooks, a utility in the
src directory is provided (
generate_scripts.jl) to create scripts from all notebooks and store those in a newly created
scripts directory. Note that this is a simple tool and will override all files in the
scripts directory. For exploration purposes I suggest to move some of those scripts to e.g. the
This Julia project uses Stan (the
cmdstan executable) as the underlying mcmc implementation. A companion project ( StatisticalRethinkingTuring.jl ) uses Turing.jl.
To (locally) reproduce and use this project, do the following (just once):
- Download this project from Github and move to the downloaded directory, e.g.:
$ git clone https://github.com/StatisticalRethinkingJulia/StatisticalRethinkingStan.jl $ cd StatisticalRethinkingStan.jl $ julia
and in the Julia REPL:
julia> ] # Actvate Pkg mode (@v1.5) pkg> activate . # Activate pkg in . (StatisticalRethinkingStan) pkg> instantiate # Install in pkg environment (StatisticalRethinkingStan) pkg> <delete> # Exit package mode julia>
If above procedure fails, if present, try to delete the Manifest.toml file and repeat above steps. As mentioned above, these steps are only needed the first time.
If you want to use a specific tagged version, use:
#cd to cloned directory git checkout v2.0.0
The next step assumes your Julia setup includes
- Start a Pluto notebook server.
$ julia julia> using Pluto julia> Pluto.run()
- A Pluto page should open in a browser.
Select a notebook in the
open a file entry box, e.g. type
./ and step to
./notebooks/00/clip-00-01-03s.jl. All notebooks will activate the project
A good notebook to initially glance over if
data directory, in DrWatson accessible through
datadir(), can be used for locally generated data, exercises, etc. All "rethinking" data files are stored and maintained in StatisticalRethinking.jl and can be accessed via
sr_datadir(...). DrWatson provides several other handy shortcuts, e.g. projectdir().
A typical set of opening lines in each notebook:
using Pkg, DrWatson # Note: Below sequence is important. First activate the project # followed by `using` or `import` statements. Pretty much all # scripts use StatisticalRethinking. If mcmc sampling is # needed, it must be loaded before StatisticalRethinking: @quickactivate "StatisticalRethinkingStan" using StanSample using StanOptimize # If quap() is used. using StatisticalRethinking # To access e.g. the Howell1.csv data file: df = CSV.read(sr_datadir("Howell1.csv"), DataFrame) df = df[df.age .>= 18, :]
All R snippets (fragments) have been organized in clips. Each clip is a notebook.
Clips are named as
cc: Chapter number
fs: First snippet in clip
ls: Last snippet in clip
[s|sl|t|d|m]: Mcmc flavor used (s : Stan, t : Turing)
d is reserved for a combination Soss/DynamicHMC,
sl is reserved for Stan models using the
logpdf formulation and
m for Mamba.
The notebooks containing the clips are stored by chapter. In addition to clips, in the early notebook chapters (0-3) it is also shown how to create some of the figures in the book, e.g.
Special introductory notebooks have been included in
intro-R-users/distributions.jl. It is suggested to at least glance over the
Great introductory notebooks showing Julia and statistics ( based on the Statistics with Julia book ) can be found in StatisticsWithJuliaPlutoNotebooks.
One goal for the changes in StatisticalRethinking v3 was to make it easier to compare and mix and match results from different mcmc implementations. Hence consistent naming of models and results is important. The models and the results of simulations are stored as follows:
- stan5_1 : Stan language program
- m5_1s : The sampled StanSample model
- q5_1s : Stan quap model (NamedTuple similar to Turing)
- chns5_1s : MCMCChains object (4000 samples from 4 chains)
- part5_1s : Stan samples (Particles notation)
- quap5_1s : Quap samples (Particles notation)
- nt5_1s : NamedTuple with samples values (default for
Results as a DataFrame:
- prior5_1s_df : Prior samples (DataFrame)
- post5_1s_df : Posterior samples (DataFrame)
- quap5_1s_df : Quap approximation to posterior samples (DataFrame) 10.pred5_1s_df : Posterior predictions (DataFrame)
As before, the
s at the end indicates Stan.
read_samples(m5_1s) returns a NamedTuple with the results.
read_samples(m5_1s; output_format=:...) makes it easy to create MCMCChains.jl Chains objects, a DataFrame with draws or a MonteCarloMeasurements.jl Particles object (item 4 in above list).
StatisticalRethinkingStan.jl is compatible with the 2nd edition of the book. Version 1.0.0 covers pretty much the same as StatisticalRethinking.jl v2.2.9+.
StructuralCausalModels.jl is included as en experimental dependency in the StatisticalRethinking.jl v3 package. Definitely WIP!
Two other packages created to
Any feedback is appreciated. Please open an issue.
Of course, without the excellent textbook by Richard McElreath, this package would not have been possible. The author has also been supportive of this work and gave permission to use the datasets.
This repository and format is derived from previous versions of StatisticalRethinking.jl, work by Karajan, and many other contributors.
StatisticalRethinkingStan v3.4.0 is in sync with StatisticalRethinking v3.4.0.
- Initial version (late Nov 2020).