RedClust
Documentation
Please see the detailed documentation above.
Introduction
RedClust is a Julia package for Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarities instead of the raw observations. It uses an MCMC sampler to generate posterior samples from the space of all possible clustering structures on the data.
Installation
The package can be installed by typing ]add RedClust
into the Julia REPL or by the usual method:
using Pkg
Pkg.add("RedClust")
Basic example
using RedClust
# Generate data
points, distM, clusts, probs, oracle_coclustering =
generatemixture(100, 10; α = 10, σ = 0.25, dim = 10)
# Let RedClust choose the best prior hyperparameters
params = fitprior(pnts, "k-means", false)
# Set the MCMC options
options = MCMCOptionsList(numiters = 5000)
data = MCMCData(points)
# Run the sampler
result = runsampler(data, options, params)
# Get a point estimate
pointestimate, index = getpointestimate(result)
# Summary of point estimate
summarise(pointestimate, clusts)
A more elaborate example can be found in the detailed documentation. Examples from the paper and its supplementary material can be found in the 'examples' branch of this repository.
Citing this package
If you want to use this package in your work, please cite it as:
Natarajan, A., De Iorio, M., Heinecke, A., Mayer, E. and Glenn, S. (2023). ‘Cohesion and Repulsion in Bayesian Distance Clustering’, Journal of the Americal Statistical Association. DOI: 10.1080/01621459.2023.2191821.
For BibTeX users:
@article{NDI23,
doi = {10.1080/01621459.2023.2191821},
author = {Natarajan, Abhinav and De Iorio, Maria and Heinecke, Andreas and Mayer, Emanuel and Glenn, Simon},
title = {Cohesion and Repulsion in Bayesian Distance Clustering},
journal = {Journal of the American Statistical Association},
year = {2023}
}