Please see the detailed documentation above.
RedClust is a Julia package for Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarities instead of the raw observations. It uses an MCMC sampler to generate posterior samples from the space of all possible clustering structures on the data.
The package can be installed by typing ]add RedClust
into the Julia REPL or by the usual method:
using Pkg
Pkg.add("RedClust")
using RedClust
# Generate data
points, distM, clusts, probs, oracle_coclustering =
generatemixture(100, 10; α = 10, σ = 0.25, dim = 10)
# Let RedClust choose the best prior hyperparameters
params = fitprior(pnts, "k-means", false)
# Set the MCMC options
options = MCMCOptionsList(numiters = 5000)
data = MCMCData(points)
# Run the sampler
result = runsampler(data, options, params)
# Get a point estimate
pointestimate, index = getpointestimate(result)
# Summary of point estimate
summarise(pointestimate, clusts)
A more elaborate example can be found in the detailed documentation. Examples from the paper and its supplementary material can be found in the 'examples' branch of this repository.
If you want to use this package in your work, please cite it as:
Natarajan, A., De Iorio, M., Heinecke, A., Mayer, E. and Glenn, S. (2023). ‘Cohesion and Repulsion in Bayesian Distance Clustering’, Journal of the American Statistical Association, 119(546), pp. 1374--1384. DOI: 10.1080/01621459.2023.2191821.
For BibTeX users:
@article{NDI23,
doi = {10.1080/01621459.2023.2191821},
author = {Natarajan, Abhinav and De Iorio, Maria and Heinecke, Andreas and Mayer, Emanuel and Glenn, Simon},
title = {Cohesion and Repulsion in Bayesian Distance Clustering},
journal = {Journal of the American Statistical Association},
volume = {119},
issue = {546},
pages={1374--1384},
year = {2023}
}