RedClust.jl

Julia package to perform Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarity information.
Author abhinavnatarajan
Popularity
4 Stars
Updated Last
5 Months Ago
Started In
October 2022

RedClust

GitHub Workflow Status (master) License Latest release Code Coverage

Documentation

Development version documentation Stable version documentation arxiv paper link

Please see the detailed documentation above.

Introduction

RedClust is a Julia package for Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarities instead of the raw observations. It uses an MCMC sampler to generate posterior samples from the space of all possible clustering structures on the data.

Installation

The package can be installed by typing ]add RedClust into the Julia REPL or by the usual method:

using Pkg
Pkg.add("RedClust")

Basic example

using RedClust
# Generate data
points, distM, clusts, probs, oracle_coclustering = 
	generatemixture(100, 10; α = 10, σ = 0.25, dim = 10)
# Let RedClust choose the best prior hyperparameters
params = fitprior(pnts, "k-means", false)
# Set the MCMC options
options = MCMCOptionsList(numiters = 5000)
data = MCMCData(points)
# Run the sampler
result = runsampler(data, options, params)
# Get a point estimate 
pointestimate, index = getpointestimate(result)
# Summary of point estimate
summarise(pointestimate, clusts)

A more elaborate example can be found in the detailed documentation. Examples from the paper and its supplementary material can be found in the 'examples' branch of this repository.

Citing this package

If you want to use this package in your work, please cite it as:

Natarajan, A., De Iorio, M., Heinecke, A., Mayer, E. and Glenn, S. (2023). ‘Cohesion and Repulsion in Bayesian Distance Clustering’, Journal of the American Statistical Association, 119(546), pp. 1374--1384. DOI: 10.1080/01621459.2023.2191821.

For BibTeX users:

@article{NDI23,
  doi = {10.1080/01621459.2023.2191821},
  author = {Natarajan, Abhinav and De Iorio, Maria and Heinecke, Andreas and Mayer, Emanuel and Glenn, Simon},
  title = {Cohesion and Repulsion in Bayesian Distance Clustering},
  journal = {Journal of the American Statistical Association},
  volume = {119},
  issue = {546},
  pages={1374--1384},
  year = {2023}
}