ClusterValidityIndices.jl

A Julia package for Cluster Validity Indices (CVI) algorithms.

Documentation	Build Status	Coverage	Reference


Documentation Build	JuliaHub Status	Dependents	Release

Please read the documentation for detailed usage and tutorials.

Overview

Cluster Validity Indices (CVIs) are designed to be metrics of performance for unsupervised clustering algorithms. In the absense of supervisory labels (i.e., ground truth), clustering algorithms - or any truly unsupervised learning algorithms - have no way to definitively know the stability of their learning and accuracy of their performance. As a result, CVIs exist to provide metrics of partitioning stability/validity through the use of only the original data samples and the cluster labels prescribed by the clustering algorithm.

This Julia package contains an outline of the conceptual usage of CVIs along with many example scripts in the documentation. This outline contains a Quickstart that provides an overview of how to use this project along with a list of CVIs that are implemented in the lastest version of the project.

Installation

This project is distributed as a Julia package and hosted on JuliaHub, Julia's package manager repository. As such, this package's usage follows the usual Julia package installation procedure, interactively:

julia> ]
(@v1.9) pkg> add ClusterValidityIndices

or programmatically:

julia> using Pkg
julia> Pkg.add("ClusterValidityIndices")

You may also add the package directly from a GitHub branch to get the latest changes between releases:

julia> ]
(@v1.9) pkg> add https://github.com/AP6YC/ClusterValidityIndices.jl#develop

Quickstart

This section provides a quick overview of how to use the project. For more detailed code usage, please see the Detailed Usage.

First, import the package with:

# Import the package
using ClusterValidityIndices

CVI objects are instantiated with empty constructors:

# Create a Davies-Bouldin (DB) CVI object
my_cvi = DB()

All CVIs are implemented with acronyms of their literature names. A list of all of these are found in the Implemented CVIs/ICVIs section.

Next, get data from a clustering process. This is a set of samples of features that are clustered and prescribed cluster labels.

Note

The ClusterValidityIndices.jl package assumes data to be in the form of Float matrices where columns are samples and rows are features. An individual sample is a single vector of features. Labels are vectors of integers where each number corresponds to its own cluster.

# Random data as an example; 10 samples with feature dimenison 3
dim = 3
n_samples = 10
data = rand(dim, n_samples)
labels = repeat(1:2, inner=n_samples)

The output of CVIs are called criterion values, and they can be computed both incrementally and in batch with get_cvi!. Compute in batch by providing a matrix of samples and a vector of labels:

criterion_value = get_cvi!(my_cvi, data, labels)

or incrementally with the same function by passing one sample and label at a time:

# Create a fresh CVI object for incremental evaluation
my_icvi = DB()

# Create a container for the values and iterate
criterion_values = zeros(n_samples)
for i = 1:n_samples
    criterion_values[i] = get_cvi!(my_icvi, data[:, i], labels[i])
end

Note

Each module has a batch and incremental implementation, but ClusterValidityIndices.jl does not yet support switching between batch and incremental modes with the same CVI object.

Implemented CVI/ICVIs

This project has implementations of the following CVIs in both batch and incremental variants:

CH: Calinski-Harabasz.
cSIL: Centroid-based Silhouette.
DB: Davies-Bouldin.
GD43: Generalized Dunn's Index 43.
GD53: Generalized Dunn's Index 53.
PS: Partition Separation.
rCIP: (Renyi's) representative Cross Information Potential.
WB: WB-index.
XB: Xie-Beni.

The exported constant CVI_MODULES also contains a list of these CVIs for convenient iteration.

Examples

A basic example of the package usage is found in the documentation illustrating top-down usage of the package.

Futhermore, there are a variety of examples in the Examples section of the documentation for a variety of use cases of the project. Each of these is made using the DemoCards.jl package and can be opened, saved, and run as a Julia notebook.

Contributing

If you have a question or concern, please raise an issue. For more details on how to work with the project, propose changes, or even contribute code, please see the Developer Notes in the project's documentation.

In summary:

Questions and requested changes should all be made in the issues page. These are preferred because they are publicly viewable and could assist or educate others with similar issues or questions.
For changes, this project accepts pull requests (PRs) from feature/<my-feature> branches onto the develop branch using the GitFlow methodology. If unit tests pass and the changes are beneficial, these PRs are merged into develop and eventually folded into versioned releases.
The project follows the Semantic Versioning convention of major.minor.patch incremental versioning numbers. Patch versions are for bug fixes, minor versions are for backward-compatible changes, and major versions are for new and incompatible usage changes.

Acknowledgements

Authors

This package is developed and maintained by Sasha Petrenko with sponsorship by the Applied Computational Intelligence Laboratory (ACIL). The users @rMassimiliano and @malmaud have graciously contributed their time with reviews and feedback that has greatly improved the project.

Support

This project is supported by grants from the Night Vision Electronic Sensors Directorate, the DARPA Lifelong Learning Machines (L2M) program, Teledyne Technologies, and the National Science Foundation. The material, findings, and conclusions here do not necessarily reflect the views of these entities.

Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-22-2-0209. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

License

This software is openly maintained by the ACIL of the Missouri University of Science and Technology under the MIT License.

Citation

This project has a citation file file that generates citation information for the package and corresponding JOSS paper, which can be accessed at the "Cite this repository button" under the "About" section of the GitHub page.

You may also cite this repository with the following BibTeX entry:

@article{Petrenko2022,
  doi = {10.21105/joss.03527},
  url = {https://doi.org/10.21105/joss.03527},
  year = {2022},
  publisher = {The Open Journal},
  volume = {7},
  number = {79},
  pages = {3527},
  author = {Sasha Petrenko and Donald C. Wunsch},
  title = {ClusterValidityIndices.jl: Batch and Incremental Metrics for Unsupervised Learning},
  journal = {Journal of Open Source Software}
}