Clustering Genetic Algorithm is a method that uses a modification of Genetic Algorithms to estimate potential clusters in a dataset. This is particulaly useful, in cases where other parameters like estimated numbers of clusters(k) may not be known be known. The alogorithm maximizes the mean silhouettes of clustering to compute the clusters.
Being an evolutionary alogorithm, the algorithm depends on randomly generated populations and for large datasets can be computational intensive.
function cga(objects, distances, population_size, generations)
Compute the clusters in the objects using Clustering Genetic Algorithms.
objects: the vector of objects used for clusteringdistances: the matrix providing the pairwise distance between the objectspopulation_size: populations utilized in computing the genetic algorithm (default20*length(objects))generations: number of generations for which genetic algorithm has to run (default50)
It returns a tuple of CGAData and CGAResult.
struct CGAResult <: ClusteringResult
assignments::Vector{Int} # element-to-cluster assignments (n)
counts::Vector{Int} # number of samples assigned to each cluster (k)
found_gen::Int # first generation where the elite was found
total_gen::Int # total generations the GA has been run
end
mutable struct CGAData{S, T<:Real}
# to be used as an opaque object and normally not to be queried for values.
end
Methods like count can be used with CGAResult as this is derived from the Clustering.ClusteringResult abstract type.
- Hruschka, Eduardo & Ebecken, Nelson. (2003). A genetic algorithm for cluster analysis. Intell. Data Anal.. 7. 15-25. 10.3233/IDA-2003-7103.