Approximate Nearest Neighbor Searches using the "Hierarchical Navigable Small World" (HNSW) algorithm as described in https://arxiv.org/abs/1603.09320 .
- Written in Julia - no non-julian dependencies
- Supports incremental index creation
- Works with arbitrary distance functions
- Is data-agnostic - can work with data of arbitrary types given a corresponding distance function
Creating an Index
An Index in this library is a struct of type
HierarchicalNSW which can be constructed using:
hnsw = HierarchicalNSW(data; metric, M, efConstruction)
data: This is an
AbstractVectorof the data points to be used.
metric = Euclidean(): The metric to use for distance calculation. Any metric defined in
Distances.jlshould work as well as any type for which
evaluate(::CustomMetric, x,y)is implemented.
M = 10: The maximum number of links per node on a level >1. Note that value highly influences recall depending on data.
M0 = 2M: The maximum number of links on the bottom layer (=1). Defaults to
M0 = 2M.
efConstruction = 100: Maximum length of dynamic link lists during index creation. Low values may reduce recall but large values increase runtime of index creation.
ef = 10: Maximum length of dynamic link lists during search. May be changed afterwards using
m_L = 1/log(M): Prefactor for random level generation.
max_elements = length(data): May be set to a larger value in case one wants to add elements to the structure after initial creation.
HierarchicalNSW struct is initialized the search graph can be built by calling
add_to_graph!(hnsw [, indices])
which iteratively inserts all points from
data into the graph.
Optionally one may provide
indices a subset of all the indices
data to partially to construct the graph.
Given an initialized
HierarchicalNSW one can search for approximate nearest
idxs, dists = knn_search(hnsw, query, k)
query may either be a single point of type
or a vector of such points.
A simple example:
using HNSW dim = 10 num_elements = 10000 data = [rand(dim) for i=1:num_elements] #Intialize HNSW struct hnsw = HierarchicalNSW(data; efConstruction=100, M=16, ef=50) #Add all data points into the graph #Optionally pass a subset of the indices in data to partially construct the graph add_to_graph!(hnsw) # optionally with a progress notification: # step = (num_elements) ÷ 100 # add_to_graph!(hnsw) do i # if iszero(i % step) # @info "Processed: $(i ÷ step)%" # end # end queries = [rand(dim) for i=1:1000] k = 10 # Find k (approximate) nearest neighbors for each of the queries idxs, dists = knn_search(hnsw, queries, k)
A multi-threaded version of this algorithm is available.
To use it, checkout the branch
multi-threaded and start the indexing with:
For multi-threaded searches add
multithreading=true as a keyword argument to