## NearestNeighborDescent.jl

Efficient approximate k-nearest neighbors graph construction and search in Julia
Author dillondaudert
Popularity
26 Stars
Updated Last
2 Years Ago
Started In
September 2018

# NearestNeighborDescent.jl

A Julia implementation of Nearest Neighbor Descent.

Dong, Wei et al. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures. WWW (2011).

## Overview

Nearest Neighbor Descent (NNDescent) is an approximate K-nearest neighbor graph construction algorithm that has several useful properties:

• general: works with arbitrary dissimilarity functions
• scalable: has an empirical complexity of O(n^1.14) pairwise comparisons for a dataset of size n
• space efficient: the only data structure required is an approximate KNN graph which is operated on in-place and is also the final output
• accurate: converges to above 90% recall while only comparing each data point to a small percentage of the whole dataset on average

NNDescent is based on the heuristic argument that a neighbor of a neighbor is also likely to be a neighbor. That is, given a list of approximate nearest neighbors to a point, we can improve that list by exploring the neighbors of each point in the list. The algorithm is in essence the repeated application of this principle.

## Installation

`]add NearestNeighborDescent`

## Basic Usage

Approximate kNN graph construction on a dataset:

```using NearestNeighborDescent
using Distances
data = [rand(20) for _ in 1:1000]
n_neighbors = 10
metric = Euclidean()
graph = nndescent(data, n_neighbors, metric)```

The approximate KNNs of the original dataset can be retrieved from the resulting graph with

```# return the approximate knns as KxN matrices of indexes and distances, where
# indices[j, i] and distances[j, i] are the index of and distance to node i's jth
# nearest neighbor, respectively.
indices, distances = knn_matrices(graph)```

To find the approximate neighbors for new points with respect to an already constructed graph:

```queries = [rand(20) for _ in 1:20]
n_neighbors = 5
indices, distances = search(graph, queries, n_neighbors)```