Julia port of L.J.P. van der Maaten and G.E. Hintons T-SNE visualisation technique.
140 Stars
Updated Last
11 Months Ago
Started In
November 2013

t-SNE (t-Stochastic Neighbor Embedding)

Build Status Coverage Status

Julia implementation of L.J.P. van der Maaten and G.E. Hintons t-SNE visualisation technique.

The scripts in the examples folder require Plots, MLDatasets and RDatasets Julia packages.


julia> Pkg.add("TSne")

Basic API usage

tsne(X, ndim, reduce_dims, max_iter, perplexit; [keyword arguments])

Apply t-SNE (t-Distributed Stochastic Neighbor Embedding) to X, i.e. embed its points (rows) into ndims dimensions preserving close neighbours. Returns the points×ndims matrix of calculated embedded coordinates.

  • X: AbstractMatrix or AbstractVector. If X is a matrix, then rows are observations and columns are features.
  • ndims: Dimension of the embedded space.
  • reduce_dims the number of the first dimensions of X PCA to use for t-SNE, if 0, all available dimension are used
  • max_iter: Maximum number of iterations for the optimization
  • `perplexity': The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results

Optional Arguments

  • distance if true, specifies that X is a distance matrix, if of type Function or Distances.SemiMetric, specifies the function to use for calculating the distances between the rows (or elements, if X is a vector) of X
  • pca_init whether to use the first ndims of X PCA as the initial t-SNE layout, if false (the default), the method is initialized with the random layout
  • max_iter how many iterations of t-SNE to do
  • perplexity the number of "effective neighbours" of a datapoint, typical values are from 5 to 50, the default is 30
  • verbose output informational and diagnostic messages
  • progress display progress meter during t-SNE optimization
  • min_gain, eta, initial_momentum, final_momentum, momentum_switch_iter, stop_cheat_iter, cheat_scale low-level parameters of t-SNE optimization
  • extended_output if true, returns a tuple of embedded coordinates matrix, point perplexities and final Kullback-Leibler divergence

Example usage

using TSne, Statistics, MLDatasets

rescale(A; dims=1) = (A .- mean(A, dims=dims)) ./ max.(std(A, dims=dims), eps())

alldata, allabels = MNIST.traindata(Float64);
data = reshape(permutedims(alldata[:, :, 1:2500], (3, 1, 2)),
               2500, size(alldata, 1)*size(alldata, 2));
# Normalize the data, this should be done if there are large scale differences in the dataset
X = rescale(data, dims=1);

Y = tsne(X, 2, 50, 1000, 20.0);

using Plots
theplot = scatter(Y[:,1], Y[:,2], marker=(2,2,:auto,stroke(0)), color=Int.(allabels[1:size(Y,1)]))
Plots.pdf(theplot, "myplot.pdf")

Command line usage

julia demo-csv.jl haveheader --labelcol=5 iris-headers.csv

Creates myplot.pdf with t-SNE result visualized using Gadfly.jl.

See also