PartialSvdStoch

This package provides approximate partial SVD using stochastic methods.

It implements two SVD related algorithms, the real purpose of the package being in fact the Shamir algorithm described in the first item.

The algorithms are the following:

The paper by Ohad Shamir: Fast Stochastic Algorithms for SVD and PCA : convergences Properties and Convexity(2015).

The algorithm combines a stochastic gradient approach and iterative techniques.
It requires a correct initialization of the left singular vectors obtained here using the impressive LowRankApprox package if the rank is imposed or the algorithm descrided below in the second item if we search the rank for a target precision. It then provides in a few iterations a further 25%-50% diminution of the relative error in the Froebonius norm. This can correspond to a reduction by up to a factor of 4 of the rank necessary to obtain the same accuracy by any of the 2 initializations possible. (Cf by example the test vrpca_epsil)

The combination of the Shamir algorithm with the initializing algorithms mentionned above provide accurate truncated SVD or range approximation for a good performance compromise and can thus save computation in further processing.

The chapter on fast incremental Monte-Carlo Svd from Spectral Algorithms. S. Vempala and R. Kannan(2009) (chapter 7 P 85 and Th 7.2 7.3).

The algorithm relies on an iterative decomposition of the range of the data matrix. Each iteration does successive sampling of columns data vectors proportionally to their L2-norm, construct a range approximation, and substract the current range approximation before next iteration.
Due to its incremental construction it is also possible to ask for an approximation at a given precision without asking for a given rank and let the computation give the rank as an output.

It is not as fast as the LowRankApprox package but can provide an alternative depending on the data structure. Run times are about 4s for a (10000, 20000) matrix and a rank=100 approximation with 2 iterations on a 2 core laptop and 8 threads (See test ifsvd_lowRankApproxEpsil). On the same matrix, it runs in 14s if we search the rank giving a 0.05 approximation.

Testing

In the directory test are some tests providing examples of the different uses including timing and precision comparisons with matrix sizes up to (10000, 20000).
Tests can be run going in the test directory and just execute include("runtest.jl") in the Julia REPL.
The Html documentation is generated by executing julia make.jl in the doc directory from a shell.

License

Licensed under either of

Apache License, Version 2.0, LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0
MIT license LICENSE-MIT or http://opensource.org/licenses/MIT

at your option.

This software was written on my own while working at CEA, CEA-LIST