This is the code for the paper
Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. LSQ++: Lower running time and higher recall in multi-codebook quantization. In ECCV 2018. [CVF].
Rayuela.jl is a package that implements non-orthogonal multi-codebook quantization methods (MCQ). These methods are useful for fast search of high-dimensional (d=100s-1000s) dense vectors. Rayuela is written in Julia.
This is not a production-ready library -- if you are looking for something
like that, maybe look at faiss.
Do note that the methods implemented by faiss and Rayuela.jl are almost entire orthogonal,
and that the libraries are distributed under different licenses
as of May 27 2019, FAISS is MIT licensed.
Rayuela implements the main contributions that I made to this problem during my PhD at UBC, as well as multiple baselines for MCQ. The package is my attempt to make my research reproducible and accessible, and to make it easier for other people, specially newcomers, to contribute to this field, where lack of reproducibility is a major barrier of entry IMO.
I originally intended to incorporate these contributions on top of faiss (see faiss/#185), but I soon realized that doing so would considerably delay the completion of my PhD. Julia is also more accessible (albeit a bit less performant) to quickly try and test new research ideas. In the future, savvier C++ programmers may port the most useful methods to faiss.
This package is written in Julia 1.0, with some extension in C++ and CUDA. You also need a CUDA-ready GPU. We have tested this code on an Nvidia Titan Xp GPU.
Before all else, make sure that you have the
g++ compiler available from the command line, and the
nvcc compiler availible at path
Then, open julia and type
] to enter Package mode:
julia> (v1.0) pkg>
Now you can clone our repo:
(v1.0) pkg> develop https://github.com/una-dinosauria/Rayuela.jl.git
This should put our code under
Due to an open bug with the package manager, you have to manually pull the latest changes:
cd ~/.julia/dev/Rayuela git pull
Demo and data
You may explore the library with
SIFT1M, a classical retrieval dataset:
mkdir data cd data wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz tar -xvzf sift.tar.gz rm sift.tar.gz cd ..
Also make a directory for the results
mkdir -p results/sift1m
Finally, run the demo:
For query/base/protocol (example by default runs on SIFT1M), or
For query/base protocol (example by default runs on LabelMe22K)
This will showcase PQ, OPQ, RVQ, ERVQ, ChainQ and LSQ/LSQ++ (SR-C and SR-D).
The rest of the datasets used in our ECCV'18 publication can be found on gdrive.
- Product Quantization -- TPAMI'11
- Optimized Product Quantization / Cartesian K-means. CVPR'13, CVPR'13, TPAMI'14
- Tree Quantization1 -- CVPR'15
- Residual Vector Quantization -- Sensors'10
- Stacked Quantizers (aka Enhanced Residual Vector Quantization) -- arxiv/CBMI'14 (paywalled)
- Local Search Quantization -- ECCV'16
- Local Search Quantization++ -- ECCV'18
- Competitive Quantization -- TKDE'16
- Recall evaluation code
Things I'd like to get around implementing / porting / wrapping some day (PRs are welcome!)
- Inverted Index -- TPAMI'11, implemented in faiss
- Inverted Multi-index -- CPVR'12, implemented in faiss
- Locally optimized product quantization CVPR'14, code, project page
- Non-orthogonal multi-index -- CVPR'16, code, project page
- Bolt -- KDD'17, code
- Composite Quantization -- ICML'14, original code (released late 2017, written in c++ with Microsoft's extensions)
TODO (no code, low priority)
I would like to implement these methods. Some of them report really good results but, to the best of my knowledge, the authors have never released code. Also, my time is not infinite so ¯\_(ツ)_/¯
- Sparse Composite Quantization -- CVPR'15
- Tree Quantization with Gurobi optimization -- CVPR'15
- Joint K-means quantization -- ICPR'16 (pay-walled)
- Pyramid encoding quantization -- EUSIPCO'17
- Arborescence coding -- ICCV'17
If you find this code useful, consider citing our work:
Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. "LSQ++: Lower running time and higher recall in multi-codebook quantization", ECCV 2018.
Julieta Martinez. "Algorithms for Large-Scale Multi-Codebook Quantization". PhD thesis, 2018. (Coming soon)
As well as the corresponding paper of the method that you are using (see above).
1: The original implementation of Tree Quantization requires a mixed-integer programming solver such as Gurobi for updating the codebooks. We implement a special version of TQ that always create a chain (not a general tree); thus encoding can be done with the Viterbi algorithm, and codebook update is simpler and faster. This method should have been a baseline in the TQ paper IMO.