CellFishing.jl

Fast and scalable cell search tool
Popularity
32 Stars
Updated Last
5 Months Ago
Started In
June 2018

CellFishing.jl ๐ŸŽฃ

DOI

CellFishing.jl (cell finder via hashing) is a tool to find similar cells of query cells based on their transcriptome expression profiles.

Kenta Sato, Koki Tsuyuzaki, Kentaro Shimizu, and Itoshi Nikaido. "CellFishing.jl: an ultrafast and scalable cell search method for single-cell RNA sequencing." Genome Biology, 2019 20:31. https://doi.org/10.1186/s13059-019-1639-x

# Import packages.
using CellFishing
using TableReader

# Load expression profiles of database cells.
# Note: We highly recommend using the Loom format (http://loompy.org/) to
# load expression data, because loading a large matrix in plain text takes
# extremely long time.
data = readtsv("database.txt")  # use readcsv if your file is comma-separated
cellnames = string.(names(data))
featurenames = string.(data[:,1])
counts = Matrix{Int}(data[:,2:end])

# Select features and create an index (or a database).
features = CellFishing.selectfeatures(counts, featurenames)
database = CellFishing.CellIndex(counts, features, metadata=cellnames)

# Save/load the database to/from a file (optional).
# CellFishing.save("database.cf", database)
# database = CellFishing.load("database.cf")

# Load expression profiles of query cells.
data = readtsv("query.txt")
cellnames = string.(names(data))
featurenames = string.(data[:,1])
counts = Matrix{Int}(data[:,2:end])

# Search the database for similar cells; k cells will be returned per query.
k = 10
neighbors = CellFishing.findneighbors(k, counts, featurenames, database)

# Write the neighboring cells to a file.
open("neighbors.tsv", "w") do file
    println(file, join(["cell"; string.("n", 1:k)], '\t'))
    for j in 1:length(cellnames)
        print(file, cellnames[j])
        for i in 1:k
            print(file, '\t', database.metadata[neighbors.indexes[i,j]])
        end
        println(file)
    end
end

Installation

First of all, you need to install a Julia compiler. A recommended way is to download a pre-built binary of Julia. The pre-built binaries for several major platforms are distributed at https://julialang.org/downloads/. Currently, CellFishing.jl supports Julia 1.0 or later.

Then, install CellFishing.jl with the following command:

$ julia -e 'using Pkg; Pkg.add(PackageSpec(url="git://github.com/bicycle1885/CellFishing.jl.git"))'

Alternatively, you can use the add command in the package management mode of Julia:

(v1.0) pkg> add git@github.com:bicycle1885/CellFishing.jl.git

To check the installation, you can try using CellFishing in your REPL:

$ julia
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.0 (2018-08-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CellFishing  # load the package
[ Info: Precompiling CellFishing [5ab3512e-c64d-48f6-b1c0-509c1121fdda]

julia>

No error messages mean you have successfully installed CellFishing.jl.

To run unit tests, execute the following command:

$ julia -e 'using Pkg; Pkg.test("CellFishing")'

Command-line interface (WIP)

The bin/cellfishing script is a command-line interface to CellFishing.jl.

$ ./bin/cellfishing build Plass2018.dge.loom
Build a search database from Plass2018.dge.loom.
  Loading data โ€•โ€•โ€•โ€•โ€•โ€•โ€•โ€•โ€•โ€•โ€•โ€• 13 seconds, 173 milliseconds
  Selecting features โ€•โ€•โ€•โ€•โ€•โ€• 1 second, 376 milliseconds
  Creating a database โ€•โ€•โ€•โ€•โ€• 16 seconds, 418 milliseconds
  Writing the database โ€•โ€•โ€•โ€• 659 milliseconds
The serialized database is in Plass2018.dge.loom.cf.
$ ./bin/cellfishing search Plass2018.dge.loom.cf Plass2018.dge.loom >neighbors.tsv
Search Plass2018.dge.loom.cf for 10 neighbors.
  Loading the database โ€•โ€•โ€•โ€• 512 milliseconds
  Loading query data โ€•โ€•โ€•โ€•โ€•โ€• 12 seconds, 960 milliseconds
  Searching the database โ€•โ€• 31 seconds, 821 milliseconds
  Writing neighbors โ€•โ€•โ€•โ€•โ€•โ€•โ€• 64 milliseconds
$ head -5 neighbors.tsv | cut -f1-3
plan1_GACTTTCTCTTC      plan1_GACTTTCTCTTC      h2b_TTTTGCTACGGG
plan1_GTAAGGCGACAN      plan1_GTAAGGCGACAN      gfp_ATTCCTAGCGAT
plan1_TGGCCCAGCTGC      plan1_TGGCCCAGCTGC      plan1_GACTTTCTCTTC
plan1_CTCCTGTAATTT      plan1_CTCCTGTAATTT      plan1_ATCCTCCATTAA
plan1_ATGACGCATAAT      plan1_ATGACGCATAAT      plan1_TACTTGACGGTA