## SparseVectors.jl

A Julia package for sparse vectors.
Popularity
10 Stars
Updated Last
10 Months Ago
Started In
May 2015

# SparseVectors

A Julia package to support sparse vectors. ## Overview

Sparse data has become increasingly common in machine learning and related areas. For example, in document analysis, each document is often represented as a sparse vector, which each entry represents the number of occurrences of a certain word. However, the support of sparse vectors remains quite limited in Julia base.

This package provides two types `SparseVector` and `SparseVectorView` and a series of methods to work with sparse vectors. Specifically, this package provides the following functionalities:

• Construction of sparse vectors, either with given non-zero entries or randomly.
• Get a view of a column in a sparse matrix (of CSC format), or a view of a range of columns.
• Specialized arithmetic functions on sparse vectors, e.g. `+`, `-`, `*`, etc.
• Specialized math functions on sparse vectors, e.g. `abs`, `abs2`, `exp`, `sin`, etc.
• Specialized reduction functions on sparse vectors, e.g. `sum`, `vecnorm`, etc.
• Specialized linear algebraic functions, e.g. `axpy!`, `dot`, `A * x`, `At_mul_B`, etc.

Note: Many of the functionalities implemented in this package may be migrated to Julia Base in `v0.5` development cycle.

## Types

This package defines two types.

• `SparseVector`: a sparse vector that owns its memory
• `SparseVectorView`: a view of external data as a sparse vector.

The formal definition of these types are listed below:

```immutable SparseVector{Tv,Ti<:Integer} <: AbstractSparseVector{Tv,Ti}
n::Int              # the number of elements
nzind::Vector{Ti}   # the indices of nonzeros
nzval::Vector{Tv}   # the values of nonzeros
end

typealias CVecView{T} ContiguousView{T,1,Vector{T}}

immutable SparseVectorView{Tv,Ti<:Integer} <: AbstractSparseVector{Tv,Ti}
n::Int                  # the number of elements
nzind::CVecView{Ti}     # the indices of nonzeros
nzval::CVecView{Tv}     # the values of nonzeros
end```

## Constructors

An instance of `SparseVector` can be constructed as follows:

`SparseVector(n, nzind, nzval)  # constructs an instance by providing all fields`

Here, all inputs will be used as fields as they are. The constructor will ensure that `length(nzind) == length(nzval)`. However, it will NOT examine the elements of `nzind` (e.g. the indexes are sorted, without duplication, and within the range `1:n`).

The package also provides a `sparsevector` function to construct a sparse vector in a variety of ways:

#### zero sparse vector

```# construct a zerp sparse vector of length `len` and element type `T`
sparsevector(T, len)  ```

#### with given lists of non-zero entries

The following methods construct a sparse vector of length `len`, with:

• non-zero indices, given by an integer vector `I`
• non-zero values, given by `V` (either a vector or a number)

Note:

• When `len` is omitted, it is determined by `maximum(I)`.
• Multiple values are allowed to be corresponding to the same index. These values are combined with a binary function/functor `combine`. When it is omitted, `AddFun` is used by default, meaning that the values are summed.
```sparsevector(I, V, len, combine)
sparsevector(I, V, combine)
sparsevector(I, V, len)
sparsevector(I, V)```

#### from an associative collection (e.g. Dict)

```sparsevector(a, len)  # a is an instance of `Associative{Ti<:Integer, Tv}`
sparsevector(a)  # length inferred as the maximum index```

#### random sparse vector

```sprand(n, p)       # construct a random sparse vector with length n and density p
# the non-zero values are generated using rand(nnz)

sprand(n, p, T)    # construct a random sparse vector of element type T
# the non-zero values are generated using rand(T, nnz)

sprandn(n, p)      # construct a random sparse vector,
# where values follow standard Normal distribution

sprand(n, p, rfn)  # construct a random sparse vector,
# where the non-zero values are generated using rfn```

## Basic methods

Like other array types, `SparseVector` and `SparseVectorView` support all the basic methods for arrays:

```eltype(x)   # get the element type
ndims(x)    # get the number of dimensions (1)
length(x)   # get the length
size(x)     # get the size, i.e. (length(x),)

x[I]        # getindex with index(es) I,
# where I can be either an integer or an integer array
x[i] = v    # set the i-th element of x to v```

They also provide methods for extracting internal data structures:

```nnz(x)          # the number of stored entries
countnz(x)      # count the actual number of nonzero entries
nonzeroinds(x)  # get the vector of indices of non-zero values
nonzeros(x)     # get the vector of non-zero values```

## Conversion

The package supports conversion between sparse vectors and other types of arrays.

```convert(SparseVector, s)  # convert s to an instance of SparseVector
# s can be an instance of Vector or SparseMatrixCSC

convert(SparseVector{Tv}, s)  # convert the element-type of s to Tv
# where s is an instance of SparseVector

convert(SparseVector{Tv,Ti}, s)  # convert the element-type of s to Tv,
# and the index-type of s to Ti,
# where s is an instance of SparseVector

convert(SparseMatrixCSC, v)  # convert a sparse vector v to a sparse matrix
# with a single column

convert(SparseMatrixCSC{Tv}, v)
convert(SparseMatrixCSC{Tv,Ti}, v)```

## Views

The package provides methods to obtain views of sparse vectors or individual columns of a sparse matrix:

```view(x)   # construct a SparseVectorView instance as a view of x
# where x is an instance of SparseVector

view(A, :, i)   # construct a view of the i-th column of X
# where X is an instance of SparseMatrixCSC
# returns a instance of SparseVectorView

getcol(A, i)   # get a copy of the i-th column of a sparse matrix A
# returns an instance of SparseVector

unsafe_colrange(A, i1:i2)  # construct an unsafe view of a range of columns
# i.e. from the i1-th to i2-th column.
# returns an instance of SparseMatrixCSC```

Note: `unsafe_colrange` uses `pointer_to_array` to obtain the internal vectors, and therefore the returned array should only be used within the local scope.

## Math Functions

The package implement a number of specialized math functions for sparse vectors.

#### Arithmetics

```scale!(x, c)   # x <- x * c, where c is a scalar
scale!(c, x)   # i.e. scale!(x, c)
scale(x, c)    # returns x * c
scale(c, x)    # i.e. scale(x, c)

x * c, x .* c  # multiple x and a scalar c
c * x, c .* x  # i.e. x * c

- x            # negate x
x + y, x .+ y  # add x and y, x and y can be either dense or sparse
x - y, x .- y  # subtract y from x, x and y can be either dense or sparse
x .* y         # multiply x and y (element-wise),
# x and y can be either dense or sparse

axpy!(a, x, y)  # y <- y + a * x
# a: a scalar number
# x: a sparse vector
# y: a dense vector```

#### Element-wise math functions

```# Input: (sparse, sparse) --> Output: sparse
# Input: (sparse, dense)  --> Output: dense
# Input: (dense, sparse)  --> Output: sparse

max(x, y), min(x, y)
complex(x, y)

# zero-preserving functions
#   Input: sparse --> Output: sparse

abs(x), abs2(x)
real(x), imag(x), conj(x)
floor(x), ceil(x), trunc(x), round(x)
log1p(x), expm1(x),
sin(x), tan(x), sinpi(x), sind(x), tand(x),
asin(x), atan(x), asind(x), atand(x),
sinh(x), tanh(x), asinh(x), atanh(x)

# Non-zero-preserving functions
#   Input: sparse --> Output: dense

exp(x), exp2(x), exp10(x), log(x), log2(x), log10(x),
cos(x), csc(x), cot(x), sec(x), cospi(x),
cosd(x), cscd(x), cotd(x), secd(x),
acos(x), acot(x), acosd(x), acotd(x),
cosh(x), csch(x), coth(x), sech(x),
acsch(x), asech(x)```

## Reduction

```sum(x)      # Compute the sum of elements
sumabs(x)   # Compute the sum of absolute values
sumabs2(x)  # Compute the sum of squared absolute values
maximum(x)  # Compute the maximum of elements (including zeros)
minimum(x)  # Compute the minimum of elements (including zeros)
maxabs(x)   # Compute the maximum of absolute values
minabs(x)   # Compute the minimum of absolute values

vecnorm(x, p=2)  # Compute the p-th order vector-norm

dot(x, y)   # Compute the dot product between x and y
# x and y can be either dense or sparse vectors```

## Linear Algebra: Matrix-vector product

```# Note: the product is dense iff A is dense
A * x             # A * x (matrix-vector product)
At_mul_B(A, x)    # A.' * x, without explicitly transposing A
Ac_mul_B(A, x)    # A' * x, without explicitly conjugate-transposing A

# If you want to get a dense result even when both A and x are sparse
# then you can write:
densemv(A, x)               # A * x --> dense vector
densemv(A, x; trans='N')    # A * x, as above
densemv(A, x; trans='T')    # transpose(A) * x -> dense vector
densemv(A, x; trans='C')    # ctranspose(A) * x -> dense vector

# Note: the following functions are only for cases where y is a strided vector
A_mul_B!(y, A, x)         # y <- A * x
A_mul_B!(a, A, x, b, y)   # y <- a * A * x + b * y
At_mul_B(y, A, x)         # y <- A.' * x
At_mul_B!(a, A, x, b, y)  # y <- a * A.' * x + b * y
Ac_mul_B(y, A, x)         # y <- A' * x
Ac_mul_B!(a, A, x, b, y)  # y <- a * A' * x + b * y```