This package provides a function
rand_binomial! to produce
CuArrays with binomially distributed entries, analogous to
CUDA.rand_poisson! for Poisson-distributed ones.
Use the built-in package manager:
import Pkg; Pkg.add("BinomialGPU")
CuArrays with binomial random variates in-place:
using CUDA, BinomialGPU A = CUDA.zeros(Int, 16) rand_binomial!(A, count = 10, prob = 0.5)
The function currently also supports broadcast over arrays of parameters of the same size as the one to be filled:
A = CUDA.zeros(Int, 8) counts = [1,2,4,8,16,32,64,128] probs = CUDA.rand(8) rand_binomial!(A, count = counts, prob = probs)
as well as broadcasts over arrays of parameters whose dimensions are a prefix of the dimensions of A, e.g.
A = CUDA.zeros(Int, (2, 4, 8)) counts = rand(1:128, 2, 4) probs = CUDA.rand(2) rand_binomial!(A, count = counts, prob = probs)
- The speed is slower when using optimal thread allocation than when defaulting to 256 threads. See issue #2
- Are there any other samplers that are comparably fast or faster? I compared the following: sample an array of size
count = 128and
(1024, 1024)with uniformly drawn entries. Timings on an RTX2070 card: BinomialGPU.jl 0.8ms, PyTorch 11ms, CuPy 18ms, tensorflow 400ms. Timings for other samplers are very welcome; please open an issue if you find one.