CuCountMap.jl

Fast `StatsBase.countmap` for small types on the GPU via CUDA.jl
Popularity
2 Stars
Updated Last
8 Months Ago
Started In
June 2020

CuCountmap

cucountmap is a faster countmap equivalent utilizing CUDA.jl for Vector{T} where isbits(T) and sizeof(T) <= 2.

Usage

using CuCountMap

v = rand(Int16, 1_000_000)

cucountmap(v) # converts v to cu(v) and then run countmap

using CUDA: cu

cuv = cu(v)
countmap(cuv) # StatsBase.countmap is overloaded for CuArrays
Dict{Int16,Int64} with 65536 entries:
  -13838 => 17
  22035  => 12
  -15285 => 19
  -13843 => 12
  -18190 => 19
  -20309 => 11
  19698  => 11
  -8633  => 20
  -17455 => 12
  -16936 => 22
  29981  => 14
  -20376 => 15
  7237   => 20
  -27415 => 10
  17959  => 17
  27248  => 17
  -32758 => 17
  -13400 => 17
  5784   => 10
  ⋮      => ⋮

Example & Benchmarks

using CUDA
using CuCountMap
using StatsBase: countmap

v = rand(Int16, 10_000_000);

using BenchmarkTools

cpu_to_gpu_benchmark = @benchmark gpu_countmap = cucountmap($v)
BenchmarkTools.Trial: 
  memory estimate:  4.17 MiB
  allocs estimate:  76
  --------------
  minimum time:     4.751 ms (0.00% GC)
  median time:      4.974 ms (0.00% GC)
  mean time:        5.320 ms (3.50% GC)
  maximum time:     14.950 ms (55.27% GC)
  --------------
  samples:          940
  evals/sample:     1
cpu_to_cpu_benchmark = @benchmark cpu_countmap = countmap($v)
BenchmarkTools.Trial: 
  memory estimate:  4.17 MiB
  allocs estimate:  37
  --------------
  minimum time:     14.915 ms (0.00% GC)
  median time:      15.344 ms (0.00% GC)
  mean time:        15.670 ms (1.06% GC)
  maximum time:     22.093 ms (28.90% GC)
  --------------
  samples:          320
  evals/sample:     1
cuv = CUDA.cu(v)
gpu_to_gpu_benchmark = @benchmark gpu_countmap2 = countmap(cuv)
BenchmarkTools.Trial: 
  memory estimate:  4.17 MiB
  allocs estimate:  64
  --------------
  minimum time:     2.512 ms (0.00% GC)
  median time:      2.692 ms (0.00% GC)
  mean time:        2.984 ms (5.91% GC)
  maximum time:     17.421 ms (73.12% GC)
  --------------
  samples:          1675
  evals/sample:     1

Benchmark Plot

using Plots
using Statistics: mean

cpu_to_gpu = mean(cpu_to_gpu_benchmark.times)/1000/1000
gpu_to_gpu = mean(gpu_to_gpu_benchmark.times)/1000/1000
cpu_to_cpu = mean(cpu_to_cpu_benchmark.times)/1000/1000

plot(
["CPU Array on CPU \n countmap(v)", "convert CPU Array to GPU array on GPU \n cucountmap(cu(v))", "GPU array on GPU \n cucountmap(cuv)"],
[cpu_to_cpu, cpu_to_gpu, gpu_to_gpu],
seriestypes = :bar, title="CuCountMap.cucountmap vs StatsBase.countmap", label="ms",
legendtitle="Mean time")

Used By Packages

No packages found.