WGPUCompute.jl

Array interface for WGPU
Author JuliaWGPU
Popularity
1 Star
Updated Last
4 Months Ago
Started In
September 2022

WGPUCompute

Stable Dev Build Status Coverage

WIP progress

using BenchmarkTools 
using WGPUCompute

aArray = WgpuArray{Float32}(undef, (1024, 1024, 100)) 
bArray = WgpuArray{Float32}(rand(Float32, (1024, 1024, 100)))

@benchmark copyto!(aArray, 1, bArray, 1, prod(size(aArray)))
BenchmarkTools.Trial: 403 samples with 1 evaluation.
 Range (min … max):  30.041 μs … 15.397 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.710 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.424 ms ±  4.124 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄                                                   ▇█
  █▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁████▇▅▆ ▆
  30 μs        Histogram: log(frequency) by time      15.3 ms <

 Memory estimate: 3.06 KiB, allocs estimate: 96.
julia> using WGPUCompute

julia> y = WgpuArray((rand(4, 4, 1) .-0.5) .|> Float32)
4×4×1 WgpuArray{Float32, 3}:
[:, :, 1] =
 -0.383893   0.16837     0.0140184   0.199563
 -0.336961  -0.192162   -0.179518   -0.0335313
  0.444875   0.0344275  -0.100446    0.498892
 -0.354451  -0.488507   -0.078437   -0.132585

julia> @kernel function Relu(x::WgpuArray{T, N}) where {T, N}
               gIdx = globalId.x * globalId.y + globalId.z
               value = x[gIdx]
               out[gIdx] = max(value, 0.0)
       end
WARNING: Method definition Relu(WGPUCompute.WgpuArray{T, N}) where {T, N} in module Main at REPL[26]:1 overwritten on the same line.
Relu (generic function with 1 method)

julia> Relu(y)
4×4×1 WgpuArray{Float32, 3}:
[:, :, 1] =
 0.0       0.16837    0.0140184  0.199563
 0.0       0.0        0.0        0.0
 0.444875  0.0344275  0.0        0.498892
 0.0       0.0        0.0        0.0

> Internally compute shader is generated like below for Relu
 ┌ Info:struct IOArray {
 │     data:array<f32>
 │ };
 │
 │ @group(0) @binding(0) var<storage, read_write> input0:IOArray ;
 │ @group(0) @binding(1) var<storage, read_write> ouput1:IOArray ;
 │ @compute @workgroup_size(8, 8, 4)
 │ fn Relu(@builtin(global_invocation_id) global_id:vec3<u32>) {
 │     let gIdx = global_id.x * global_id.y+global_id.z;
 │     let value = input0.data[gIdx];
 │     ouput1.data[gIdx] = max(value, 0.0);
 │ }
 └

Issues in example above.

  • globalDim is not supported yet. As a temporary fix global_id.y is used.
  • in generated shader code, @workgroup_size is hardcoded.

Known issues

  • Currently kernel macro has hardcoded sections and would expect :Relu as main function name. And it is being currently worked on.
  • Doesn't have an api to pass @workgroup_size and @dispath_size yet. The example above is hardcoded.
  • jupyter notebooks are not supported yet because of some softscope issues.

TODO

  • Fix known issues.
  • Explore possibility of JSServe the generated wgsl code in web app.
  • Complete SPIRV version
  • Explore and adhere to Binary generation eventually.

Used By Packages

No packages found.