OpenCL.jl
OpenCL bindings for Julia
Julia interface for the OpenCL parallel computation API
This package aims to be a complete solution for OpenCL programming in Julia, similar in scope to PyOpenCL for Python. It provides a high level api for OpenCL to make programing GPU's and multicore CPU's much less onerous.
OpenCL.jl provides access to OpenCL API versions 1.0, 1.1, 1.2 and 2.0.
This package is based off the work of others:
- PyOpenCL by Andreas Klockner
- oclpb by Sean Ross
- Boost.Compute by Kyle Lutz
- rust-opencl
OpenCL.jl has had contributions from many developers.
Currently supported Julia versions
- Julia
v"0.4.x"
is supported on therelease-0.4
branch and the OpenCL.jl versionsv"0.4.x"
. Only bug-fixes will be applied. - Julia
v"0.5.x"
is supported on themaster
branch and the OpenCL.jl versionsv"0.5.x"
. - Julia
v"0.6.x"
is experimentally supported on themaster
branch and the OpenCL.jl versionsv"0.5.x"
.
Discontinued support
- Julia
v"0.3.x"
was supported on OpenCL.jl versionsv"0.3.x"
. It should still be installable and work.
Setup
- Install an OpenCL driver. If you use OSX, OpenCL is already available
- Checkout the packages from the Julia repl
Pkg.add("OpenCL")
- OpenCL will be installed in your
.julia
directory cd
into your.julia
directory to run the tests and try out the examples- To update to the latest development version, from the Julia repl:
Pkg.update()
IJulia Notebooks
Quick Example
using LinearAlgebra
using OpenCL
const sum_kernel = "
__kernel void sum(__global const float *a,
__global const float *b,
__global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
"
a = rand(Float32, 50_000)
b = rand(Float32, 50_000)
device, ctx, queue = cl.create_compute_context()
a_buff = cl.Buffer(Float32, ctx, (:r, :copy), hostbuf=a)
b_buff = cl.Buffer(Float32, ctx, (:r, :copy), hostbuf=b)
c_buff = cl.Buffer(Float32, ctx, :w, length(a))
p = cl.Program(ctx, source=sum_kernel) |> cl.build!
k = cl.Kernel(p, "sum")
queue(k, size(a), nothing, a_buff, b_buff, c_buff)
r = cl.read(queue, c_buff)
if isapprox(norm(r - (a+b)), zero(Float32))
@info "Success!"
else
@error "Norm should be 0.0f"
end
Translation
Here's a rough translation between the OpenCL API in C to this Julia version. Optional arguments are indicated by [name?]
(see clCreateBuffer
, for example). For a quick reference to the C version, see the Khronos quick reference card.
Platform and Devices
C | Julia | Notes |
---|---|---|
clGetPlatformIDs |
cl.platforms() |
|
clGetPlatformInfo |
cl.info(platform, :symbol) |
Platform info: :profile , :version , :name , :vendor , :extensions |
clGetDeviceIDs |
cl.devices() , cl.devices(platform) , cl.devices(:type) |
Device types: :all , :cpu , :gpu , :accelerator , :custom , :default |
clGetDeviceInfo |
cl.info(device, :symbol) |
Device info: :driver_version , :version , :profile , :extensions , :platform , :name , :device_type , :has_image_support , :queue_properties , :has_queue_out_of_order_exec , :has_queue_profiling , :has_native_kernel , :vendor_id , :max_compute_units , :max_work_item_size , :max_clock_frequency , :address_bits , :max_read_image_args , :max_write_image_args , :global_mem_size , :max_mem_alloc_size , :max_const_buffer_size , :local_mem_size , :has_local_mem , :host_unified_memory , :available , :compiler_available , :max_work_group_size , :max_work_item_dims , :max_parameter_size , :profiling_timer_resolution , :max_image2d_shape , :max_image3d_shape |
clCreateContext |
cl.context(queue) , cl.context(CLMemObject), cl.context(CLArray)` |
|
clReleaeContext |
cl.release! |
Buffers
C | Julia | Notes |
---|---|---|
clCreateBuffer |
cl.Buffer(type, context, [length?]; [hostbuf?]) , cl.Buffer(type, context, flags, [length?]; [hostbuf?]) |
Memory flags: :rw , :r , :w , :use , :alloc , :copy |
clEnqueueCopyBuffer |
cl.copy!(queue, destination, source) |
|
clEnqueueFillBuffer |
cl.enqueue_fill_buffer(queue, buffer, pattern, offset, nbytesm wait_for) |
|
clEnqueueReadBuffer |
cl.enqueue_read_buffer(queue, buffer, hostbuf, dev_offset, wait_for, is_blocking) |
|
clEnqueueWriteBuffer |
cl.enqueue_write_buffer(queue, buffer, hostbuf, byte_count, offset, wait_for, is_blocking) |
Program Objects
C | Julia | Notes |
---|---|---|
clCreateProgramWithSource |
cl.Program(ctx; source) |
|
clCreateProgramWithBinaries |
cl.Program(ctx; binaries) |
|
clReleaseProgram |
cl.release! |
|
clBuildProgram |
cl.build!(progrm, options) |
|
clGetProgramInfo |
cl.info(program, :symbol) |
Program info: :reference_count , :devices , :context , :num_devices , :source , :binaries , :build_log , :build_status |
Kernel and Event Objects
C | Julia | Notes |
---|---|---|
clCreateKernel |
cl.Kernel(program, "kernel_name") |
|
clGetKernelInfo |
cl.info(kernel, :symbol) |
Kernel info: :name , :num_args , :reference_count , :program , :attributes |
clEnqueueNDRangeKernel |
cl.enqueue_kernel(queue, kernel, global_work_size) , cl.enqueue_kernel(queue, kernel, global_work_size, local_work_size; global_work_offset, wait_on) |
|
clSetKernelArg |
cl.set_arg!(kernel, idx, arg) |
idx starts at 1 |
clCreateUserEvent |
cl.UserEvent(ctx; retain) |
|
clGetEventInfo |
cl.info(event, :symbol) |
Event info: :context , :command_queue , :reference_count , :command_type , :status , :profile_start , :profile_end , :profile_queued , :profile_submit , :profile_duration |
clWaitForEvents |
cl.wait(event) , cl.wait(events) |
|
clEnqueueMarkerWithWaitList |
cl.enqueue_marker_with_wait_list(queue, wait_for) |
|
clEnqueueBarrierWithWaitList |
cl.enqueue_barrier_with_wait_list(queue, wait_for) |