From VFXPedia

< Eyeon:Script | Reference | Applications | Fuse

Jump to: navigation, search

OpenCL Fuse Reference Manual

[ Main Manual Page ]

OpenCL for Fuses

One or more OpenCL "kernels" are written - functions that are called repeatedly, usually for each pixel. These are stored as a string constant, which is compiled with BuildCachedProgram() in Create().
Source Images or buffers can be uploaded to the OpenCL host with CreateImage() or CreateBuffer().
An empty Image or other buffer should be passed to the OpenCL host using CreateImage() or CreateBuffer() with the "write" flag. This is the destination for the processed data.
CreateKernel() is called to identify the kernel to be used.
Call SetArg() for each argument to pass to the kernel, including images/buffers.
You may need to call SetSize() to specify the buffer dimensions (not needed for Images), and SetWorksgroupSize() can be used to tweak the workgroup dimensions.
To start the processing, RunKernel() executes the kernel for each pixel or element in the destination Image or buffer.
Download() retrieves the processed pixels/data.

A discussion of an example OpenCL fuse can be found here.

API

There’s three main classes you’ll need to use when adding OpenCL support to your fuses:

OCLManager
OCLProgram
OCLMemory

OCLManager

Functions

mgr = OCLManager()

Get the global manager

OCLProgram = BuildProgram(string sourcecode [, number sourcelen [, string compileoptions]])

Compiles the OpenCL program source, which may take some time (often 10-15 seconds). Compile errors are output to the Console view (error line numbers may be offset). If sourcelen is not specified, it is calculated automatically. Compileoptions are as per OpenCL spec. Note: returns immediately; compiling happens on a background thread.

OCLProgram = BuildCachedProgram(string ID, string sourcefile [, string sourcecode [, number sourcelen [, string compileoptions]]])
OCLProgram = BuildCachedProgram(Registry regnode [, string sourcecode [, number sourcelen [,string compileoptions]]])

Similar to BuildProgram(), but caches the result to Fusion:OCLPrograms, and can be much quicker (though still a few seconds). ID should be a unique string. Sourcefile may be the path to a .cl file (which is loaded and compiled), or alternatively to a file that contains the source code within it, such as the fuse itself (if the latter, the sourcecode arg must be given). The timestamp on sourcefile is compared against the cached file, and is recompiled if newer. If a Registry node is passed instead of ID and sourcefile, the sourcefile's path is assumed to be "Shaders:<regnode_ID>.cl", or "Fusion:Shaders\<regnode_ID>.cl". In either case, if no cache or source file can be found, sourcecode is compiled instead.

Variables

SupportsImage: True if the OpenCL image2d_t/image3d_t types are supported

GLSharing: True if sharing of OpenCL images and OpenGL textures is supported

FP16: True if the half (float16) type is supported

FP64: True if the double (float64) type is supported

OCLProgram

Functions

kernel = CreateKernel(string kernelname)

Waits until any background compiling is finished, then returns a handle to your kernel function.

success = RunKernel(kernel [, boolean wait])

Runs the kernel. If wait is true or not given, it blocks until the kernel has finished.

SetSize(number width [, number height])

Sets the dimensions of the next RunKernel() call. This determines how many times (width * height) the kernel will be executed, and thus the range of indices returned by get_global_id() will be. If height is not specified, the kernel is executed width times.

SetWorkgroupSize(number x [, number y])

Sets the dimensions the workgroup used for the next RunKernel() call. Workgroups are groups of host processors that share __local memory, and jobs are divided into groups of x * y elements. Setting x and y too small or too large, can cause under-utilisation, reducing performance on some hosts. Values that are an integer fraction of the job size and that match the host's architecture often work best. If not specified, Fusion will attempt to pick reasonable values.

SetArg(kernel, number argindex, value [, value [, value [, value]]])

Passes a value to an argument in your kernel. Argindex is zero-based, so SetArg(kernel, 0, 5.0) passes 5.0 for the first argument. All arguments must be set before running a kernel, and argument types must be matched to the kernel’s args. Up to 4 scalar values can be passed for vector types, e.g. float4. The following Lua/Fusion types are matched to OpenCL types as follows:

Fusion type	OpenCL type
1, 2 or 4 numbers	float, float2, float4
table of 1, 2 or 4 numbers	float, float2, float4
Vector2	float2
Vector4	float4
Matrix4	float16
OCLMemory (from CreateImage())	image2d_t (or FuReadImage_t, FuWriteImage_t)
OCLMemory (from CreateBuffer())	__global pointer to any OpenCL-supported type

SetArgInt(kernel, number argindex, value [, value [, value [, value]]])

Similar to SetArg(), but for setting int/uint arguments, as follows:

Fusion type	OpenCL type
1, 2 or 4 numbers	int, uint, int2, uint2, int4, uint4
table of 1, 2 or 4 numbers	int, uint, int2, uint2, int4, uint4
Vector2	int2, uint2
Vector4	int4, uint4
Matrix4	int16, uint16

SetArgBufferSize(kernel, number argindex, number bufsize)

Used for specifying the size of __local buffers. Bufsize is the number of bytes to allocate.

SetArgDataWindow(kernel, number argindex, Image)

Takes the DataWindow from the given Fusion Image, and passes the left/bottom/width/height to a uint4.

OCLMemory = CreateImage(Image [, string flags])

Creates an OpenCL image2d_t object, and uploads the contents of the given Fusion Image to it. Flags may include any combination of “read”, “write” and “gl” (default is “read”). If “gl” is specified and OpenGL interoperability is supported, an OpenGL texture is created and cached with the Image, the Image contents are uploaded (if not cached) to an OpenGL texture, then shared with OpenCL (Note: some restrictions apply, for example GL_FLOAT GL_BGRA textures are not supported, and GL texture outputs are displayed but not always read back). Otherwise, the Image is uploaded directly each time, and no caching is done.

OCLMemory = CreateBuffer(Image [, string flags])
OCLMemory = CreateBuffer(table, string type [, string flags])

Creates an OpenCL __global buffer of the Image’s pixel type (e.g. float images produce a __global float4 buffer). Flags can currently be “read” or “write”. If a table is specified, the table is uploaded to a __global buffer, as specified by the type string (default is bytes):

Fusion type	OpenCL type
“int8″	bytes
“int16″	shorts
“float16″	halfs
“float32″	floats

Download(OCLMemory, Image)
Download(OCLMemory, table, number size [, string type])

Copies the contents of an OCLMemory image or buffer to a Fusion Image (size & depth must match), or from an OCLMemory buffer to a table of size bytes, as type of “int8″, “int16″, “float16″ or “float32″.

OCLMemory

Functions

Release()

Releases the OpenCL image/buffer. This is called automatically when all references to a local buffer are released, and the variable has gone out of scope. You can use also use this function to release images/buffers sooner if desired.

Fusion Image Types

Types

As a portability convenience, Fusion also supplies the FuReadImageT and FuWriteImageT types to compiled kernels. These are equivalent to OpenCL's image2d_t type, on supporting hosts, and otherwise to float4*. They can be helpful when running the same fuse on a local workstation with a GPU, and on GPU-less render farms. See the following table for precise type mapping:

Fusion type	OpenCL Image type	OpenCL non-Image type
FuReadImageT	__read_only image2d_t	const __global float4 *
FuWriteImageT	__write_only image2d_t	__global float4 *

Functions

Additionally, common image read/write functions that use these types are supplied, which compile to appropriate code for each type of host. Also, OCLProgram's CreateImage() will automatically call CreateBuffer() instead, if the host does not support OpenCL images, so that

float4 FuReadImagef(FuReadImage_t img, int2 coords, int2 size)

Reads a pixel from img at integer pixel (0..w/h) coordinates coords, when the image has size dimensions. Returns the pixel as an RGBA float4 with normalised (0..1) colour values, and returns (0,0,0,0) when reading beyond the image bounds.

float4 FuReadImageCf(FuReadImage_t img, int2 coords, int2 size)

Reads a pixel from img at integer pixel (0..w/h) coordinates coords, when the image has size dimensions. Returns the pixel as an RGBA float4 with normalised (0..1) colour values, and clamps out-of-bound reads (i.e. returns the nearest image edge pixel instead).

float4 FuSampleImagef(FuReadImage_t img, float2 coords, int2 size)

Bilinear-samples a pixel from img at normalised image (0..1) coordinates coords, when the image has size dimensions. Returns the subpixel sample as an RGBA float4 with normalised (0..1) colour values, and returns (0,0,0,0) when reading beyond the image bounds.

float4 FuSampleImageCf(FuReadImage_t img, float2 coords, int2 size)

Bilinear-samples a pixel from img at normalised image (0..1) coordinates coords, when the image has size dimensions. Returns the subpixel sample as an RGBA float4 with normalised (0..1) colour values, and clamps out-of-bound reads (i.e. returns the nearest image edge pixel instead).

FuWriteImagef(FuWriteImage_t img, int2 coords, int2 size, float4 pix)

Writes the pixel pix to img at integer pixel (0..w/h) coordinates coords, when the image has size dimensions. Writing beyond the image bounds is an undefined operation, and may cause an error or even crash on some hosts.

These functions are equivalent to native OpenCL functions with sampler_t flags as follows:

Fusion function	OpenCL function	sampler_t flags
FuReadImagef()	read_imagef()	CLK_NORMALISED_COORDS_FALSE, CLK_ADDRESS_CLAMP, CLK_FILTER_NEAREST
FuReadImageCf()	read_imagef()	CLK_NORMALISED_COORDS_FALSE, CLK_ADDRESS_CLAMP_TO_EDGE, CLK_FILTER_NEAREST
FuSampleImagef()	read_imagef()	CLK_NORMALISED_COORDS_TRUE, CLK_ADDRESS_CLAMP, CLK_FILTER_LINEAR
FuSampleImageCf()	read_imagef()	CLK_NORMALISED_COORDS_TRUE, CLK_ADDRESS_CLAMP_TO_EDGE, CLK_FILTER_LINEAR
FuWriteImagef()	write_imagef()	n/a

Support

Hosts that support the OpenCL image2d_t type include nVidia CUDA-enabled chips and ATi 5xxx+ chips (with the Stream SDK 2.1 or later). Hosts that typically do not support any image types include CPUs.

Note that hosts can vary greatly in characteristics. For example, some hosts cache global data (particularly images), speeding up repeated reads, while some hosts do not. Since memory bandwidth and access patterns are often the bottleneck when processing pixels in OpenCL, this can make a big difference to performance, though even slower hosts will likely still be faster than plain interpreted fuses.

The contents of this page are copyright by eyeon Software.

Eyeon:Script/Reference/Applications/Fuse/OpenCL Fuse Reference Manual

From VFXPedia

OpenCL for Fuses

Contents

API

OCLManager

Functions

Variables

OCLProgram

Functions

OCLMemory

Functions

Fusion Image Types

Types

Functions

Support

Views

Personal tools

Navigation

external links

Search

Toolbox