GPGPU

GPGPU stands for General-purpose computing on graphics processing units.

OpenCL

OpenCL (Open Computing Language) is an open, royalty-free parallel programming specification developed by the Khronos Group, a non-profit consortium.

The OpenCL specification describes a programming language, a general environment that is required to be present, and a C API to enable programmers to call into this environment.

Tip: The clinfo utility can be used to list OpenCL platforms, devices present and ICD loader properties.

Runtime

To execute programs that use OpenCL, a compatible hardware runtime needs to be installed.

AMD/ATI

opencl-mesa: free runtime for AMDGPU and Radeon
opencl-amd^AUR, opencl-amd-dev^AUR: ROCr OpenCL and legacy OpenCL (a.k.a. orca) repackaged from AMD's ubuntu releases (equivalent to specifying opencl=rocr,legacy in ubuntu's amdgpu-install)
opencl-legacy-amdgpu-pro^AUR: The legacy OpenCL (a.k.a. orca) repackaged from AMD's ubuntu releases (equivalent to specifying opencl=legacy in ubuntu's amdgpu-install)
: Part of AMD's ROCm GPU compute stack, officially supporting GFX8 and later cards (Fiji, Polaris, Vega), with unofficial and partial support for Navi10 based cards (this is similar, but not equivalent to specifying in ubuntu's amdgpu-install, because this package's rocm version differs from ubuntu's installer version). To support cards older than Vega you need to set the runtime variable .
: AMD CPU runtime

NVIDIA

opencl-nvidia: official NVIDIA runtime

Intel

: a.k.a. the Neo OpenCL runtime, the open-source implementation for Intel HD Graphics GPU on Gen8 (Broadwell) and beyond.
: the open-source implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy hardware platforms (e.g. Ivy Bridge, Haswell).
: the proprietary implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy hardware platforms (e.g. Ivy Bridge, Haswell).
: the implementation for Intel Core and Xeon processors. It also supports non-Intel CPUs.

Others

: LLVM-based OpenCL implementation (hardware independent)

There is compiler and translator enable OpenCL applications to be run over a Vulkan run-time.

: Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders.
clvk: clvk is a prototype implementation of OpenCL 3.0 on top of Vulkan using clspv as the compiler.
xrt-bin^AUR: Xilinx Run Time for FPGA xrt
fpga-runtime-for-opencl:FPGA Runtime

32-bit runtime

To execute 32-bit programs that use OpenCL, a compatible hardware 32-bit runtime needs to be installed.

AMD/ATI

: free runtime for AMDGPU and Radeon (32-bit)
: The legacy OpenCL (a.k.a. orca) repackaged from AMD's ubuntu releases (32-bit)

NVIDIA

lib32-opencl-nvidia: official NVIDIA runtime (32-bit)

ICD loader (libOpenCL.so)

The OpenCL ICD loader is supposed to be a platform-agnostic library that provides the means to load device-specific drivers through the OpenCL API. Most OpenCL vendors provide their own implementation of an OpenCL ICD loader, and these should all work with the other vendors' OpenCL implementations. Unfortunately, most vendors do not provide completely up-to-date ICD loaders, and therefore Arch Linux has decided to provide this library from a separate project () which currently provides a functioning implementation of the current OpenCL API.

The other ICD loader libraries are installed as part of each vendor's SDK. If you want to ensure the ICD loader from the package is used, you can create a file in which adds to the dynamic program loader's search directories:

This is necessary because all the SDKs add their runtime's lib directories to the search path through files.

The available packages containing various OpenCL ICDs are:

: recommended, most up-to-date
by Intel. Provides OpenCL 2.0, deprecated in favour of .

Development

For OpenCL development, the bare minimum additional packages required, are:

: OpenCL ICD loader implementation, up to date with the latest OpenCL specification.
: OpenCL C/C++ API headers.

The vendors' SDKs provide a multitude of tools and support libraries:

intel-opencl-sdk^AUR: Intel OpenCL SDK (old version, new OpenCL SDKs are included in the INDE and Intel Media Server Studio)
: This package is installed as and apart from SDK files it also contains a number of code samples (). It also provides the clinfo utility which lists OpenCL platforms and devices present in the system and displays detailed information about them. As the SDK itself contains a CPU OpenCL driver, no extra driver is needed to execute OpenCL on CPU devices (regardless of its vendor).
: Nvidia's GPU SDK which includes support for OpenCL 1.1.

Implementations

To see which OpenCL implementations are currently active on your system, use the following command:

$ ls /etc/OpenCL/vendors

To find out all possible (known) properties of the OpenCL platform and devices available on the system, install .

Language bindings

JavaScript/HTML5: WebCL
Python:
D: cl4d or DCompute
Java: Aparapi or JOCL (a part of JogAmp)
Mono/.NET: Open Toolkit
Go: OpenCL bindings for Go
Racket: Racket has a native interface on PLaneT that can be installed via raco.
Rust: ocl
Julia: OpenCL.jl

SYCL

SYCL is another open and royalty-free standard by the Khronos Group that defines a single-source heterogeneous programming model for C++ on top of OpenCL 1.2.

SYCL consists of a runtime part and a C++ device compiler. The device compiler may target any number and kind of accelerators. The runtime is required to fall back to a pure CPU code path in case no OpenCL implementation can be found.

Implementations

Codeplay's proprietary implementation of SYCL 1.2.1. Can target SPIR, SPIR-V and experimentally PTX (NVIDIA) as device targets.
: Open source implementation mainly driven by Xilinx.
and hipsycl-rocm-git^AUR: Free implementation built over AMD's HIP instead of OpenCL. Is able to run on AMD and NVIDIA GPUs.

Checking for SPIR support

Most SYCL implementations are able to compile the accelerator code to SPIR or SPIR-V. Both are intermediate languages designed by Khronos that can be consumed by an OpenCL driver. To check whether SPIR or SPIR-V are supported can be used:

ComputeCpp additionally ships with a tool that summarizes the relevant system information:

Drivers known to at least partially support SPIR or SPIR-V include , , and .

Development

SYCL requires a working C++11 environment to be set up. There are a few open source libraries available:

ComputeCpp SDK: Collection of code examples, integration for ComputeCpp
SYCL-DNN: Neural network performance primitives
SYCL-BLAS: Linear algebra performance primitives
VisionCpp: Computer Vision library
SYCL Parallel STL: GPU implementation of the C++17 parallel algorithms

CUDA

CUDA (Compute Unified Device Architecture) is NVIDIA's proprietary, closed-source parallel computing architecture and framework. It requires an NVIDIA GPU, and consists of several components:

Required:
- Proprietary NVIDIA kernel module
- CUDA "driver" and "runtime" libraries
Optional:
- Additional libraries: CUBLAS, CUFFT, CUSPARSE, etc.
- CUDA toolkit, including the compiler
- CUDA SDK, which contains many code samples and examples of CUDA and OpenCL programs

The kernel module and CUDA "driver" library are shipped in and opencl-nvidia. The "runtime" library and the rest of the CUDA toolkit are available in . needs ncurses5-compat-libs^AUR to be installed, see .

Development

The package installs all components in the directory . For compiling CUDA code, add to your include path in the compiler instructions. For example, this can be accomplished by adding to the compiler flags/options. To use , a wrapper provided by NVIDIA, add /opt/cuda/bin to your path.

To find whether the installation was successful and whether CUDA is up and running, you can compile the CUDA samples. One way to check the installation is to run the sample.

Language bindings

Fortran: PGI CUDA Fortran Compiler
Haskell: The accelerate package lists available CUDA backends
Java: JCuda
Mathematica: CUDAlink
Mono/.NET: CUDAfy.NET, managedCuda
Perl: KappaCUDA, CUDA-Minimal
Python:
Ruby: rbcuda
Rust: cuda-sys (bindings) or RustaCUDA (high-level wrapper)

ROCm

ROCm (Radeon Open Compute) is AMD's open-source parallel computing architecture and framework. Although it requires an AMD GPU some ROCm tools are hardware agnostic. See the ROCm for Arch Linux repository for more information and installation instructions.

HIP

The Heterogeneous Interface for Portability (HIP) is AMD's dedicated GPU programming environment for designing high performance kernels on GPU hardware. HIP is a C++ runtime API and programming language that allows developers to create portable applications on different platforms.

: The base runtime, packages to run HIP applications on the AMD platform.
: The Heterogeneous Interface for AMDGPUs in ROCm. Supports GPUs from the polaris architecture (RX 500 series) till AMD's latest RDNA 2 architecture (RX 6000 series)
hip-runtime-nvidia^AUR: The Heterogeneous Interface for NVIDIA GPUs in ROCm.

OpenMP

The package provides AOMP - an open source Clang/LLVM based compiler with added support for the OpenMP API on AMD GPUs.

OpenCL

The package is the part of the ROCm framework providing an OpenCL runtime.

OpenCL image support

The latest ROCm versions now includes OpenCL Image Support used by GPGPU accelerated software such as Darktable. ROCm with the AMDGPU open source graphics driver are all that is required. AMDGPU PRO is not required.

List of GPGPU accelerated software

Bitcoin
Blender – CUDA support for Nvidia GPUs and HIP support for AMD GPUs. More information here.
BOINC
FFmpeg – more information here.
Folding@home
GIMP – experimental – more information here.
HandBrake
Hashcat
LibreOffice Calc – more information here.
– Find all possible (known) properties of the OpenCL platform and devices available on the system.
– a GPU memtest. Despite its name, is supports both CUDA and OpenCL.
– OpenCL feature requires at least 1 GB RAM on GPU and Image support (check output of clinfo command).
DaVinci Resolve - a non-linear video editor. Can use both OpenCL and CUDA.
lc0^AUR - Used for searching the neural network (supports tensorflow, OpenCL, CUDA, and openblas)
pyrit^AUR
- PyTorch with CUDA backend
- Port of TensorFlow to CUDA
- Port of TensorFlow to SYCL
- High Perf CryptoNote CPU and GPU (OpenCL, CUDA) miner

gollark: A Redmi AI 7 Dual Camera, of course.

gollark: Maybe you should GF2P8INVAFFINEQB the images.

gollark: Why *do* your images *contain* that watermark?

gollark: Those don't look like GF2P8INVAFFINEQB.

gollark: I assumed that was obvious.

GPGPU

OpenCL

Runtime

AMD/ATI

NVIDIA

Intel

Others

32-bit runtime

AMD/ATI

NVIDIA

ICD loader (libOpenCL.so)

Development

Implementations

Language bindings

SYCL

Implementations

Checking for SPIR support

Development

CUDA

Development

Language bindings

ROCm

HIP

OpenMP

OpenCL

OpenCL image support

List of GPGPU accelerated software

See also