Hands On OpenCL by HandsOnOpenCL

Welcome

Hands On OpenCL is a two-day lecture course introducing OpenCL, the API for writing heterogeneous applications. Provided are slides for around twelve lectures, plus some appendices, complete with Examples and Solutions in C, C++ and Python. The lecture series finishes with information on porting CUDA applications to OpenCL.

This set of freely available OpenCL exercises and solutions, together with slides have been created by Simon McIntosh-Smith and Tom Deakin from the University of Bristol in the UK, with financial support from the Khronos Initiative for Training and Education (KITE) to promote the use of open standards.

Simon McIntosh-Smith is one of the foremost OpenCL trainers in the world, having taught the subject since 2009. He has run many OpenCL training courses at conferences such as SuperComputing and HiPEAC, and has provided OpenCL training for the UK's national supercomputing service and for the Barcelona Supercomputing Center. With OpenCL training experience ranging from half day on-site introductions within companies, to two-day intensive hands-on workshops for undergraduates, Simon can provide customized OpenCL training to meet your needs. Get in touch if you'd like to know more:

For more about the authors, please visit Simon's home page or Tom's home page.

These lectures, and their examples, and released under the "attribution CC BY" creative commons license. In other words, you can use these in any way you see fit, including commercially, but please retain an attribution for the original authors, Simon McIntosh-Smith and Tom Deakin.

Get the slides and code

The slides are available under Releases. The code is available in the Exercises and Solutions repository.

Course Structure

Introduction to Heterogeneous Parallel Computing

Setting up your OpenCL environment (AMD, Intel, NVIDIA)
An overview of OpenCL
Important OpenCL concepts

Platforms, contexts, programs, queues, buffers and kernels

NDRanges, Work‐Groups, Work-Items
Overview of OpenCL APIs

C, C++ and Python
Introducing OpenCL kernel programming
Understanding the OpenCL memory hierarchy
Synchronization in OpenCL

Events and barriers
Heterogeneous computing with OpenCL

Using CPUs and GPUs simultaneously, multiple platforms and devices
Enabling portable performance via OpenCL

Autotuning using Flamingo
Optimizing OpenCL performance

Profiling using Extrae and Paraver Information on NVVP and CodeXL
Debugging OpenCL

Using GDB
Porting CUDA to OpenCL

Examples

Download the examples by checking out the git repository with the command:

git clone git://github.com/HandsOnOpenCL/Exercises-Solutions.git

Platform Information

Run a simple OpenCL program to give you some key facts about the devices available in your system.
VADD - The OpenCL "Hello World"

Start by looking at the C API for this program which introduces the OpenCL computational model.
VADD - Now in C++ and Python
Chaining vector add kernels

Extend VADD to compute C=A+B; D=C+E; F=D+G by running the kernel multiple times.
Extend VADD for D = A + B + C

Extend the VADD kernel to compute a different sum.
Matrix Multiplication

Write your first OpenCL kernel from scratch.
Using private memory

Use private memory to minimize memory costs.
Using local memory

Use local and private memory to minimize memory costs.
The Pi program

Estimate Pi by integration.
Heterogeneous Computing

Run your kernels on many devices.
Optimize matrix multiplication

Look at portable performance (combining 9. and 10.)
Profiling OpenCL programs

Experiment making things run faster.
Porting CUDA to OpenCL

Convert a simple CUDA application to OpenCL (program TBA).

Authors and Contributors

Simon McIntosh-Smith, University of Bristol

Tom Deakin (@tomdeakin)

Support or Contact

Found a bug or with to suggest an update to the material? Please submit a new Issue in the relevant repository (Exercises or Slides)

Fixed a bug yourself? Please submit a pull request. Thanks.