Hands On OpenCL is a two-day lecture course introducing OpenCL, the API for writing heterogeneous applications. Provided are slides for around twelve lectures, plus some appendices, complete with Examples and Solutions in C, C++ and Python. The lecture series finishes with information on porting CUDA applications to OpenCL.
This set of freely available OpenCL exercises and solutions, together with slides have been created by Simon McIntosh-Smith and Tom Deakin from the University of Bristol in the UK, with financial support from the Khronos Initiative for Training and Education (KITE) to promote the use of open standards.
Simon McIntosh-Smith is one of the foremost OpenCL trainers in the world, having taught the subject since 2009. He has run many OpenCL training courses at conferences such as SuperComputing and HiPEAC, and has provided OpenCL training for the UK's national supercomputing service and for the Barcelona Supercomputing Center. With OpenCL training experience ranging from half day on-site introductions within companies, to two-day intensive hands-on workshops for undergraduates, Simon can provide customized OpenCL training to meet your needs. Get in touch if you'd like to know more:
These lectures, and their examples, and released under the "attribution CC BY" creative commons license. In other words, you can use these in any way you see fit, including commercially, but please retain an attribution for the original authors, Simon McIntosh-Smith and Tom Deakin.
Introduction to Heterogeneous Parallel Computing
Setting up your OpenCL environment (AMD, Intel, NVIDIA)
An overview of OpenCL
Important OpenCL concepts
Platforms, contexts, programs, queues, buffers and kernels
NDRanges, Work‐Groups, Work-Items
Overview of OpenCL APIs
C, C++ and Python
Introducing OpenCL kernel programming
Understanding the OpenCL memory hierarchy
Synchronization in OpenCL
Events and barriers
Heterogeneous computing with OpenCL
Using CPUs and GPUs simultaneously, multiple platforms and devices
Enabling portable performance via OpenCL
Autotuning using Flamingo
Optimizing OpenCL performance
Profiling using Extrae and Paraver Information on NVVP and CodeXL
Porting CUDA to OpenCL
Download the examples by checking out the git repository with the command:
git clone git://github.com/HandsOnOpenCL/Exercises-Solutions.git
Run a simple OpenCL program to give you some key facts about the devices available in your system.
VADD - The OpenCL "Hello World"
Start by looking at the C API for this program which introduces the OpenCL computational model.
VADD - Now in C++ and Python
Chaining vector add kernels
Extend VADD to compute C=A+B; D=C+E; F=D+G by running the kernel multiple times.
Extend VADD for D = A + B + C
Extend the VADD kernel to compute a different sum.
Write your first OpenCL kernel from scratch.
Using private memory
Use private memory to minimize memory costs.
Using local memory
Use local and private memory to minimize memory costs.
The Pi program
Estimate Pi by integration.
Run your kernels on many devices.
Optimize matrix multiplication
Look at portable performance (combining 9. and 10.)
Profiling OpenCL programs
Experiment making things run faster.
Porting CUDA to OpenCL
Convert a simple CUDA application to OpenCL (program TBA).
Simon McIntosh-Smith, University of Bristol
Tom Deakin (@tomdeakin)
Fixed a bug yourself? Please submit a pull request. Thanks.