NVIDIA Fundamentals of Accelerated Computing with CUDA Python

This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA® GPUs and the Numba compiler.

Difficulty rating: ★★★★ Advanced

Who is it for?

Both research staff and research students
Developers, data scientists, and researchers looking to solve challenging problems with deep learning and accelerated computing

Summary of the topics covered

GPU-accelerated NumPy ufuncs with a few lines of code.
Configure code parallelization using the CUDA thread hierarchy.
Write custom CUDA device kernels for maximum performance and flexibility.
Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.

Prerequisites

Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
NumPy competency, including the use of ndarrays and ufuncs
No previous knowledge of CUDA programming is required

Frequency

3 times a year

Duration

8 hours

Next course

11th March 2026 09:00 - 17:00 Book here

Can't attend?

We don’t have online materials for this session, but the course will run again — so you’ll be very welcome to join next time. You can find more information about the course on the NVIDIA webpages.