NVIDIA Fundamentals of Accelerated Computing with CUDA Python

This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA® GPUs and the Numba compiler.

Difficulty rating: ★★★★ Advanced

Who is it for?

  • Both research staff and research students
  • Developers, data scientists, and researchers looking to solve challenging problems with deep learning and accelerated computing

Summary of the topics covered

  • GPU-accelerated NumPy ufuncs with a few lines of code.
  • Configure code parallelization using the CUDA thread hierarchy.
  • Write custom CUDA device kernels for maximum performance and flexibility.
  • Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.

Prerequisites

  • Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
  • NumPy competency, including the use of ndarrays and ufuncs
  • No previous knowledge of CUDA programming is required

Frequency

3 times a year

Duration

8 hours

Next course

4th December 2025 Book here

Can't attend?

We don’t have online materials for this session, but the course will run again — so you’ll be very welcome to join next time. You can find more information about the course on the NVIDIA webpages.