Data Engineering

NVIDIA Deep Learning Institute Workshops

Accelerating Data Engineering Pipelines

NVIDIA DLI offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning and accelerated computing.

About This Workshop

Data Engineering is the foundation of data science and lays the groundwork for analysis and modelling. In order for organisations to extract knowledge and insights from structured and unstructured data, fast access to accurate and complete datasets is critical. Working with massive amounts of data from disparate sources requires complex infrastructure and expertise. Minor inefficiencies can result in major costs, both in terms of time and money, when scaled across millions to trillions of data points.

In this workshop, we’ll explore how GPUs can improve data pipelines and how using advanced data engineering tools and techniques can result in significant performance acceleration. Faster pipelines produce fresher dashboards and machine learning (ML) models, so users can have the most current information at their fingertips.

Learning Objectives

In this workshop, you will learn:

  • How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs. How different file formats can be read and manipulated by hardware.
  • How to scale an ETL pipeline with multiple GPUs using NVTabular.
  • How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.

Prerequisites:

  • Intermediate knowledge of Python (list comprehension, objects)
  • Familiarity with pandas a plus
  • Introductory statistics (mean, median, mode)

Workshop Setup Instructions:

Will be provided by email to attendees.

Upcoming Workshops

Details of upcoming workshops are listed on the Training page.