AMD GPU Programming Training Course

ROCm is an open-source platform designed for GPU programming that supports AMD GPUs and also ensures compatibility with CUDA and OpenCL. ROCm provides programmers with direct access to hardware details, giving them full control over the parallelization process. However, this level of control requires a solid understanding of device architecture, memory models, execution models, and optimization techniques.

HIP is a C++ runtime API and kernel language that enables developers to write portable code capable of running on both AMD and NVIDIA GPUs. HIP offers a thin abstraction layer over native GPU APIs like ROCm and CUDA, allowing users to leverage existing GPU libraries and tools effectively.

This instructor-led, live training (available online or onsite) is targeted at beginner to intermediate-level developers who are interested in using ROCm and HIP for programming AMD GPUs and harnessing their parallel capabilities.

By the end of this training, participants will be able to:

Set up a development environment that includes the ROCm Platform, an AMD GPU, and Visual Studio Code.
Create a basic ROCm program that performs vector addition on the GPU and retrieves results from GPU memory.
Utilize the ROCm API to query device information, manage device memory allocation and deallocation, transfer data between host and device, launch kernels, and synchronize threads.
Write HIP language kernels that execute on the GPU and manipulate data.
Leverage HIP built-in functions, variables, and libraries for common tasks and operations.
Optimize data transfers and memory accesses using ROCm and HIP memory spaces such as global, shared, constant, and local.
Control threads, blocks, and grids to define parallelism using ROCm and HIP execution models.
Debug and test ROCm and HIP programs using tools like the ROCm Debugger and ROCm Profiler.
Optimize ROCm and HIP programs using techniques such as coalescing, caching, prefetching, and profiling.

Format of the Course

Interactive lectures and discussions.
Extensive exercises and practice sessions.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

This course is available as onsite live training in Uzbekistan or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

What is ROCm?
What is HIP?
ROCm vs CUDA vs OpenCL
Overview of ROCm and HIP features and architecture
Setting up the Development Environment

Getting Started

Creating a new ROCm project using Visual Studio Code
Exploring the project structure and files
Compiling and running the program
Displaying the output using printf and fprintf

ROCm API

Understanding the role of ROCm API in the host program
Using ROCm API to query device information and capabilities
Using ROCm API to allocate and deallocate device memory
Using ROCm API to copy data between host and device
Using ROCm API to launch kernels and synchronize threads
Using ROCm API to handle errors and exceptions

HIP Language

Understanding the role of HIP language in the device program
Using HIP language to write kernels that execute on the GPU and manipulate data
Using HIP data types, qualifiers, operators, and expressions
Using HIP built-in functions, variables, and libraries to perform common tasks and operations

ROCm and HIP Memory Model

Understanding the difference between host and device memory models
Using ROCm and HIP memory spaces, such as global, shared, constant, and local
Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, etc.
Using ROCm and HIP memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

Understanding the difference between host and device execution models
Using ROCm and HIP threads, blocks, and grids to define the parallelism
Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, etc.
Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

Understanding the common errors and bugs in ROCm and HIP programs
Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

Understanding the factors that affect the performance of ROCm and HIP programs
Using ROCm and HIP coalescing techniques to improve memory throughput
Using ROCm and HIP caching and prefetching techniques to reduce memory latency
Using ROCm and HIP shared memory and local memory techniques to optimize memory accesses and bandwidth
Using ROCm and HIP profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Requirements

An understanding of C/C++ language and parallel programming concepts
Basic knowledge of computer architecture and memory hierarchy
Experience with command-line tools and code editors

Audience

Developers who wish to learn how to use ROCm and HIP to program AMD GPUs and exploit their parallelism
Developers who wish to write high-performance and scalable code that can run on different AMD devices
Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance

28 Hours

Need help picking the right course?

AMD GPU Programming Training Course

Course Outline

Requirements

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

AMD GPU Programming Training Course

Course Outline

Requirements

Related Courses

Developing AI Applications with Huawei Ascend and CANN

Deploying AI Models with CANN and Ascend AI Processors

GPU Programming on Biren AI Accelerators

Cambricon MLU Development with BANGPy and Neuware

Introduction to CANN for AI Framework Developers

CANN for Edge AI Deployment

Understanding Huawei’s AI Compute Stack: From CANN to MindSpore

Optimizing Neural Network Performance with CANN SDK

CANN SDK for Computer Vision and NLP Pipelines

Building Custom AI Operators with CANN TIK and TVM

Migrating CUDA Applications to Chinese GPU Architectures

Performance Optimization on Ascend, Biren, and Cambricon

Related Categories

GPU

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites