ROCm for Windows Training Course

ROCm is an open-source platform designed for GPU programming that supports AMD GPUs and offers compatibility with CUDA and OpenCL. This platform provides programmers with direct access to hardware details, granting them full control over the parallelization process. However, it also necessitates a thorough understanding of device architecture, memory models, execution models, and optimization techniques.

ROCm for Windows is a recent advancement that enables users to install and utilize ROCm on the widely used Windows operating system, which is prevalent in both personal and professional settings. With ROCm for Windows, users can harness the power of AMD GPUs for various applications, including artificial intelligence, gaming, graphics, and scientific computing.

This instructor-led, live training (available online or onsite) is targeted at developers with beginner to intermediate levels of experience who wish to install and use ROCm on Windows to program AMD GPUs and leverage their parallel capabilities.

By the end of this training, participants will be able to:

Set up a development environment that includes the ROCm Platform, an AMD GPU, and Visual Studio Code on Windows.
Create a basic ROCm program that performs vector addition on the GPU and retrieves the results from GPU memory.
Utilize the ROCm API to query device information, allocate and deallocate device memory, copy data between the host and device, launch kernels, and synchronize threads.
Write kernels using the HIP language that execute on the GPU and manipulate data.
Leverage HIP built-in functions, variables, and libraries to perform common tasks and operations.
Optimize data transfers and memory accesses using ROCm and HIP memory spaces, such as global, shared, constant, and local.
Control threads, blocks, and grids that define parallelism using ROCm and HIP execution models.
Debug and test ROCm and HIP programs using tools like the ROCm Debugger and ROCm Profiler.
Optimize ROCm and HIP programs using techniques such as coalescing, caching, prefetching, and profiling.

Format of the Course

Interactive lecture and discussion.
Plenty of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

This course is available as onsite live training in Uzbekistan or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

What is ROCm?
What is HIP?
ROCm vs CUDA vs OpenCL
Overview of ROCm and HIP features and architecture
ROCm for Windows vs ROCm for Linux

Installation

Installing ROCm on Windows
Verifying the installation and check the device compatibility
Updating or uninstall ROCm on Windows
Troubleshooting common installation issues

Getting Started

Creating a new ROCm project using Visual Studio Code on Windows
Exploring the project structure and files
Compiling and run the program
Displaying the output using printf and fprintf

ROCm API

Using ROCm API in the host program
Querying device information and capabilities
Allocating and deallocate device memory
Copying data between host and device
Launching kernels and synchronize threads
Handling errors and exceptions

HIP Language

Using HIP language in the device program
Writing kernels that execute on the GPU and manipulate data
Using data types, qualifiers, operators, and expressions
Using built-in functions, variables, and libraries

ROCm and HIP Memory Model

Using different memory spaces, such as global, shared, constant, and local
Using different memory objects, such as pointers, arrays, textures, and surfaces
Using different memory access modes, such as read-only, write-only, read-write, etc.
Using memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

Using different execution models, such as threads, blocks, and grids
Using thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
Using block functions, such as __syncthreads, __threadfence_block, etc.
Using grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

Debugging ROCm and HIP programs on Windows
Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

Optimizing ROCm and HIP programs on Windows
Using coalescing techniques to improve memory throughput
Using caching and prefetching techniques to reduce memory latency
Using shared memory and local memory techniques to optimize memory accesses and bandwidth
Using profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Requirements

An understanding of C/C++ language and parallel programming concepts
Basic knowledge of computer architecture and memory hierarchy
Experience with command-line tools and code editors
Familiarity with Windows operating system and PowerShell

Audience

Developers who wish to learn how to install and use ROCm on Windows to program AMD GPUs and exploit their parallelism
Developers who wish to write high-performance and scalable code that can run on different AMD devices
Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance

21 Hours

Need help picking the right course?

ROCm for Windows Training Course

Course Outline

Requirements

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

ROCm for Windows Training Course

Course Outline

Requirements

Related Courses

Developing AI Applications with Huawei Ascend and CANN

Deploying AI Models with CANN and Ascend AI Processors

GPU Programming on Biren AI Accelerators

Cambricon MLU Development with BANGPy and Neuware

Introduction to CANN for AI Framework Developers

CANN for Edge AI Deployment

Understanding Huawei’s AI Compute Stack: From CANN to MindSpore

Optimizing Neural Network Performance with CANN SDK

CANN SDK for Computer Vision and NLP Pipelines

Building Custom AI Operators with CANN TIK and TVM

Migrating CUDA Applications to Chinese GPU Architectures

Performance Optimization on Ascend, Biren, and Cambricon

Related Categories

GPU

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites