Get in Touch

Course Outline

Introduction to Biren GPU Architecture

  • Overview of Biren and its key use cases
  • Hardware layout: cores, memory, and compute clusters
  • Comparative analysis with NVIDIA and AMD GPUs

Setting Up the Biren Programming Environment

  • Installation of the Biren SDK and runtime
  • Understanding the toolchain and compiler model
  • Basic project structure and the build process

GPU Programming with the Biren Stack

  • Thread and block execution models
  • Memory management and data transfer strategies
  • Kernel development and launch patterns

Porting from CUDA to Biren

  • Techniques for translating CUDA code
  • Common API mappings and necessary adaptations
  • Practical labs and exercises for code conversion

Debugging and Profiling

  • Utilisation of Biren's debugger and profiler tools
  • Identifying performance bottlenecks
  • Analysing memory access patterns and applying optimisations

Optimisation Techniques

  • Thread scheduling and instruction pipelining
  • Loop unrolling and efficient use of shared memory
  • Advanced kernel tuning for maximum throughput

Case Study and Application Examples

  • Training a model using Biren accelerators
  • Porting and profiling vision or NLP models
  • Comparing performance against CUDA and NVIDIA platforms

Summary and Next Steps

Requirements

  • A solid understanding of GPU architecture and parallel processing concepts
  • Practical experience with CUDA, OpenCL, or comparable GPU programming environments
  • Familiarity with deep learning frameworks such as PyTorch or TensorFlow

Target Audience

  • HPC developers
  • AI infrastructure engineers
  • Performance optimisation specialists
 21 Hours

Testimonials (2)

Related Categories