GPU programming made easy with OpenMP

Tutorial, HiPC 2019 - December 17, 2:00 pm to 6:00 pm

Abstract: In today’s era of heterogeneous computing, where more than one kind of compute units (CPU & accelerators) are used, application developers need to exploit all available compute power to its fullest to get the best performance. Traditionally, CUDA and OpenCL are been used for GPU programming which requires a sound knowledge of these programming techniques and restructuring of an existing application to run on GPUs, a sound knowledge of these programming techniques is necessary.

With the release of the OpenMP 4.0 standard and above, OpenMP directives can be used to offload work to GPUs, which can easily improve application performance in the range of 2x to 10x or more. The goal of the OpenMP standard is to make applications portable across different architectures (CPU or GPU) and minimize the vendor-specific statements from the program. The HPC community has accepted OpenMP for CPU shared memory parallel programming for decades. The extension of this open standard for GPU programming makes it a promising programming model for the heterogeneous system.  OpenMP’s ease of programming can motivate application developers to enable CPU-only applications to run on GPUs.

In this hands-on tutorial, participants will learn OpenMP programming techniques for CPUs and GPUs, profiling and monitoring a GPU-offloaded application using OpenMP. This tutorial targets attendees of all skill levels, including students, researchers, and professionals from the IT industry.  Basic knowledge of C or C++ programming is assumed, while an understanding of parallel programming, OpenMP programming, and GPU architecture would be an added advantage. Tutorial attendees are expected to bring their laptop computers. Make sure NVIDIA Visual Profiler (NVVP) is installed on it (required for GPU profile visualization). Download NVVP here: (CUDA 10.1 or above)


Tutorial split into 2 sections: ( Half day Tutorial)

Section 1 :

1. Heterogeneous system overview (CPU & GPU architecture) – 15 min

2. Introduction (basics of parallel programming, OpenMP background/history, OpenMP shared-memory programming model ) – 25 min

3. OpenMP programming on CPU and GPU -30 minutes

4. Profiling and monitoring OpenMP GPU offloaded application -20 minutes

5. HPC application performance -15 minutes

Break – 10 mins

Section 2: (90 minutes duration) – Hands-on session

1. Converting simple matrix multiplication code into CPU multithreaded matrix multiplication using OpenMP

2. Offloading of OpenMP CPU multithreaded matrix multiplication code on GPU

3. Scaling GPU offloaded code on multiple GPUs

4. Profiling OpenMP GPU offloaded code using nvprof



Pidad D’Souza is a high-performance computing(HPC) system performance architect at IBM India. He leads the mission of GPU-accelerated applications performance analysis and optimization in the fields like life sciences, molecular and fluid dynamics, computational chemistry on the new generation of OpenPOWER systems. He is one of the key influencers in the next generation system design. He is also specialized in performance projection of HPC applications, optimizing workloads in exploiting IBM POWER9 System unique features such as GPU coherent memory, ATS & CPU-GPU NVLink capabilities. His prior responsibilities included the development of HPC application profiler tools, IBM AIX Operating system libraries, and IBM JVM development support. He has extensively presented in various national and international conferences(GTC, IBM Think, IBM Edge, OpenPower summit), workshops and tutorials for customers, and universities.

Aditya Nitsure is a senior High-Performance Computing (HPC) performance analyst at IBM India Pvt. Ltd. He has 15+ years of experience in HPC application development and performance engineering. At IBM, he is involved in HPC application performance analysis on heterogeneous (CPU/GPU) system, OpenMP GPU offloading exploration, performance projection on the future generation of POWER platform and HPC application performance collateral publications on IBM Power systems. His prior responsibilities include project lead for IBM Workload Estimator & Energy Estimator, development of HPC application profiler tools, catastrophe modeling software, etc. He has presented sessions at international conferences like IBM Think, NVIDIA GTC, etc.