Tutorials

The 17th annual IEEE International Conference on High Performance Computing (HiPC 2010) will be held in Goa, India, during December 19-22, 2010. It will serve as a forum to present the current work by researchers from around the world. HiPC 2010 will focus on the design and analysis of high performance computing and networking systems and their scientific, engineering, and commercial applications.

The conference will be co-located with the following tutorials.

Tutorial I: Tutorial on Architecture Specific Optimizations for modern CPU and GPU

[Dec. 19; 2:00pm - 6:30pm; Sala de Banquet]

Speakers
Dhiraj Kalamkar and Sangeeta Bhattacharya, Intel Labs, India

Background
As multicore architectures overtake single-core architectures in today and future computer systems, applications must switch to parallel algorithms to achieve higher performance. Years of researches have yielded many ways to parallelize applications - functional decomposition, data partitioning, etc. However, we found exploiting parallelism alone at the algorithmic level is not sufficient to achieve the best performance. We must take into account the underlying platform architecture characteristics such as core architecture, SIMD width, bandwidth, etc. to achieve optimal application performance.

Intel's Throughput Computing lab has been engaging in platform specific optimizations for the last many years. We work with many companies in optimizing their applications for Intel and other non-Intel platforms. We have published many technical papers on this topic to top conferences such as IEEE vis, Sigmod, VLDB, IEEE signal processing magazine, ISCA, ICS, SC, Siggraph, etc.

Tutorial description
The goal of this tutorial is to present a summary of our optimization work in many application domains on modern CPU and GPU to the audience. The tutorial will include a discussion of architecture trend of modern CPUs and GPUs as well as their architecture difference. We will offer optimization guides for modern CPU as well as modern GPU. Examples of how architecture specific optimizations have improved real applications will be discussed to reinforce our teaching. We will conclude with a summary of the techniques discussed.

Target audience
This tutorial targets experienced programmers who are interested in improving their programs performance using proven techniques.

Breakdown of the tutorial materials
20% Introductory
30% Intermediate
30% Advanced

Website
http://sites.google.com/site/optimizingforcpuandgpu/

Tutorial II: NVIDIA CUDA - What's new in CUDA and Fermi architecture

[Dec. 19; 2:00pm - 6:30pm; Harmonia]

Speakers
Manish Bali, NVIDIA Corporation, India
Dibyapran Sanyal, NVIDIA Corporation, India
Swapna Matwankar, NVIDIA Corporation, India

Background
All around the world high performance computing is witnessing an era of disruptive innovation with highly parallel applications that achieve dramatic speedups at significantly lower power budgets. CUDA is a general purpose architecture for writing such highly parallel applications with support for several key abstractions for scalable high-performance parallel computing. With a rapid chain of innovations in programming ecosystem coupled with newer generation GPUs such as Fermi, CUDA today supports many languages, developer tools and libraries on a new breed of supercomputers. Recently, the world's fastest supercomputer "Tianhe-1A" using 7168 Fermi GPUs to provide 2.507 Petaflops of LINPACK performance at 3X lower power budget was unveiled at National Supercomputer Center at Tianjin, China. Scientists and engineers are today using CUDA in both research and production environment to push the envelope of discovery in a wide range of disciplines from finance to drug discovery to engineering to oil and gas exploration.

Tutorial description
In this tutorial we begin with an overview of basics of CUDA programming model and describe some of the key abstractions and newer features of CUDA in detail. We then look at the capabilities of Fermi and optimization techniques specific to Fermi. We next cover the CUDA programming ecosystem containing different developer tools and resources and wrap up with Q&A.

Target audience
This tutorial targets experienced programmers who are interested in developing high performance computing applications using CUDA. Prior knowledge of CUDA is not mandatory.

Tutorial III: Amazon Web Services

[Dec. 19; 11:00am - 1:00pm; Harmonia]

Speakers
Simone Brunozzi, Amazon

Tutorial description
This tutorial is intended to briefly introduce the Amazon Web Services to the audience; in particular to introduce new instance types targetted at HPC users.