In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our numerical scientific software. A new generation of software libraries and algorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments.
Jack Dongarra specializes in numerical algorithms in linear algebra, parallel computing, the use of advanced computer architectures, programming methodology, and tools for parallel computers. He holds appointments at the University of Manchester, Oak Ridge National Laboratory, and the University of Tennessee, where he founded the Innovative Computing Laboratory. In 2019 he received the ACM/SIAM Computational Science and Engineering Prize. In 2020 he received the IEEE-CS Computer Pioneer Award and, most recently, he received the 2021 ACM A.M. Turing Award for his pioneering contributions to numerical algorithms and software that have driven decades of extraordinary progress in computing performance and applications.
Recent rates of improvement in transistor scaling have been much lower than previous decades. Hence hardware customization for more efficient use of the transistors on a VLSI chip is now a primary means for performance improvement. These trends towards increased hardware customization present challenges that should be addressed by compilers: i) How can performance-portable and productive application development for diverse hardware platforms be achieved? ii) How can architectural parameters for hardware accelerators be optimized for execution of key workloads?
The state-of-the-art in optimizing compilers is very advanced with respect to lowering programs from high-level languages to low-level instruction sets so as to minimize the number of executed instructions. However, the fundamental bottleneck today is not the number of executed arithmetic/logic instructions but the cost of data access and movement, both in terms of energy as well as time. Many program transformation techniques such as loop tiling/fusion and data layout transformation have been devised to address this critical bottleneck. But while these techniques have been used in creating manually optimized libraries, effective automated data-locality optimization by compilers remains a challenge. For some classes of matrix/tensor computations used in high-impact domains like machine learning, progress has been made in defining and exploring search spaces for automated code optimization for multiple hardware targets. This talk will elaborate on a number of key challenges/opportunities for compilers, including design space exploration, effective performance modeling, algorithm-architecture co-design and derivation of lower bounds on data movement.
Sadayappan is a Professor in the School of Computing at the University of Utah, with a joint appointment at Pacific Northwest National Laboratory. His primary research interests center around compiler/runtime optimization for high-performance computing, with an emphasis on matrix/tensor computations. He collaborates closely with computational scientists and data scientists in developing high-performance domain-specific frameworks and applications. Sadayappan received a B.Tech from the Indian Institute of Technology, Madras, and M.Sc. and Ph.D. from Stony Brook University. Sadayappan is an IEEE Fellow.
There are two trends that will have a significant impact on how to sustain an exponential computational performance growth at a reduced power consumption in the future. One trend is that applications have shifted from being compute centric to data centric and the other trend is that all technology scaling laws (Dennard scaling and Moore’s Law) have or will soon come to an end. In this talk, I will elaborate on the implications of these trends on the design of computer systems in the future and give a few glimpses on work in my research lab being underway towards data-centric computer architectures.
Per Stenstrom is professor at Chalmers University of Technology. His research interests are in parallel computer architecture. He has authored or co-authored four textbooks, about 200 publications and twenty patents in this area. He has been program chairman of several top-tier IEEE and ACM conferences including IEEE/ACM Symposium on Computer Architecture and acts as Associate Editor of ACM TACO, Topical of Editor IEEE Transaction on Computers and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea, the Royal Swedish Academy of Engineering Sciences and the Royal Spanish Academy of Engineering Science.