The acquisitions of Altera by Intel in 2015 and of Xilinx by AMD in 2022 seem to mark a new era for field-programmable gate-arrays, or FPGAs: from a reconfigurable device mostly dedicated to prototyping and moderate-volume embedded applications, they may now have a chance to become ubiquitous high-performance computing devices, alongside CPUs, GPUs, and TPUs. Yet, one of the fundamental conditions for the adoption of any new device in a computing environment is the availability of a suitable programming paradigm. Alas, enabling software developers to apply their skills to FPGAs has been a long and, as of yet, unreached research objective in reconfigurable computing. In this talk, we will first discuss our experiences with the daunting task of running efficiently kernels of software-oriented imperative code on FPGAs–a seemingly simple goal which has remained elusive for a long time. We will also touch on the necessity of developing software environments to support forms of explicit parallelism profitable to FPGAs. We will argue that without progress on these fronts, reconfigurable computing will remain a missed opportunity.
Paolo Ienne has been a Professor in the School of Computer and Communication Sciences at EPFL since 2000. His research interests include various aspects of computer and processor architecture, FPGAs reconfigurable computing, electronic design automation, andcomputer arithmetic. He has published over 200 articles in peer reviewed journals and international conferences, some of which have received Best Paper Awards (three times at ISFPGA, three at FPL, at CASES, and at DAC). Ienne has served as General or Program chair of various conferences (including ASAP, ARITH, FPL, and ISFPGA) and is an Associate Editor of ACM Computing Surveys and ACM Transactions on Architecture and Code Optimization. He serves on the steering committee of the ARITH, FPL, and ISFPGA conferences.
Recent rates of improvement in transistor scaling have been much lower than previous decades. Hence hardware customization for more efficient use of the transistors on a VLSI chip is now a primary means for performance improvement. These trends towards increased hardware customization present challenges that should be addressed by compilers: i) How can performance-portable and productive application development for diverse hardware platforms be achieved? ii) How can architectural parameters for hardware accelerators be optimized for execution of key workloads?
The state-of-the-art in optimizing compilers is very advanced with respect to lowering programs from high-level languages to low-level instruction sets so as to minimize the number of executed instructions. However, the fundamental bottleneck today is not the number of executed arithmetic/logic instructions but the cost of data access and movement, both in terms of energy as well as time. Many program transformation techniques such as loop tiling/fusion and data layout transformation have been devised to address this critical bottleneck. But while these techniques have been used in creating manually optimized libraries, effective automated data-locality optimization by compilers remains a challenge. For some classes of matrix/tensor computations used in high-impact domains like machine learning, progress has been made in defining and exploring search spaces for automated code optimization for multiple hardware targets. This talk will elaborate on a number of key challenges/opportunities for compilers, including design space exploration, effective performance modeling, algorithm-architecture co-design and derivation of lower bounds on data movement.
Sadayappan is a Professor in the School of Computing at the University of Utah, with a joint appointment at Pacific Northwest National Laboratory. His primary research interests center around compiler/runtime optimization for high-performance computing, with an emphasis on matrix/tensor computations. He collaborates closely with computational scientists and data scientists in developing high-performance domain-specific frameworks and applications. Sadayappan received a B.Tech from the Indian Institute of Technology, Madras, and M.Sc. and Ph.D. from Stony Brook University. Sadayappan is an IEEE Fellow.
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our numerical scientific software. A new generation of software libraries and algorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments.
Jack Dongarra specializes in numerical algorithms in linear algebra, parallel computing, the use of advanced computer architectures, programming methodology, and tools for parallel computers. He holds appointments at the University of Manchester, Oak Ridge National Laboratory, and the University of Tennessee, where he founded the Innovative Computing Laboratory. In 2019 he received the ACM/SIAM Computational Science and Engineering Prize. In 2020 he received the IEEE-CS Computer Pioneer Award and, most recently, he received the 2021 ACM A.M. Turing Award for his pioneering contributions to numerical algorithms and software that have driven decades of extraordinary progress in computing performance and applications.
There are two trends that will have a significant impact on how to sustain an exponential computational performance growth at a reduced power consumption in the future. One trend is that applications have shifted from being compute centric to data centric and the other trend is that all technology scaling laws (Dennard scaling and Moore’s Law) have or will soon come to an end. In this talk, I will elaborate on the implications of these trends on the design of computer systems in the future and give a few glimpses on work in my research lab being underway towards data-centric computer architectures.
Per Stenstrom is professor at Chalmers University of Technology. His research interests are in parallel computer architecture. He has authored or co-authored four textbooks, about 200 publications and twenty patents in this area. He has been program chairman of several top-tier IEEE and ACM conferences including IEEE/ACM Symposium on Computer Architecture and acts as Associate Editor of ACM TACO, Topical of Editor IEEE Transaction on Computers and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea, the Royal Swedish Academy of Engineering Sciences and the Royal Spanish Academy of Engineering Science.