HiPC 2015 Keynotes

Scale-Out Beyond Map-Reduce

Raghu Ramakrishnan

Head, Big Data Engineering & Head, Cloud Information Services Lab
Microsoft

[Thursday, Dec. 17th 2015, ]

Raghu Ramakrishnan

Abstract

Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the MapReduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which MapReduce is widely recognized to be a poor fit. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.

Bio

Raghu Ramakrishnan heads the Cloud and Information Services Lab (CISL) in the Data Platforms Group at Microsoft, and leads development for the Big Data team. From 1987 to 2006, he was a professor at University of Wisconsin-Madison, where he wrote the widely-used text “Database Management Systems” and led a wide range of research projects in database systems (e.g., the CORAL deductive database, the DEVise data visualization tool, SQL extensions to handle sequence data) and data mining (scalable clustering, mining over data streams). In 1999, he founded QUIQ, a company that introduced a cloud-based question-answering service. He joined Yahoo! in 2006 as a Yahoo! Fellow, and over the next six years served as Chief Scientist for the Audience (portal), Cloud and Search divisions, driving content recommendation algorithms (CORE), cloud data stores (PNUTS), and semantic search (“Web of Things”). Ramakrishnan has received several awards, including the ACM SIGKDD Innovations Award, the SIGMOD 10-year Test-of-Time Award, the IIT Madras Distinguished Alumnus Award, and the Packard Fellowship in Science and Engineering. He is a Fellow of the ACM and IEEE.
Top^

Compilers and the future of high performance computing

David Padua

Donald Biggar Willet Professor of Computer Science
University of Illinois at Urbana-Champaign

[Saturday, Dec. 19th 2015, ]

David Padua

Abstract

Compiler technology has enabled the software advances of the last sixty years. It has given us machine-independent programming and improved productivity by automatically handling a number of issues, such as instruction selection and register allocation. However, in the parallel world of high performance computing, the impact of compiler technology has been small. Part of the reason is that the ambitious research projects of the last few decades, such as automatic parallelization and automatic generation of distributed memory programs à la High Performance Fortran, are yet to produce useful results. The absence of effective compiler technology has resulted in lack of portability and low productivity in the programming of parallel machines. With these problems growing more serious, due to the popularization of parallelism and the complexity increase expected in future high-end machines, advances in compiler technology are now more important than ever. In this presentation, I will discuss the state of the long standing problem of automatic parallelization and describe new important lines of research such as the identification of levels of abstractions that help both productivity and compilation, the development of a solid understanding of the automatic optimization process, the creation of a research methodology to enable the quantification of progress, and the development of an effective methodology for the interaction of programmers with compilers.

Bio

David Padua is the Donald Biggar Willet Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he has been a faculty member since 1985. His areas of interest include compilers, software tools, and parallel computing. He has published more than 170 papers and has supervised the dissertations of 30 PhD students. Padua has served as a program committee member, program chair, or general chair for more than 70 conferences and workshops. He was the Editor-in-Chief of Springer‐Verlag’s Encyclopedia of Parallel Computing and is a member of the editorial board of the IEEE Transactions of Parallel and Distributed Systems, the Journal of Parallel and Distributed Computing, and the International Journal of Parallel Programming. He received the 2015 IEEE Computer Society Harry H. Goode Award and is a Fellow of the ACM and the IEEE.
Top^

The Architecture of Smart Phones

Trevor Mudge

Bredt Family Professor of Engineering
Univ. of Michigan

[Friday, Dec. 18th 2015, ]

Trevor Mudge

Abstract

The growth of the smart-phone market has been phenomenal. I don’t need to quote exact numbers, which are in the hundreds of millions, to illustrate their ubiquity —- most of us have a smart-phone in our pocket. The design constraints on smart phones are among the most challenging in computing: 1) low power to preserve battery life; 2) base-band processors to support 4G data rates (100 Mbs — moving to 1Gbs for 5G); 3) multicore application processors for ever more sophisticated applications; and 4) time-to-market constraints that often result in solutions that seem ad hoc at best. Smart phones have become by far the most important of today’s computing platforms. Oddly, the computer architecture community has been slow to recognize this. There are only an handful of published studies that attempt to provide an architectural perspective. This talk will review the current state of the architecture of mobile phone platforms, and present some initial studies that the author and his research group have conducted on existing systems. Suggestions for future research and future architectures will be presented.

Bio

Trevor Mudge received the Ph.D. degree in Computer Science from the University of Illinois, Urbana in 1977. Since then he has been on the faculty of the University of Michigan, Ann Arbor. In 2003 he was named the first Bredt Family Professor of Electrical Engineering and Computer Science. Previously he served a ten-year term as the Director of the Advanced Computer Architecture Laboratory, which is a group of eight faculty and about 60 graduate students. He is author of numerous papers on computer architecture, programming languages, VLSI design, and computer vision. He has also chaired 49 theses in these areas. His research interests include computer architecture, computer-aided design, and compilers. In 2014 he was the recipient of ACM/IEEE CS Eckert-Mauchly Award “For pioneering contributions to low-power computer architecture and its interaction with technology.” In addition to his position as a faculty member, he runs Idiot Savants, a chip design consultancy. Trevor Mudge is a Life Fellow of the IEEE, a member of the ACM, the IET, and the British Computer Society.
Top^