1st Workshop on High Performance Fabrics
31st IEEE International Conference on High Performance Computing, Data, & Analytics
Dec 18, 2024
Bengaluru, India
Workshop on High Performance Fabrics (FABRICS 2024)
In conjunction with the 31st IEEE International Conference on High-Performance Computing, Data, & Analytics (HiPC 2024)
Description
The word “Fabrics” is perhaps the most used or abused word in Computing today. A Fabric is, as its name implies, a web of multiple devices in a cluster using some physical level connectivity with transport and protocol layers built on top of it. Computing, in general, and High Performance Computing in particular have moved over the years from proprietary cluster interconnects to more popularly, InfiniBand and Ethernet.
However, the communication fabric closer to the CPUs or Accelerators (including GPUs) is where the focus has been of late with the AI era. Compute Express Link (CXL) is a memory semantic fabric that has been developed to solve multiple challenges in computing related to capacity and bandwidth of memory, as well as the need to disaggregate memory. When it comes to connecting GPUs with each other, NVLink has become the most popular communication fabric. However, there have been recent advancements in this space with the formation of the Ultra Accelerator Link, or UALink, consortium to connect accelerators and allow sharing of memory between them and CPUs. Ethernet is further being enhanced by the Ultra Ethernet Consortium (UEC) to build a low-latency, lossless variant of Ethernet.
The Fabrics for HPC/AI workshop will be a half-day workshop with the aim of exploring the novel research ideas around this very rich space of fabrics and the research happening in operating systems, virtualization and manageability in related areas.
Keynote 1 : Fabric Challenges in AI Clusters
Speaker : Manoj Wadekar, Meta
Abstract : This talk will focus on the following areas :
- AI clusters are creating new fabrics in Datacenters.
- AI Fabrics pushing network, memory, chip-to-chip, die-to-die interconnects
- Exponential growth in AI clusters and Fabric limitations driving unique data center solutions creating opportunities for innovation.
Bio : Manoj Wadekar (Meta) is a leading figure in Meta’s technology division, focusing on advanced computing solutions and AI integration. His expertise in system architecture and data management has driven significant innovations in Meta’s infrastructure, enhancing the performance and scalability of their computing systems.
Keynote 2 : Harnessing power of advanced next-gen fabrics to break memory wall with TCO optimized solution
Speaker : Vishal Tanna, Micron
Abstract : This talk will cover the following areas
- Fabrics for Memory and Their Necessity
- The memory wall problem and the solutions being explored to address it.
- Evolution of CXL and how it can help mitigate the memory wall issue.
- Workload Analysis
- Analysis of different workloads that are either memory capacity or bandwidth bound.
- CXL for Pooling and Sharing
- Investigation into how CXL supports resource pooling and sharing, and the role of FAMFS in these features.
- Emerging Fabrics
- Introduction to UALink and its benefits.
Bio: Vishal is the Director of engineering at Micron. He is currently leading the R&D teams working on various aspects of CXL interface-based product development. Previous to this, he lead the team which delivered Micron’s first ever PCIe based NAND flash controller to complex enterprise NVMe SSDs. He is an expert in fabrics like PCIe, CXL and NVMe
Accepted Papers
Title | Authors and Affiliation |
Study of CXL Memory Sharing with FamFS and its Use cases | Aravind Ramesh (Micron), John Groves (Micron). |
Porting of OpenSM over Trinetra-A Switchless Torus Network | V Evancer Vino John, Mahesh Chaudhari, Yogeshwar Sonawane and Sanjay Wandhekar (CDAC). |
Simulation-Driven Design of Large-Scale Systems Architecture Using Slingshot Fabrics | Shridhar Joshi, Sumant Kalra [Hewlett Packard Enterprise]. |
Performance optimization on CXL products using in-house modeling and simulation toolchain | Kirthi Ravindra Kulkarni, Anandhavel Nagendrakumar, Rohit Sehgal, Eishan Mirakhur, Nikesh Agarwal, Chandana Manjula Linganna [Micron] and Ranjit Gupte [Microchip] |
Towards Continuous Checkpointing for HPC Systems Using CXL | Ellis Giles (Elex Technologies) and Peter Varman (Rice University) |
Workshop Schedule
Time | Topic | Speaker and Affiliation |
8.30-8.45 a.m | Welcome and Workshop Introduction | Mohan Parthasarathy, Hewlett Packard Enterprise |
8.45-9.40 a.m | Keynote 1 : Fabric Challenges in AI Clusters | Manoj Wadekar, Meta |
9.40-10.05 a.m | Study of CXL Memory Sharing with FamFS and its Use cases | Aravind Ramesh, Micron |
10.05-10.30 a.m | Porting of OpenSM over Trinetra-A Switchless Torus Network | Mahesh Chaudhari, C-DAC |
10.30-11.00 a.m | Break | |
11.00-11.55 a.m | Keynote 2 : Harnessing power of advanced next-gen fabrics to break memory wall with TCO optimized solution | Vishal Tanna, Micron |
11.55-12.20 p.m | Simulation-Driven Design of Large-Scale Systems Architecture Using Slingshot Fabrics | Shridhar Joshi, Hewlett Packard Enterprise |
12.20-12.45 p.m | Performance optimization on CXL products using in-house modeling and simulation toolchain | Rohit Sehgal, Kirthi Ravindra Kulkarni, Micron |
12.45-1.10 p.m | Towards Continuous Checkpointing for HPC Systems Using CXL | Ellis Giles, Elex Technologies |
1.10-1.15 p.m | Vote of Thanks |
Important Dates
Submission site opens: September 1, 2024
Paper submission due: September 30, 2024
Author notification: October 30, 2024
Camera ready version submission: November 15, 2024
Organizers
Workshop Chairs :
Mohan Parthasarathy, Hewlett Packard Enterprise
Sunita Jain, AMD
Publicity Chair :
Dr Badrinath Ramamurthy, IIITB
Program Commitee Chair :
Ajay Joshi, Micron
Technical Program Committee Members :
KV Subramanian, AMD
Sunil VL, Ventana
Ritesh Prasad Raturi, Micron
Sparsh Mittal , IIT Roorkee
Navin Bishnoi, Marvell
Maruf, Hasan, AMD
Hemangee Kalpesh Kapoor, IIT Guwahati
Brian Hirano
Sridhar Muthrasanallur
YR Ananda, IIITB
Questions may be sent to [email protected]
Call for Papers:
Papers are solicited from the areas, including, but not limited to:
Hardware Architectures
- Memory Expansion and Pooling (via technologies like CXL)
- Ethernet challenges as a High Performance Fabric
- Fabrics for next gen AI/ML workloads
Software Architectures
- CXL emulation
- Operating system support for Tiered Memory
- Virtualization for Fabrics
Applications and Use-Cases
- Benchmarking Applications
- Workload characterization
- Analysis and Profiling of Fabric Performance
Control Plane Software for Management
- Fabric Management e.g. RedFish (DMTF)/OFA (Open Fabrics)
- In-band and OOB Management of Fabric devices (CXL/Ethernet/Accelerators)
Paper Submissions: Using the main website
General guidelines: Manuscripts submitted to FABRICS 2024 should not have been previously published or be under review for a different workshop, conference or journal. Abstracts should contain no more than 300 words and must be submitted along with the paper by the paper submission deadline. The title and abstract submitted by this deadline should have sufficient detail and not just be a placeholder. Submissions should have the final list of authors, as changes may not be feasible later. Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors.
Length of submission: Submitted manuscripts can be full papers or short papers.
- Full papers may not exceed six (6) single-spaced double-column pages using 10-point size font on 8.5×11 inch pages (IEEE conference style) plus one extra page for references.
- Short papers may not exceed four (4) plus one extra page for references in the same format.
Single blind policy: All submitted manuscripts will be reviewed by the Program Committee under a single-blind review process, so the submitted paper should NOT list any authors or their affiliations.
Submission link: Submit your paper at:
HiPC 2024 is the 31st edition of the IEEE International Conference on High Performance Computing, Data, and Analytics. It will be an in-person event in Bengaluru, India, from December 18 to December 21, 2024.
IEEE Conduct and Safety Statement
IEEE believes that science, technology, and engineering are fundamental human activities, for which openness, international collaboration, and the free flow of talent and ideas are essential. Its meetings, conferences, and other events seek to enable engaging, thought provoking conversations that support IEEE’s core mission of advancing technology for humanity. Accordingly, IEEE is committed to providing a safe, productive, and welcoming environment to all participants, including staff and vendors, at IEEE-related events.
IEEE has no tolerance for discrimination, harassment, or bullying in any form at IEEE-related events. All participants have the right to pursue shared interests without harassment or discrimination in an environment that supports diversity and inclusion.
Participants are expected to adhere to these principles and respect the rights of others. IEEE seeks to provide a secure environment at its events. Participants should report any behavior inconsistent with the principles outlined here, to on site staff, security or venue personnel, or to [email protected].
Diversity and Inclusion
HiPC is committed to the promotion of diversity and inclusion in all professional activities. We encourage the diversity and welcome everyone regardless of age, gender identity, race, ethnicity, socioeconomic background, country of origin, religion, sexual orientation, physical ability, political views, education, and work experience.
Follow us on: