1^st Workshop on High Performance Fabrics

31^st IEEE International Conference on High Performance Computing, Data, & Analytics

Dec 18, 2024
Bengaluru, India

Workshop on High Performance Fabrics (FABRICS 2024)

In conjunction with the 31st IEEE International Conference on High-Performance Computing, Data, & Analytics (HiPC 2024)

Description

The word “Fabrics” is perhaps the most used or abused word in Computing today. A Fabric is, as its name implies, a web of multiple devices in a cluster using some physical level connectivity with transport and protocol layers built on top of it. Computing, in general, and High Performance Computing in particular have moved over the years from proprietary cluster interconnects to more popularly, InfiniBand and Ethernet.

However, the communication fabric closer to the CPUs or Accelerators (including GPUs) is where the focus has been of late with the AI era. Compute Express Link (CXL) is a memory semantic fabric that has been developed to solve multiple challenges in computing related to capacity and bandwidth of memory, as well as the need to disaggregate memory. When it comes to connecting GPUs with each other, NVLink has become the most popular communication fabric. However, there have been recent advancements in this space with the formation of the Ultra Accelerator Link, or UALink, consortium to connect accelerators and allow sharing of memory between them and CPUs. Ethernet is further being enhanced by the Ultra Ethernet Consortium (UEC) to build a low-latency, lossless variant of Ethernet.

The Fabrics for HPC/AI workshop will be a half-day workshop with the aim of exploring the novel research ideas around this very rich space of fabrics and the research happening in operating systems, virtualization and manageability in related areas.

Keynote 1 : Fabric Challenges in AI Clusters

Speaker : Manoj Wadekar, Meta

Abstract : This talk will focus on the following areas :

AI clusters are creating new fabrics in Datacenters.
AI Fabrics pushing network, memory, chip-to-chip, die-to-die interconnects
Exponential growth in AI clusters and Fabric limitations driving unique data center solutions creating opportunities for innovation.

Bio : Manoj Wadekar (Meta) is a leading figure in Meta’s technology division, focusing on advanced computing solutions and AI integration. His expertise in system architecture and data management has driven significant innovations in Meta’s infrastructure, enhancing the performance and scalability of their computing systems.

Keynote 2 : Harnessing power of advanced next-gen fabrics to break memory wall with TCO optimized solution

Speaker : Vishal Tanna, Micron

Abstract : This talk will cover the following areas

Fabrics for Memory and Their Necessity
- The memory wall problem and the solutions being explored to address it.
- Evolution of CXL and how it can help mitigate the memory wall issue.
Workload Analysis
- Analysis of different workloads that are either memory capacity or bandwidth bound.
CXL for Pooling and Sharing
- Investigation into how CXL supports resource pooling and sharing, and the role of FAMFS in these features.
Emerging Fabrics
- Introduction to UALink and its benefits.

Bio: Vishal is the Director of engineering at Micron. He is currently leading the R&D teams working on various aspects of CXL interface-based product development. Previous to this, he lead the team which delivered Micron’s first ever PCIe based NAND flash controller to complex enterprise NVMe SSDs. He is an expert in fabrics like PCIe, CXL and NVMe

Accepted Papers

Title	Authors and Affiliation
Study of CXL Memory Sharing with FamFS and its Use cases	Aravind Ramesh (Micron), John Groves (Micron).
Porting of OpenSM over Trinetra-A Switchless Torus Network	V Evancer Vino John, Mahesh Chaudhari, Yogeshwar Sonawane and Sanjay Wandhekar (CDAC).
Simulation-Driven Design of Large-Scale Systems Architecture Using Slingshot Fabrics	Shridhar Joshi, Sumant Kalra [Hewlett Packard Enterprise].
Performance optimization on CXL products using in-house modeling and simulation toolchain	Kirthi Ravindra Kulkarni, Anandhavel Nagendrakumar, Rohit Sehgal, Eishan Mirakhur, Nikesh Agarwal, Chandana Manjula Linganna [Micron] and Ranjit Gupte [Microchip]
Towards Continuous Checkpointing for HPC Systems Using CXL	Ellis Giles (Elex Technologies) and Peter Varman (Rice University)

Workshop Schedule

Time	Topic	Speaker and Affiliation
8.30-8.45 a.m	Welcome and Workshop Introduction	Mohan Parthasarathy, Hewlett Packard Enterprise
8.45-9.40 a.m	Keynote 1 : Fabric Challenges in AI Clusters	Manoj Wadekar, Meta
9.40-10.05 a.m	Study of CXL Memory Sharing with FamFS and its Use cases	Aravind Ramesh, Micron
10.05-10.30 a.m	Porting of OpenSM over Trinetra-A Switchless Torus Network	Mahesh Chaudhari, C-DAC
10.30-11.00 a.m	Break
11.00-11.55 a.m	Keynote 2 : Harnessing power of advanced next-gen fabrics to break memory wall with TCO optimized solution	Vishal Tanna, Micron
11.55-12.20 p.m	Simulation-Driven Design of Large-Scale Systems Architecture Using Slingshot Fabrics	Shridhar Joshi, Hewlett Packard Enterprise
12.20-12.45 p.m	Performance optimization on CXL products using in-house modeling and simulation toolchain	Rohit Sehgal, Kirthi Ravindra Kulkarni, Micron
12.45-1.10 p.m	Towards Continuous Checkpointing for HPC Systems Using CXL	Ellis Giles, Elex Technologies
1.10-1.15 p.m	Vote of Thanks

Important Dates

Submission site opens: September 1, 2024

Paper submission due: September 30, 2024

Author notification: October 30, 2024

Camera ready version submission: November 15, 2024

Organizers

Workshop Chairs :

Mohan Parthasarathy, Hewlett Packard Enterprise

Sunita Jain, AMD

Publicity Chair :

Dr Badrinath Ramamurthy, IIITB

Program Commitee Chair :

Ajay Joshi, Micron

Technical Program Committee Members :

KV Subramanian, AMD

Sunil VL, Ventana

Ritesh Prasad Raturi, Micron

Sparsh Mittal , IIT Roorkee

Navin Bishnoi, Marvell

Maruf, Hasan, AMD

Hemangee Kalpesh Kapoor, IIT Guwahati

Brian Hirano

Sridhar Muthrasanallur

YR Ananda, IIITB

Questions may be sent to [email protected]

Call for Papers:

Papers are solicited from the areas, including, but not limited to:

Hardware Architectures

Memory Expansion and Pooling (via technologies like CXL)
Ethernet challenges as a High Performance Fabric
Fabrics for next gen AI/ML workloads

Software Architectures

CXL emulation
Operating system support for Tiered Memory
Virtualization for Fabrics

Applications and Use-Cases

Benchmarking Applications
Workload characterization
Analysis and Profiling of Fabric Performance

Control Plane Software for Management

Fabric Management e.g. RedFish (DMTF)/OFA (Open Fabrics)
In-band and OOB Management of Fabric devices (CXL/Ethernet/Accelerators)

Paper Submissions: Using the main website

General guidelines: Manuscripts submitted to FABRICS 2024 should not have been previously published or be under review for a different workshop, conference or journal. Abstracts should contain no more than 300 words and must be submitted along with the paper by the paper submission deadline. The title and abstract submitted by this deadline should have sufficient detail and not just be a placeholder. Submissions should have the final list of authors, as changes may not be feasible later. Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors.

Length of submission: Submitted manuscripts can be full papers or short papers.

Full papers may not exceed six (6) single-spaced double-column pages using 10-point size font on 8.5×11 inch pages (IEEE conference style) plus one extra page for references.
Short papers may not exceed four (4) plus one extra page for references in the same format.

Single blind policy: All submitted manuscripts will be reviewed by the Program Committee under a single-blind review process, so the submitted paper should NOT list any authors or their affiliations.

Submission link: Submit your paper at:

HiPC 2024 (31st IEEE International Conference on High Performance Computing, Data and Analytics – Fabrics Workshop) (easychair.org)

HiPC 2025 is the 32^st edition of the IEEE International Conference on High Performance Computing, Data, and Analytics. It will be an in-person event in Hyderabad, India, from December 17 to December 20, 2025.

IEEE Conduct and Safety Statement

IEEE believes that science, technology, and engineering are fundamental human activities, for which openness, international collaboration, and the free flow of talent and ideas are essential. Its meetings, conferences, and other events seek to enable engaging, thought provoking conversations that support IEEE’s core mission of advancing technology for humanity. Accordingly, IEEE is committed to providing a safe, productive, and welcoming environment to all participants, including staff and vendors, at IEEE-related events

IEEE has no tolerance for discrimination, harassment, or bullying in any form at IEEE-related events. All participants have the right to pursue shared interests without harassment or discrimination in an environment that supports diversity and inclusion.

Participants are expected to adhere to these principles and respect the rights of others. IEEE seeks to provide a secure environment at its events. Participants should report any behavior inconsistent with the principles outlined here, to on site staff, security or venue personnel, or to [email protected].

IEEE Computer Society Open Conference Statement

Expanding participation in computing is central to the goals of the IEEE Computer Society and all of its conferences. The IEEE Computer Society is firmly committed to broad participation in all sponsored activities, including but not limited to, technical communities, steering committees, conference organizations, standards committees, and ad hoc committees that welcome the entire global community.

IEEE’s mission to foster technological innovation and excellence to benefit humanity requires the talents and perspectives of people with many disciplinary backgrounds.

All individuals are entitled to participate in any IEEE Computer Society activity free of discrimination and harassment.

1st Workshop on High Performance Fabrics