Seminar • Systems and Networking • AI Efficiency Across the Computing Stack: Model Efficiency, Software Tooling, and Hardware Architecture | Cheriton School of Computer Science

Wednesday, April 8, 2026 10:30 am - 11:30 am EDT (GMT -04:00)

Please note: This seminar will take place in DC 1304.

Mohamed Abdelfattah, Assistant Professor, Cornell Tech
Electrical and Computer Engineering, Cornell University

Large language model (LLM) inference is computationally expensive and increasingly dominated by memory bandwidth. This talk presents a set of hardware, software, and algorithmic techniques that address these bottlenecks through cross-layer co-design, spanning numerical representation, tensor compression, and custom hardware accelerator architectures. First, Palu and xKV introduce cross-layer compression through singular value decomposition to substantially reduce memory footprint. Second, RaZeR introduces a numerical extension to important formats such as NVFP4 to improve the capability of these compact data types. We evaluate the hardware efficiency of this new datatype when implemented on current GPUs, new GPU prototypes, and custom hardware accelerators. Third, we describe how we utilized trained predictors to model and sparsify LLM architectures through ShadowLLM and TokenButler. Our work on trained predictors has since significantly expanded to include predicting metrics of complex systems through RegressLM. The talk will conclude with a synopsis of the importance of system-level codesign for the performance of AI, including examples of ongoing and future work.

Bio: Mohamed Abdelfattah is an Assistant Professor at Cornell Tech and in the Electrical and Computer Engineering Department at Cornell University. His research group is designing the next generation of machine-learning-centric computer systems for both datacenters and mobile devices.

He received his BSc from the German University in Cairo, his MSc from the University of Stuttgart, and his PhD from the University of Toronto. After his PhD, Mohamed spent six years at Intel and Samsung Research. Recently, he co-founded a startup, Makora, to automate performance optimizations for AI. He is the recipient of multiple best paper awards, the Vanier Canada Graduate Scholarship, and the NSF CAREER award.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
DC 1304
Waterloo, ON, CA N2L 3G1

Location coordinates: