Surya Subramanian
Meta, Software Engineering Intern. CS @ Georgia Tech.
Actions
Surya Subramanian is a computer science student at Georgia Tech and a software engineering intern at Meta. He’s interested in the intersection of machine learning, systems, and performance. At Meta, he works on distributed Triton and PyTorch symmetric memory. At Georgia Tech, he’s currently researching efficient inference for mixture-of-experts models. Previously, Surya was a software engineering intern at Pinterest, where he worked on distributed systems for ads ranking.
Links
Kraken: Hackable Triton Kernels for Computation and Multi-GPU Communication Fusion
Modern GPUs are so fast that reaching peak performance demands kernel fusion—not just between compute operations, but by interleaving computation with multi-GPU communication within a single kernel. Achieving this requires efficient in-kernel messaging at the tile/threadblock level and easy integration with existing compute kernels.
We introduce Kraken, a collection of hackable Triton kernels that overlap computation and communication using symmetric memory-style in-kernel communication. Kraken delivers state-of-the-art performance compared to AsyncTP-style fusion ops, while providing full flexibility for both intra-node (NVLink) and inter-node (GPUDirect RDMA) peer-to-peer transfers.
Rather than a rigid framework, Kraken is a hands-on tutorial: developers can embed its techniques into xformers, FlashAttention, TorchInductor-generated kernels—or any custom Triton code. We preserve CUDA graph compatibility and unlock unprecedented prologue/epilogue fusion flexibility. Though Kraken currently targets NVIDIA-specific APIs, it’s designed for future expansion to heterogeneous hardware across the Triton ecosystem.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top