NVFP4 Precision Training: Achieving Maximum Efficiency with Zero Accuracy Loss

As transformer models continue to grow, developers need practical ways to improve post-training and fine-tuning throughput without sacrificing model quality or adding fragile precision-conversion workflows. In this session, we will show how to fine-tune models with NVIDIA NVFP4, a 4-bit low-precision training format, using the PyTorch-native NVIDIA NeMo Megatron Bridge stack.

We will explain how NVFP4 reduces memory usage and increases throughput compared with BF16, while preserving downstream accuracy when applied with the right mixed-precision recipe. Attendees will learn how Megatron Bridge enables developers to switch between BF16, FP8, MXFP8, and NVFP4 through configuration changes rather than rewriting model or optimizer code while keep the PyTorch functionalities intact. The session will cover practical considerations for stable NVFP4 fine-tuning, including calibration, hierarchical scaling, and selectively retaining BF16 precision in sensitive transformer layers.

Mitesh Patel

NVIDIA Corporation, Developer Advocate -- Manager

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

NVFP4 Precision Training: Achieving Maximum Efficiency with Zero Accuracy Loss

Mitesh Patel

Links

Actions