David Brewster
Native PyTorch, Pure Speed , Multimodal Flows
Actions
David Brewster is a Real-time AI Software Architect holding an M.S. in Software Engineering (w/ Distinction, Gannon) and AI Certification (MIT). His technical portfolio includes CNN-based computer vision for medical devices, agglomerative clustering and 3D graphics for Mercedes, and Augmented Reality with Computer Vision for MQ-9 video feeds. His background spans software for semiconductor physical verification, carrier-grade telecom networks, and control systems for diesel-electric locomotives.
Links
Unifying Modalities: Building Efficient Video Flows with PyTorch and Diffusion Transformers
As video generation shifts from specialized U-Net architectures to Diffusion Transformers (DiT), separating modalities is increasingly unnecessary. This session presents the Single-Stream paradigm, where text and video are embedded into a shared token space and processed as a single sequence by a standard PyTorch nn.TransformerEncoder, enabling joint attention across spatial, temporal, and semantic dimensions without modality-specific components.
We demonstrate Rectified Flow Matching in native PyTorch, replacing discrete noise schedules with straight-line probability paths parameterized by continuous flow time. The talk shows that multimodal DiT models reduce to conventional nn.TransformerEncoder layers applied to concatenated text and video tokens with modality-aware positional encodings.
Finally, we show how to optimize this architecture using torch.compile and FlashAttention (torch.nn.functional.scaled_dot_product_attention), producing a simpler, faster, and more maintainable training and inference pipeline. Attendees will leave with a deeper understanding of multimodal video generation using recent advances in AI and PyTorch.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top