Mahak Shah

Splunk, Software Engineer P3

Seattle, Washington, United States

Actions

I have nearly 7 years of experience working across various aspects of distributed systems, specializing in scalability, extensibility, and availability. I am proficient in building microservice-based software systems tailored to meet complex customer requirements. Currently, I am part of the Core Search platform team at Splunk, specifically Federated Search. My previous experience includes roles at Salesforce and Samsung Research. I hold a Master’s degree in Computer Science from Columbia University.

Area of Expertise

Environment & Cleantech
Information & Communications Technology
Media & Information

Topics

Distributed systems
AI
Machine Learning & AI
women in machine learning and data science
Software Design
Software Deveopment
Software Practices
natural language procesing
Generative AI

System Recovery in Distributed Architectures

System recovery in distributed systems emphasizes resilience in decentralized and complex environments. This session will cover key approaches such as replication, redundancy, and failover techniques to ensure data integrity and system availability. It will address the trade-offs between synchronous and asynchronous replication, highlight the significance of automated failover, and explore the challenges of maintaining consistency within the framework of the CAP theorem. A practical example will also be included to demonstrate these concepts in action.

System Recovery in Distributed Architectures

System recovery in distributed architectures strengthens resilience against failures in complex, dispersed systems. In this session, we’ll examine core strategies such as replication, redundancy, and failover mechanisms that ensure availability and data integrity. We’ll look at the trade-offs between synchronous and asynchronous replication, discuss the function of automated failover, and address the challenges of achieving consistency while navigating the CAP theorem. A practical example will showcase these strategies in real-time.

Building Resilient Distributed Systems

Disaster recovery in distributed systems ensures resilience by protecting against failures in complex, distributed environments. This presentation will explore essential strategies like replication, redundancy, and failover mechanisms that help preserve availability and data integrity. We will examine the trade-offs between synchronous and asynchronous replication, the importance of automatic failover, and the challenges of ensuring consistency in the context of the CAP theorem. A hands-on example will also be provided to illustrate these concepts in practice.

Infrastructure Symphony: Orchestrating Distributed Systems with Terraform

We will dive deep into how to harness Terraform to build and manage robust distributed systems at scale and reliably. This presentation bridges the gap between distributed systems theory and practical implementation, demonstrating how Infrastructure as Code can simplify complex architectures. We'll explore how to create resilient, scalable infrastructure that grows with application traffic. In addition, We'll present practical techniques to automate and manage distributed infrastructure effectively.

Balancing Ethics and Scalability in Distributed Systems

The rise of large-scale AI systems built on distributed architectures presents unique challenges for ensuring responsible AI practices. Traditional distributed systems issues intersect with ethical considerations, complicating the development of transparent, fair, and reliable AI. Drawing on real-world production examples, this presentation highlights these challenges and proposes actionable solutions that prioritize ethical AI deployment without compromising system performance or scalability.

Approaching Distributed Training of ML Models

In today's era of large-scale machine learning models, training on a single machine often becomes impractical due to resource constraints and time limitations. Distributed training provides an efficient solution by leveraging multiple computing resources to accelerate model training and handle larger datasets. This talk explores various approaches to distributed training, including data and model parallelism, synchronous and asynchronous strategies, using frameworks like TensorFlow and PyTorch.

Building Robust Distributed Systems: Recovery Techniques

Recovery planning in distributed systems is essential for resilience in the face of failures across complex, dispersed environments. This presentation will explore core strategies like replication, redundancy, and failover mechanisms that support data integrity and system availability. We'll cover the balance between synchronous and asynchronous replication, the importance of automated failover, and the complexities of maintaining consistency as outlined by the CAP theorem. A hands-on example will illustrate these concepts in action.

Mahak Shah

Splunk, Software Engineer P3

Seattle, Washington, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Mahak Shah

Actions

Links

Area of Expertise

Topics

Sessions

System Recovery in Distributed Architectures

System Recovery in Distributed Architectures

Building Resilient Distributed Systems

Infrastructure Symphony: Orchestrating Distributed Systems with Terraform

Balancing Ethics and Scalability in Distributed Systems

Approaching Distributed Training of ML Models

Building Robust Distributed Systems: Recovery Techniques

Mahak Shah

Links

Actions