
Mahak Shah
Splunk, Software Engineer P3
Seattle, Washington, United States
Actions
I have nearly 6 years of experience working across various aspects of distributed systems, specializing in scalability, extensibility, and availability. I am proficient in building microservice-based software systems tailored to meet complex customer requirements. Currently, I am part of the Core Search platform team at Splunk, specifically Federated Search. My previous experience includes roles at Salesforce and Samsung Research. I hold a Master’s degree in Computer Science from Columbia University.
Links
Area of Expertise
Topics
System Recovery in Distributed Architectures
System recovery in distributed systems emphasizes resilience in decentralized and complex environments. This session will cover key approaches such as replication, redundancy, and failover techniques to ensure data integrity and system availability. It will address the trade-offs between synchronous and asynchronous replication, highlight the significance of automated failover, and explore the challenges of maintaining consistency within the framework of the CAP theorem. A practical example will also be included to demonstrate these concepts in action.
System Recovery in Distributed Architectures
System recovery in distributed architectures strengthens resilience against failures in complex, dispersed systems. In this session, we’ll examine core strategies such as replication, redundancy, and failover mechanisms that ensure availability and data integrity. We’ll look at the trade-offs between synchronous and asynchronous replication, discuss the function of automated failover, and address the challenges of achieving consistency while navigating the CAP theorem. A practical example will showcase these strategies in real-time.
Building Resilient Distributed Systems
Disaster recovery in distributed systems ensures resilience by protecting against failures in complex, distributed environments. This presentation will explore essential strategies like replication, redundancy, and failover mechanisms that help preserve availability and data integrity. We will examine the trade-offs between synchronous and asynchronous replication, the importance of automatic failover, and the challenges of ensuring consistency in the context of the CAP theorem. A hands-on example will also be provided to illustrate these concepts in practice.
Infrastructure Symphony: Orchestrating Distributed Systems with Terraform
We will dive deep into how to harness Terraform to build and manage robust distributed systems at scale and reliably. This presentation bridges the gap between distributed systems theory and practical implementation, demonstrating how Infrastructure as Code can simplify complex architectures. We'll explore how to create resilient, scalable infrastructure that grows with application traffic. In addition, We'll present practical techniques to automate and manage distributed infrastructure effectively.
Balancing Ethics and Scalability in Distributed Systems
The rise of large-scale AI systems built on distributed architectures presents unique challenges for ensuring responsible AI practices. Traditional distributed systems issues intersect with ethical considerations, complicating the development of transparent, fair, and reliable AI. Drawing on real-world production examples, this presentation highlights these challenges and proposes actionable solutions that prioritize ethical AI deployment without compromising system performance or scalability.
Approaching Distributed Training of ML Models
In today's era of large-scale machine learning models, training on a single machine often becomes impractical due to resource constraints and time limitations. Distributed training provides an efficient solution by leveraging multiple computing resources to accelerate model training and handle larger datasets. This talk explores various approaches to distributed training, including data and model parallelism, synchronous and asynchronous strategies, using frameworks like TensorFlow and PyTorch.
Building Robust Distributed Systems: Recovery Techniques
Recovery planning in distributed systems is essential for resilience in the face of failures across complex, dispersed environments. This presentation will explore core strategies like replication, redundancy, and failover mechanisms that support data integrity and system availability. We'll cover the balance between synchronous and asynchronous replication, the importance of automated failover, and the complexities of maintaining consistency as outlined by the CAP theorem. A hands-on example will illustrate these concepts in action.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top