Karthigayan Devan

Google Developer Expert | Cloud Platforms, SRE & AI Innovation

Atlanta, Georgia, United States

Actions

Karthigayan Devan is a Google Developer Expert (GDE) and seasoned Platform Engineering leader with over 20 years of experience building resilient systems. Based in Atlanta, Karthigayan sits at the intersection of classical SRE principles and the emerging world of AI-driven operations.

He specializes in "rebooting" engineering cultures by moving teams from manual toil to automated governance. His work includes pioneering fully automated Cloud FinOps cultures in complex multi-cloud environments and leading transformative initiatives that turn cost centers into innovation hubs.

As an active speaker, hackathon judge, and community mentor, Karthigayan focuses on practical, "real-life" implementations rather than theory. Whether he is architecting Kubernetes clusters or exploring GenAI for Observability, his goal is always the same: to help engineers sleep better at night by building systems that heal themselves.

Area of Expertise

Information & Communications Technology

Topics

Cloud & DevOps
Cloud Containers and Infrastructure
Cloud Technology
Cloud Automation
Google Cloud Paltform
Cloud Architecture
Cloud Computing
Cloud strategy
DevOpsCulture
DevOps & Automation
DevOps Transformation
Azure DevOps
DevOps Agile Methodology & Culture
DevOps
Platform Engineering
Cloud ML Platforms
google cloud platform

Transforming Ideas into AI Solutions with Gemini

Generative AI has stopped being a research breakthrough and has become a practical reality in reshaping the design and deployment of intelligent apps. The cutting-edge multimodal capabilities at Google belong to text, code, and reasoning brought by Gemini API right to developers so they can make their innovative ideas real production-ready AI solutions. This session introduces Gemini’s architecture and core differentiators: context handling, grounding, and integration with Google’s ecosystem. It then takes us through practical examples of building conversational agents and accelerating code development as well as domain-specific applications, touching on aspects of scalability, security, and Responsible AI—all fused together so that it can be seen how Gemini enables developers plus organizations to access new opportunities toward driving much-anticipated AI innovation at scale.

The use of Apache Ignite in a cloud environment

Apache Ignite is a distributed computing platform specifically designed for cloud deployments. It offers a powerful combination of in-memory computing capabilities and traditional database functionality. At its core, it provides distributed caching, SQL querying, and compute grid features, making it particularly effective for cloud-based applications requiring high performance and scalability.
In cloud environments, Apache Ignite implements a memory-centric architecture where data is primarily stored in RAM while maintaining disk persistence for durability. This hybrid approach ensures optimal performance while guaranteeing data safety. Using consistent hashing algorithms, the platform automatically partitions and replicates data across multiple cloud nodes, enabling fault tolerance and high availability.
The platform integrates with major cloud providers, including AWS, Azure, and Google Cloud Platform, through native connectors. It supports auto-discovery of nodes in cloud environments using IP-based discovery mechanisms, simplifying cluster management and scaling operations. The system allows for dynamic scaling by adding or removing nodes without service interruption, with automatic data rebalancing across available nodes.
Security is a crucial aspect of cloud deployments, and Apache Ignite addresses this through comprehensive SSL/TLS encryption for data in transit, along with robust authentication and authorization mechanisms. The platform also implements sophisticated network security features essential for cloud operations.
Several performance factors must be considered in cloud deployments, including network latency between nodes, memory allocation based on instance types, and backup strategies that account for cloud storage costs. Best practices involve utilizing cloud-native monitoring tools, implementing proper network security groups, configuring appropriate instance sizes, and enabling automatic backups to cloud storage.
The distributed architecture of Apache Ignite makes it particularly well-suited for cloud-native applications that demand high throughput and low latency data access. Organizations can build highly available, scalable, and performant distributed systems by leveraging cloud platform features and following recommended deployment practices.

Ops Chatbot: Ask Gemini About System Health

Site Reliability Engineers and platform teams are constantly challenged with analyzing incident data, tracking SLA performance, and extracting actionable insights from logs—tasks that are often manual and time-consuming. This session introduces an AI-powered Ops Chatbot built with Google Gemini API, demonstrating how teams can interact with operational data in natural language. Using sample incident logs and uptime metrics, the chatbot can answer questions like, “Why did Service X fail last week?” or “Which incidents most impacted our SLAs?” Attendees will see a live demo of Gemini summarizing complex operational data into concise, actionable insights. This practical example highlights how AI can reduce toil, accelerate decision-making, and enhance visibility for both technical and non-technical stakeholders, offering a new paradigm for reliability management in modern platform engineering.

No-Code Reliability: Extending SRE Principles with Google AppSheet

The SRE foundation practices include automation, toil reduction, and allowing teams to work on scaling reliability. In most cases, that means code and complex tools. Google AppSheet opens a no-code pathway to push SRE principles further across the organization. This talk reviews how AppSheet may be used for the fast secure scale to build applications on top of incident tracking, on-call handoffs, operational dashboards, workflow approvals through integration with Google Workspace, BigQuery, and cloud monitoring data-all without requirement of any code from engineers/other non-technical stakeholders. Real use cases, governance strategies, and design patterns shall also be learned here that help in accelerating automation by platform engineering teams and reducing manual effort through driving reliability culture using AppSheet as an SRE no-code extension.

No-Code Ops Dashboard with AppSheet

SRE and platform engineering teams as well as their accompanying management depend on reliable reporting for visibility into operations. In most cases, designing dashboards entails tedious coding and integration. Google AppSheet delivers a no-code solution where simple spreadsheets can be transformed into interactive real-time dashboards within minutes. This session presents a walk-through of how an operations dashboard showing uptime metrics, trends relating to incidents, and sample data on SLA performance can be created. The drag-and-drop user interface by AppSheet will be shown integrating with Google Sheets or BigQuery so that technical or non-technical users have insight into system health for making decisions based upon data to reduce operational toil. The live demo even shows how the teams can drive reliability, collaboration, and transparency without big engineering overhead with just a few clicks.

Incident Classifier with Teachable Machine

Effective incident management is key to maintaining system reliability, but triaging alerts and categorizing incidents can be time-consuming. Google Teachable Machine enables SRE and platform engineering teams to build simple machine learning models without coding. In this session, we will demonstrate how to train a model to classify different types of incidents—such as network failures, application errors, and database issues—using sample data. Attendees will see a live demo where new incidents are automatically classified, helping teams prioritize and respond faster. This practical example showcases how lightweight AI models can reduce manual toil, enhance operational efficiency, and make reliability practices more accessible to both technical and non-technical stakeholders.

Hot Tech (new and emerging tech)

Cloud/DevOps

Automation in Cloud DevOps pipelines.

Karthigayan Devan

Google Developer Expert | Cloud Platforms, SRE & AI Innovation

Atlanta, Georgia, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Karthigayan Devan

Actions

Links

Area of Expertise

Topics

Sessions

Transforming Ideas into AI Solutions with Gemini

The use of Apache Ignite in a cloud environment

Ops Chatbot: Ask Gemini About System Health

No-Code Reliability: Extending SRE Principles with Google AppSheet

No-Code Ops Dashboard with AppSheet

Incident Classifier with Teachable Machine

Hot Tech (new and emerging tech)

Cloud/DevOps

Karthigayan Devan

Links

Actions