Gokul Prabagaren
Lead Software Engineer at Capital One
McLean, Virginia, United States
Actions
Lead Software Engineer at CapitalOne – Rewards Org, specialized in Distributing computing. Developed distributed Cloud Native applications based on Spark & NoSQL which are currently serving millions of customers everyday. Previously developed Java apps from 1.2 on-prem & VMs. Tech Speaker and writer specialized in Big Data and NoSQL
Speaker at All Things Open 2023,2024, Open Source 101 2023
Spark AI Summit 2020
ApacheCon2021
cdCon2021,2022
Open Source Summit 2022,2024
PythonWebConf 2021,2022,2023
IndyCloudConf 2020
Area of Expertise
Tale of Apache Parquet reaching the pinnacle of friendship with Data Engineers
CapitalOne being Tech Company in Banking business, we are 100% Cloud operated Company with Data DNA. CapitalOne Loyalty is one of such cloud native application processing billions of credit card transactions yearly and delighting our customers with rewards. This talk will be see thro' lens into our data processing pipeline and how Apache Parquet plays a pivot role in each step of our processing. We have various design patterns implemented using Apache Parquet and Apache Spark and this talk will touch upon those as well and how our resiliency has increased with usage of Apache Parquet. There are multiple credit card processing streams and how Parquet helps in our choreography and replaying them will also be covered in this talk. Apache Parquet is deeply intertwined in our pipeline and this talk will highlight on how it is connected and used in our pipeline. Overall audience will take away some interesting real world usage of Apache Parquet in Apache Spark data processing pipeline.
Tale of 3 Amigos - Spark,Cassandra & Mongo in Open Source Land
CapitalOne is first US bank to exist out of on-premises and moved completely on Cloud. Over this process of modernizing our application in CapitalOne Card Rewards, we developed ground up custom transactions processing application on open source technologies like Apache Spark, MongoDB, Apache Cassandra etc. This application currently processes millions of customer transactions daily providing them millions of miles, cash and points everyday. In process of building our application, we came across many challenging issues to have Spark application process data from MongoDB and Cassandra backend to serve customers. This talk focuses on tale of these 3 open source technologies on how they co-existed and solved the problems of Open source land applications mainly focussing on below problems:
How Cassandra Key sequence is important and how it impacts in querying
How Cassandra batching helps and works well with Spark partitions
Importance of Cassandra Data Modeling and its implications after MVP/Deployment
How to manage Mongo Connection (at JVM level)
Implications of using MongoSpark connector on its Partitioner
Open Source Alchemy: Unveiling Capital One's Transaction Magic
In this captivating tech talk, we delve into the enchanting world of Capital One's innovative use of open source technologies to process millions of transactions daily within their loyalty systems. Join us as we uncover the lessons learned and the ingenious techniques employed to optimize queries, ensuring lightning-fast responses.
Key Highlights:
Open Source Alchemy: Explore the magic behind Capital One's decision to embrace open source technologies and witness the transformation it brought to their transaction processing landscape.
Millions in Motion: Understand the intricacies of handling a colossal number of transactions daily and the challenges faced in maintaining seamless loyalty systems.
Lessons Learned: Gain insights into the valuable lessons Capital One learned throughout their journey, from implementation hurdles to fine-tuning processes for optimal performance.
Query Optimization Wizardry: Uncover the secret techniques employed by Capital One to optimize queries, ensuring that every transaction receives a swift and responsive treatment.
Join us on this magical journey !!
Introduction into AI Gateways
AI has taken the centre stage in 2022 with lot of momentum in both private and publich LLMs. With increase of adoption and usage of LLMs is accelerating,there is greater need to manage and orchestrate those LLMs. This talk will focus on introducing something called AI Gateway. What are it is benefits and why it is required
Life is better,Since we choose right keys for Cassandra Home
CapitalOne Loyalty is a cloud native application processing billions of credit card transactions yearly and delighting our customers with rewards. Our platform uses Apache Cassandra as one of its key datastore. While building highly resilient systems for servicing our customer with Apache Cassandra. There are various key design decisions were made and it involves choosing right partition keys for Apache Cassandra. This talk will highlight 3 of such real world use case of choosing right partition key and how it helps us with our scale and resiliency. Those use cases are :
1. Adoption of Staging Table pattern helping us resiliency
2. Adoption of Current and History pattern helping us to service both batch and real-time use cases
3.Adoption of idempotency helping with all de-dups
How Apache Parquet plays a pivotal role in processing billions of CapitalOne transactions
CapitalOne being Tech Company in Banking business, we are 100% Cloud operated Company with Data DNA. All our workloads are Cloud Native. CapitalOne Loyalty is one of such cloud native application processing billions of credit card transactions yearly and delighting our customers with rewards. This talk will be see thro' lens into our data processing pipeline and how Apache Parquet plays a pivot role in each step of our processing. We have various design patterns implemented using Parquet and Spark and this talk will touch upon those as well and how our resiliency has increased with usage of Apache Parquet. There are multiple credit card processing streams and how Parquet helps in our choreography and replaying them will also be covered in this talk. Parquet is deeply intertwined in our pipeline and this talk will highlight on how it is connected and used in our pipeline. Overall audience will take away some interesting real world usage of Parquet in Spark data processing pipeline.
How Abstraction is important in accelerating Continuous Delivery & improving engineer's productivy
CapitalOne being pioneer in CICD and Tech Adoption has enabled them to excel in defining many CICD patterns. We have shared our journey of CICD in 2021 and shifting left benefits in 2022 cdCON. This talk will be continuation of those focussing in area of how abstraction of underlying technologies helps organization accelerate their CICD adoption in turn helping them in increasing their engineer's productivity. CapitalOne being a tech company doing banking and credit card business, buildind foundational capabilities and continuing to innovate on those frontier have helped us excel as well define some patterns for industry adoption. Abstraction is one such pattern which we have adopted and scaled across our enterprise. This talk will focus more details on how we have reached out pivot and scaled them for our organization. Attendees will be able to take away a design pattern in CD which they can apply to their org.
Deep dive on Apache Spark design pattern of filtering vs enriching the data
Apache Spark provides lot of options of joining the data for its datasets. This talk will focus on comparing the approach of Enriching the data vs filtering the data.How both approaches end up with same result and highlight the merits of Enriching the data approach helped us. We at CapitalOne are heavy users of Spark.This talk will provide more details of how we evolved from filtering to Enriching the data for credit card transactions and highlight what benefits we got by following Enriching the data approach. Being the financial institution, we are bound by regulation.We need to backtrace all credit card transactions processed through our engine. Will be providing the details on how Enriching the data approach solved us this requirement. This talk will provide more context on how financial institutions can use Enriching the data approach for their Spark workloads and backtrace all the data they processed this approach. We have used the filtering approach in Production and what were it issues and why we moved to Enriching the data approach in Production will also be covered in this talk. This se case is running successfully in Production processing billions of transactions yearly
Enriching the data vs Filtering the data in Apache Spark using Java.
Apache Spark provides lot of options of joining the data for its data sets. This talk will focus on comparing the approach of Enriching the data (left outer join) versus filtering the data(inner join).How both approaches end up with same result and highlight the merits of Enriching the data approach helped us in CapitalOne. We at CapitalOne are heavy users of Spark from its initial days.This talk will provide more details of how we evolved from filtering to Enriching the data for credit card transactions and highlight what benefits we got by following Enriching the data approach. Being the financial institution, we are bound by regulation.We need to backtrace all credit card transactions processed through our engine. Will be providing the details on how Enriching the data approach solved us this requirement. This talk will provide more context on how financial institutions can use Enriching the data approach for their Spark workloads and backtrace all the data they processed this approach. We have used the filtering approach in Production and what were it issues and why we moved to Enriching the data approach in Production will also be covered in this talk. This realtime use case was implemented using Java and running successfully in Production processing billions of transactions yearly. Attendees will be able to take away the more details on Enriching and filtering options to decide on their use cases.
Challenges of Spark Application coexisting with NoSQL databases
CapitalOne is first US bank to exist out of on-premises and moved completely on Cloud. Over this process of modernizing our application in CapitalOne Card Rewards, we developed ground up custom transactions processing application on open source technologies like Apache Spark, MongoDB, Apache Cassandra etc. This application currently processes millions of customer transactions daily providing them millions of miles, cash and points everyday. In process of building our application, we came across many challenging issues to have Spark application process data from MongoDB and Cassandra backend to serve customers. This talk is going to focus on few of those issues, what is the impact of those issue and how to mitigate them.To call out specifically following are list of issues this talk will focus on.
How Cassandra Key sequence is important and how it impacts in querying
How Cassandra batching helps and works well with Spark partitions
Importance of Cassandra Data Modeling and its implications after MVP/Deployment
How to manage Mongo Connection (at JVM level)
Implications of using MongoSpark connector on its Partitioner
All the issues highlighted are faced by us in our application. This talk will focus on what are these issues in Spark/Mongo/Cassandra app
Challenges of Spark Application coexisting with NoSQL databases
CapitalOne is first US bank to exist out of on-premises and moved completely on Cloud. Over this process of modernizing our application in CapitalOne Card Rewards, we developed ground up custom transactions processing application on open source technologies like Apache Spark, MongoDB, Apache Cassandra etc. This application currently processes millions of customer transactions daily providing them millions of miles, cash and points everyday. In process of building our application, we came across many challenging issues to have Spark application process data from MongoDB and Cassandra backend to serve customers. This talk is going to focus on few of those issues, what is the impact of those issue and how to mitigate them.To call out specifically following are list of issues this talk will focus on.
How Cassandra Key sequence is important and how it impacts in querying
How Cassandra batching helps and works well with Spark partitions
Importance of Cassandra Data Modeling and its implications after MVP/Deployment
How to manage Mongo Connection (at JVM level)
Implications of using MongoSpark connector on its Partitioner
All the issues highlighted are faced by us in our application. This talk will focus on what are these issues in Spark/Mongo/Cassandra app
Challenges of Spark Application coexisting with NoSQL databases
CapitalOne is first US bank to exist out of on-premises and moved completely on Cloud. Over this process of modernizing our application in CapitalOne Card Rewards, we developed ground up custom transactions processing application on open source technologies like Apache Spark, MongoDB, Apache Cassandra etc. This application currently processes millions of customer transactions daily providing them millions of miles, cash and points everyday. In process of building our application, we came across many challenging issues to have Spark application process data from MongoDB and Cassandra backend to serve customers. This talk is going to focus on few of those issues, what is the impact of those issue and how to mitigate them.To call out specifically following are list of issues this talk will focus on.
How Cassandra Key sequence is important and how it impacts in querying
How Cassandra batching helps and works well with Spark partitions
Importance of Cassandra Data Modeling and its implications after MVP/Deployment
How to manage Mongo Connection (at JVM level)
Implications of using MongoSpark connector on its Partitioner
All the issues highlighted are faced by us in our application. This talk will focus on what are these issues in Spark/Mongo/Cassandra app
Unleashing Agility: How Capital One embraced COTS for NextGen Modular Architecture using serverless
In today's fast-paced technology landscape, organizations face increasing pressure to stay competitive, reduce costs, and deliver high-quality products and services quickly. However, traditional monolithic architectures can be a barrier to innovation and agility, making it difficult to keep up with rapidly evolving market demands. That's where federating COTS comes in.
What is COTS? - Components Of The Shelf refers to pre-built or pre-existing software components that are readily available for use without needing to develop them from scratch. These components can be used to speed up development and reduce costs.
During our talk, we will share Capital One's journey from monoliths to a modular architecture and then transforming them to reusable plug n play COTS at Capital One. We will discuss how we leveraged cloud computing, micro services, and automated pipelines to achieve faster time-to-market, reduced development costs, and enhanced agility and flexibility.
Attendees will gain valuable insights into the benefits and challenges of federating COTS, as well as the practical steps for adopting a modular architecture.
Enriching datasets approach using Apache Spark solves granular tracing issue in regulated finance
Apache Spark provides lot of options of joining the data for its data sets. This talk will focus on comparing the approach of Enriching the data (left outer join) versus filtering the data(inner join).How both approaches end up with same result and highlight the merits of Enriching the data approach helped us in CapitalOne. We at CapitalOne are heavy users of Spark from its initial days.This talk will provide more details of how we evolved from filtering to Enriching the data for credit card transactions and highlight what benefits we got by following Enriching the data approach. Being the financial institution, we are bound by regulation.We need to backtrace all credit card transactions processed through our engine. Will be providing the details on how Enriching the data approach solved us this requirement. This talk will provide more context on how financial institutions can use Enriching the data approach for their Spark workloads and backtrace all the data they processed this approach. We have used the filtering approach in Production and what were it issues and why we moved to Enriching the data approach in Production will also be covered in this talk. Attendees will be able to take away the more details on Enriching and filtering options to decide on their use cases.
Indy Cloud Conference 2020
Enriching datasets approach using Apache Spark solves granular tracing issue in regulated finance
Apache Spark provides lot of options of joining the data for its data sets. This talk will focus on comparing the approach of Enriching the data (left outer join) versus filtering the data(inner join).How both approaches end up with same result and highlight the merits of Enriching the data approach helped us in CapitalOne. We at CapitalOne are heavy users of Spark from its initial days.This talk will provide more details of how we evolved from filtering to Enriching the data for credit card transactions and highlight what benefits we got by following Enriching the data approach. Being the financial institution, we are bound by regulation.We need to backtrace all credit card transactions processed through our engine. Will be providing the details on how Enriching the data approach solved us this requirement. This talk will provide more context on how financial institutions can use Enriching the data approach for their Spark workloads and backtrace all the data they processed this approach. We have used the filtering approach in Production and what were it issues and why we moved to Enriching the data approach in Production will also be covered in this talk. Attendees will be able to take away the more details on Enriching and filtering options to decide on their use cases.
Databricks Spark AI Summit 2020
Modeling Financial Data In Cassandra To Serve Real Time And Batch Workloads At Same Time
This talk will explain how we have modeled customer rewards data in CapitalOne using Apache Cassandra to serve Real time microservice based workloads ( Customer accessing their rewards online ) and batch Apache Spark workloads ( Customer statements ) at same time.
CapitalOne being Tech Company in Banking business, we are 100% Cloud operated Company. All our workloads are Cloud Native. This talk covers one of such use case which will explain how we have modeled customer rewards data in CapitalOne using Apache Cassandra to serve Real time microservice based workloads and batch Apache Spark workloads at same time. When customer accesses their Rewards from web or Customer receives their Rewards in Statements the Cassandra table we modeled plays a central role and services both the different workloads at same time. This talk will cover how Cassandra data is used by Spring based microservice and Spark based batch workload. I am part of team which designed and developed this application ground up and serving millions of customers now.
Python Web Conference 2021
Evolution of CICD in CapitalOne
Capital One is the first U.S. bank to exit on-prem legacy data centers and go all in on the cloud. We came very far in our journey of being a 100% Cloud operated company. From initial days of having separate teams to support and deploy development team’s code changes to production,now we are deploying the majority of our production changes as CICD enabled push button changes.This talk focuses on our CICD journey from our humble beginnings to where we are today.It focuses on various phases of CICD development and adoption within our organization. From initial days of custom scripting to matured inner-sourced framework for CICD,we have gone through various issues,challenges and learnt a lot of lessons over the course of our journey. We felt it worth sharing with a broader audience in this forum.
Continuous Delivery Conference 2021
How Shifting Left is Helping CapitalOne in its CICD Process
CapitalOne is first U.S Bank to exit out of legacy on-premise datacentres and go all in cloud. On this journey we have completely re-innovated our software delivery and automated all our software delivery process. There are various key milestones to our CICD journey. This talk will focus on how adopting shifting left is helping us in our Software Delivery Lifecycle and Security
Continuous Delivery Foundation 2022
Challenges of Spark Application coexisting with NoSQL databases.
CapitalOne is first US bank to exist out of on-premises and moved completely on Cloud. Over this process of modernizing our application in CapitalOne Card Rewards, we developed ground up custom transactions processing application on open source technologies like Spark, Mongo, Cassandra etc. This application currently processes millions of customer transactions daily providing them millions of miles, cash and points everyday. In process of building our application, we came across many challenging issues to have Spark application process data from MongoDB and Cassandra backend to serve customers. This talk is going to focus on few of those issues, what is the impact of those issue and how to mitigate them.To call out specifically following are list of issues this talk will focus on.
How Cassandra Key sequence is important and how it impacts in querying
How Cassandra batching helps and works well with Spark partitions
Importance of Cassandra Data Modeling and its implications after MVP/Deployment
How to manage Mongo Connection (at JVM level)
Implications of using MongoSpark connector on its Partitioner
All the issues highlighted are faced by us in our application. This talk will focus on what are these issues in Spark/Mongo/Cassandra app environment and how to mitigate them. Anyone using Spark apps with Mongo and Cassandra databases as backend can benefits from this talk.
Apache Con aka Community Over Code 2021
Modeling Financial Data In Cassandra To Serve Real Time And Batch Workloads At Same Time
This talk will explain how we have modeled customer rewards data in CapitalOne using Apache Cassandra to serve Real time microservice based workloads (Customer accessing their rewards online) and batch Apache Spark workloads (Customer statements) at same time.
CapitalOne being Tech Company in Banking business, we are 100% Cloud operated Company. All our workloads are Cloud Native. This talk covers one of such use case which will explain how we have modeled customer rewards data in CapitalOne using Apache Cassandra to serve Real time microservice based workloads and batch Apache Spark workloads at same time. When customer accesses their Rewards from web or Customer receives their Rewards in Statements the Cassandra table we modeled plays a central role and services both the different workloads at same time. This talk will cover how Cassandra data is used by Spring based microservice and Spark based batch workload. I am part of team which designed and developed this application ground up and serving millions of customers now.
Apache Con aka Community Over Code 2021
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top