Session

10 Steps to Transform RAG from POC to Production: The Hidden Complexity of RAG

In this talk, we will explore the journey of the Cisco DevNet engineering team as we transformed a simple Retrieval-Augmented Generation (RAG) proof of concept (POC) into a robust, production-ready application for developer.cisco.com. While creating a basic RAG demo is straightforward, scaling it to meet production standards involves navigating a myriad of challenges and complexities. This session will provide a step-by-step guide to the key aspects of this transformation, including vector indexing, model selection, prompt engineering, agent frameworks, security, user feedback, and continuous improvement.

Key Takeaways and Outline:

1. Introduction: Brief overview of RAG and its applications, highlighting the simplicity of creating a basic RAG demo.

2. Vector Indexing: Selecting the right vector database (Pinecone, Faiss, Milvus) and chunking strategies based on content type. Reference to VectorView benchmark.

3. Embedding Models: Choosing the right embedding model based on context size and balancing performance and cost.

4. Model Selection: Exploring self-hosting vs. API-based models with references to Artificial Analysis, The Fastest AI, and Hugging Face Open LLM Leaderboard.

5. Prompt Engineering: Crafting effective prompts, query rewriting, and reranking models.

6. Agent Frameworks: Comparing Langchain, Langgraph, and Autogen, and choosing the right framework for your needs.

7. Security and Guardrails: Adhering to the GenAI OWASP Top 10, implementing data privacy, access controls, and monitoring. Using Red Teaming with Promptfoo to simulate attacks and identify vulnerabilities, ensuring the application is robust and secure.

8. User Feedback and Trial Runs: Conducting trial runs, gathering user feedback, and benchmarking user trial queries. Using DeepEval for evaluating model performance, measuring accuracy, precision, recall, and F1 score, and integrating into

9. CI/CD pipelines for continuous monitoring.
Continuous Improvement: Setting up a dashboard to track, analyze, and improve patterns, making data-driven decisions to enhance the user experience.

10. Deployment: Automating the deployment process with CI/CD pipelines, real-time monitoring and logging, and rollback strategies for safe deployments.

Conclude with Real-World Application: Case study of developer.Cisco.com, discussing challenges faced and solutions implemented.

https://developerweek2025.sched.com/event/1tFJk/open-session-10-steps-to-transform-rag-from-poc-to-production-the-hidden-complexity-of-rag

Neelesh Pateriya

Principal Engineer at Cisco.

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top