Emmanuel Acheampong
Senior Manager Developer Relations at Crusoe
San Francisco, California, United States
Actions
Emmanuel Acheampong is a Developer Advocate at Crusoe AI, where he is a founding member of the Developer Relations team for Managed AI — bridging the gap between managed inference infrastructure and the open-source developer ecosystem. He focuses on enabling engineers to leverage Crusoe's high-performance GPU infrastructure to build and deploy the next generation of AI agents and frontier models.
Before Crusoe, Emmanuel was Co-Founder and Head of AI at yShade.ai (Google for Startups Black Founders Accelerator '24), where he led development of proprietary computer vision algorithms trained on a 12M+ image dataset, shipping production AI systems across e-commerce, beauty, and vision-critical applications.
Emmanuel is passionate about open-source AI, agentic systems, machine learning, and the future of cloud infrastructure for AI workloads.
Links
Area of Expertise
Topics
Inference in Production: Engineering LLM Serving for Latency, Throughput, and Reliability
Inference looks simple from the outside: send a prompt, get a response. In production, it becomes a systems engineering problem.
Latency spikes under burst traffic. Throughput stalls despite adding GPUs. Tail latency explodes from batching and scheduling dynamics. Teams spend months rediscovering the same bottlenecks around KV cache pressure, autoscaling lag, model warmup, and GPU utilization.
This workshop is presented by Crusoe engineers who work directly on the inference systems powering customer workloads on Crusoe Cloud. In 50 minutes, we’ll break down how modern LLM inference actually works, why production serving is far harder than most teams expect, and the infrastructure patterns required to deliver reliable low-latency inference at scale.
Topic includes:
1. The mechanics of inference: tokenization, prefill vs. decode, KV cache behavior, and the real drivers of latency and throughput.
2. Why serving LLMs is difficult in practice: batching tradeoffs, memory pressure, head-of-line blocking, autoscaling behavior, and tail-latency management.
3. How Crusoe engineers its inference stack for low time-to-first-token, sustained throughput, and predictable performance under load.
4. Production case studies of leveraging open-source LLM infrastructure.
Attendees will leave with a systems-level mental model of inference, practical evaluation criteria for inference providers, and concrete operational patterns they can apply to their own deployments.
No Single Model to Rule Them All: Building Resilient AI Agents Across Open & Closed LLMs
The era of betting everything on a single LLM is over. Developers building production AI agents face a reality no model vendor wants to talk about: no one model excels at every task, no single API guarantees 100% uptime, and no proprietary provider offers the cost profile that works for every layer of an agentic pipeline.
The open-source LLM ecosystem has changed the equation. Llama 3.3, DeepSeek-R1, Qwen3, Gemma 3, Kimi-K2 — these models are not fallback options. They are, for many agentic workloads, the better choice on quality, latency, cost, or all three. But the real power is not in picking one winner. It is in architecting agents that route across multiple models, failover when an endpoint goes down, and match model strengths to task requirements in real time.
Resilient agentic engineering demands a multi-model, multi-provider architecture — and the neocloud is built for exactly this. Crusoe Managed AI provides a single API surface across every major open-source LLM, on infrastructure purpose-built for the throughput and latency demands of agentic workloads.
This session draws from production experience to walk through the architecture decisions, failure modes, and performance tradeoffs of moving from a single-model prototype to a resilient, multi-model agent in production.
No Single Model to Rule Them All: Building Resilient AI Agents Across Open & Closed LLMs
The era of betting everything on a single LLM is over. Developers building production AI agents face a reality no model vendor wants to talk about: no one model excels at every task, no single API guarantees 100% uptime, and no proprietary provider offers the cost profile that works for every layer of an agentic pipeline.
The open-source LLM ecosystem has changed the equation. Llama 3.3, DeepSeek-R1, Qwen3, Gemma 3, Kimi-K2 — these models are not fallback options. They are, for many agentic workloads, the better choice on quality, latency, cost, or all three. But the real power is not in picking one winner. It is in architecting agents that route across multiple models, failover when an endpoint goes down, and match model strengths to task requirements in real time.
Resilient agentic engineering demands a multi-model, multi-provider architecture — and the neocloud is built for exactly this. Crusoe Managed AI provides a single API surface across every major open-source LLM, on infrastructure purpose-built for the throughput and latency demands of agentic workloads.
This session draws from production experience to walk through the architecture decisions, failure modes, and performance tradeoffs of moving from a single-model prototype to a resilient, multi-model agent in production.
No Single Model to Rule Them All: Building Resilient AI Agents Across Open & Closed LLMs
AI agents are only as reliable as the models behind them. Most teams start by wiring an agent to a single LLM and calling it done. Then reality hits: rate limits, outages, cost spikes, and tasks where one model underperforms another. The teams building resilient agents in production aren't betting on one model. They're building across many.
This talk covers how to architect AI agents that route intelligently across open and closed LLMs. I'll walk through practical patterns for model selection at inference time: when to use a large frontier model versus a fine-tuned open-weight model, how to build fallback chains that maintain agent quality during provider outages, and how to use routing logic to optimize for cost, latency, and task-specific accuracy.
Using PyTorch ecosystem tools like vLLM for self-hosted open models alongside closed API providers, I'll show how teams are deploying agent systems that aren't locked into any single vendor or architecture. We'll look at real tradeoffs between dense and MoE open models for different agent subtasks, and why the most resilient agent architectures treat model selection as a runtime decision, not a design-time one.
Building Computer Vision AI algorithms for 100 skin shades
roboMUA is leveraging AI to build Computer Vision models for the beauty and fashion industry for over 100 skin shades.
In this session, I’ll discuss how we gathered our data, trained our ur models and deployed models that took 100 skin shades into consideration in order to be inclusive.
Emmanuel Acheampong
Senior Manager Developer Relations at Crusoe
San Francisco, California, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top