Session
Using Amazon Bedrock at scale: How we overcame quota limits without Provisioned Throughput
Amazon Bedrock provides easy access to some of the best foundation models for GenAI projects. However, due to high user demand and limited hardware capacity, AWS unexpectedly reduced quotas in late 2024. Anthropic's models in regions like Frankfurt were particularly affected, causing serious problems for many customer projects due to very low quota limits on requests and tokens per minute.
In this talk, I'll share real-world experiences from several customer projects, highlighting practical solutions we've found to successfully scale the use of foundation models without relying on expensive provisioned throughput. I'll discuss specific techniques we've used to increase seemingly fixed quota limits by more than ten times. I'll also show how choices in foundation model selection, AWS regions, and architectural setups - such as multi-account structures and cross-region inference - can significantly improve capacity and reduce costs.
By the end of this session, cloud architects, engineers, and AI developers will have clear insights into how to optimise their Amazon Bedrock architecture. You'll learn straightforward methods for supporting cost-sensitive but robust GenAI workloads, even in demanding production environments.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top