Session
Reducing the Cost of AI Inference: From Hardware to Software
AI models operate through two primary stages. The first is the training stage, during which the model learns underlying patterns and relationships from the training data by optimizing its parameters. The second is the inference stage, where the trained model generalizes this learned knowledge to interpret and make predictions on new unseen data.
When inference is discussed, it is often associated with high computational cost. Modern AI models, particularly large language models (LLMs), can require millions of dollars to deploy and operate at scale. This high cost arises from several key factors, including the scale of operations, latency and throughput requirements, and model complexity.
In this session, I aim to explain how the cost of AI inference can be reduced at multiple levels of the system stack, namely the hardware, software, and middleware levels.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top