Why AI energy efficiency efforts must focus on inference -

Energy, not cost, should be the central parameter for AI efficiency
Green AI focuses on reducing the environmental impact of AI systems over the lifecycle

As artificial intelligence becomes increasingly embedded in business and society, the conversation around sustainability and efficiency is evolving. Vincent Caldeira (pic), CTO, APAC of Red Hat and an influential voice in the Green Software Foundation, emphasises that the true sustainability challenge for AI lies not in model training, but in inference—the phase where models deliver predictions and value at scale. “The biggest problem is on the inference side, because inference consumes all the time. It scales with utilisation. The more people use it, the more the ratio is going to be worse,” he said on the sidelines of the Red Hat Summit in Boston.

Energy Optimisation: The Key Metric for AI

Caldeira is unequivocal: energy, not cost, should be the central parameter for AI efficiency. While traditional IT operations have focused on optimising for cost, this approach is no longer sufficient in the era of large-scale AI deployment.

“Energy has become in my view the biggest parameter of actual efficiency management… energy is the only proxy of efficiency as a whole at the system level. There is no other way to measure it.”

He argues that optimising for cost alone can lead to sub-optimal decisions that may appear efficient financially but are wasteful in terms of resource and energy use. Instead, organisations should measure and optimise for energy consumption across the entire AI lifecycle, from infrastructure choices to operational deployment.

The Challenges of Optimising AI Inference

Optimising inference is a complex, multi-layered problem. Unlike training, which is a one-off event, inference is continuous and highly variable, depending on user demand, workload location, and infrastructure specifics. Caldeira outlines several key challenges:

Workload Placement and Infrastructure Diversity: Deciding where to run inference workloads—on-premises, in the cloud, or at the edge—requires careful consideration of data gravity, energy availability, and hardware capabilities.
Resource Right-Sizing: Matching the right GPU or accelerator to the workload is non-trivial, especially as hardware and model requirements evolve.
Operational Complexity: Efficient inference demands real-time decisions about routing, caching, and memory allocation, all of which impact energy use.
Lack of Standardised Metrics: Different cloud providers report efficiency metrics in inconsistent ways, making it hard for enterprises to compare and optimise across environments.

“Inference is a huge, really difficult optimisation problem. You need to decide based on your infrastructure what’s the best place to run the workload… you have a need to do right sizing… to optimise the allocation, then you’ve got to do routing. And this routing ideally should be cognisant of the use case.”

Continue reading at https://oursustainabilitymatters.com/why-ai-energy-efficiency-efforts-must-focus-on-inference/ for the full article as DNA is transitioning our sustainability coverage to a standalone news site.