We are in the midst of an AI revolution, being led by the rise of large language models (LLMs), enabling unprecedented advancements in natural language processing and generation. Models like GPT-4, Claude-3, and Gemini have demonstrated remarkable performance across a wide range of language tasks, from understanding and generation to complex problem-solving. However, as these models grow in size and complexity, their energy consumption has skyrocketed, raising alarming concerns about the sustainability and scalability of AI systems.
This is a growing concern among all the stakeholders in the AI ecosystem. At the World Economic Forum's annual meeting in Davos in Jan 2024, Sam Altman warned that the energy demands of future generative AI systems will vastly outstrip expectations and strain global energy systems.
“There’s no way to get there without a breakthrough,” he said .
Currently, it's estimated that ChatGPT alone uses as much power as 33,000 homes , and a search powered by generative AI consumes four to five times more energy than conventional web searches. As AI systems continue to grow in scale and sophistication, their energy needs will likely match entire countries within years.
It’s not just power, Gen AI requires enormous amounts of fresh water to cool processors and generate electricity. OpenAI's large language model GPT-4 resulted in a 6% increase in water usage at the data center cluster powering it. A study by Alex de Vries, a researcher at VU Amsterdam, projects that AI could consume between 85 to 134 terawatt hours of electricity annually, equivalent to the total energy usage of the Netherlands by 2027. This staggering figure is primarily driven by the increasing size and complexity of LLMs.
As de Vries puts it, "A single LLM interaction may consume as much power as leaving a low-brightness LED lightbulb on for one hour" (Wells, 2023).
While Training is usually blamed, Inference is the hidden culprit The energy consumption of LLMs can be attributed to two primary stages: training and inference. Training an LLM involves learning the model's parameters from vast amounts of data, a process that requires immense computational resources, including powerful GPUs and TPUs. As reported by The Verge, training GPT-3 was estimated to consume nearly 1,300 megawatt hours of electricity , equivalent to the annual energy consumption of 130 U.S. homes. The models currently being built are an order (or 2 orders!) of magnitude larger than GPT-3.
Inference, the actual usage of the trained model to generate outputs based on user inputs, has often been overlooked in terms of its energy impact. However, recent studies have shown that the cumulative energy consumption during inference can surpass that of training when deployed at scale. Researchers at Hugging Face and Carnegie Mellon found that using an LLM to generate a single image consumes about as much energy as fully charging a smartphone . With millions of users interacting with popular models like ChatGPT daily, the inference costs quickly accumulate.
Model Routing: A Promising Solution To address the growing energy consumption of LLMs, researchers and industry experts are exploring various optimization techniques - ranging from training improvements to hardware innovations. Martian is developing an orthogonal approach which can further address this problem - “model routing”. This involves dynamically selecting the most efficient model for each specific task or inference request.
By matching each request to the model best suited for that specific task, routers can often use smaller, more efficient models while still achieving high quality outputs. This translates directly to energy savings.
For example, the in our research , we find that cascading routers, which sequentially try increasingly large models until a quality threshold is met, can outperform individual models while using less computation overall. By stopping when "good enough" is achieved, cascading routers avoid the energy cost of overcomputing with massive models.
Furthermore, model routing also empowers users with more control. Every user of an LLM has their own needs and priorities. Some may favor response quality above all else, while others prefer fast, cheap outputs.
For example, an AI-powered writing assistant could offer a "quality" slider, using cheaper, weaker models for realtime feedback but routing to more powerful models for the final output. This allows users to customize the system to their needs. Students may favor cheaper suggestions, while professional writers opt for the highest quality regardless of cost. By offering users transparency and control, model routing enables personalized balancing of cost, energy use, and task performance.
As detailed in our research, Martian's model router "is capable of out-performing GPT-4 on OpenAI's own evals – and doing so at a lower cost." By intelligently routing queries to specialized models tailored for specific tasks, the Martian router can achieve even up to a 97% reduction in cost while maintaining performance on par with GPT-4.
Making Model Training More Efficient While inference accounts for the bulk of energy use in deployed AI systems, model training is also extremely energy intensive, with large models like GPT-3 estimated to have a carbon footprint exceeding that of a passenger vehicle over its entire lifetime.
Model routing can help here too by enabling more efficient use of training data and compute resources. Instead of training one massive, monolithic model to handle every possible task, developers can train an ecosystem of smaller, specialized models that a router selects between.
This modular approach avoids redundant training, as the specialized models focus solely on their niche rather than absorbing extraneous information. Further research is needed to develop training pipelines optimized for routing applications. But the potential is clear: a thoughtfully constructed fleet of specialized models can provide excellent task coverage and output quality while being faster and more efficient to train than a brute force giant model.
Industry Collaborations: Driving Sustainable AI Practices While technical innovations like model routing are crucial in mitigating the energy footprint of LLMs, industry collaborations and initiatives play an equally important role in driving sustainable AI practices. Enterprises are increasingly recognizing the need to address the environmental impact of their AI systems and are actively participating in collaborative efforts to establish best practices and standards.
The Green Software Foundation (GSF) , founded by leading technology companies such as Accenture, Microsoft, GitHub, and ThoughtWorks, aims to support the Information and Communications Technology (ICT) sector in reducing its greenhouse gas emissions by 45% by 2030 (Accenture, 2023). The foundation focuses on various initiatives, including assessing and reporting the carbon footprint of applications, discovering energy-saving techniques for AI, and developing tools and training for green software engineering.
Accenture, a founding member of the GSF, has been at the forefront of promoting sustainable AI practices. In their blog post "Making generative AI green" (Accenture, 2023), Accenture highlights the importance of minimizing the computational cost of generative AI models, adapting pre-trained models for specific tasks, and applying generative AI to accelerate the energy transition. They emphasize the need for clear governance structures, measurement criteria, and experiential learning to help developers understand the energy implications of training and deploying AI models.
The Path Forward: Balancing Performance and Sustainability As the demand for more powerful and capable LLMs grows, it is imperative to address the environmental implications and develop solutions to mitigate their energy footprint. Model routing techniques and industry collaborations offer promising avenues for optimizing resource utilization and reducing energy consumption during both training and inference stages.
However, tackling the energy challenge in AI requires a concerted effort from the entire community. Enterprises must prioritize sustainability alongside performance and capabilities, embracing innovative approaches like model routing and fostering a culture of collaboration and knowledge sharing. It is time for the AI community to embrace sustainability as a first-class citizen, ensuring that the benefits of AI are realized while minimizing its carbon footprint.
If you're interested in collaborating with us, we'd love to hear from you. Please contact us directly at contact@withmartian.com .