Review - Environmental impact of delivering AI

“Who controls the past controls the future. Who controls the present controls the past.”

― George Orwell, Nineteen Eighty-Four

Lately I have become interested in tackling the complex efficiency challenges related to AI infrastructure. One aspect of improving efficiency is data centers as they have a huge impact on the environment. While doing research on this I came across a paper published by Google https://arxiv.org/abs/2508.15734 [1] . This is review of the paper in 3 parts:

The real impact on the environment because of data center race and water usage.
Training of models which consumes a massive amount of compute power, in the latter part i have listed a small subsets of use cases where models need to be trained continuously.
The technical part which describes the techniques used to improve Inference efficiency.

Claims made in paper

The paper claims that the median Gemini Apps text prompt uses less energy than watching nine seconds of television (0.24 Wh of energy) and consumes the equivalent of five drops of water (0.26 mL).

In my opinion this is a nice but incomplete paper. The nice part is that it makes claims based on empirical data over a large period of time (May 2024–2025) instead of theoretical extrapolations or back of envelope calculations, however the paper has left out some details. This paper does not provide any data on training to intentionally reduce the power and water consumption than what they are. If training power consumption and its impact on water and environment are included then that will definitely raise the power consumption and the amount of water used.

Before going into details on technical aspects of the papers, it is important to understand why big companies (Google, Microsoft, Meta, OpenAI, Anthropic, NVidia etc) are writing these papers . Currently companies are in a data center race (to set up as many data centers as they can) to remain relevant in the AI era and to acquire never ending compute power across the globe. The hardware, processors, memory, storage, and energy needed to operate these data centers is collectively known as compute power, it is estimated (by McKinsey & Company) that by 2030, data centers are projected to require $6.7 trillion worldwide [3] to keep pace with the demand for compute power.

Setting up these data centers needs massive investment (billions of dollars) which only a few players in the industry posses, during development phase these data center do provide couple of hundred short term jobs in the construction industry (100-250 jobs) however once they are up running then 90% of those jobs go away. The job benefits provided during the construction phase are offset by a huge negative impact on the environment (water and power consumptions) which lasts forever. In order to improve their image, companies are writing papers which project that their usage is not as impactful on the environment or that they are using negligible amount of water or investing in technologies to put their data centers in space in another 15 years (later part of this review has a section for the challenges of data center in space which is cost prohibitive and can have negative effect of exponentially increasing space debris in future also known as Kessler Syndrome [6])

Data Centers and Water Consumption

Data centers, which house servers for cloud computing and AI, require constant cooling to prevent overheating. This cooling often relies on water-intensive systems that consume vast amounts of potable (drinking) water. A single data center can consume up to 5 million gallons of drinking water per day, which is enough to supply thousands of households or farms. The increasing use and training of machine learning models (AI), particularly deep learning, generates significant heat, requiring even more intensive cooling and putting additional pressure on local water resources. The drinking water used is often chemically treated, making it unsuitable for human consumption or agriculture. Water-stressed regions, such as the Southwest United States (e.g., the Phoenix area), are attractive locations for data centers because the naturally dry air reduces the risk of corrosion and electrical issues. The presence of over 58 data centers in the Phoenix area could equate to more than 170 million gallons of drinking water used per day for cooling.

Reference article [4] cites an example where Google was criticized for negotiating a lower water rate ($6.08 per 1,000 gallons) than residents ($10.80 per 1,000 gallons) in Mesa, Arizona. The water story is the same for other companies. There are articles about the negative impact of the Meta data center in rural Georgia [7] and [8], similarly there are reports of negative impact of Microsoft and Amazon Web Services data center in rural drought hit Mexico in reference section[9]. It’s not just the water consumption which is making an impact on the environment, data centers also have huge power consumption which is often generated using water. A study by Goldman Sachs projects that energy will increase by 165% by 2030 [10]. Some companies like Google are investing in nuclear reactors which hopefully will reduce dependence on water for electricity.

The next Frontier - Data Centers in Space - Google Moonshot - Project Suncatcher

Big AI companies are trying to reduce dependence on water by investing in alternatives to water by planning to set up data centers in space. This offers long-term advantages like abundant solar power and passive cooling, but faces significant engineering, logistical, and economic drawbacks that currently make it impractical for large-scale production workloads (realistic estimate is it will take 10-20 years from now). At this time, the disadvantages far outweigh the benefits that space data centers will provide:

Launching the massive hardware (servers, networking equipment, power systems, and radiators) is currently prohibitively expensive. Even with lower-cost launch vehicles, the cost of mass-to-orbit makes the initial investment significantly higher than terrestrial alternatives.
Hardware failure happens frequently in data centers. Servicing or repairing components in orbit is complex and requires specialized, expensive robotic spacecraft or human intervention. Replacing obsolete hardware (which happens every 2–3 years) would require a continuous, costly supply chain of rocket launches.
While space is cold, it is a vacuum, meaning cooling relies solely on thermal radiation. This is the least efficient form of heat transfer, requiring extremely large and complex radiator panels to shed the many megawatts of waste heat generated by modern AI clusters.
Large-scale AI training and inference demand megawatts (MW) of continuous power. Even the largest orbital structures (like the ISS) operate in the range of hundreds of kilowatts (kW) [12]. Generating MW-scale power requires launching massive, heavy solar arrays and batteries, which is a major engineering challenge and cost hurdle.
Computer chips are highly vulnerable to cosmic and solar radiation in space, which can cause hardware failure, data corruption. This necessitates using either expensive, technologically backward radiation-hardened chips or complex shielding.
Achieving the necessary high-bandwidth, low-latency links between satellites and back to Earth (ground stations) is a significant technical challenge that requires flying satellites in extremely close formation. Large satellites, particularly those in lower orbits, experience atmospheric drag. This requires continuous maneuvers using thrusters needing even more energy.
End-of-life disposal of these satellites is a big problem leading to kessler syndrome [6] creating a nightmarish scenario of exponentially increasing amount of space debris over time.

Most of the above problems have a solution but that solution is cost prohibitive compared to data centers on earth. Another paper from Google [2] provides an overview of their system design and envisions compact constellations of solar-powered satellites, carrying Google TPUs [16] and connected by free-space optical links [11]. The proposed system consists of a constellation of networked satellites operating in a sun-synchronous low earth orbit [17] with Google’s TPU Trillium showing some promise of protection from low dose radiation.

At present, this moonshot actually looks like a long shot project that is good for research but impractical in practice. The moonshot paper has no mention of the kessler syndrome which is a phenomenon in which the amount of junk in orbit around Earth reaches a point where it just creates more and more space debris [13], causing big problems for satellites, astronauts and mission planners. The Suncatcher paper doesn’t address End-of-life disposal of these satellites.

Training Models (Not covered in Paper)

The paper does not measure the amount of water spent in training models before they are ready for inference, it mentions “This study specifically considers the inference and serving energy consumption of an AI prompt. We leave the measurement of AI model training to future work.”

The difference between AI Training and Inference can be explained using the analogy of a student's education and subsequent career. The training phase (equivalent to 4 years degree) is the initial, resource-intensive, high-cost phase where the AI model acquires its knowledge. The model is fed massive, diverse datasets, like a student reading tens of books and solving thousands of practice problems. The model adjusts its billions of internal parameters to minimize errors and learn complex patterns. The training phase is computational heavy, needing large compute power doing millions of calculations per second taking weeks to months. The inference phase is the continuous application phase where the model uses its knowledge to solve real-world problems, the inference phase is relatively fast requiring a very few computations but when applied to millions and billions of user queries per day then the power consumption adds up quickly.

One important point is that training is not a one time phenomenon, this is because the user behavior, data or business rules are constantly changing, causing the model's accuracy to degrade over time. There is data drift for example users start submitting queries which were not in the original training data, in this case models either hallucinates or admits that it doesn’t have data (both ChatGPT and Anthropic models were honest enough to admit that to some of my queries related to recent NVIDIA GPU architecture which was released after those specific models were trained) so models need retraining after every few months or weeks depending on company which owns these models.

There are plenty of use cases where inference needs constant training to improve response, for example: Google's search algorithm or social media feed ranking (e.g., Facebook/Instagram) need constant training with new queries using a streaming option in real time
Self-driving cars and industrial robotics need training at regular intervals. This is because feedback from millions of driving hours must be continuously collected and used to retrain/adapt the model to new edge cases like unforeseen weather conditions or new unique objects on roads like construction barriers to improve safety and robustness.
Personalized product recommendations (on platforms like Netflix, Spotify, or Amazon) need constant training to improve users selection in real time.

Needless to say that actual water consumption, the equivalent of five drops of water as mentioned in the paper, is misleading because inference and training go hand in hand in reality. Looking forward to the “future work” mentioned in paper for the number of drops of water consumed when training and inference are combined.

Gemini AI Efficiency

This is the final part of paper which describes the efficiency gain achieved during inference using full stack approach from hardware to efficient algorithms to efficient data centers to efficient usage of GPU’s (TPU in case of google ) to Google infrastructure which is one of the best I have seen anywhere ( infrastructure is taken very seriously by the company though like any other company there are anomalies like the security group which is highly inefficient in terms of compute power consumed vs the amount of traffic served but overall other parts of company are exemplary in this field). The company has also published an abridged version of this paper [14]. Some of the techniques described in the following section are run-of-the-mill, a variation of which are followed across most other companies in the industry whereas some are unique to Google and it has slight advantage and moat in certain areas (TPU).

Efficient algorithms & quantization: Algorithms use Accurate Quantized Training (AQT) a technique used to improve energy efficiency of AI models during inference by reducing the numerical precision of the model's weights and activations (e.g., from 32-bit floating point to 8-bit integers) without meaningfully sacrificing the model's accuracy or performance. By reducing the precision, there are savings in memory (4x less space, 32bit -> 8 bits) which allows larger models to fit on the same hardware instead of distributed hardware there by saving network calls and reducing latency.

Custom-built hardware: Use of TPU (Tensor processing unit) is a real moat and advantage for Google which other companies don’t have at present (unless Google starts selling these to others). These chips are designed specifically by Google for AI processing. This is one of the few moonshot projects at Google which was successful and is paying dividends after more than a decade of research. Google claims their latest generation, Ironwood, is 30x more energy-efficient than their first publicly-available TPU. Selling or loaning these TPU’s will lead to gain in stock price but will allow other companies to have the same level playing field which is offset by the fact that companies will have to move to GCP (I am not aware of availability of these in other clouds like AWS or Azure). Every major AI company (Amazon, Tesla etc) are now trying to design their own chips.

Smarter model architectures: Google's models, built on the Transformer architecture, achieved a 10x to 100x efficiency boost over previous language models. These were achieved using Mixture-of-Experts (MoE), a technique which drastically reduces computation by only activating a small, necessary subset of the large model's parameters when responding to a specific user prompt.

Optimized inference and serving: Gemini uses technologies like speculative decoding which uses a cheap, fast model to guess ahead and a slow, high-quality model to verify the guesses in parallel, enabling the LLM to generate high-quality text much more quickly (parallelization is always faster than sequential computation). Gemini utilizes distillation which is a training technique to transfer the complex, nuanced knowledge learned by a large, high-performing model (the Teacher) into a smaller, more efficient model (the Student). The resulting Student model is smaller (fewer parameters) and faster, leading to lower latency and significantly lower energy consumption for inference, the smaller size makes the Student model ideal for deployment on resource-constrained devices like mobile phones (edge devices) or for high-volume, low-cost cloud serving.

Optimized idling: This is another area where Google infrastructure is one of the best in the world. This approach is related to minimizing CPU and GPU idleness and hardware virtualization for maximizing efficiency. Google has decades of experience in managing its planet scale Orchestration engine (Borg) and its miniature version Kubernetes which is open source but doesn’t have all the bells and whistles (customizations) and efficient architectural improvements that Borg has for infrastructure efficiency.

Ultra-efficient data centers: One of the ways to improve AI efficiency is to manage data centres and the software and hardware infrastructure it hosts in an efficient way. There is no doubt that infrastructure at Google is good, that is the result of two decades of investment in this area, experience from its search business of managing millions of machines effectively for a long time, reducing idle machines and likely maximizing TPU efficiency by partitioning and scheduling workloads effectively. As mentioned above there are definitely large areas of improvement in many parts of the company in my opinion but that is a solved problem which has been effectively worked on in the past by the company(slow and steady progress).

The paper claims that PUE of Google data centers is one the lowest in the industry. Power Usage Effectiveness (PUE) is a metric used to assess the energy efficiency of a data center. It quantifies how much of the total energy consumed by the facility is actually delivered to the computing equipment (the servers and storage) versus the amount consumed by supporting infrastructure (cooling, power conversion, lighting, etc.). The lower the value the better it is. Meta and Google generally report the lowest public PUE values, often hovering around 1.10 or below. PUE data is often self-reported and can vary significantly based on location, climate, and the specific time period being measured.

Summary

Impression management [15] is a conscious or subconscious process in which an attempt is made to influence the perceptions of other people about a person, object or event by regulating and controlling information in social interaction. By projecting that Inference is taking about five drops when Inference can’t happen without Training models which require huge compute power (hence water) looks like impression management by the company. It is hard to guess how much water consumption is when both inference and training numbers are presented but the number of drops of water will definitely increase. I look forward to “future work” in this area as mentioned in the paper and also hope that this paper is an anomaly and future papers will continue to be as informative and accurate as previous papers by the company. My approach to reading any paper is very similar to executing a software project which is to identify the high level areas after reading paper (initiatives), prioritize those initiatives, then split them into smaller areas (topics), research on those topics using search engines. Lately I have also started using AI for research (I have used the AI Free model for research along with Search engines here) and really see the benefit of AI in future and hope that companies come out with innovations to reduce environmental impact. A data center in space is a moonshot and cost prohibitive with potential of increasing space debris/junk but it is a good step in the right direction and most problems associated with system design of this moonshot are solvable over the next two decades.

Disclaimer

All views in this paper are mine and have no relation with my previous or current employers. I have invested in Google stock so I have little bias in favor of the company.

References

Google paper about AI impact on environment https://arxiv.org/abs/2508.15734
Google moonshot project, Suncatcher paper (pre-release) - https://arxiv.org/abs/2511.19468
Mckinset and company projection on data center investment by 2030 https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/the-cost-of-compute-a-7-trillion-dollar-race-to-scale-data-centers
Data centers and water from University of Tulsa https://utulsa.edu/news/data-centers-draining-resources-in-water-stressed-communities/
gemini.google.com
Kessler Syndrome https://en.wikipedia.org/wiki/Kessler_syndrome
Times coverage of meta data center https://www.nytimes.com/2025/07/14/technology/meta-data-center-water.html
Meta data center in rural Georgia https://www.bbc.com/news/articles/cy8gy7lv448o
Mexico drought and data centers https://www.bbc.com/news/articles/cx2ngz7ep1eo
Goldman Sachs data center power consumption projection https://www.goldmansachs.com/insights/articles/ai-to-drive-165-increase-in-data-center-power-demand-by-2030
Space optical communication https://en.wikipedia.org/wiki/Free-space_optical_communication
International Space Statton (ISS) electrical system https://en.wikipedia.org/wiki/Electrical_system_of_the_International_Space_Station#Solar_array_wing
Space Debris https://www.space.com/16518-space-junk.html
Google Cloud https://cloud.google.com/blog/products/infrastructure/measuring-the-environmental-impact-of-ai-inference/
Impression Management https://en.wikipedia.org/wiki/Impression_management
Google TPU https://cloud.google.com/tpu
Sun-synchronous low earth orbit https://en.wikipedia.org/wiki/Sun-synchronous_orbit

Search This Blog

Gaurav Blog