Two collaborative videos by @ClimateAdam and @AnkurShah on the relationship of Artificial Intelligence and Climate education, adaptation, and action.
Over the last ten years, AI, specifically deep learning, has yielded remarkable results. When Siri understands what you say, when Facebook identifies your cousin, when Google Maps reroutes you, chances are that a deep learning system is involved.
What is less noticed is that these models are churning away at a staggering cost, not just in terms of dollars and cents, but also in terms of energy consumed. On its current trajectory, AI will only accelerate the climate crisis. In contrast, our brains are incredibly efficient, consuming less than 40 watts of power. If we can apply neuroscience-based techniques to AI, there is enormous potential to dramatically decrease the amount of energy used for computation and thus cut down on greenhouse gas emissions. This blog post aims to explain what causes this outsized energy consumption, and how brain-based techniques can address AI’s incredibly high energy cost.
Why does AI consume so much energy?
First, it is worth understanding how a deep learning model works in simple terms. Deep learning models are not intelligent the way your brain is intelligent. They don’t learn information in a structured way. Unlike you, they don’t understand cause-and-effect, context, or analogies. Deep learning models are “brute force” statistical techniques. For example, if you want to train a deep learning model to identify a photo of a cat, you show it thousands of images of cats that have been labeled by humans. The model does not understand that a cat is more likely than a dog to climb up a tree or play with a feather, so unless it is trained with images of cats that include trees and feathers, it is unaware that the presence of these objects would aid in identifying a cat. To make these inferences, it needs to be trained in a brute force way with all possible combinations.
The enormous energy requirement of these brute force statistical models is due to the following attributes:
- Requires millions or billions of training examples. In the cat example, pictures are needed from the front, back, and side. Pictures are needed of different breeds. Pictures are needed with different colors and shadings, and in different poses. There are an infinite number of possible cats. To succeed at identifying a novel cat, the model must be trained on many versions of cats.
- Requires many training cycles. The process of training the model involves learning from errors. If the model has incorrectly labeled a cat as a raccoon, the model readjusts its parameters and classifies the image as a raccoon, then retrains. It learns slowly from its mistakes, which requires more and more training passes.
- Requires retraining when presented with new information. If the model is now required to identify cartoon cats, which it has never seen before, it will need to be retrained from the start. It will need to have blue cartoon cats and red cartoon cats added to the training set and be retrained from scratch. The model cannot learn incrementally.
- Requires many weights and lots of multiplication. A typical neural network has many connections, or weights, that are represented by matrices. For the network to compute an output, it needs to perform numerous matrix multiplications through subsequent layers until a pattern emerges on top. In fact, it often takes millions of steps to compute the output of a single layer! A typical network might contain dozens to hundreds of layers, making the computations incredibly energy intensive.


