I don’t think energy use is a serious problem, that just seems to get thrown around just because it’s trendy. Does it even matter compared to gaming or crypto? It’s also an easily solved problem, just install more solar. Training the initial model isn’t time critical or depended on location, so there is a lot of flexibility here that you wouldn’t have in other applications. Meanwhile running the already trained model is very cheap, it’s literally the most efficient way to solve the problem. Trying to replicate what StableDiffusion is doing with a 3D renderer and you’d need to burn a heck of a lot more cycles, as well as hire a truckload of artists, which would all use substantially more energy.
Basically, people are going to use AI when it makes better use of time/money/energy than the competition. Nobody is going to use AI to burn energy just for the fun of it, it has to improve on what we already have.
As for the concentration of power and wealth, that can certainly happen to some degree, but I could also easily see that get balanced out by the amount of freedom that local models give. Right now I can generate subtitles for video with Whisper, generate voices with tortoise-tts, generate images with StableDiffusion as well as play around with LLMs on my local machine with OpenSource’ish models. Nobody controls what I do and I am not paying for anything. There are obviously still aspects that those models can’t do, local LLMs aren’t up to GPT-4, but already quite close to ChatGPT for some tasks, StableDiffusion isn’t quite as good as Midjourney for plain txt2img, but state-of-the-art in a lot of other aspects (custom training, ControlNet, LORA, etc.). But for a lot of tasks those models are already “good enough” and they are constantly getting better. Meanwhile ChatGPT or BingChat are so heavily censored that they flat out just don’t work for a lot of task, even seemingly simple things like summarizing movies (too much violence). Nobody even talks about DALL-E2 anymore, due to being surpassed by everything else out there.
Now centralization can still happen, Google is sitting on more data than everybody and if they make some multi-modal model that is trained on it all, that could be a very potent offering. But for the time being at least, everything that was released was outclassed by another thing within a few months. Nothing in the AI space so far lasts very long and the fact that AI models can use other AI models to improve themselves, hopefully makes that continue for a while. With the censorship going on I also have a hard time seeing local models disappearing anytime soon, as so far none of the commercial offerings had the balls to just build a model that knows everything.
World population and our standard of living have improved drastically over those years too, we aren’t burning that additional energy for nothing.