Artificial intelligence (AI) image generators are making big advances, and they’ve learnt a thing or two about South Africa.
AI image generators are the latest in the rapidly growing world of machine learning.
They are tools that transform a string of words, like “food magazine photo of Durban curry on a plate on top of a tablecloth” into images like these:
At a basic level, machine learning uses huge amounts of data to learn some kind of task.
This could be identifying faces, for things like Instagram filters, or predicting which words come next in a sentence, like how Google suggests searches after you only type a couple of words.
Image generators are no different, and are trained using vast collections of pictures paired with a description of what’s happening in each of them. This allows them to predict what a given sentence may look like as an image.
Here are some of its predictions for what “a political rally in South Africa” could look like:
Not bad! Except for the faces. These tools are only good at creating things if they see lots of examples of them during training. It would be interesting to see how much these tools “know” about South Africa.
As you can see, image generators often struggle with realistic faces, especially with groups of people.
One of the things AI image generators are good at is making pictures in different artistic styles.
Obviously, these tools often struggle with the fine details of complicated things like faces and hands. You’ll no doubt run into examples of strange people with extra arms, or objects like firearms which seem okay from a distance, but make no sense on a closer look.
As you might guess, it usually creates writing that is (mostly) nonsense. Here are its attempts at “a political cartoon about Jacob Zuma’s prison sentence ending”:
You usually get better results by being specific and telling it exactly what you want. But it can be very interesting to see what it comes up with when given a very abstract sentence.
Keep in mind that these machine learning models don’t “understand’ what you ask them to do, and aren’t doing any kind of ”thinking“ that would be familiar to a human.
They just associate words and images in ways that they have seen before.
Here are some very different attempts at “Chief Justice Zondo shining a light on corruption”.
The AI model being used for these images is called Stable Diffusion, which is open source and available for free.
The examples in this article are far from the best that can be achieved with better hardware, clever use of current tools like upscalers, and lots of attempts.
But they should give you an idea of what is currently possible, and a taste of what is still to come.
IOL Tech