Pixel Power Unveiled 🌟🖼️

How do AI image generators work?

February 01, 2024

Welcome to Answers on AI: making you smarter about artificial intelligence, one email at a time.

First, a quick favor: this free newsletter is intended to help educate the masses so they can understand what’s happening in AI. It’s only able to grow thanks to readers like you, who share it with others. If you’d be willing to forward it to a friend, we’d greatly appreciate it. And we’re sure they’d appreciate it too!

Or, if you’re a first time reader and someone forwarded this to you, sign up here to get Answers in your inbox.

And now, on to the question…

How do AI image generators work?

Imagine having the power to create stunning visuals at the click of a button, from breathtaking landscapes to portraits of people who don't exist. AI image generators like Midjourney and OpenAI’s DALL-E bring this power to your fingertips, transforming text descriptions into vivid images, and behind this magic lies a complex web of algorithms and data. Let's pull back the curtain to reveal how AI breathes life into pixels.

🤖 AI Boot Camp: AI image generators, also known as generative models, are part of a group of algorithms trained to create content. They learn by analyzing a vast number of images and the associated data about them. By recognizing patterns, styles, and features within this massive dataset, the AI hones its ability to generate new images that are often indistinguishable from those created by human hands.
🎨 Learning to Paint by Numbers: At the heart of most AI image generators is a neural network — think of it as a digital artist's brain. This network undergoes a process called "training," where it's fed countless examples of images and sometimes text descriptions. Unlike a human who may need rest, this neural network tirelessly improves by adjusting internal parameters, which can number in the millions. These parameters define the AI's "artistic style" and determine how it applies the learned data to new creations.
⚖️ The Balancing Act of Creativity: Central to the workings of sophisticated AI image generators is the interplay between two neural networks: a generator and a discriminator. Here's where it gets interesting — the generator creates the images, while the discriminator judges them compared to real images. This adversarial relationship pushes the generator to produce increasingly convincing results, in a process aptly named Generative Adversarial Networks (GANs). It's like having an eternal art competition inside the computer where the goal is to fool the harshest critic - the discriminator.
🧩 A Picture is Worth a Thousand Words: When you provide an AI with a text prompt, it deciphers visual cues from that text and references its learned patterns to generate an image. If you typed "a peaceful garden at sunset," the AI predicts what elements constitute a garden, the colors of sunset, and how they commonly blend into an image that can evoke peace.
🚧 Challenge Accepted: Despite the impressive capabilities of AI image generators, there are challenges. Generating images from vague or complex prompts still trips up AI, resulting in bizarre or nonsensical images. Another issue is the entanglement of cultural and ethical considerations — AI sometimes replicates and amplifies biases present in the data it was trained on, leading to a minefield of ethical (and potentially legal) quandaries.
🕵️ A Glance in the Mirror: The AI’s impressions are only as good as the data it's seen. For an AI to create diverse and unbiased images, it must be trained on a dataset that represents a wide range of subjects fairly. Developers and researchers are constantly working to minimize bias and improve the variety and quality of images that AI models are trained on.

As AI image generators continue to evolve, they will not only influence the realms of art and design but may also reshape our understanding of creativity itself. They’re becoming more adept at mimicking various artistic styles and responding to ever more nuanced prompts. Reflect on this: As these AI artists become more skilled and autonomous, how will the line between human and machine creativity blur, and what new forms of art might emerge from their digital easels?

In the real world…

Under U.S. law, an artwork generated solely by artificial intelligence without human involvement cannot be granted copyright, as decided by a court in Washington, D.C.
A report by the New York Times recently revealed some of the ethical and legal risks of image generation models, showing how tools like OpenAI’s DALL-E and Midjourney could be used to quickly generate images of copyrighted material, such as DC’s Joker and Nintendo’s Super Mario.
The latest AI image generator from Google, Imagen 2, includes capabilities like inpainting and outpainting, allowing users to augment or expand the boundaries of a created image. Similar features are found in OpenAI’s DALL-E 2. However, Imagen 2 distinguishes itself with its ability to create images using a reference image.
AI startup Midjourney, which offers a power image generation model and product via the social chat platform Discord, was recently reported to be generating $200M in annual revenue — all while raising no outside capital and employing roughly 40 employees.

What do the experts say?

"Regardless of whether it’s AI models like ChatGPT, Stability AI, or Midjourney, the main issue is who owns material generated by AI, such as an image on Midjourney? There is a growing belief that issues regarding ownership and authenticity could be resolved if some of the outputs of the material are tied back to the blockchain or programmed into tokens. By tokenizing AI-generated artwork or text, security and trust can be heightened, improving traceability, authentication, and overall efficiency in various applications. This could theoretically resolve any copyright concerns promptly and easily."

— T_HQ, from How could blockchain solve the AI copyright problem?

"Artists are not compensated or asked for permission when AI companies take their art and use it to train AI models. These models come into direct competition with artists, who often work independently on a commission basis. AI-generated art threatens artists’ livelihood as they try to find work.

These AI models belong to companies. They may be in the research stage for now and some may be free to use for the public, but this will not last. Many programs, like ChatGPT, already have paid versions. The Midjourney art-generating AI model is no longer free for use. When these tech companies profit from their AI models, they are profiting from the artworks made by the artists whose works they used to train their models. And artists have not consented to take part in this process of creation.”

— Julian Horsey, from Op-Ed: AI art is art theft and should be a crime by Zachary Olson for The Eagle

"Nobody really knows how things will shake out around US copyright law and artificial intelligence, but the court cases have been piling up. Sarah Silverman and two other authors filed suit against OpenAI and Meta earlier this year over their models’ data scraping practices, for instance, while another lawsuit by programmer and lawyer Matthew Butterick alleges that data scraping by Microsoft, GitHub, and OpenAI amounted to software piracy."

— Wes Davis, from AI-generated art cannot be copyrighted, rules a US federal judge for The Verge

Stay Tuned for More

In each issue, we bring you the most interesting and thought-provoking questions of the day to help make you smarter about AI. Stay tuned for more questions, and more importantly, more answers.

Share Answers on AI

Want to help your friends get smarter? Share Answers with them! We may even give you a shoutout or the ability to ask us your own AI question.