Sunday, December 3, 2023

How DALL-E could power a creative revolution

Must read

Disclaimer: All the images in this story were generated by artificial intelligence.

Every few years, technology appears that clearly divides the world into before and after. I remember the first time I saw a YouTube video embedded in a webpage; the first time I synced Evernote files between devices; the first time I scanned tweets from people nearby to see what they said about a concert I attended.

I remember the first time I Shazam’d a song, called Uber, and streamed myself live using Meerkat. What stands out in these moments, I think, is the feeling that some unpredictable set of new possibilities has been unlocked. What would the web be like when you could easily add videos to it? When could you call any file to your phone from the cloud? When could you send yourself out into the world?

It’s been a few years since I saw the kind of nascent technology that made me call my friends and say, you must see this. But this week I did, because I have a new one to add to the list. It’s an image-generating tool called DALL-E, and although I have very little idea of ​​how it will eventually be used, it’s one of the most compelling new products I’ve seen since I started writing this newsletter.

Technically, the technology in question is DALL-E 2. It was created by OpenAI, a seven-year-old San Francisco company whose mission is to create secure and useful artificial intelligence. OpenAI is already known in its field for creating GPT-3, a powerful tool for generating complex text passages from simple prompts, and Co-pilota tool that helps automate code for software engineers.

DALL-E – a mistrust of the surreal Salvador Dalí and Pixar MURO-E – takes text invitations and generates images from them. In January 2021, the company presented the first version of the toolwhich was limited to 256-by-256 pixel squares.

But the second version, which entered a private research beta in April, feels like a radical leap forward. The images are now 1,024 by 1,024 pixels and can incorporate new techniques such as “inpainting” – replacing one or more elements of an image with another. (Imagine taking a picture of an orange in a bowl and replacing it with an apple.) DALL-E has also improved his understanding of the relationship between objects, which helps him visualize more and more amazing scenes – a koala trekking basketball, an astronaut riding a horse.

For weeks, threads of DALL-E-generated images took over my Twitter timeline. And after thinking about what I could do with the technology – namely, waste countless hours with it – a very nice person at OpenAI took pity on me and invited me to the private research office. The number of people who have access is now in the low thousands, a spokesman told me today; the company hopes to add 1,000 people a week.

By creating an account, OpenAI makes you agree DALL-E’s content policy, which is designed to prevent most of the obvious potential abuses of the platform. Hate, harassment, violence, sex or nudity are not allowed, and the company also asks you not to create images related to politics or politicians. (It seems remarkable here that among the co-founders of OpenAI is Elon Musk, who is famously mad at Twitter for a much less restrictive set of policies. He left its board in 2018.)

DALL-E also prevents a lot of possible image creation by adding keywords (“shooting”, for example) to a block list. You are also not allowed to use it to create images intended to deceive – no false positives are allowed. And while there’s no ban on trying to make images based on public figures, you can’t upload photos of people without their permission, and the technology seems a bit blurry to most faces to make it clear that the images were manipulated.

Once you agree to this, you are presented with the delightfully simple DALL-E interface: a text box inviting you to create everything you can think of if content policy allows. Imagine using Google’s search box like Photoshop – that’s DALL-E. Borrowing some inspiration from the search engine, DALL-E includes a “surprise me” button, which pre-fills the text with a suggested question, based on past successes. I’ve often used this to get ideas for trying out art styles that I might never have considered otherwise – for example “macro 35mm photography”, or pixel art.

For each of my initial questions, DALL-E would take about 15 seconds to generate 10 images. (Earlier this week, the number of images was reduced to six, to allow more people to access.) Almost every time, I would find myself cursing loudly and laughing at how good the results were.

For example, this is the result of a “shiba inu dog dressed as a firefighter.”

And here’s one of “bulldog dressed as a wizard, digital art.”

I love these fake AI dogs so much. I want to adopt them and then write children’s books about them. If the metaverse ever exists, I want them to join me there.

Do you know who else can come? “A frog wearing a hat, digital art.”

Why is he literally perfect?

On our Sidechannel Discord server, I started accepting requests. Someone asked to present “the metaverse at night, digital art.” What came back, I thought, was appropriately grand and abstract:

I will not attempt to explain here how DALL-E makes these images, in part because I am still working to understand it myself. One of the core technologies involved, “diffusion”, is helpfully explained in this blog post from Google AI last year.) But it has struck me many times how creative this imagery technology can look.

Take, for example, two results shared in my Discord by another reader with DALL-E access. First, look at the set of results for “Bear economist before stock chart crashing, digital art.”

And second, “A bull economist in front of a chart of a growing stock market with upstream, synthetic, digital art.”

The degree to which DALL-E captures emotion here is striking: the bear’s fear and excitement, and the bull’s aggression. It seems wrong to describe any of this as “creative” – ​​what we’re looking at here is nothing more than probabilistic guessing – and yet they have the same effect on me as having something truly creative.

Another compelling aspect of DALL-E is the way it will try to solve a single problem in a variety of ways. For example, when I asked her to show me a “delicious cinnamon bun with googly eyes”, she had to figure out how to picture the eyes.

Sometimes DALL-E added a pair of plastic-looking eyes to a roll, as I would. Other times it created eyes from a negative space in the frost. And in one case it made the eyes out miniature cinnamon buns.

That was one of the times I cursed out loud and started laughing.

DALL-E is the most advanced image generation tool I’ve seen so far, but it’s far from the only one. I also experimented lightly with a similar tool called Midway, which is also in beta; Google has announced another, called Imagen, but has not yet let outsiders try it. A third tool, DALL-E Mini, has generated a series of viral images in recent days; it has nothing to do with OpenAI or DALL-E, however, and I imagine the developer will be hit with a termination letter soon.

OpenAI told me that it has not yet made any decisions on whether and how DALL-E will ever be available more generally. The aim of the current research beta is to show people using this technology, adapting both the tool and content policies as necessary.

And yet already, the number of use cases that artists have discovered for DALL-E is surprising. One artist uses DALL-E to create augmented reality filters for social programs. A chef in Miami is using it to get new ideas on how to cook his dishes. Ben Thompson wrote a predictive piece on how DALL-E could be used create extremely cheap environments and objects in the metaverse.

It is only natural, and appropriate, to worry about what such automation might do to professional illustrators. Many jobs may be lost. And yet I can’t help but think that tools like DALL-E could be useful in their workflows. What if they asked DALL-E to outline some concepts for them before they started, for example? The tool allows you to create variations of any image; I used it to suggest an alternative Platformer emblems:

I will stick with the logo I have. But if I were an illustrator, I could appreciate the alternate suggestions, even just for the inspiration.

It’s also worth considering what creative potential these tools can open up for people who would never think (or could afford) to hire an illustrator. As a child I wrote my own comics, but my illustration skills never progressed very far. What if I could teach DALL-E to draw all my superheroes instead?

On the one hand, this doesn’t seem like the kind of tool most people would use every day. And yet I imagine that in the coming months and years we will find more and more creative applications of technology like this: in e-commerce, in social programs, at home and at work. For artists, it seems like it could be one of the most powerful tools for remixing a culture we’ve ever seen – assuming copyright issues are resolved. (It’s not entirely clear whether using AI to generate images of protected works is considered fair use or not, I’m told. If you want to see DALL-E’s treatment of “Batman eating a sandwich,” DM to me.)

I suspect we will also see some harmful applications of this tool. Although I am confident that OpenAI will implement strong policies against the misuse of DALL-E, similar tools will certainly emerge and take more access to content moderation. People are already creating malicious, often pornographic deep falsehoods harass their exes using the crude tools available today; that technology will only get better.

It often happens that when new technology comes along, we focus on its happier and more whimsical uses, just to ignore how it could be misused in the future. As much as I enjoy using DALL-E, I am also very concerned about what similar tools could do in the hands of less scrupulous companies.

It’s also worth thinking about what even positive uses of this technology could do on a scale. When most of the images we come across online are created by AI, what does that make to our sense of reality? How do we know that everything we see is real?

Today, DALL-E feels like a breakthrough in the history of consumer technology. The question is whether in a few years we will think of it as the beginning of a creative revolution, or something more critical. The future is already here, and it adds 1,000 users a week. The time to discuss its implications is now before the rest of the world gets their hands on it.


More articles

Latest article