How can artists defend themselves against the whims of tech companies that want to use their work to train AI? A group of researchers has a novel idea: injecting a subtle poison into art itself to kill the AI art generator from within.
Ben Zhao, a professor of computer science at the University of Chicago and an outspoken critic of AI’s data scraping practices, said MIT Technology Review The new tool from him and his team called “Nightshade” does what it promises: It poisons any model that uses images to train the AI. Until now, the only way for artists to fight AI companies was to sue them or rely on the developers abide by the artist’s opt-out requests.
The tool manipulates an image at the pixel level, distorting it in ways that are undetectable to the naked eye. Once enough of these distorted images are used to train AI like stability AIs Stable diffusion XL, the entire model begins to collapse. After the team inserted data samples into an SDXL version, the model began interpreting a prompt for “car” as “cow” instead. A dog was interpreted as a cat while a hat was transformed into a cake. Likewise, various styles came out completely wrong. Prompts a “cartoon” that features art reminiscent of the 19th century Impressionists.
Defending individual artists also worked. When you ask SDXL to create a painting in the style of famous science fiction and fantasy artist Michael Whelan, the poisoned model creates something far less similar to her work.
Depending on the size of the AI model, you would need hundreds or probably even thousands of poisoned images to create these strange hallucinations. Still, it could force anyone developing new AI art generators to think twice before using training data from the Internet.
Gizmodo reached out to Stability AI for comment, but we did not immediately receive a response.
What tools do artists have to combat AI training?
Zhao was also the leader of the team that helped make it glazea tool that can create a kind of “style coat”. Mask images of artists. So it also disrupts the pixels of an image leads astray AI art generators that attempt to mimic an artist and their work. Zhao told MIT Technology Review that Nightshade will be integrated into Glaze as another tool, but will also be released on the open source market so other developers can create similar tools.
Other researchers have found some ways to immunize images from direct manipulation by AI, but these techniques did not stop the data scraping techniques used to train the art generators in the first place. Nightshade is one of the few and perhaps most combative attempts to give artists the opportunity to protect their work.
There are also increasing efforts to distinguish real images from those created by AI. Google belongs DeepMind claims to have developed a watermark ID This can detect whether an image was created by AI, regardless of how it was manipulated. These types of watermarks essentially do the same thing as Nightshade: they manipulate pixels in ways that are imperceptible to the naked eye. Some of the largest AI companies have done this promised to watermark generated content in the future, but current efforts like Adobe’s Metadata AI Labels don’t really offer a level of true transparency.
Nightshade is potentially devastating for companies that actively use the work of artists to train their AI. like DeviantArt. The DeviantArt community already had a pretty one negative reaction to the site’s built-in AI art generator, and if enough users poison their images, it could result in developers having to manually find every single instance of poisoned images or reset training for the entire model.
However, the program cannot change existing models such as SDXL or that recently released DALL-3. These models are all already trained based on artists’ previous work. Companies like Stability AI, Midjourney and DeviantArt have done this have already been sued by artists for using their copyrighted work to train AI. There are many other lawsuits targeting AI developers Google, MetaAnd OpenAI for the use of copyrighted works without permission. Companies and AI advocates have argued that all books, papers, images and artwork in the training data fall under fair use because generative AI creates new content based on this training data.
OpenAI developers noted in their research paper that their latest art generator can produce far more realistic images because it is trained on detailed captions generated using the company’s proprietary custom tools. The company did not disclose how much data actually went into training its new AI model (most AI companies are now unwilling to comment on their AI training data), but efforts to combat AI could escalate over time. As these AI tools become more sophisticated, they require more and more data to power them, and artists may be willing to take even greater measures to combat them.