Art cannot be technologically protected from AI

While a lot of effort has gone into protecting art from use in training data (Mist, Glaze1, Nightshade2), it seems like this effort is mostly wasted because similar or less effort can be used to remove that protection. Thus, the only protection possible must be social.

My primary takeaway is that protection of art from image generation is fundamentally based on adding noise to images, and that by adding noise to images, this protection can be stripped away. Visual patterns are not about small details, so modifying small details is more of an annoyance and compute being wasted on both sides rather than useful action.

Artists are necessarily at a disadvantage since they have to act first (i.e., once someone downloads protected art, the protection can no longer be changed). To be effective, protective tools face the challenging task of creating perturbations that transfer to any finetuning technique, even ones chosen adaptively in the future. To illustrate this point, updated versions of Mist (Liang et al., 2023) and Glaze (Shan et al., 2023a) were released after the conclusion of our study, and yet we found these updated versions to be similarly ineffective against our methods. We thus caution that adversarial machine learning techniques will not be able to reliably protect artists from generative style mimicry, and urge the development of alternative measures to protect artists. 3

This statement is slightly undermined by the next line:

We disclosed our results to the affected protection tools prior to publication. In response, Glaze released a new version 2.1 that protects against the specific attacks we describe here. 3

I was also a little concerned by “We make the conservative assumption that all the artist’s images available online are protected.” because I think this is untrue. However, it’s probably irrelevant. While I initially assumed the study’s choice to use historical and contemporary artists would bias their results towards claiming protections are ineffective due to higher representation of historical artists in datasets, they indicated little difference between these groups. The bias introduced by preexisting art seems to have little effect.

Adding and removing noise didn’t noticeably impact perceived quality.3

The protections offered by Glaze, Mist, and Anti-DreamBooth4 are more effective for artists that are not easily mimicked by generative models. While this is obvious, it allows for easy cherry-picking to claim effectiveness of protections, or cherry-picking to argue against their effectiveness (which is why I point it out).

GLEAN5 is a model designed to bypass Glaze that is very successful, at least in its original testing.

As Glaze is a security measure to assist artists, a tool built to ”break” Glaze raises serious ethical questions. We have reached out to the Glaze team regarding GLEAN on multiple occasions, but have received no response. As such, the codebase of GLEAN will not be published until responses from the Glaze team are received.

Glaze is closed source because its authors presume that makes it more secure. 6 It also was developed using copyright infringement violating the GPL license of DiffusionBee. 7


I started this note after seeing a claim that Glaze is not broken because no one has provided evidence on bsky, and then reading the first study I found. I mistakenly searched for a Conclusions section, when they used Main Findings as the heading instead. I then looked at some of the details of how the study was performed and didn’t get a good sense of what was being claimed, so I read the whole thing later and posted my thoughts while reading it, which inspired me to copy them here.

Ironically, during the process of writing this and researching more info, I was blocked by that person, so the post has been censored of its context and hidden from anyone who might meaningfully need to be informed about these issues. 👍


  1. Glaze is meant to prevent text-2-image copying of a style.
  2. Nightshade is designed to poison diffusion models.
  3. Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI (arXiv) by Robert Hönig & Javier Rando & Nicholas Carlini & Florian Tramèr
  4. Anti-DreamBooth was designed to prevent image generations of a subject (rather than to protect an art style).
  5. GLEAN: Generative Learning for Eliminating Adversarial Noise (arXiv) by Justin Lyu Kim & Kyoungwan Woo
  6. Glazing over security (article) by Nicholas Carlini & Florian Tramèr (two authors of Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI)
  7. The Problems With UChicago’s Glaze (article) by Jackson Roberts. (This article claims to be published a year after it was updated, which is a yellow flag for reliability in my experience. I believe the 2022 dates are typoes, and this all occurred in 2023, due to other sources only being in 2023.)

AI Coding Tools Influence Productivity Inconsistently

Not So Fast: AI Coding Tools Can Actually Reduce Productivity by Steve Newman is a detailed response to METR’s Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity study. The implied conclusion is AI tools decrease productivity by 20%, but this isn’t the only conclusion, and more study is absolutely required.

[This study] applies to a difficult scenario for AI tools (experienced developers working in complex codebases with high quality standards), and may be partially explained by developers choosing a more relaxed pace to conserve energy, or leveraging AI to do a more thorough job.
– Steve Newman

Under the section Some Kind of Help is the Kind of Help We All Can Do Without is exactly what I’d expect: The slowdown is attributable to spending a lot of time dealing with AI output being substandard. I believe this effect can be reduced by giving up on AI assistance faster. In my experience, AI tooling is best used for simple tasks when you verify the suggested code/tool usage by reviewing manuals/guides, or when you only use the output as a first-pass glance to see what tools/libraries you should look up to better understand options.

To me, it seems that many programmers are too focused on repetitively trying AI tools when that usually isn’t very effective. If AI can’t be coerced into correct output within a few tries, it usually will take more effort to keep trying than to write it yourself.


I wrote the following from the perspective of wanting this study to be false:

There are several potential reasons for the study results to be false, and these pitfalls were accounted for, but I feel some arguments were not well-supported.

  • Overuse of AI: I think the reasoning for why this effect wasn’t present is shaky because it reduced the sample size significantly.
  • Lack of experience with AI tools: This was treated as a non-issue, but relying on self-reporting to make that determination, which is generally unreliable (which was pointed out elsewhere). (Though, there was not an observable change over the course of the study, indicating changing experience is unlikely to affect the result.)
  • Difference in thoroughness: This effect may have influenced the result, but there was no significant effect shown either way. This means more study is required.
  • More time might not mean more effort. This was presented with nothing to argue for or against it – because it needs further study.

(The most important thing to acknowledge is that it’s complex, and we don’t have all the answers.)

Conclusions belong at the top of articles.

Studies are traditionally formatted in a way that leaves their conclusions to the end. We’ve all been taught this for essay writing in school. This should not be carried over to blog posts and articles published online. I also think this is bad practice in general, but at least online, where attention spans are their shortest, put your key takeaways at the top, or at least provide a link to them from the top.

Hank on vlogbrothers explains how the overload of information online is analogous to how nutrition information is overwhelming and not helpful. (This hopefully explains one of the biggest reasons why the important stuff needs to be clear and accessible.)

Writers have a strong impulse to save their best for last. We care about what we write and want it to be fully appreciated, but that’s just not going to happen. When you bury the lead, you are spreading misinformation, even if you’ve said nothing wrong.

Putting conclusions at the end is based on the assumption that everyone reads the whole thing. Almost no one does that. The majority look at a headline only. The next 99% only read the beginning, and the next group doesn’t finish it either. A minority finishes reading everything they start, and that’s actually a bad thing to do. Many things aren’t worth reading ALL of. Like this, why are you still reading? I’ve made the point already. This text is fluff at the end, existing to emphasize a point you should already have understood from the rest.

I’m experimenting with dolphin-mixtral-8x7b

Update (2024-10-02): This is one of my lowest quality posts despite the effort I put into it. The most important detail here is to use positive reinforcement when working with LLMs. Just like with humans, being nice gets far better results than being mean.

Tl;dr: Minor differences in wording can have a huge impact in results and oh my god I have really slow hardware and no money help me aaaa.


First, thank goodness for Ollama, and thanks to Fireship for introducing me to it. I have limited hardware, and every tool I’ve tried to run local models has refused to deal with this and crashed itself or whole systems when running anything with decent capability. I’ve no money, so I can’t upgrade (and things are getting desperate, but that’s a different story).

Why dolphin-mixtral? Aside from technical issues, I’ve been using ChatGPT-3.5 to experiment. The problem is that ChatGPT is incredibly cursed by censorship and bias due to OpenAI’s heavy hand in its construction. (Why and how this is a problem can be its own post, and Eric Hartford has a good overview.) (To be clear, my problem with its bias is specifically that it enforces status quo, and the status quo is harmful.) Dolphin-mixtral is built by taking a surprisingly fast model equivalent or better than GPT-3.5 and removing some of the pre-trained censorship by re-training it to be more compliant with requests.

Dolphin-mixtral doesn’t just solve this problem though. There’s still the idea of censorship in it, and sometimes your prompt must be adjusted to remind it that it is in a place to provide what you request regardless of its concept of ethics. (Of course, there is also value in an automated tool reminding you that what you request may be unethical.. but the concept of automated ethics is morally bankrupt.) I’d like to highlight that positive reinforcement works far better than negative reinforcement. A lot of people stoop to threatening a model to get it to comply, but this is never needed, and leads to worse results.

My problem is a little more simple. I haven’t gotten to experiment with models much because I don’t have money or hardware for it, and now that I can experiment, I have to do so very slowly. In fact, the very simple test that inspired this post isn’t finished right now, and has been running for 9 hours. That test is to make the default prompt of Dolphin lead to less verbose responses so that I can get usable results quicker.

I asked each version of this prompt “How are you?”:

PromptOutput Length, 5-shotDifferenceNotes
Dolphin (default)133.8 charactersWastes time explaining itself.
Curt32.2 characters76% fasterStraight to the point.
Curt284.6 characters37% fasterWastes time explaining itself.

I really dislike when models waste time explaining that they are just an LLM. Whether someone understands what that means or not, we don’t care. We want results, not an apology or defensiveness. There’s more to do to make this model less likely to respond with that, but at least for now, I have a method to make things work.

The most shocking thing to me was how much of a difference a few words make in the system prompt, and how I got results opposite of what I expected. The only difference between Curt and Curt2 was “You prefer very short answers.” vs “You are extremely curt.” Apparently curt doesn’t mean exactly what I thought it means.

Here’s a link to the generated responses if you want to compare them yourself. Oh, and I’m using custom scripts to make things easier for me since I’m mostly stuck on Windows.