a nitpick of Acerola’s “Generative AI is not what you think it is”

I was asked why I didn’t think this video is very good, and a lot of my thoughts on it are nitpicks, but since I went ahead and made a list, I figured I might as well share it here. This video is better than the majority on this topic, but my standards are even higher.

  • Kind of a nitpick, kind of not? The distinction between theft and piracy is important, because companies historically try to equivocate them in order to levy heavier penalties on pirates by making their losses appear greater than they are. I really don’t like that everyone is calling mass piracy mass theft because piracy really shouldn’t be treated as theft.
  • Nitpick: I wish they’d given more attention to the modern slavery behind datasets because the emotional torture of people in third-world countries is the most obviously unethical and directly damaging thing these companies do. It’s a good emotional hook and can get people to pay more attention to the bad being done.
  • Nitpick: Models being confidently incorrect is a problem that is being improved, but it’s portrayed as a fundamental issue.
  • The whole bit about machine learning not being accessible is just wrong. People have been making models on laptops for at least a decade. These are usually more demonstrations of concepts and toys, but can be useful. Just because a person can’t scale to the internet as a dataset doesn’t mean they can’t effectively use machine learning for tasks.
  • They’re completely wrong about models being at a dead-end because all internet data has been scraped. Synthetic data generation doesn’t cause model collapse and does improve models. The example given – piss-tint on images – is also completely misattributed to data incest when it’s actually due to yellowing in old photographs.
  • Electricity and water use are consistently overstated. The projected water use in 2027 for global AI usage is 0.17% of 2014’s global water use. Agriculture’s water waste is a much bigger problem – and I do mean waste, not usage. Agriculture must use the majority of water, it simply has to because growing anything takes a lot of water, but the amount it uses is far in excess of how much it has to. Electricity usage I’m far less knowledgable about, but looking into. I know a big problem with electricity is in how costs are distributed, and how computers use electricity causes a unique strain problem.
  • Nitpick: I don’t like that they consistently say models aren’t improving, because they are. Benchmarks are kind of bullshit in my opinion, but I’ve seen increases in quality despite a lot of mistakes and steps backwards.
  • Nitpick: I think they’re wrong about how long these companies can burn money. In order to remain competitive with how speculative all of this is, they are spending cash exponentially faster, so the insane cash reserves they’ve built won’t last that long. There’s also the fact that we’re in a recession, and only AI is making the economy appear healthy. I don’t think an economy can actually survive long-term when all of it is failing except one small sector that has massively inflated value.
  • Calling anyone how worked on machine learning to be tried for crimes against humanity is just fucked up. There’s a massive difference between what the few big companies are doing and all of the research that’s been done. Blaming everyone for the actions of a greedy few is wrong.
  • They didn’t mention a single model trained only on data with explicit permission. Feels misleading to ignore that there are exceptions to the general rule of mass piracy.
  • They didn’t mention anything about how machine learning has been used to discover new drugs or better understand biology. Despite many failures and the extreme infection of greed in the medical industry, that sector is where there is the most benefit from machine learning.

I guess that last point is kind of a nitpick as well, because it’s so far outside of the point of the video being about image and text generation, but when they started by defining how “AI” just means machine learning right now, it feels wrong to exclude the positive aspects entirely. If we demonize ALL machine learning, then we lose the benefits where they do exist.

I should also say that I know far more about failures of AI in medicine than successes, and I want to have more solid examples to point to when saying that AI has been useful in medicine. I recently made a video about that, and posted about it here.

cringe is necessary for growth

Unfortunately, I cannot find the original source for this. This version I modified to add a representation of my own attempts to be cool ending up in cringe.

A common lie about the development of anything valued is that the people making it were cool the whole time, that they knew it would be perfect and loved. This leads to pushing people away from creation because they think they can’t do bad on the road to doing good.

Bad art is essential for good art, and this applies more broadly to just about anything. You don’t make progress on success, you make progress on iterative failure, repeatedly getting just a little closer every time, until you find success.

This is a video I made about a large collaborative piece of bad art that was deleted for profit, rather than serving as a place to continue to grow and learn from. More info.

Online privacy protection only works when we all participate

In a group chat I’m in, the following was said (details modified/removed to protect anonymity):

I haven’t searched for topic in Google. I searched on a privacy-protecting search engine. I talked about it with a coworker on an internal chat tool. And now, on my personal device, YouTube is showing me a video explaining topic.

Android devices in particular do listen to you1 and send data to various companies. While Google claims to only listen when directed and with permission, they are often caught listening without explicit permission. They have a strong incentive to collect as much data as they can, but I don’t think this is the cause.

Shadow profiles are like accounts, but created without permission for tracking purposes. They do not always uniquely identify a person, but they usually do. I am confident YouTube builds these and tracks connections between users (signed in or not), and tests their presumptions about identity by showing videos recently watched by someone related to you. It confirms these relationships by your interactions.2

That may also not be the cause, because ultimately, YouTube’s algorithms are a pattern-matching machine sifting through a hoard of data. Relatedness can be found in unknowable ways. It’s spooky to us because we cannot imagine how these connections are made, but they are nonetheless real – or made real by the machine.

They are definitely doing shady tracking because suggestions are too precise to only be accounted for by spurious connectivity.

Have you ever looked into browser fingerprinting?

It’s shockingly easy to identify users3 from standard data available to anyone. You as an individual can’t fight it because when you genericize your data using privacy protection features, you are put in a group of similar users so small that the remaining traces (like loading times) become enough to uniquely identify you anyhow.

As an individual4, use privacy protecting features whenever you can, because they only work when we all use them, but know that we must also fight back as a culture. We need systematic change to regain privacy, and that only happens with laws and social movements.

Lobbying is evil, but necessary in the world we live in. There are many organizations that call themselves privacy advocates, but most of these are actually fronts for business interests. Startpage’s Privacy Organizations You Should Follow is a list of organizations actually interested in preserving privacy instead of controlling access to privacy.

you keep saying “privacy protection features” like I know what that is

The easiest first step is to use a browser that protects you by default, like Vivaldi or LibreWolf5. Conversely, Chrome is the worst browser to use – it’s the most popular because of a concerted data collection effort by Google. Brave has a number of issues6, but is likewise strongly marketed as privacy-focused. All warfare marketing is based on deception.

Another easy step is to install a VPN. Use Private Internet Access, as they are the only VPN to consistently be proven by legal actions to not collect user data. (They’re also the cheapest!) Despite popular VPNs claiming to offer full privacy just by being installed, VPNs only hide one small part of how you are tracked online. They are a good tool, but do not offer that much protection, and they do slow your connection somewhat.

If you want to go all-in, I’ve stumbled across A Comprehensive Guide To Protecting Your Digital Privacy by Thessy Emmanuel. Even by a glance, I can tell it’s a pretty good resource, and it even covers things you may not expect like how cities track you.

footnotes

Speak Up When You Are Suspicious

Updated 2025-07-11: I accidentally referenced 2019 when I meant 2020 when referring to COVID-19, this has been corrected.

Recently I saw a video (When Your Hero Is A Monster) talking about the general response people have any time a celebrity is revealed to have been doing sex crimes. A common response is to claim you always knew something was up, as a way to process your grief at having been misled into believing they were a good person. The video suggests that this impulse is harmful because it signals to others that they aren’t “good enough” because they didn’t see it coming. But this is usually post-fact rationalization, not a belief that was held before the reveal.

When Your Hero Is A Monster isn’t really about Neil Gaiman, it’s sorta about how we are misled into believing celebrity is good, and have an unhealthy relationship with finding out the truth.

It made me think about how Honey blew up recently (How Honey Scammed Everyone on YouTube). I never installed it because it seemed suspicious1, but I never called it out, so now me saying so is exactly the same knee-jerk response. It doesn’t actually help, whether or not it’s true that I felt there was something wrong, because now it’s too late to have warned anyone. It made me realize that I should be more forthright in saying when I think something bad is going on. At the very least, I can point to proof and say “yes, I did actually suspect” and know that I’m not making false memories, but it also is helpful to talk about misgivings because that’s how you can work out whether or not your concerns are justified, and maybe even help others.

Everyone credits MegaLag for exposing this, and while they definitely made the video that got everyone talking about it, it’s a long video and Mental Outlaw‘s video not only explains it much easier and quicker, but also manages to cover similar suspicions/problems with VPN companies, how Linus Media Group unintentionally helped Honey stay incognito, and even mentions a sort of successor to Honey to be on the lookout for. I think this is the best summary of recent events.

This also made me think about COVID. In March or April 2020, I correctly predicted exactly (within a few months) how long it would take for vaccines to arrive, and how people would pretend it stopped being a problem despite becoming endemic. But I didn’t say anything publicly. I told close friends and family what to do to be safe, and what to expect. I made my dad take precautions and took over riskier interactions to help keep him safe. I should’ve told more people. It’s my only regret from all of 2020. I could’ve helped more people, but I didn’t.

When you are unsure of something, or you feel that is something wrong, talk about it. Markiplier called out Honey’s suspicious activity years ago. Through dialogue, you learn whether or not your fears are misplaced, you help others remember to stay vigilant, or even help others recognize something is wrong long before it becomes popular or common knowledge. This is a mistake I keep making, but I’m trying to improve. When I see something important to discuss, I should call out. It’s not about being correct, it’s about communication.

https://www.youtube.com/watch?v=bKbfnFL-s6M
Markiplier Predicts Honey Scam In 2020 (there’s also a response he made being very excited about how right he was, and a very amusing animatic of part of this rant)

Update 2025-07-28: It seems that video was taken down at some point. Thankfully it’s been archived using PreserveTube:


Footnotes

Linus Media Group pulled their Honey sponsorships over suspicions a long time ago, but didn’t talk much about it. One could easily argue they are partially to blame for not speaking up, but it’s also easy to argue that it was a private business decision, and they didn’t know how important it would be to say something. (Hell, they could’ve even been under contract requiring them to keep the secret2. We would never know.) They did post a response to the Honey situation. That’s also a class-action lawsuit underway, spearheaded by LegalEagle.

  1. Ironically, I was suspicious of it primarily because of privacy violations (tracking any shopping you do, but possibly also just everywhere) and because I assumed it worked through backroom deals with sellers to give out discounts in exchange for customer information – allowing a company to keep its image clean because it wasn’t the one who stole your private information, it just bought that information. As we now know, that’s not at all what was happening.
  2. Being under a secretive contract is always bad. You don’t get to know what secrets you’re required to keep secret without signing the contract. Because of this, it’s hard to blame someone for being required to keep a secret. Obviously, there are many secrets that are highly unethical.. but it’s understandable to value your life more than revealing such secrets.

Is it still footnotes if you’re just posting semi-related thoughts?

As always, I endeavor to make sure my blog posts are archived.

Updated 2025-01-16: Removed confusing phrasing in the opener and moved a paragraph into the footnotes because it was out-of-place.