a nitpick of Acerola’s “Generative AI is not what you think it is”

I was asked why I didn’t think this video is very good, and a lot of my thoughts on it are nitpicks, but since I went ahead and made a list, I figured I might as well share it here. This video is better than the majority on this topic, but my standards are even higher.

  • Kind of a nitpick, kind of not? The distinction between theft and piracy is important, because companies historically try to equivocate them in order to levy heavier penalties on pirates by making their losses appear greater than they are. I really don’t like that everyone is calling mass piracy mass theft because piracy really shouldn’t be treated as theft.
  • Nitpick: I wish they’d given more attention to the modern slavery behind datasets because the emotional torture of people in third-world countries is the most obviously unethical and directly damaging thing these companies do. It’s a good emotional hook and can get people to pay more attention to the bad being done.
  • Nitpick: Models being confidently incorrect is a problem that is being improved, but it’s portrayed as a fundamental issue.
  • The whole bit about machine learning not being accessible is just wrong. People have been making models on laptops for at least a decade. These are usually more demonstrations of concepts and toys, but can be useful. Just because a person can’t scale to the internet as a dataset doesn’t mean they can’t effectively use machine learning for tasks.
  • They’re completely wrong about models being at a dead-end because all internet data has been scraped. Synthetic data generation doesn’t cause model collapse and does improve models. The example given – piss-tint on images – is also completely misattributed to data incest when it’s actually due to yellowing in old photographs.
  • Electricity and water use are consistently overstated. The projected water use in 2027 for global AI usage is 0.17% of 2014’s global water use. Agriculture’s water waste is a much bigger problem – and I do mean waste, not usage. Agriculture must use the majority of water, it simply has to because growing anything takes a lot of water, but the amount it uses is far in excess of how much it has to. Electricity usage I’m far less knowledgable about, but looking into. I know a big problem with electricity is in how costs are distributed, and how computers use electricity causes a unique strain problem.
  • Nitpick: I don’t like that they consistently say models aren’t improving, because they are. Benchmarks are kind of bullshit in my opinion, but I’ve seen increases in quality despite a lot of mistakes and steps backwards.
  • Nitpick: I think they’re wrong about how long these companies can burn money. In order to remain competitive with how speculative all of this is, they are spending cash exponentially faster, so the insane cash reserves they’ve built won’t last that long. There’s also the fact that we’re in a recession, and only AI is making the economy appear healthy. I don’t think an economy can actually survive long-term when all of it is failing except one small sector that has massively inflated value.
  • Calling anyone how worked on machine learning to be tried for crimes against humanity is just fucked up. There’s a massive difference between what the few big companies are doing and all of the research that’s been done. Blaming everyone for the actions of a greedy few is wrong.
  • They didn’t mention a single model trained only on data with explicit permission. Feels misleading to ignore that there are exceptions to the general rule of mass piracy.
  • They didn’t mention anything about how machine learning has been used to discover new drugs or better understand biology. Despite many failures and the extreme infection of greed in the medical industry, that sector is where there is the most benefit from machine learning.

I guess that last point is kind of a nitpick as well, because it’s so far outside of the point of the video being about image and text generation, but when they started by defining how “AI” just means machine learning right now, it feels wrong to exclude the positive aspects entirely. If we demonize ALL machine learning, then we lose the benefits where they do exist.

I should also say that I know far more about failures of AI in medicine than successes, and I want to have more solid examples to point to when saying that AI has been useful in medicine. I recently made a video about that, and posted about it here.