LuaJIT Strange & Unpredictable Behavior in table.sort

A quick story about discovering where Lua’s table abstraction fails to work.

In Lua, arrays should always be initialized sequentially, and should never have a nil inserted.

(The code behind this adventure starts on line 128 in one commit, where a crazy work-around was used to get what I wanted. This was eventually fixed in commit 8b8f36b (though accidentally sorting the wrong direction at that point). I provide these links to remind myself what happened, and so that if someone more knowledgeable than me wants to figure this out, hopefully they can. I’ve also made sure those links have been archived by the Internet Archive so they remain available even if the project is gone.)

While working on a prototype, I needed to sort a list of integers so that I could count the sum of the top thousand items in an array of ~40 thousand items. This list had empty indexes (or holes) within it, so I wrote a for loop to fill them with zeros and then called table.sort. It errored from trying to compare with a nil. This was confusing because I had a well-defined length that was used to fill holes, and the way Lua works, any hole in the list should be the end of what Lua recognizes as a list. As far as I knew, standard functions don’t allow holes in lists by ending at the first hole encountered.

for i = 0, list_length do
  if not list[i] then
    list[i] = 0
  end
end

I was further confused when I added debug code to check the list for nil values by iterating over it and printing every value in it: No nil values existed. The strangest part was when I passed my own sorting function to table.sort specifically to handle nil values and it created the same error. I worked around this by brute-forcing it with an incredibly inefficient thousand traversals of the entire list, removing items as I found the maximum each time. It’s a prototype, so whatever.

It didn’t sit right with me, especially because I couldn’t figure out how this was occurring. It’s incredibly rare to find an actual bug in highly used production software like programming languages, so I doubted I’d found an actual problem. Later, I realized I had added a zeroth element to this list, and Lua starts counting at 1, unlike a lot of programming languages. Removing this changed nothing, so I double-checked my assumptions and found that now nil values were showing up – and this is after no code changes except adding a for loop to print values.

I wrote a new sorting function that checks more thoroughly for nil values (I think the previous attempts at that failed because I’d made some simple error in how I was thinking about it). It worked. But then I found the loop whose code I’ve shared above, where holes are filled with zeroes. It had been a while, so I was very surprised to find what should have fixed things right there doing nothing.. somehow. I mentioned this in a group chat, and someone suggested replacing list_length with #list (the # operator returns length). The problem is that Lua stops counting at the first nil, so that should do nothing.

There’s also the possibility of an off-by-one error, those are very common, but I had assumed this would happen at some point and made sure all of my functions and loops checked one element earlier than necessary and one element later, so that if I checked too much – just extra zeroes, but I couldn’t check too little.

I wasn’t satisfied by all the nil values being there after a loop that replaces them with zero, so I tried making a loop at the top of the code that prefills a much longer list than I actually needed with zeroes. Suddenly, I didn’t need a custom sorting function. So why does filling the list with zeroes work in one place, but not another?

How Objects Work in Lua

In Lua all objects are called tables, and can contain an array-like section and arbitrary key-value pairs in a hashmap. It doesn’t really distinguish between these except as an optimization, under the hood it tries to guess how you intend to use the object and assigns values internally differently depending on what index you’re using. I thought that in practice, this means integer indexes are handled like arrays, and non-integer indexes are in the hashmap. I thought that maybe because my code was selecting integer indexes in a completely random order, it treated these as a hashmap instead, somehow leading to ignoring attempting to fill the holes – because there weren’t any holes, or the holes persisted because only part of the indexes were treated as an array.

Lua’s tables are an abstraction meant to make writing code easier, and for my years of working with them, they do that quite well, but maybe this is where the abstraction leaks? (All abstractions leak: They have some condition(s) where you end up needing to know how they work underneath, even though the point of abstraction is to not need to know how it works underneath.) Perhaps arrays are only recognized when they’re initialized sequentially, whereas my code initialized them in a random order? Either way, the error I started with manifests as “An array of integers can’t be sorted because part of it is undefined – even though every element is defined when iterated over.” which is not supposed to be possible.

There’s also the detail that in Lua there is not supposed to be difference between something having a nil value and it not existing at all. There is a difference, but it should be impossible to tell the difference or for the difference to matter in your code. It is possible that this alone is why I had this problem. Changing a value to nil inside an array table can cause all values within it to be converted to a hashmap, or some of them, or none of them. Again, this shouldn’t matter (other than a difference in speed which is usually unnoticeable), but it might matter.

I wrote a minimum test case to see if my hunch was correct, but it works when I expect it not to. I’m lost. I have an explanation that informs how I should code so that it works – and it does work, but because that explanation fails when tested, it seems incorrect. I want to know what’s really going on, but it requires a depth of understanding that I lack, and effort that I do not have the time and energy to go into.

I also learned that Lua defines this kind of behavior as “strange and unpredictable behavior” instead of undefined behavior, because undefined behavior means that an entire program can fail to function correctly do to one element with undefined behavior (even when that element is not accessed), whereas in Lua, a table exhibiting strange and unpredictable behavior doesn’t affect the rest of the program. So, if you can work around it, it’s okay.

At this point, I’m happy enough telling myself that arrays should always be initialized sequentially, and shouldn’t have nil elements ever. If I follow that, I probably won’t run into this. But.. if you’re reading this and understand what’s really going on here, please tell me.

YouTube Censorship Made Me Write a Script

YouTube’s been forcing creators to censor their works more and more, and often times after a successful publish of said content. More history and valuable information is being lost every day because a corporation controls the largest source of video content freely available.

At the same time, I’ve been running commands using yt-dlp over and over again for my own purposes, aside from this censorship. The syntax is relatively easy to forget despite being very clearly defined, so I finally made a script to handle it for me.

It’s in Lua because that’s what I prefer to use, and available on GitHub’s gists. Because it is based on yt-dlp, it works for any website supported by yt-dlp. Here’s how to use it:

Usage:
  ./video-dl.lua [action] <url>
[action]: What is desired.
  video (default): Highest quality video (maximum 720p).
  backup, clone, copy: English subtitles (including automatic subtitles), thumbnail, description, highest quality video (maximum 720p).
  music, audio: Highest quality audio only.
  metadata, meta: English subtitles (including automatic subtitles), thumbnail, description.
<url>: Source. YouTube URL expected, but should work with anything yt-dlp works with.

Information wants to be free. Help it.

File Size Statistics Script (Lua)

I used ChatGPT to write a script for generating a list of file statistics based on everything within the directory it is placed in. It uses LuaFilesystem, and generates a final output like the following after it’s done processing through the files:

2359    files found.
Average (mean) file size:       44842.524374735 bytes
Standard deviation:     320478.50592438
Multiple modes:
Mode 1: 126     bytes
Mode 2: 204     bytes
Frequency:      7
[####################] 0.00 - 199271.16: 2245 files
[##########          ] 199271.16 - 398542.33: 59 files
[#######             ] 398542.33 - 597813.49: 16 files
[#######             ] 597813.49 - 797084.65: 14 files
[#####               ] 797084.65 - 996355.82: 6 files
[#####               ] 996355.82 - 1195626.98: 8 files
[##                  ] 1195626.98 - 1394898.14: 2 files
[#                   ] 1394898.14 - 1594169.31: 1 files
[#                   ] 1594169.31 - 1793440.47: 1 files
[                    ] 1793440.47 - 1992711.63: 0 files
[                    ] 1992711.63 - 2191982.80: 0 files
[#                   ] 2191982.80 - 2391253.96: 1 files
[                    ] 2391253.96 - 2590525.12: 0 files
[                    ] 2590525.12 - 2789796.29: 0 files
[                    ] 2789796.29 - 2989067.45: 0 files
[##                  ] 2989067.45 - 3188338.61: 2 files
[                    ] 3188338.61 - 3387609.78: 0 files
[                    ] 3387609.78 - 3586880.94: 0 files
[                    ] 3586880.94 - 3786152.10: 0 files
[                    ] 3786152.10 - 3985423.27: 0 files
[                    ] 3985423.27 - 4184694.43: 0 files
[#                   ] 4184694.43 - 4383965.59: 1 files
[                    ] 4383965.59 - 4583236.76: 0 files
[                    ] 4583236.76 - 4782507.92: 0 files
[                    ] 4782507.92 - 4981779.08: 0 files
[                    ] 4981779.08 - 5181050.24: 0 files
[#                   ] 5181050.24 - 5380321.41: 1 files
[                    ] 5380321.41 - 5579592.57: 0 files
[                    ] 5579592.57 - 5778863.73: 0 files
[                    ] 5778863.73 - 5978134.90: 0 files
[                    ] 5978134.90 - 6177406.06: 0 files
[                    ] 6177406.06 - 6376677.22: 0 files
[#                   ] 6376677.22 - 6575948.39: 1 files
[                    ] 6575948.39 - 6775219.55: 0 files
[                    ] 6775219.55 - 6974490.71: 0 files
[                    ] 6974490.71 - 7173761.88: 0 files
[                    ] 7173761.88 - 7373033.04: 0 files
[                    ] 7373033.04 - 7572304.20: 0 files
[                    ] 7572304.20 - 7771575.37: 0 files
[                    ] 7771575.37 - 7970846.53: 0 files
[                    ] 7970846.53 - 8170117.69: 0 files
[                    ] 8170117.69 - 8369388.86: 0 files
[                    ] 8369388.86 - 8568660.02: 0 files
[                    ] 8568660.02 - 8767931.18: 0 files
[                    ] 8767931.18 - 8967202.35: 0 files
[                    ] 8967202.35 - 9166473.51: 0 files
[                    ] 9166473.51 - 9365744.67: 0 files
[                    ] 9365744.67 - 9565015.84: 0 files
[#                   ] 9565015.84 - 9764287.00: 1 files
0th percentile: 0       bytes
10th percentile:        167     bytes
20th percentile:        317     bytes
30th percentile:        476     bytes
40th percentile:        692     bytes
50th percentile (median):       986     bytes
60th percentile:        1428    bytes
70th percentile:        2101    bytes
80th percentile:        3650    bytes
90th percentile:        38917   bytes
100th percentile:       9764287 bytes

With minimal effort, you could change it quite a bit, because it’s written as pure functions. I wouldn’t have achieved this myself, nor produced it so quickly, if I didn’t have ChatGPT do the easy stuff for me. I found the experience quite helpful. While ChatGPT did once forget that Lua indexes tables starting with 1, and made a few weird decisions and downright inefficient code in some places, it allowed me to focus on making it work exactly how I wanted it to, instead of just mostly correct or “good enough for now”.

(Btw, the example output above is from my Obsidian vault. You can read a bit more about how I use Obsidian to organize my notes here.)

cc-pkg: A ComputerCraft Package Manager

(The latest information and a quick reference is located here.)

ComputerCraft is a Minecraft mod that adds Lua-based computers. Over time, many programs have been created, and several package managers have come and gone. As I write this, all that I have seen are gone – their original authors have moved on, and shut down the servers hosting packages.

Now it’s my turn to sell you a package manager. Unlike the others, I expect this to remain viable – even if I’m gone from the picture. If you want to skip to trying it, here’s how it’s installed (and how to ask for help):

pastebin get 9Li3u4Rc /bin/pkg
/bin/pkg help

(I’d also recommend installing the unix-like package, which adds /bin to your path, along with a few other small tweaks.)

Why is cc-pkg different?

  1. It is built on ComputerCraft’s pastebin integration.
  2. It does not require a maintainer.
  3. It is extremely simple – and flexible.

cc-pkg has the same three sub-commands of the default pastebin program: get, run, and put*. They each do exactly what you’d expect them to do, except they can use names as well as pastebin IDs. Names consist of alphanumeric characters and dashes, and there are two file types cc-pkg can recognize and utilize:

Both are plaintext lists of the form key=value. The first is called a package and has file paths (all starting with a forward slash) as keys, with names or pastebin IDs as values. This is how cc-pkg knows which files to download and where to put them. The second type is called a list, and contains names as keys, with names or pastebin IDs as values. Lists are saved to a local file cc-pkg uses to resolve package names – overwriting any existing entries with the same package name, which is how updating is done.

With just those core features, I think the system is viable.

*The put command is not implemented as of version 1.4.2 1.5.10, the latest at the time of writing finishing updating this introduction.

Command Extensions

On top of this, if a file is saved to /lib/pkg-commands/<name> and a user runs pkg <name>, that file will be run with the other arguments. This allows adding new functions, and overwriting the core functions to add additional features, if desired.

Under the Hood

cc-pkg keeps metadata through the following files:

  • /etc/pkg/names.list: The master names list. (How cc-pkg knows what to download.)
  • /etc/pkg/ids.list: Download history of cc-pkg. (An ordered list of every pastebin ID successfully downloaded.)
  • /etc/pkg/<package-name>: Each package is saved here by name, in addition to the files it specifies.

API Loadable

It uses global functions so that it can be loaded as an API to make a more advanced package management system on top of it – or just to make programs automatically download requirements using cc-pkg.

  • get(name_or_id, path): The core function that installs anything (package/list/file). (path is optional.)
  • down(id): Downloads from a pastebin ID, returning its content or throwing an error.
  • save(path, data): Write to file. (Note: This is not a binary write.)
  • append(path, data): Append to file. (Note: This is not a binary write.)
  • id(name_or_id): Recursively checks the master names list until a pastebin ID is returned. (Note: On failure, returns the last name resolved.)
  • type(data): Recognizes data as a package, list, or unknown type, and returns that type.
  • src(data): Combines this data with existing names (overwriting if duplicates exist). (If get is used, src will be called when needed automatically.)

An example of this is the pkg-search package, which adds a search command to look for specific package names within the master list.

Example Packages & Lists

  • A basic package looks like this:
    /startup=XsYUCf5b
    This is the multi-startup package, which makes the computer run all files in /.startup when started. It is required for several of my packages to function.
  • A package with a dependency looks like this:
    /etc/pkg/multi-startup=multi-startup
    /.startup/unix-like.lua=iG9idFgk

    This is the unix-like package, which requires multi-startup to function, and makes the computer operate more like a UNIX computer. (Due to how cc-pkg works, specifying to download a package anywhere will install that package. By convention and for consistency, I use the location cc-pkg installs packages by default.)
  • A “meta package” (a package that only lists dependencies) looks like this:
    /etc/pkg/ping=ping
    /etc/pkg/touch=touch
    /etc/pkg/which=which

    This is only part of the unix-meta package, which lists many packages that make a computer operate more like a UNIX computer. Packages are very flexible! They don’t even have to install anything themselves.
  • A list looks like this:
    pocket-computer-chat=qtnShai4
    pocketgps=8SXT1hDN

    This is the current list of pocket packages at the time of writing. (These can be added by running pkg get pocket-packages.) Remember, lists can reference other names (e.g. unix-stuff=unix-meta), allowing one to create aliases for existing packages & lists.

Switches Suck

I don’t like switch statements, in any language. They seem unnecessarily verbose and error-prone, in fact I forgot the break statements in my example below on the first draft. Most of the time, you don’t want the fall-through feature, but you have to remember that extra word for each case to prevent that.

switch (n)
{
    case 1:
    // something useful
    break;
    case 2:
    // another useful option
    break;
    default:
    // nothing matched
}

I also really hate the indenting used in most examples (including my own), as it makes it more difficult to visually parse. I prefer to just create an object with keys based on the possible values, and access what you need directly.

-- we're gonna pretend these are useful functions dependent on a star's type..something to do with heat?
local default = {}           -- used for a unique key
local heat = {
  A = function() end,        -- pretend these are full of useful code
  B = function() end,
  G = function() end,        -- and so on
  [default] = function() end -- default which can't be accidentally chosen
}

-- make sure we don't error, and call default if needed
if heat[star.type] then
  heat[star.type]()
else
  heat[default]()
end