File Size Statistics Script (Lua)

I used ChatGPT to write a script for generating a list of file statistics based on everything within the directory it is placed in. It uses LuaFilesystem, and generates a final output like the following after it’s done processing through the files:

2359    files found.
Average (mean) file size:       44842.524374735 bytes
Standard deviation:     320478.50592438
Multiple modes:
Mode 1: 126     bytes
Mode 2: 204     bytes
Frequency:      7
[####################] 0.00 - 199271.16: 2245 files
[##########          ] 199271.16 - 398542.33: 59 files
[#######             ] 398542.33 - 597813.49: 16 files
[#######             ] 597813.49 - 797084.65: 14 files
[#####               ] 797084.65 - 996355.82: 6 files
[#####               ] 996355.82 - 1195626.98: 8 files
[##                  ] 1195626.98 - 1394898.14: 2 files
[#                   ] 1394898.14 - 1594169.31: 1 files
[#                   ] 1594169.31 - 1793440.47: 1 files
[                    ] 1793440.47 - 1992711.63: 0 files
[                    ] 1992711.63 - 2191982.80: 0 files
[#                   ] 2191982.80 - 2391253.96: 1 files
[                    ] 2391253.96 - 2590525.12: 0 files
[                    ] 2590525.12 - 2789796.29: 0 files
[                    ] 2789796.29 - 2989067.45: 0 files
[##                  ] 2989067.45 - 3188338.61: 2 files
[                    ] 3188338.61 - 3387609.78: 0 files
[                    ] 3387609.78 - 3586880.94: 0 files
[                    ] 3586880.94 - 3786152.10: 0 files
[                    ] 3786152.10 - 3985423.27: 0 files
[                    ] 3985423.27 - 4184694.43: 0 files
[#                   ] 4184694.43 - 4383965.59: 1 files
[                    ] 4383965.59 - 4583236.76: 0 files
[                    ] 4583236.76 - 4782507.92: 0 files
[                    ] 4782507.92 - 4981779.08: 0 files
[                    ] 4981779.08 - 5181050.24: 0 files
[#                   ] 5181050.24 - 5380321.41: 1 files
[                    ] 5380321.41 - 5579592.57: 0 files
[                    ] 5579592.57 - 5778863.73: 0 files
[                    ] 5778863.73 - 5978134.90: 0 files
[                    ] 5978134.90 - 6177406.06: 0 files
[                    ] 6177406.06 - 6376677.22: 0 files
[#                   ] 6376677.22 - 6575948.39: 1 files
[                    ] 6575948.39 - 6775219.55: 0 files
[                    ] 6775219.55 - 6974490.71: 0 files
[                    ] 6974490.71 - 7173761.88: 0 files
[                    ] 7173761.88 - 7373033.04: 0 files
[                    ] 7373033.04 - 7572304.20: 0 files
[                    ] 7572304.20 - 7771575.37: 0 files
[                    ] 7771575.37 - 7970846.53: 0 files
[                    ] 7970846.53 - 8170117.69: 0 files
[                    ] 8170117.69 - 8369388.86: 0 files
[                    ] 8369388.86 - 8568660.02: 0 files
[                    ] 8568660.02 - 8767931.18: 0 files
[                    ] 8767931.18 - 8967202.35: 0 files
[                    ] 8967202.35 - 9166473.51: 0 files
[                    ] 9166473.51 - 9365744.67: 0 files
[                    ] 9365744.67 - 9565015.84: 0 files
[#                   ] 9565015.84 - 9764287.00: 1 files
0th percentile: 0       bytes
10th percentile:        167     bytes
20th percentile:        317     bytes
30th percentile:        476     bytes
40th percentile:        692     bytes
50th percentile (median):       986     bytes
60th percentile:        1428    bytes
70th percentile:        2101    bytes
80th percentile:        3650    bytes
90th percentile:        38917   bytes
100th percentile:       9764287 bytes

With minimal effort, you could change it quite a bit, because it’s written as pure functions. I wouldn’t have achieved this myself, nor produced it so quickly, if I didn’t have ChatGPT do the easy stuff for me. I found the experience quite helpful. While ChatGPT did once forget that Lua indexes tables starting with 1, and made a few weird decisions and downright inefficient code in some places, it allowed me to focus on making it work exactly how I wanted it to, instead of just mostly correct or “good enough for now”.

(Btw, the example output above is from my Obsidian vault. You can read a bit more about how I use Obsidian to organize my notes here.)

How I Use Obsidian (Notetaking)

This post has had more hours put into it than the majority of my writings here. It still doesn’t feel finished, or correct, because this is a huge topic. I think it’s more useful to publish as-is, and update it with links to more detailed thoughts as I publish those thoughts.

Key Takeaways

  • Organization is something to slowly bring in as needed, not something to focus on from the beginning.
  • File hierarchy is a losing strategy. If you have to use folders, keep them as simple, organized, and flat as possible.
  • Tags should be kept simple. They have most of the flaws of folders.
  • Don’t stress about links. Use them, but it’s okay to forget or to remove extra links later.
  • Journal entries should be simple and unstructured – easily triaged. Likewise, reviews should be kept simple.

How I Got Here

I don’t have a magic answer that will solve your organization needs, but I have 2,300 notes from as far back as 20071, I’ve been trying to organize them since 2010, and I think I finally am getting the hang of it as of 2 weeks ago. When your interests include everything humanity has ever done – and many things beyond, it gets unwieldy to organize. When you have neurodivergent brain, it gets even harder.

It started with folders – after all, that’s how computer filesystems work – but then you run into at least two distinct problems: “Where did I put that?” and “Where does this go?“. The first can be helped with search, but many search tools are inadequate. The second takes longer to appear, because it only hits you after you’ve decided to put the same kind of file in multiple folders.

Then you move to tags (or categories, which are just stricter tags). Tags allow you to place the same thing in multiple places, solving the most significant problem of folders: hierarchies. Eventually, you run into the same problems in a different form: “What did I tag that with?” and “What do I tag this with?“. You resolve to use tags differently, but then they become too vague to be useful (everything in my journal is tagged #journal) or too specific (I only have a single note about dumbbell fusion reactors, why does that specific concept have its own tag?).

Folders, categories, and tags are all fundamentally the same thing.
They’re like the Dewey decimal system, trying to fit all of possibility into a logically ordered structure. Reality refuses to fit in this box.
Use these sparingly or not at all whenever possible. They limit your ability to organize rather than help it.

It is critical to recognize these similarities and their failings. I used to think tags a superior organizational method, and focused on their usage for years. It all fell apart. (I now think links between notes are the most important organizational method, but only as long as you let them happen naturally. Don’t stress about what should or shouldn’t be linked.)

In 2019, I finally heard of something beyond default “Notes” apps5. I started using Obsidian and trying to follow the Zettelkasten method (every single note must be a single specific idea – an atom). Don’t start like this. It sounds appealing, but it is not for beginners, and it held me back for years. Analysis Paralysis – the inability to make a decision because of the options available – is a trap that I regularly fall into. “Is this note really atomic?” distracts you from the importance of what the note connects to. Similarly, I was distracted by the beauty of the graph, and decided everything must be linked and the graph must be useful. These ideas are misguided.

My past 3 years have been spent wandering between notetaking methodologies without solving any of my core problems. I used the Theme System, got lost in LYT‘s MOCs (yes, their website does look like a scam), tried Bullet Journaling, looked at PARA & LATCH, and finally started making real changes after stumbling across the Johnny.Decimal system and watching several videos from Nicole van der Hoeven. I’ve realized for a long time that file structure is pointless2, but a part of my brain obsesses over where files are. Using the simple hierarchy of the Johnny.Decimal system allows me to shut up that part of my brain, while generally avoiding thought on the location of files.

Partial example of a Johnny.Decimal hierarchy.
This is a portion of what I am using now, which is partially failing to adhere to the “correct” way to use Johnny.Decimal. But, it works for me.

The primary reason this works for me is Obsidian’s settings for where attachments and new files are are placed. Previously I followed the doctrine of having attachments in a specific organized folder, and briefly had all new notes created in the root (so they could be properly organized after creation). That just leads to splitting some files from what they are related to, and creating extra burdens organizing files uselessly.

Ironically, the hardest lesson for me has been to focus less on organization. Organization should not be your first goal. Using your notes effectively is far more important than them being organized. If my notes were perfectly organized, I would have very few and nearly useless notes.

Daily Notes & The Importance of Review

Daily notes are a dumping ground of thoughts, and a place to link with significant notes – creating a loose record of each day. They are for whims. The goal of a daily note should not be aesthetic, organized, or even complete. I recommend embracing the fleeting nature of days – don’t try too hard to complete task lists created in your daily notes, don’t follow a specific format that you’ll end up feeling bogged down in, don’t feel like a note is required for every day.

The organization comes in first with weekly reviews3. Each week, I make a note summarizing the events from the previous week and create a task lists for things more important than daily whims. These I am currently leaving very freeform, just like daily notes. I haven’t done this very much yet, but the flexibility of not trying to follow a standard seems to be helping so far. Usually, some of these tasks end up including how to manage the rest of my notes. This review process is hierarchical, and cyclical. Every month gets its review based on the previous weeks, quarterly reviews for their months, and yearly reviews for those.

Again, I have only just started this process, so I can’t speak to structure in higher levels of review. Maybe it’ll be useful, maybe it won’t be. It’s important to discover what works for you instead of following a prescription.

The Importance of Forgetting

I have an old task list with thousands of items. It’s unapproachable, unusable, downright silly. Every day, I create more incomplete tasks. I used to view this as a flaw to be solved, but now I view it as a feature. If you are completing your todo lists every day, you’re doing something wrong.4 So many tasks I create are not really needed, and with a finite life, I shouldn’t expect to achieve everything I want to.

Likewise, a lot of strategies for organization and usage of notes focus on making sure everything is findable. This is a noble goal, but one that I think acts against the usefulness of notetaking. The majority of the value in notes is writing them in the first place, so forgetting them shouldn’t be a large concern. (And that’s before even considering psychology and physiology. Forgetting is important for health.)

Footnotes

  1. From text files on Windows XP, to iOS (Apple Notes), RedNotebook Portable, Notepad++, GitHub, Atom, Android/Google Notes (and then Keep).. and finally Obsidian.
  2. A well-written pro/con list based on file hierarchies.
  3. I believe I am stealing this idea from a combination of things emphasized by Bullet Journaling, the Theme System, and the Periodic Notes Obsidian plugin.
  4. I first heard this idea on a recent episode of Cortex, where the importance of the order of a task list was discussed.
  5. This does include Google Keep, and a Google “Notes” app that briefly existed and has since been memory-holed.

(Note: All resources are archived using the services linked to on Archives & Sources.)