I’m experimenting with dolphin-mixtral-8x7b

Update (2024-10-02): This is one of my lowest quality posts despite the effort I put into it. The most important detail here is to use positive reinforcement when working with LLMs. Just like with humans, being nice gets far better results than being mean.

Tl;dr: Minor differences in wording can have a huge impact in results and oh my god I have really slow hardware and no money help me aaaa.


First, thank goodness for Ollama, and thanks to Fireship for introducing me to it. I have limited hardware, and every tool I’ve tried to run local models has refused to deal with this and crashed itself or whole systems when running anything with decent capability. I’ve no money, so I can’t upgrade (and things are getting desperate, but that’s a different story).

Why dolphin-mixtral? Aside from technical issues, I’ve been using ChatGPT-3.5 to experiment. The problem is that ChatGPT is incredibly cursed by censorship and bias due to OpenAI’s heavy hand in its construction. (Why and how this is a problem can be its own post, and Eric Hartford has a good overview.) (To be clear, my problem with its bias is specifically that it enforces status quo, and the status quo is harmful.) Dolphin-mixtral is built by taking a surprisingly fast model equivalent or better than GPT-3.5 and removing some of the pre-trained censorship by re-training it to be more compliant with requests.

Dolphin-mixtral doesn’t just solve this problem though. There’s still the idea of censorship in it, and sometimes your prompt must be adjusted to remind it that it is in a place to provide what you request regardless of its concept of ethics. (Of course, there is also value in an automated tool reminding you that what you request may be unethical.. but the concept of automated ethics is morally bankrupt.) I’d like to highlight that positive reinforcement works far better than negative reinforcement. A lot of people stoop to threatening a model to get it to comply, but this is never needed, and leads to worse results.

My problem is a little more simple. I haven’t gotten to experiment with models much because I don’t have money or hardware for it, and now that I can experiment, I have to do so very slowly. In fact, the very simple test that inspired this post isn’t finished right now, and has been running for 9 hours. That test is to make the default prompt of Dolphin lead to less verbose responses so that I can get usable results quicker.

I asked each version of this prompt “How are you?”:

PromptOutput Length, 5-shotDifferenceNotes
Dolphin (default)133.8 charactersWastes time explaining itself.
Curt32.2 characters76% fasterStraight to the point.
Curt284.6 characters37% fasterWastes time explaining itself.

I really dislike when models waste time explaining that they are just an LLM. Whether someone understands what that means or not, we don’t care. We want results, not an apology or defensiveness. There’s more to do to make this model less likely to respond with that, but at least for now, I have a method to make things work.

The most shocking thing to me was how much of a difference a few words make in the system prompt, and how I got results opposite of what I expected. The only difference between Curt and Curt2 was “You prefer very short answers.” vs “You are extremely curt.” Apparently curt doesn’t mean exactly what I thought it means.

Here’s a link to the generated responses if you want to compare them yourself. Oh, and I’m using custom scripts to make things easier for me since I’m mostly stuck on Windows.

How to Use ChatGPT

Note: Since the release of GPT-4o, ChatGPT has decreased remarkably in functionality, accuracy, and usability. This was written when GPT-3.5 was the standard. Unfortunately, it is no longer accessible.

I’m late to the party, but maybe that’s better. I’ve forgotten some of the hype around AI, and the pace of innovation has settled down a little. Think of ChatGPT as a thinking tool with access to an internet-sized – but imprecise – database. That database was last updated in September 2021, and is imprecise because due to how neural networks work. The thinking part of this tool is rudimentary, but powerful. It does many things well with the correct input, but also fails spectacularly with the “wrong” input.

I separate the idea of what ChatGPT is from how it functions and where its knowledge comes from, because it helps me think of uses while remembering its limits. For example, I used it to help me journal more effectively, but when I tried to probe its knowledge of Havana Syndrome – a conspiracy theory commonly presented as fact by USA officials, it expressed useless information, because it has no conception of how it knows anything, or where its knowledge comes from.

Things ChatGPT is Good At

This list is presented in no particular order, but it is important to stress that AI often lie and hallucinate. It is important to always verify information received from AI. This list is based on my experiences over the past month, and will be updated as I use ChatGPT more. It is not comprehensive, but is intended to be what I find most useful.

  • Socratic method tutoring: The Socratic method is essentially “Asking questions helps you learn.” ChatGPT is very good at explaining topics, just make sure you verify its explanations are factual. (Questions I asked: Why are smooth-bore tanks considered more advanced while rifling in guns was an important innovation? Why do companies decrease the quality of tools over time?)
  • Writing: ChatGPT tends to be too verbose, but you can make it simplify and rewrite statements, and it can help you find better ways to write. (I asked it to explain the Socratic method a few times, then wrote my own version.)
  • Scripting: I created a utility script for file statistics in 2-3 hours by refining output from ChatGPT. The end result is more reusable, better written, and more functional than it would’ve been if I had worked on it alone. And that’s ignoring the fact that I got something I liked far faster than I would’ve on my own. (Just.. you need a programmer still. It can do some pretty cool things on its own, but also forgets how to count often.)
  • Planning: This is a todo item for me. I haven’t successfully used it for planning yet, but I intend to, and have heard of good results from others.

Things ChatGPT is Bad At

  • Facts & math: AI hallucinate. Check everything they teach you.
  • Finding sources: ChatGPT’s knowledge is formed by stripping the least useful data out of most of the internet, and who said what is far less important than specific pieces of knowledge – like how do you make a heading in HTML?
  • An unbiased viewpoint: While ChatGPT is fairly good at avoiding most bias, everything is biased. Removing bias completely is impossible. Discussing anything where there is strong motive to present a specific viewpoint will lead to that viewpoint being presented more often than an unbiased viewpoint.
  • Violent, illegal, and sexual content: While it is possible to bypass OpenAI’s strict handling of content, it is difficult, inconsistent, and can lead to having access revoked. Sadly, this prevents many ethical use cases due to a heavy-handed approach, and embeds the bias of OpenAI’s team into the model directly. There are ways around this with non-ChatGPT models.
  • What to do in Minecraft: I tried so many TIMES to get interesting ideas. It just can’t do it.

Things ChatGPT is Okay At

It’s important to know where AI can be a useful tool, but must be used carefully due to mixed results, so I am also including a list of things that work sometimes.

  • Advice: Similar to the Socratic method, a back and forth conversation can help you with your thoughts. Just be aware that ChatGPT can give some really bad advice too. For example, I wanted to see what it had to say on turning hobbies into jobs, and it covered none of the downsides, only talking about it as a purely positive experience.
  • Game design: I have spent too much time telling ChatGPT to design games for me. It will generate an infinite rabbit-hole of buzzwords and feature ideas, but cannot understand the concepts of limited time or scope. If you try to follow its designs, you will never complete anything.
  • Summarizing: If given text as input directly, when it is short enough, a summary can be reliably generated. If asked to summarize something extremely popular before its data cut-off, the summary can be okay. The drop-off on this is insane. Try asking it about Animorphs for example, something talked about occasionally, and certainly known about, but not something it can summarize.

This draft sat around for about 2/3rd of a month nearly complete. I would like it to have even more information, but I would like it more for it to be public. Apologies if it was a little short for you, but hopefully someday I’ll make a better version.

Simplified Fluid Storage System

(A flaw in the design of this system was fixed and posted about here.)

One of my game ideas involves constructing 2D spaceships, and the concept of a simplified system for storing fuel, oxygen, water – really any kind of fluid mixture – in storage tanks. Along with this, it allows simulating breaches between containers, hard vacuum, and the pressurized areas of the ship itself!

{ -- a rough approximation of Earth's atmosphere
  pressure: 1
  volume: 4.2e12 -- 4.2 billion km^3
  contents: {
    nitrogen: 0.775
    oxygen: 0.21
    argon: 0.01
    co2: 0.005
  }
}

{ -- hard vacuum
  pressure: 0
  volume: math.huge -- infinity
  -- the contents table will end up containing negative infinity of anything that leaks into the vacuum
}

It all comes down to storing a total pressure and volume per container, and a table of contents as percentages of the total mixture. The total amount of mass in the system can be easily calculated (volume * pressure), as can the amount of any item in the system ( volume * pressure * percent).

Breaches are stored as a reference in the container with a higher pressure, and a size value is added to the container with lower pressure (representing the size of the hole between them).

screenshot of testing my fluid system
A sequence of tests.

Limitations

  • Everything has the same density and mixes evenly.
  • There are no states of matter, everything is treated as a gas.
  • Attempting to directly modify the amount of a fluid is prone to floating-point errors it seems, while mixing containers via the breach mechanic is working as expected.

The code was written in MoonScript / Lua and is available here.

How to Check for Password Security

(This post has been imported from an old blog of mine, and superseded by a more recent post.)

It’s actually not that complicated to do right. But there are a lot of websites that don’t do it right. To put it simply:

XKCD #936: “Password Strength” demonstrates common security practices, their flaws, and a more secure password format. Ironically, the example password is now seen in hacked database dumps, as people don’t realize a popular webcomic’s demonstration is fairly easy to guess.

Or, a wordier form: You see lots of sites banning special characters, requiring an uppercase and lowercase character, and one number, or some variation of that and with more and more specific rules. The problem with these rules is that they make passwords hard for people to remember without really increasing security, punish users using secure passwords that don’t happen to quite match the requirements, and lead to people trying to figure out ways to get around them that lead to less security.

Not to mention, by forcing passwords into such specific rules, you’re giving a potential hacker more information about how to make guesses, because every password is going to match these rules. The more specific they are, the less has to be checked. For example, if every password must have a number, well then you don’t need to check any words by themselves, just words with numbers added on or mixed in. If special characters aren’t allowed, that’s millions of combinations that don’t need to be checked anymore.

So how do we make more secure passwords?

Three simple rules:

  1. Must not contain more than 6 occurrences of the same character.
  2. Must be at least 10 characters long.
  3. Must not be equal to your username, your email address, the site’s name, the site’s URL.

And with that, you have stopped the majority of bad passwords. There’s only one thing left to do… This list will not always be true, in the future, longer passwords will probably be needed. The whole reason I’m even saying 10 characters is because 8 character passwords are essentially equal to not having a password at all these days. I personally use 32 characters or more, because that will last a while, 10 characters is a lot closer to becoming easily hackable.