(Top)

AI models are unpredictable digital brains

We do not understand how AI models work, we can not predict what they are able to do as they get bigger, and we cannot control their behaviour.

Modern AI models are grown, not programmed

Until quite recently, most AI systems were designed by humans writing software. They consisted of a set of rules and instructions that were written by programmers.

This changed when machine learning became popular. Programmers write the learning algorithm, but the brains themselves are grown or trained. Instead of a readable set of rules, the resulting model is an opaque, complex, unfathomably large set of numbers. Understanding what is happening inside these models is a major scientific challenge. That field is called interpretability and it’s still in its infancy.

Unpredictable scaling

When these digital brains become larger, or when they’re fed more data, they get more capabilities. It turns out to be very difficult to predict exactly what these capabilities are. This is why Google refers to them as Emergent Capabilities . For most capabilities, this is not a problem. However, there are some dangerous capabilities (like hacking or bioweapon design) that we don’t want AI models to possess. Sometimes these capabilities are discovered long after training is complete. For example, 18 months after the GPT-4 finished training, researchers discovered that it can autonomously hack websites .

Until we go train that model, it’s like a fun guessing game for us

Unpredictable behavior

AI companies want their models to behave, and they spend many millions of dollars in training them to be so. Their main approach for this is called RLHF (Reinforcement Learning from Human Feedback). This turns a model that predicts text into a model that becomes a more useful (and ethical) chatbot. Unfortunately, this approach is flawed:

Even OpenAI does not expect this approach to scale up as their digital brains become smarter - it “could scale poorly to superhuman models” .

Uncontrollable

“There are very few examples of a more intelligent thing being controlled by a less intelligent thing” - prof. Geoffrey Hinton

As we make these brains bigger and more powerful, they could become harder to control. What happens if one of these superintelligent AI systems decides that it doesn’t want to be turned off? This isn’t some fantasy problem - 86% of AI researchers believe that the control problem is real and important . If we cannot control future AI systems, it could be game over for humanity .

Let’s prevent that from happening .