3 November 2022

Four options for avoiding an AI cataclysm

Let’s consider four hard truths, and then four options for a solution.

Hard truth 1: Software has bugs.

Even when clever people write the software, and that software passes numerous verification tests, any complex software system generally still has bugs. If the software encounters a circumstance outside its verification suite, it can go horribly wrong.

Hard truth 2: Just because software becomes more powerful, that won’t make all the bugs go away.

Newer software may run faster. It may incorporate input from larger sets of training data. It may gain extra features. But none of these developments mean the automatic removal of subtle errors in the logic of the software, or shortcomings in its specification. It might still reach terrible outcomes – just quicker than before!

Hard truth 3: As AI becomes more powerful, there will be more pressure to deploy it in challenging real-world situations.

Consider the real-time management of:

  • Complex arsenals of missiles, anti-missile missiles, and so on
  • Geoengineering interventions, which are intended to bring the planet’s climate back from the brink of a cascade of tipping points
  • Devious countermeasures against the growing weapons systems of a group (or nation) with a dangerously unstable leadership
  • Social network conversations, where changing sentiments can have big implications for electoral dynamics or for the perceived value of commercial brands
  • Ultra-hot plasmas inside whirling magnetic fields in nuclear fusion energy generators
  • Incentives for people to spend more money than is wise, on addictive gambling sites
  • The buying and selling of financial instruments, to take advantage of changing market sentiments.

In each case, powerful AI software could be a very attractive option. A seductive option. Especially if it has been written by clever people, and appears to have a good track record of delivering results.

Until it goes wrong. In which case the result could be cataclysmic. (Accidental nuclear war. The climate walloped past a tipping point in the wrong direction. Malware going existentially wrong. Partisan outrage propelling a psychological loose cannon over the edge. Easy access to weapons of mass destruction. Etc.)

Indeed, the real risk of AI cataclysm – as opposed to the Hollywood version of any such risk – is that an AI system may acquire so much influence over human society and our surrounding environment that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

  1. Implementation defect
  2. Design defect
  3. Design overridden
  4. Implementation overridden.

Hard truth 4: There are no simple solutions to the risks described above.

What’s more, people who naively assume that a simple solution can easily be put in place (or already exists) are making the overall situation worse. They encourage complacency, whereas greater attention is urgently needed.

But perhaps you disagree?

That’s the context for the conversation in Episode 11 of the London Futurists Podcast, which was published yesterday morning.

In just thirty minutes, that episode dug deep into some of the ideas in my recent book The Singularity Principles. Co-host Calum Chace and I found plenty on which to agree, but had differing opinions on one of the most important questions.

Calum listed three suggestions that people sometimes make for how the dangers of potentially cataclysmic AI might be handled.

In response, I described a different approach – something that Calum said would be a fourth idea for a solution. As you can hear from the recording of the podcast, I evidently left him unconvinced.

Therefore, I’d like to dig even deeper.

Option 1: Humanity gets lucky

It might be the case that AI software that is smart enough, will embody an unshakeable commitment toward humanity having the best possible experience.

Such software won’t miscalculate (after all, it is superintelligent). If there are flaws in how it has been specified, it will be smart enough to notice these flaws, rather than stubbornly following through on the letter of its programming. (After all, it is superintelligent.)

Variants of this wishful thinking exist. In some variants, what will guarantee a positive outcome isn’t just a latent tendency of superintelligence toward superbenevolence. It’s the invisible hand of the free market that will guide consumer choices away from software that might harm users, toward software that never, ever, ever goes wrong.

My response here is that software which appears to be bug free can, nevertheless, harbour deep mistakes. It may be superintelligent, but that doesn’t mean it’s omniscient or infallible.

Second, software which is bug free may be monstrously efficient at doing what some of its designers had in mind – manipulating consumers into actions which increase the share price of a given corporation, despite all the externalities arising.

Moreover, it’s too much of a stretch to say that greater intelligence always makes your wiser and kinder. There are plenty of dreadful counterexamples, from humans in the worlds of politics, crime, business, academia, and more. Who is to say that a piece of software with an IQ equivalent to 100,000 will be sure to treat us humans any better than we humans sometimes treat swarms of insects (e.g. ant colonies) that get in our way?

Do you feel lucky? My view is that any such feeling, in these circumstances, is rash in the extreme.

Option 2: Safety engineered in

Might a team of brilliant AI researchers, Mary and Flo (to make up a couple of names), devise a clever method that will ensure their AI (once it is built) never harms humanity?

Perhaps the answer lies in some advanced mathematical wizardry. Or in chiselling a 21st century version of Asimov’s Laws of Robotics into the chipsets at the heart of computer systems. Or in switching from “correlation logic” to “causation logic”, or some other kind of new paradigm in AI systems engineering.

Of course, I wish Mary and Flo well. But their ongoing research won’t, by itself, prevent lots of other people releasing their own unsafe AI first. Especially when these other engineers are in a hurry to win market share for their companies.

Indeed, the considerable effort being invested by various researchers and organisations in a search for a kind of fix for AI safety is, arguably, a distraction from a sober assessment of the bigger picture. Better technology, better product design, better mathematics, and better hardware can all be part of the full solution. But that full solution also needs, critically, to include aspects of organisational design, economic incentives, legal frameworks, and political oversight. That’s the argument I develop in my book. We ignore these broader forces at our peril.

Option 3: Humans merge with machines

If we can’t beat them, how about joining them?

If human minds are fused into silicon AI systems, won’t the good human sense of these minds counteract any bugs or design flaws in the silicon part of the hybrid formed?

With such a merger in place, human intelligence will automatically be magnified, as AI improves in capability. Therefore, we humans wouldn’t need to worry about being left behind. Right?

I see two big problems with this idea. First, so long as human intelligence is rooted in something like the biology of the brain, the mechanisms for any such merger may only allow relatively modest increases in human intelligence. Our biological brains would be bottlenecks that constrain the speed of progress in this hybrid case. Compared to pure AIs, the human-AI hybrid would, after all, be left behind in this intelligence race. So much for humans staying in control!

An even bigger problem is the realisation that a human with superhuman intelligence is likely to be at least as unpredictable and dangerous as an AI with superhuman intelligence. The magnification of intelligence will allow that superhuman human to do all kinds of things with great vigour – settling grudges, acting out fantasies, demanding attention, pursuing vanity projects, and so on. Recall: power tends to corrupt. Such a person would be able to destroy the earth. Worse, they might want to do so.

Another way to state this point is that, just because AI elements are included inside a person, that won’t magically ensure that these elements become benign, or are subject to the full control of the person’s best intentions. Consider as comparisons what happens when biological viruses enter a person’s body, or when a cancer grows there. In neither case does the intruding element lose its ability to cause damage, just on account of being part of a person who has humanitarian instincts.

This reminds me of the statement that is sometimes heard, in defence of accelerating the capabilities of AI systems: “I am not afraid of artificial intelligence. I am afraid of human stupidity”.

In reality, what we need to fear is the combination of imperfect AI and imperfect humanity.

The conclusion of this line of discussion is that we need to do considerably more than enable greater intelligence. We also need to accelerate greater wisdom – so that any beings with superhuman intelligence will operate truly beneficently.

Option 4: Greater wisdom

The cornerstone insight of ethics is that, just because we can do something, and indeed may even want to do that thing, it doesn’t mean we should do that thing.

Accordingly, human societies since prehistory have placed constraints on how people should behave.

Sometimes, moral sanction is sufficient: people constrain their actions in deference to public opinion. In other cases, restrictions are codified into laws and regulations.

Likewise, just because a corporation could boost its profits by releasing a new version of its AI software, that doesn’t mean it should release that software.

But what is the origin of these “should” imperatives? And how do we resolve conflicts, when two different groups of people champion two different sets of ethical intuitions?

Where can we find a viable foundation for ethical restrictions – something more solid than “we’ve always done things like this” or “this feels right to me” or “we need to submit to the dictates in our favourite holy scripture”?

Welcome to the world of philosophy.

It’s a world that, according to some observers, has made little progress over the centuries. People still argue over fundamentals. Deontologists square off against consequentialists. Virtue ethicists stake out a different position.

It’s a world in which it is easier to poke holes in the views held by others, rather than defending a consistent view of your own.

But it’s my position that the impending threat of cataclysmic AI impels us to reach a wiser agreement.

It’s like how the devastation of the Covid pandemic impelled society to find significantly quicker ways to manufacture, verify, and deploy vaccines.

It’s like how society can come together, remarkably, in a wartime situation, notwithstanding the divisions that previously existed.

In the face of the threats of technology beyond our control, minds should focus, with unprecedented clarity. We’ll gradually build a wider consensus in favour of various restrictions and, yes, in favour of various incentives.

What’s your reaction? Is option 4 simply naïve?

Practical steps forward

Rather than trying to “boil the ocean” of philosophical disputes over contrasting ethical foundations, we can, and should, proceed in a kaizen manner.

To start with, we can give our attention to specific individual questions:

  • What are the circumstances when we should welcome AI-powered facial recognition software, and when should we resist it?
  • What are the circumstances when we should welcome AI systems that supervise aspects of dangerous weaponry?
  • What are the circumstances that could transform AI-powered monitoring systems from dangerous to helpful?

As we reach some tentative agreements on these individual matters, we can take the time to highlight principles with potential wider applicability.

In parallel, we can revisit some of the agreements (explicit and implicit) for how we measure the health of society and the liberties of individuals:

  • The GDP (Gross Domestic Product) statistics that provide a perspective on economic activities
  • The UDHR (Universal Declaration of Human Rights) statement that was endorsed in the United Nations General Assembly in 1948.

I don’t deny it will be hard to build consensus. It will be even harder to agree how to enforce the guidelines arising – especially in light of the wretched partisan conflicts that are poisoning the political processes in a number of parts of the world.

But we must try. And with some small wins under our belt, we can anticipate momentum building.

These are some of the topics I cover in the closing chapters of The Singularity Principles:

I by no means claim to know all the answers.

But I do believe that these are some of the most important questions to address.

And to help us make progress, something that could help us is – you guessed it – AI. In the right circumstances, AI can help us think more clearly, and can propose new syntheses of our previous ideas.

Thus today’s AI can provide stepping stones to the design and deployment of better, safer, wiser AI tomorrow. That’s provided we maintain human oversight.


The image above includes a design by Pixabay user Alexander Antropov, used with thanks.

See also this article by Calum in Forbes, Taking Back Control Of The Singularity.

Blog at WordPress.com.