dw2

23 February 2023

Nuclear-level catastrophe: four responses

36% of respondents agree that it is plausible that AI could produce catastrophic outcomes in this century, on the level of all-out nuclear war.

That’s 36% of a rather special group of people. People who replied to this survey needed to meet the criterion of being a named author on at least two papers published in the last three years in accredited journals in the field of Computational Linguistics (CL) – the field sometimes also known as NLP (Natural Language Processing).

The survey took place in May and June 2022. 327 complete responses were received, by people matching the criteria.

A full report on this survey (31 pages) is available here (PDF).

Here’s a screenshot from page 10 of the report, illustrating the answers to questions about Artificial General Intelligence (AGI):

You can see the responses to question 3-4. 36% of the respondents either “agreed” or “weakly agreed” with the statement that

It is plausible that decisions made by AI or machine learning systems could cause a catastrophe this century that is at least as bad as an all-out nuclear war.

That statistic is a useful backdrop to discussions stirred up in the last few days by a video interview given by polymath autodidact and long-time AGI risk researcher Eliezer Yudkowsky:

The publishers of that video chose the eye-catching title “we’re all gonna die”.

If you don’t want to spend 90 minutes watching that video – or if you are personally alienated by Eliezer’s communication style – here’s a useful twitter thread summary by Liron Shapira:

In contrast to the question posed in the NLP survey I mentioned earlier, Eliezer isn’t thinking about “outcomes of AGI in this century“. His timescales are much shorter. His “ballpark estimate” for the time before AGI arrives is “3-15 years”.

How are people reacting to this sombre prediction?

More generally, what responses are there to the statistic that, as quoted above,

36% of respondents agree that it is plausible that AI could produce catastrophic outcomes in this century, on the level of all-out nuclear war.

I’ve seen a lot of different reactions. They break down into four groups: denial, sabotage, trust, and hustle.

1. Denial

One example of denial is this claim: We’re nowhere near an understanding the magic of human minds. Therefore there’s no chance that engineers are going to duplicate that magic in artificial systems.

I have two counters:

  1. The risks of AGI arise, not because the AI may somehow become sentient, and take on the unpleasant aspects of alpha male human nature. Rather, the risks arise from systems that operate beyond our understanding and outside our control, and which may end up pursuing objectives different from the ones we thought (or wished) we had programmed into them
  2. Many systems have been created over the decades without the underlying science being fully understood. Steam engines predated the laws of thermodynamics. More recently, LLMs (Large Language Model AIs) have demonstrated aspects of intelligence that the designers of these systems had not anticipated. In the same way, AIs with some extra features may unexpectedly tip over into greater general intelligence.

Another example of denial: Some very smart people say they don’t believe that AGI poses risks. Therefore we don’t need to pay any more attention to this stupid idea.

My counters:

  1. The mere fact that someone very smart asserts an idea – likely outside of their own field of special expertise – does not confirm the idea is correct
  2. None of these purported objections to the possibility of AGI risk holds water (for a longer discussion, see my book The Singularity Principles).

Digging further into various online discussion threads, I caught the impression that what was motivating some of the denial was often a terrible fear. The people loudly proclaiming their denial were trying to cope with depression. The thought of potential human extinction within just 3-15 years was simply too dreadful for them to contemplate.

It’s similar to how people sometimes cope with the death of someone dear to them. There’s a chance my dear friend has now been reunited in an afterlife with their beloved grandparents, they whisper to themselves. Or, It’s sweet and honourable to die for your country: this death was a glorious sacrifice. And then woe betide any uppity humanist who dares to suggests there is no afterlife, or that patriotism is the last refuge of a scoundrel!

Likewise, woe betide any uppity AI risk researcher who dares to suggest that AGI might not be so benign after all! Deny! Deny!! Deny!!!

(For more on this line of thinking, see my short chapter “The Denial of the Singularity” in The Singularity Principles.)

A different motivation for denial is the belief that any sufficient “cure” to the risk of AGI catastrophe would be worse than the risk it was trying to address. This line of thinking goes as follows:

  • A solution to AGI risk will involve pervasive monitoring and widespread restrictions
  • That monitoring and restrictions will only be possible if an autocratic world government is put in place
  • Any autocratic world government would be absolutely terrible
  • Therefore, the risk of AGI can’t be that bad after all.

I’ll come back later to the flaws in that particular argument. (In the meantime, see if you can spot what’s wrong.)

2. Sabotage

In the video interview, Eliezer made one suggestion for avoiding AGI catastrophe: Destroy all the GPU server farms.

These vast collections of GPUs (a special kind of computing chip) are what enables the training of many types of AI. If these chips were all put out of action, it would delay the arrival of AGI, giving humanity more time to work out a better solution to coexisting with AGI.

Another suggestion Eliezer makes is that the superbright people who are currently working flat out to increase the capabilities of their AI systems should be paid large amounts of money to do nothing. They could lounge about on a beach all day, and still earn more money than they are currently receiving from OpenAI, DeepMind, or whoever is employing them. Once again, that would slow down the emergence of AGI, and buy humanity more time.

I’ve seen other similar suggestions online, which I won’t repeat here, since they come close to acts of terrorism.

All these suggestions have in common: let’s find ways to stop the development of AI in its tracks, all across the world. Companies should be stopped in their tracks. Shadowy military research groups should be stopped in their tracks. Open source hackers should be stopped in their tracks. North Korean ransomware hackers must be stopped in their tracks.

This isn’t just a suggestion that specific AI developments should be halted, namely those with an explicit target of creating AGI. Instead, it recognises that the creation of AGI might occur via unexpected routes. Improving the performance of various narrow AI systems, including fact-checking, or emotion recognition, or online request interchange marketplaces – any of these might push the collection of AI modules over the critical threshold. Mixing metaphors, AI could go nuclear.

Shutting down all these research activities seems a very tall order. Especially since many of the people who are currently working flat out to increase AI capabilities are motivated, not by money, but by the vision that better AI could do a tremendous amount of good in the world: curing cancer, solving nuclear fusion, improving agriculture by leaps and bounds, and so on. They’re not going to be easy to persuade to change course. For them, there’s a lot more at stake than money.

I have more to say about the question “To AGI or not AGI” in this chapter. In short, I’m deeply sceptical.

In response, a would-be saboteur may admit that their chances of success are low. But what do you suggest instead, they will ask.

Read on.

3. Trust

Let’s start again from the statistic that 36% of the NLP survey respondents agreed, with varying degrees of confidence, that advanced AI could trigger a catastrophe as bad as an all-out nuclear war some time this century.

It’s a pity that the question wasn’t asked with shorter timescales. Comparing the chances of an AI-induced global catastrophe in the next 15 years with one in the next 85 years:

  • The longer timescale makes it more likely that AGI will be developed
  • The shorter timescale makes it more likely that AGI safety research will still be at a primitive (deeply ineffective) level.

Even since the date of the survey – May and June 2022 – many forecasters have shortened their estimates of the likely timeline to the arrival of AGI.

So, for the sake of the argument, let’s suppose that the risk of an AI-induced global catastrophe happening by 2038 (15 years from now) is 1/10.

There are two ways to react to this:

  • 1/10 is fine odds. I feel lucky. What’s more, there are plenty of reasons we ought to feel lucky about
  • 1/10 is terrible odds. That’s far too high a risk to accept. We need to hustle to find ways to change these odds in our favour.

I’ll come to the hustle response in a moment. But let’s first consider the trust response.

A good example is in this comment from SingularityNET founder and CEO Ben Goertzel:

Eliezer is a very serious thinker on these matters and was the core source of most of the ideas in Nick Bostrom’s influential book Superintelligence. But ever since I met him, and first debated these issues with him,  back in 2000 I have felt he had a somewhat narrow view of humanity and the universe in general.   

There are currents of love and wisdom in our world that he is not considering and seems to be mostly unaware of, and that we can tap into by creating self reflective compassionate AGIs and doing good loving works together with them.

In short, rather than fearing humanity, we should learn to trust humanity. Rather than fearing what AGI will do, we should trust that AGI can do wonderful things.

You can find a much longer version of Ben’s views in the review he wrote back in 2015 of Superintelligence. It’s well worth reading.

What are the grounds for hope? Humanity has come through major challenges in the past. Even though the scale of the challenge is more daunting on this occasion, there are also more people contributing ideas and inspiration than before. AI is more accessible than nuclear weapons, which increases the danger level, but AI could also be deployed as part of the solution, rather than just being a threat.

Another idea is that if an AI looks around for data teaching it which values to respect and uphold, it will find plenty of positive examples in great human literature. OK, that literature also includes lots of treachery, and different moral codes often conflict, but a wise AGI should be able to see through all these conclusions to discern the importance of defending human flourishing. OK, much of AI training at the moment focuses on deception, manipulation, enticement, and surveillance, but, again, we can hope that a wise AGI will set aside those nastier aspects of human behaviour. Rather than aping trolls or clickbait, we can hope that AGI will echo the better angels of human nature.

It’s also possible that, just as DeepMind’s AlphaGo Zero worked out by itself, without any human input, superior strategies at the board games Go and Chess, a future AI might work out, by itself, the principles of universal morality. (That’s assuming such principles exist.)

We would still have to hope, in such a case, that the AI that worked out the principles of universal morality would decide to follow these principles, rather than having some alternative (alien) ways of thinking.

But surely hope is better than despair?

To quote Ben Goertzel again:

Despondence is unwarranted and unproductive. We need to focus on optimistically maximizing odds of a wildly beneficial Singularity together.   

My view is the same as expressed by Berkeley professor of AI Stuart Russell, in part of a lengthy exchange with Steven Pinker on the subject of AGI risks:

The meta argument is that if we don’t talk about the failure modes, we won’t be able to address them…

Just like in nuclear safety, it’s not against the rules to raise possible failure modes like, what if this molten sodium that you’re proposing should flow around all these pipes? What if it ever came into contact with the water that’s on the turbine side of the system? Wouldn’t you have a massive explosion which could rip off the containment and so on? That’s not exactly what happened in Chernobyl, but not so dissimilar…

The idea that we could solve that problem without even mentioning it, without even talking about it and without even pointing out why it’s difficult and why it’s important, that’s not the culture of safety. That’s sort of more like the culture of the communist party committee in Chernobyl, that simply continued to assert that nothing bad was happening.

(By the way, my sympathies in that long discussion, when it comes to AGI risk, are approximately 100.0% with Russell and approximately 0.0% with Pinker.)

4. Hustle

The story so far:

  • The risks are real (though estimates of their probability vary)
  • Some possible “solutions” to the risks might produce results that are, by some calculations, worse than letting AGI take its own course
  • If we want to improve our odds of survival – and, indeed, for humanity to reach something like a sustainable superabundance with the assistance of advanced AIs – we need to be able to take a clear, candid view of the risks facing us
  • Being naïve about the dangers we face is unlikely to be the best way forward
  • Since time may be short, the time to press for better answers is now
  • We shouldn’t despair. We should hustle.

Some ways in which research could generate useful new insight relatively quickly:

  • When the NLP survey respondents expressed their views, what reasons did they have for disagreeing with the statement? And what reasons did they have for agreeing with it? And how do these reasons stand up, in the cold light of a clear analysis? (In other words, rather than a one-time survey, an iterative Delphi survey should lead to deeper understanding.)
  • Why have the various AI safety initiatives formed in the wake of the Puerto Rico and Asilomar conferences of 2015 and 2017 fallen so far short of expectations?
  • Which descriptions of potential catastrophic AI failure modes are most likely to change the minds of those critics who currently like to shrug off failure scenarios as “unrealistic” or “Hollywood fantasy”?

Constructively, I invite conversation on the strengths and weaknesses of the 21 Singularity Principles that I have suggested as contributing to improving the chances of beneficial AGI outcomes.

For example:

  • Can we identify “middle ways” that include important elements of global monitoring and auditing of AI systems, without collapsing into autocratic global government?
  • Can we improve the interpretability and explainability of advanced AI systems (perhaps with the help of trusted narrow AI tools), to diminish the risks of these systems unexpectedly behaving in ways their designers failed to anticipate?
  • Can we deepen our understanding of the ways new capabilities “emerge” in advanced AI systems, with a particular focus on preventing the emergence of alternative goals?

I also believe we should explore more fully the possibility that an AGI will converge on a set of universal values, independent of whatever training we provide it – and, moreover, the possibility that these values will include upholding human flourishing.

And despite me saying just now that these values would be “independent of whatever training we provide”, is there, nevertheless, a way for us to tilt the landscape so that the AGI is more likely to reach and respect these conclusions?

Postscript

To join me in “camp hustle”, visit Future Surge, which is the activist wing of London Futurists.

If you’re interested in the ideas of my book The Singularity Principles, here’s a podcast episode in which Calum Chace and I discuss some of these ideas more fully.

In a subsequent episode of our podcast, Calum and I took another look at the same topics, this time with Millennium Project Executive Director Jerome Glenn: “Governing the transition to AGI”.

3 November 2022

Four options for avoiding an AI cataclysm

Let’s consider four hard truths, and then four options for a solution.

Hard truth 1: Software has bugs.

Even when clever people write the software, and that software passes numerous verification tests, any complex software system generally still has bugs. If the software encounters a circumstance outside its verification suite, it can go horribly wrong.

Hard truth 2: Just because software becomes more powerful, that won’t make all the bugs go away.

Newer software may run faster. It may incorporate input from larger sets of training data. It may gain extra features. But none of these developments mean the automatic removal of subtle errors in the logic of the software, or shortcomings in its specification. It might still reach terrible outcomes – just quicker than before!

Hard truth 3: As AI becomes more powerful, there will be more pressure to deploy it in challenging real-world situations.

Consider the real-time management of:

  • Complex arsenals of missiles, anti-missile missiles, and so on
  • Geoengineering interventions, which are intended to bring the planet’s climate back from the brink of a cascade of tipping points
  • Devious countermeasures against the growing weapons systems of a group (or nation) with a dangerously unstable leadership
  • Social network conversations, where changing sentiments can have big implications for electoral dynamics or for the perceived value of commercial brands
  • Ultra-hot plasmas inside whirling magnetic fields in nuclear fusion energy generators
  • Incentives for people to spend more money than is wise, on addictive gambling sites
  • The buying and selling of financial instruments, to take advantage of changing market sentiments.

In each case, powerful AI software could be a very attractive option. A seductive option. Especially if it has been written by clever people, and appears to have a good track record of delivering results.

Until it goes wrong. In which case the result could be cataclysmic. (Accidental nuclear war. The climate walloped past a tipping point in the wrong direction. Malware going existentially wrong. Partisan outrage propelling a psychological loose cannon over the edge. Easy access to weapons of mass destruction. Etc.)

Indeed, the real risk of AI cataclysm – as opposed to the Hollywood version of any such risk – is that an AI system may acquire so much influence over human society and our surrounding environment that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

  1. Implementation defect
  2. Design defect
  3. Design overridden
  4. Implementation overridden.

Hard truth 4: There are no simple solutions to the risks described above.

What’s more, people who naively assume that a simple solution can easily be put in place (or already exists) are making the overall situation worse. They encourage complacency, whereas greater attention is urgently needed.

But perhaps you disagree?

That’s the context for the conversation in Episode 11 of the London Futurists Podcast, which was published yesterday morning.

In just thirty minutes, that episode dug deep into some of the ideas in my recent book The Singularity Principles. Co-host Calum Chace and I found plenty on which to agree, but had differing opinions on one of the most important questions.

Calum listed three suggestions that people sometimes make for how the dangers of potentially cataclysmic AI might be handled.

In response, I described a different approach – something that Calum said would be a fourth idea for a solution. As you can hear from the recording of the podcast, I evidently left him unconvinced.

Therefore, I’d like to dig even deeper.

Option 1: Humanity gets lucky

It might be the case that AI software that is smart enough, will embody an unshakeable commitment toward humanity having the best possible experience.

Such software won’t miscalculate (after all, it is superintelligent). If there are flaws in how it has been specified, it will be smart enough to notice these flaws, rather than stubbornly following through on the letter of its programming. (After all, it is superintelligent.)

Variants of this wishful thinking exist. In some variants, what will guarantee a positive outcome isn’t just a latent tendency of superintelligence toward superbenevolence. It’s the invisible hand of the free market that will guide consumer choices away from software that might harm users, toward software that never, ever, ever goes wrong.

My response here is that software which appears to be bug free can, nevertheless, harbour deep mistakes. It may be superintelligent, but that doesn’t mean it’s omniscient or infallible.

Second, software which is bug free may be monstrously efficient at doing what some of its designers had in mind – manipulating consumers into actions which increase the share price of a given corporation, despite all the externalities arising.

Moreover, it’s too much of a stretch to say that greater intelligence always makes your wiser and kinder. There are plenty of dreadful counterexamples, from humans in the worlds of politics, crime, business, academia, and more. Who is to say that a piece of software with an IQ equivalent to 100,000 will be sure to treat us humans any better than we humans sometimes treat swarms of insects (e.g. ant colonies) that get in our way?

Do you feel lucky? My view is that any such feeling, in these circumstances, is rash in the extreme.

Option 2: Safety engineered in

Might a team of brilliant AI researchers, Mary and Flo (to make up a couple of names), devise a clever method that will ensure their AI (once it is built) never harms humanity?

Perhaps the answer lies in some advanced mathematical wizardry. Or in chiselling a 21st century version of Asimov’s Laws of Robotics into the chipsets at the heart of computer systems. Or in switching from “correlation logic” to “causation logic”, or some other kind of new paradigm in AI systems engineering.

Of course, I wish Mary and Flo well. But their ongoing research won’t, by itself, prevent lots of other people releasing their own unsafe AI first. Especially when these other engineers are in a hurry to win market share for their companies.

Indeed, the considerable effort being invested by various researchers and organisations in a search for a kind of fix for AI safety is, arguably, a distraction from a sober assessment of the bigger picture. Better technology, better product design, better mathematics, and better hardware can all be part of the full solution. But that full solution also needs, critically, to include aspects of organisational design, economic incentives, legal frameworks, and political oversight. That’s the argument I develop in my book. We ignore these broader forces at our peril.

Option 3: Humans merge with machines

If we can’t beat them, how about joining them?

If human minds are fused into silicon AI systems, won’t the good human sense of these minds counteract any bugs or design flaws in the silicon part of the hybrid formed?

With such a merger in place, human intelligence will automatically be magnified, as AI improves in capability. Therefore, we humans wouldn’t need to worry about being left behind. Right?

I see two big problems with this idea. First, so long as human intelligence is rooted in something like the biology of the brain, the mechanisms for any such merger may only allow relatively modest increases in human intelligence. Our biological brains would be bottlenecks that constrain the speed of progress in this hybrid case. Compared to pure AIs, the human-AI hybrid would, after all, be left behind in this intelligence race. So much for humans staying in control!

An even bigger problem is the realisation that a human with superhuman intelligence is likely to be at least as unpredictable and dangerous as an AI with superhuman intelligence. The magnification of intelligence will allow that superhuman human to do all kinds of things with great vigour – settling grudges, acting out fantasies, demanding attention, pursuing vanity projects, and so on. Recall: power tends to corrupt. Such a person would be able to destroy the earth. Worse, they might want to do so.

Another way to state this point is that, just because AI elements are included inside a person, that won’t magically ensure that these elements become benign, or are subject to the full control of the person’s best intentions. Consider as comparisons what happens when biological viruses enter a person’s body, or when a cancer grows there. In neither case does the intruding element lose its ability to cause damage, just on account of being part of a person who has humanitarian instincts.

This reminds me of the statement that is sometimes heard, in defence of accelerating the capabilities of AI systems: “I am not afraid of artificial intelligence. I am afraid of human stupidity”.

In reality, what we need to fear is the combination of imperfect AI and imperfect humanity.

The conclusion of this line of discussion is that we need to do considerably more than enable greater intelligence. We also need to accelerate greater wisdom – so that any beings with superhuman intelligence will operate truly beneficently.

Option 4: Greater wisdom

The cornerstone insight of ethics is that, just because we can do something, and indeed may even want to do that thing, it doesn’t mean we should do that thing.

Accordingly, human societies since prehistory have placed constraints on how people should behave.

Sometimes, moral sanction is sufficient: people constrain their actions in deference to public opinion. In other cases, restrictions are codified into laws and regulations.

Likewise, just because a corporation could boost its profits by releasing a new version of its AI software, that doesn’t mean it should release that software.

But what is the origin of these “should” imperatives? And how do we resolve conflicts, when two different groups of people champion two different sets of ethical intuitions?

Where can we find a viable foundation for ethical restrictions – something more solid than “we’ve always done things like this” or “this feels right to me” or “we need to submit to the dictates in our favourite holy scripture”?

Welcome to the world of philosophy.

It’s a world that, according to some observers, has made little progress over the centuries. People still argue over fundamentals. Deontologists square off against consequentialists. Virtue ethicists stake out a different position.

It’s a world in which it is easier to poke holes in the views held by others, rather than defending a consistent view of your own.

But it’s my position that the impending threat of cataclysmic AI impels us to reach a wiser agreement.

It’s like how the devastation of the Covid pandemic impelled society to find significantly quicker ways to manufacture, verify, and deploy vaccines.

It’s like how society can come together, remarkably, in a wartime situation, notwithstanding the divisions that previously existed.

In the face of the threats of technology beyond our control, minds should focus, with unprecedented clarity. We’ll gradually build a wider consensus in favour of various restrictions and, yes, in favour of various incentives.

What’s your reaction? Is option 4 simply naïve?

Practical steps forward

Rather than trying to “boil the ocean” of philosophical disputes over contrasting ethical foundations, we can, and should, proceed in a kaizen manner.

To start with, we can give our attention to specific individual questions:

  • What are the circumstances when we should welcome AI-powered facial recognition software, and when should we resist it?
  • What are the circumstances when we should welcome AI systems that supervise aspects of dangerous weaponry?
  • What are the circumstances that could transform AI-powered monitoring systems from dangerous to helpful?

As we reach some tentative agreements on these individual matters, we can take the time to highlight principles with potential wider applicability.

In parallel, we can revisit some of the agreements (explicit and implicit) for how we measure the health of society and the liberties of individuals:

  • The GDP (Gross Domestic Product) statistics that provide a perspective on economic activities
  • The UDHR (Universal Declaration of Human Rights) statement that was endorsed in the United Nations General Assembly in 1948.

I don’t deny it will be hard to build consensus. It will be even harder to agree how to enforce the guidelines arising – especially in light of the wretched partisan conflicts that are poisoning the political processes in a number of parts of the world.

But we must try. And with some small wins under our belt, we can anticipate momentum building.

These are some of the topics I cover in the closing chapters of The Singularity Principles:

I by no means claim to know all the answers.

But I do believe that these are some of the most important questions to address.

And to help us make progress, something that could help us is – you guessed it – AI. In the right circumstances, AI can help us think more clearly, and can propose new syntheses of our previous ideas.

Thus today’s AI can provide stepping stones to the design and deployment of better, safer, wiser AI tomorrow. That’s provided we maintain human oversight.

Footnotes

The image above includes a design by Pixabay user Alexander Antropov, used with thanks.

See also this article by Calum in Forbes, Taking Back Control Of The Singularity.

Blog at WordPress.com.