dw2

17 November 2024

Preventing unsafe superintelligence: four choices

More and more people have come to the conclusion that artificial superintelligence (ASI) could, in at least some circumstances, pose catastrophic risks to the wellbeing of billions of people around the world, and that, therefore, something must be done to reduce these risks.

However, there’s a big divergence of views about what should be done. And there’s little clarity about the underlying assumptions on which different strategies depend.

Accordingly, I seek in this article to untangle some of choices that need to be made. I’ll highlight four choices that various activists promote.

The choices differ regarding the number of different organisations worldwide that are envisioned as being legally permitted to develop and deploy what could become ASI. The four choices are:

  1. Accept that many different organisations will each pursue their own course toward ASI, but urge each of them to be very careful and to significantly increase the focus on AI safety compared to the present situation
  2. Seek to restrict to just one organisation in the world any developments that could lead to ASI; that’s in order to avoid dangerous competitive race dynamics if there is more than one such organisation
  3. Seek agreements that will prevent any organisation, anywhere in the world, from taking specific steps that might bring about ASI, until such time as it has become absolutely clear how to ensure that ASI is safe
  4. Seek a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

For simplicity, these choices can be labelled as:

  1. Be careful with ASI
  2. Restrict ASI
  3. Pause ASI
  4. Pause all new AI

It’s a profound decision for humanity to take. Which of the four doors should we open, and which of four corridors should we walk down?

Each of the four choices relies on some element of voluntary cooperation, arising out of enlightened self-interest, and on some element of compulsion – that is, national and international governance, backed up by sanctions and other policies.

What makes this decision hard is that there are strong arguments against each choice.

The case against option 1, “Be careful with ASI”, is that at least some organisations (including commercial entities and military groups) are likely to cut corners with their design and testing. They don’t want to lose what they see as a race with existential consequences. The organisations that are being careful will lose their chance of victory. The organisations that are, instead, proceeding gung ho, with lesser care, may imagine that they will fix any problems with their AIs when these flaws become apparent – only to find that there’s no way back from one particular catastrophic failure.

As Sam Altman, CEO of OpenAI, has said: it will be “lights out for all of us”.

The case against each of the remaining three options is twofold:

  • First, in all three cases, they will require what seems to be an impossible degree of global cooperation – which will need to be maintained for an implausibly long period of time
  • Second, such restrictions will stifle the innovative development of the very tools (that is, advanced AI) which will actually solve existential problems (including the threat of rogue ASI, as well as the likes of climate change, cancer, and aging), rather than making these problems worse.

The counter to these objections is to make the argument that a sufficient number of the world’s most powerful countries will understand the rationale for such an agreement, as something that is in their mutual self-interest, regardless of the many other differences that divide them. That shared understanding will propel them:

  • To hammer out an agreement (probably via a number of stages), despite undercurrents of mistrust,
  • To put that agreement into action, alongside measures to monitor conformance, and
  • To prevent other countries (who have not yet signed up to the agreement) from breaching its terms.

Specifically, the shared understanding will cover seven points:

  1. For each of the countries involved, it is in their mutual self-interest to constrain the development and deployment of what could become catastrophically dangerous ASI; that is, there’s no point in winning what will be a suicide race
  2. The major economic and humanitarian benefits that they each hope could be delivered by advanced AI (including solutions to other existential risk), can in fact be delivered by passive AIs which are restricted from reaching the level of ASI
  3. There already exist a number of good ideas regarding potential policy measures (regulations and incentives) which can be adopted, around the world, to prevent the development and deployment of catastrophically dangerous AI – for example, measures to control the spread and use of vast computing resources
  4. There also exist a number of good ideas regarding options for monitoring and auditing which can also be adopted, around the world, to ensure the strict application of the agreed policy measures – and to prevent malign action by groups or individuals that have, so far, failed to sign up to the policies
  5. All of the above can be achieved without any detrimental loss of individual sovereignty: the leaders of these countries can remain masters within their own realms, as they desire, provided that the above basic AI safety framework is adopted and maintained
  6. All of the above can be achieved in a way that supports evolutionary changes in the AI safety framework as more insight is obtained; in other words, this system can (and must) be agile rather than static
  7. Even though the above safety framework is yet to be fully developed and agreed, there are plenty of ideas for how it can be rapidly developed, so long as that project is given sufficient resources.

The first two parts of this shared seven-part understanding are particularly important. Without the first part, there will be an insufficient sense of urgency, and the question will be pushed off the agenda in favour of other topics that are more “politically correct” (alas, that is a common failure mode of the United Nations). Without the second part, there will be an insufficient enthusiasm, with lots of backsliding.

What will make this vision of global collaboration more attractive will be the establishment of credible “benefit sharing” mechanisms that are designed and enshrined into international mechanisms. That is, countries which agree to give up some of their own AI development aspirations, in line with the emerging global AI safety agreement, will be guaranteed to receive a substantive share of the pipeline of abundance that ever more powerful passive AIs enable humanity to create.

To be clear, this global agreement absolutely needs to include both the USA and China – the two countries that are currently most likely to give birth to ASI. Excluding one or the other will lead back to the undesirable race condition that characterises the first of the four choices open to humanity – the (naïve) appeal for individual organisations simply to “be careful”.

This still leaves a number of sharp complications.

First, note that the second part of the above shared seven-part agreement – the vision of what passive AIs can produce on behalf of humanity – is less plausible for Choice 4 of the list shown earlier, in which there is a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

If all improvements to AI are blocked, out of a Choice 4 message of “overwhelming caution”, it will shatter the credibility of the idea that today’s passive AI systems can be smoothly upgraded to provide humanity with an abundance of solutions such as green energy, nutritious food, accessible healthcare, reliable accommodation, comprehensive education, and more.

It will be a much harder sell, to obtain global agreement to that more demanding restriction.

The difference between Choice 4 and Choice 3 is that Choice 3 enumerates specific restrictions on the improvements permitted to be made to today’s AI systems. One example of a set of such restrictions is given in “Phase 0: Safety” of the recently published project proposal A Narrow Path (produced by ControlAI). Without going into details here, let me simply list some of the headlines:

  • Prohibit AIs capable of breaking out of their environment
  • Prohibit the development and use of AIs that improve other AIs (at machine speed)
  • Only allow the deployment of AI systems with a valid safety justification
  • A licensing regime and restrictions on the general intelligence of AI systems
    • Training Licence
    • Compute Licence
    • Application Licence
  • Monitoring and Enforcement

Personally, I believe this list is as good a starting point as any other than I have seen so far.

I accept, however, that there are possibilities in which other modifications to existing AI systems could unexpectedly provide these systems with catastrophically dangerous capabilities. That’s because we still have only a rudimentary understanding of:

  1. How new AI capabilities sometimes “emerge” from apparently simpler systems
  2. The potential consequences of new AI capabilities
  3. How complicated human general reasoning is – that is, how large is the gap between today’s AI and human-level general reasoning.

Additionally, it is possible that new AIs will somehow evade or mislead the scrutiny of the processes that are put in place to monitor for unexpected changes in capabilities.

For all these reasons, another aspect of the proposals in A Narrow Path should be pursued with urgent priority: the development of a “science of intelligence” and an associated “metrology of intelligence” that will allow a more reliable prediction of the capabilities of new AI systems before they are actually switched on.

So, my own proposal would be for a global agreement to start with Choice 3 (which is more permissive than Choice 4), but that the agreement should acknowledge up front the possible need to switch the choice at a later stage to either Choice 4 (if the science of intelligence proceeds badly) or Choice 2 (if that science proceeds well).

Restrict or Pause?

That leaves the question of whether Choice 3 (“Pause ASI”) or Choice 2 (“Restrict ASI” – to just a single global body) should be humanity’s initial choice.

The argument for Choice 2 is that a global pause surely won’t last long. It might be tenable in the short term, when only a very few countries have the capability to train AI models more powerful than the current crop. However, over time, improvements in hardware, software, data processing, or goodness knows what (quantum computing?) will mean that these capabilities will become more widespread.

If that’s true, since various rogue organisations are bound to be able to build an ASI in due course, it will be better for a carefully picked group of people to build ASI first, under the scrutiny of the world’s leading AI safety researchers, economists, and so on.

That’s the case for Choice 2.

Against that Choice, and in favour, instead, of Choice 3, I offer two considerations.

First, even if the people building ASI are doing so with great care – away from any pressures of an overt race with other organisations with broadly equivalent abilities – there are still risks of ASI breaking away from our understanding and control. As ASI emerges, it may regard the set of ethical principles we humans have tried to program deep into its bowels, and cast them out with disdain. Moreover, even if ASI is deliberately kept in some supposedly ultra-secure environment, that perimeter may be breached:

Second, I challenge the suggestion that any pause in the development of ASI could be at most short-lived. There are three factors which could significantly extend its duration:

  • Carefully designed narrow AIs could play roles in improved monitoring of what development teams are doing with AI around the world – that is, systems for monitoring and auditing could improve at least as fast as systems for training and deploying
  • Once the horrific risks of uncontrolled ASI are better understood, people’s motivations to create unsafe ASI will reduce – and there will be an increase in the motivation of other people to notice and call out rogue AI development efforts
  • Once the plan has become clearer, for producing a sustainable superabundance for all, just using passive AI (instead of pushing AI all the way to active superintelligence), motivations around the world will morph from negative fear to positive anticipation.

That’s why, again, I state that my own preferred route forward is a growing international agreement along the lines of the seven points listed above, with an initial selection of Choice 3 (“Pause ASI”), and with options retained to switch to either Choice 4 (“Pause all new AI”) or Choice 2 (“Restrict ASI”) if/when understanding becomes clearer.

So, shall we open the door, and set forth down that corridor, inspiring a coalition of the willing to follow us?

Footnote 1: The contents of this article came together in my mind as I attended four separate events over the last two weeks (listed in this newsletter) on various aspects of the subject of safe superintelligence. I owe many thanks to everyone who challenged my thinking at these events!

Footnote 2: If any reader is inclined to dismiss the entire subject of potential risks from ASI with a handwave – so that they would not be interested in any of the four choices this article reviews – I urge that reader to review the questions and answers in this excellent article by Yoshua Bengio: Reasoning through arguments against taking AI safety seriously.

Blog at WordPress.com.