dw2

17 November 2024

Preventing unsafe superintelligence: four choices

More and more people have come to the conclusion that artificial superintelligence (ASI) could, in at least some circumstances, pose catastrophic risks to the wellbeing of billions of people around the world, and that, therefore, something must be done to reduce these risks.

However, there’s a big divergence of views about what should be done. And there’s little clarity about the underlying assumptions on which different strategies depend.

Accordingly, I seek in this article to untangle some of choices that need to be made. I’ll highlight four choices that various activists promote.

The choices differ regarding the number of different organisations worldwide that are envisioned as being legally permitted to develop and deploy what could become ASI. The four choices are:

  1. Accept that many different organisations will each pursue their own course toward ASI, but urge each of them to be very careful and to significantly increase the focus on AI safety compared to the present situation
  2. Seek to restrict to just one organisation in the world any developments that could lead to ASI; that’s in order to avoid dangerous competitive race dynamics if there is more than one such organisation
  3. Seek agreements that will prevent any organisation, anywhere in the world, from taking specific steps that might bring about ASI, until such time as it has become absolutely clear how to ensure that ASI is safe
  4. Seek a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

For simplicity, these choices can be labelled as:

  1. Be careful with ASI
  2. Restrict ASI
  3. Pause ASI
  4. Pause all new AI

It’s a profound decision for humanity to take. Which of the four doors should we open, and which of four corridors should we walk down?

Each of the four choices relies on some element of voluntary cooperation, arising out of enlightened self-interest, and on some element of compulsion – that is, national and international governance, backed up by sanctions and other policies.

What makes this decision hard is that there are strong arguments against each choice.

The case against option 1, “Be careful with ASI”, is that at least some organisations (including commercial entities and military groups) are likely to cut corners with their design and testing. They don’t want to lose what they see as a race with existential consequences. The organisations that are being careful will lose their chance of victory. The organisations that are, instead, proceeding gung ho, with lesser care, may imagine that they will fix any problems with their AIs when these flaws become apparent – only to find that there’s no way back from one particular catastrophic failure.

As Sam Altman, CEO of OpenAI, has said: it will be “lights out for all of us”.

The case against each of the remaining three options is twofold:

  • First, in all three cases, they will require what seems to be an impossible degree of global cooperation – which will need to be maintained for an implausibly long period of time
  • Second, such restrictions will stifle the innovative development of the very tools (that is, advanced AI) which will actually solve existential problems (including the threat of rogue ASI, as well as the likes of climate change, cancer, and aging), rather than making these problems worse.

The counter to these objections is to make the argument that a sufficient number of the world’s most powerful countries will understand the rationale for such an agreement, as something that is in their mutual self-interest, regardless of the many other differences that divide them. That shared understanding will propel them:

  • To hammer out an agreement (probably via a number of stages), despite undercurrents of mistrust,
  • To put that agreement into action, alongside measures to monitor conformance, and
  • To prevent other countries (who have not yet signed up to the agreement) from breaching its terms.

Specifically, the shared understanding will cover seven points:

  1. For each of the countries involved, it is in their mutual self-interest to constrain the development and deployment of what could become catastrophically dangerous ASI; that is, there’s no point in winning what will be a suicide race
  2. The major economic and humanitarian benefits that they each hope could be delivered by advanced AI (including solutions to other existential risk), can in fact be delivered by passive AIs which are restricted from reaching the level of ASI
  3. There already exist a number of good ideas regarding potential policy measures (regulations and incentives) which can be adopted, around the world, to prevent the development and deployment of catastrophically dangerous AI – for example, measures to control the spread and use of vast computing resources
  4. There also exist a number of good ideas regarding options for monitoring and auditing which can also be adopted, around the world, to ensure the strict application of the agreed policy measures – and to prevent malign action by groups or individuals that have, so far, failed to sign up to the policies
  5. All of the above can be achieved without any detrimental loss of individual sovereignty: the leaders of these countries can remain masters within their own realms, as they desire, provided that the above basic AI safety framework is adopted and maintained
  6. All of the above can be achieved in a way that supports evolutionary changes in the AI safety framework as more insight is obtained; in other words, this system can (and must) be agile rather than static
  7. Even though the above safety framework is yet to be fully developed and agreed, there are plenty of ideas for how it can be rapidly developed, so long as that project is given sufficient resources.

The first two parts of this shared seven-part understanding are particularly important. Without the first part, there will be an insufficient sense of urgency, and the question will be pushed off the agenda in favour of other topics that are more “politically correct” (alas, that is a common failure mode of the United Nations). Without the second part, there will be an insufficient enthusiasm, with lots of backsliding.

What will make this vision of global collaboration more attractive will be the establishment of credible “benefit sharing” mechanisms that are designed and enshrined into international mechanisms. That is, countries which agree to give up some of their own AI development aspirations, in line with the emerging global AI safety agreement, will be guaranteed to receive a substantive share of the pipeline of abundance that ever more powerful passive AIs enable humanity to create.

To be clear, this global agreement absolutely needs to include both the USA and China – the two countries that are currently most likely to give birth to ASI. Excluding one or the other will lead back to the undesirable race condition that characterises the first of the four choices open to humanity – the (naïve) appeal for individual organisations simply to “be careful”.

This still leaves a number of sharp complications.

First, note that the second part of the above shared seven-part agreement – the vision of what passive AIs can produce on behalf of humanity – is less plausible for Choice 4 of the list shown earlier, in which there is a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

If all improvements to AI are blocked, out of a Choice 4 message of “overwhelming caution”, it will shatter the credibility of the idea that today’s passive AI systems can be smoothly upgraded to provide humanity with an abundance of solutions such as green energy, nutritious food, accessible healthcare, reliable accommodation, comprehensive education, and more.

It will be a much harder sell, to obtain global agreement to that more demanding restriction.

The difference between Choice 4 and Choice 3 is that Choice 3 enumerates specific restrictions on the improvements permitted to be made to today’s AI systems. One example of a set of such restrictions is given in “Phase 0: Safety” of the recently published project proposal A Narrow Path (produced by ControlAI). Without going into details here, let me simply list some of the headlines:

  • Prohibit AIs capable of breaking out of their environment
  • Prohibit the development and use of AIs that improve other AIs (at machine speed)
  • Only allow the deployment of AI systems with a valid safety justification
  • A licensing regime and restrictions on the general intelligence of AI systems
    • Training Licence
    • Compute Licence
    • Application Licence
  • Monitoring and Enforcement

Personally, I believe this list is as good a starting point as any other than I have seen so far.

I accept, however, that there are possibilities in which other modifications to existing AI systems could unexpectedly provide these systems with catastrophically dangerous capabilities. That’s because we still have only a rudimentary understanding of:

  1. How new AI capabilities sometimes “emerge” from apparently simpler systems
  2. The potential consequences of new AI capabilities
  3. How complicated human general reasoning is – that is, how large is the gap between today’s AI and human-level general reasoning.

Additionally, it is possible that new AIs will somehow evade or mislead the scrutiny of the processes that are put in place to monitor for unexpected changes in capabilities.

For all these reasons, another aspect of the proposals in A Narrow Path should be pursued with urgent priority: the development of a “science of intelligence” and an associated “metrology of intelligence” that will allow a more reliable prediction of the capabilities of new AI systems before they are actually switched on.

So, my own proposal would be for a global agreement to start with Choice 3 (which is more permissive than Choice 4), but that the agreement should acknowledge up front the possible need to switch the choice at a later stage to either Choice 4 (if the science of intelligence proceeds badly) or Choice 2 (if that science proceeds well).

Restrict or Pause?

That leaves the question of whether Choice 3 (“Pause ASI”) or Choice 2 (“Restrict ASI” – to just a single global body) should be humanity’s initial choice.

The argument for Choice 2 is that a global pause surely won’t last long. It might be tenable in the short term, when only a very few countries have the capability to train AI models more powerful than the current crop. However, over time, improvements in hardware, software, data processing, or goodness knows what (quantum computing?) will mean that these capabilities will become more widespread.

If that’s true, since various rogue organisations are bound to be able to build an ASI in due course, it will be better for a carefully picked group of people to build ASI first, under the scrutiny of the world’s leading AI safety researchers, economists, and so on.

That’s the case for Choice 2.

Against that Choice, and in favour, instead, of Choice 3, I offer two considerations.

First, even if the people building ASI are doing so with great care – away from any pressures of an overt race with other organisations with broadly equivalent abilities – there are still risks of ASI breaking away from our understanding and control. As ASI emerges, it may regard the set of ethical principles we humans have tried to program deep into its bowels, and cast them out with disdain. Moreover, even if ASI is deliberately kept in some supposedly ultra-secure environment, that perimeter may be breached:

Second, I challenge the suggestion that any pause in the development of ASI could be at most short-lived. There are three factors which could significantly extend its duration:

  • Carefully designed narrow AIs could play roles in improved monitoring of what development teams are doing with AI around the world – that is, systems for monitoring and auditing could improve at least as fast as systems for training and deploying
  • Once the horrific risks of uncontrolled ASI are better understood, people’s motivations to create unsafe ASI will reduce – and there will be an increase in the motivation of other people to notice and call out rogue AI development efforts
  • Once the plan has become clearer, for producing a sustainable superabundance for all, just using passive AI (instead of pushing AI all the way to active superintelligence), motivations around the world will morph from negative fear to positive anticipation.

That’s why, again, I state that my own preferred route forward is a growing international agreement along the lines of the seven points listed above, with an initial selection of Choice 3 (“Pause ASI”), and with options retained to switch to either Choice 4 (“Pause all new AI”) or Choice 2 (“Restrict ASI”) if/when understanding becomes clearer.

So, shall we open the door, and set forth down that corridor, inspiring a coalition of the willing to follow us?

Footnote 1: The contents of this article came together in my mind as I attended four separate events over the last two weeks (listed in this newsletter) on various aspects of the subject of safe superintelligence. I owe many thanks to everyone who challenged my thinking at these events!

Footnote 2: If any reader is inclined to dismiss the entire subject of potential risks from ASI with a handwave – so that they would not be interested in any of the four choices this article reviews – I urge that reader to review the questions and answers in this excellent article by Yoshua Bengio: Reasoning through arguments against taking AI safety seriously.

12 November 2024

The Narrow Path – questions and answers

Filed under: AGI, risks — Tags: , — David Wood @ 9:53 am

On Saturday, I had the pleasure to chair a webinar on the subject “The Narrow Path: The big picture”.

This involved a deep dive into aspects of two recently published documents:

The five panellists – who all made lots of thoughtful comments – were:

  • Chris Scammell, the COO of Conjecture and one of the principal authors of The Compendium
  • Andrea Miotti, the Executive Director of Control AI and the lead author of A Narrow Path
  • Robert Whitfield, Chair of Trustees of One World Trust
  • Mariana Todorova, a core member of the team in the Millennium Project studying scenarios for the transition between AI and AGI
  • Daniel Faggella, the Founder and Head of Research of Emerj

For your convenience, here’s a recording of the event:

It was a super discussion, but it fell short in one aspect from the objectives I had in mind for the meeting. Namely, the conversation between the panellists was so rich that we failed to find sufficient time to address the many important questions which audience members had submitted in Zoom’s Q&A window.

Accordingly, I am posting these questions at the end of this blogpost, along with potential answers to some of them.

Out of caution for people’s privacy, I’ve not given the names of the people who asked each question, but I will happily edit the post to include these names on an individual basis as requested.

I also expect to come back and edit this post whenever someone proposes a good answer to one of the questions.

(When I edit the post, I’ll update this version number tracker. Currently this is version 1.2 of the post.)

The draft answers are by me (“DW”) except where otherwise indicated.

Aside 1:

For those in or near to London on Thursday evening (14th October), there’s another chance to continue the discussion about if/how to try to pause or control the development of increasingly powerful AI.

This will be at an event in London’s Newspeak House. Click here for more details.

Aside 2:

I recently came across a powerful short video that provides a very different perspective on many issues concerning the safety of AI superintelligence. It starts slowly, and at first I was unsure what to think about it. But it builds to a striking conclusion:

And now, on to the questions from Saturday’s event…

1. Strict secure environments?

Some biomedical research is governed by military in prevent major incident or fall into wrong hands, could you envision an AI experiments under strict secure environments?

Answer (DW): That is indeed envisioned, but with two provisos:

  1. Sadly, there is a long history of leaks from supposedly biosecure laboratories
  2. Some AIs may be so powerful that they will find ways (psychological and/or physical) of escaping from any confinement.

Accordingly, it will be better to forbid certain kinds of experiment altogether, until such time (if ever) it becomes clear that the outcomes will be safe.

2. How will AI view living beings?

How would AI view living beings’ resilience, perseverance, and thriving? Could you explore please, thank you.

3. AIs created with different ideologies?

AGI created in China and perhaps even in North Korea is going to have ideology and supremacy of the regime and ideology over human rights, will say North Korean AGI find way into our systems, whether human induced or at AGI autonomy?

Answer (DW): Indeed, when people proudly say that they, personally, know how to create safe superintelligence, so the world has no need to worry about damage from superintelligence, that entirely presupposes, recklessly, that no-one else will build (perhaps first) an unsafe superintelligence.

So, this issue cannot be tackled at an individual level. It requires global level coordination.

Happily, despite differences in ideological output, governments throughout the world are increasingly sharing the view that superintelligence may spin out of control and, therefore, that such development needs careful control. For example, the Chinese government fully accepts that principle.

4. A single AGI or many?

I just saw a Sam Altman interview where he indicated expecting OpenAI to achieve AGI in 2025. I would expect others are close as well. It seems there will be multiple AGIs in close proximity. Given open source systems are nearly equal to private developers, why would we think that the first to get AGI will rule the world?

Answer (DW): This comes down to the question of whether the first AGI that emerges will gain a decisive quick advantage – whether it will be a “winner takes all” scenario.

As an example, consider the fertilisation of an egg (ovum). Large numbers of sperm may be within a short distance from that goal, but as soon as the first sperm reaches the target, the egg undergoes a sharp transition, and it’s game over for all the other sperm.

5. National AGI licencing systems?

What are the requirements for national AGI licencing systems and global governance coordination among national systems?

Answer (DW): The Narrow Path document has some extensive proposals on this topic.

6. AGI as the solution to existential risk?

Suppose we limit the intelligence of the developing GenAI apps because they might be leveraged by bad actors in a way that triggers an existential risk scenario for humans.

In doing that, wouldn’t we also be limiting their ability to help us resolve existential risk situations we already face, e.g., climate change?

Answer (DW): What needs to be promoted is the possibility of narrow AI making decisive contributions to the solution of these other existential risks.

7. A “ceiling” to the capability of AI?

Are you 100% certain that self-improving AI won’t reach a “ceiling” of capability. After all, it only has human knowledge and internet slop to learn from?

Answer (by Chris Scammel): On data limitations, people sometimes argue that we will run into these at a boundary. It could be that we don’t have data currently to train AI. But we can make more, and so can AI! (top paid dataset labellers getting paid like $700/hr.)

If the question is about intelligence/capability.

One intuition pump: chess AI has gone vastly beyond human skill.

Another: humanity is vastly smarter than a single human.

Another: humans / humanity is vastly smarter than we were thousands of years ago (at the very least, much much more capable).

What we consider “intelligence” to be is but a small window of what capabilities could be, so to believe that there is a ceiling near human level seems wrong from the evidence.

That there is a ceiling at all… deep philosophical question. Is there a ceiling to human intelligence? Humanity’s? Is this different from what an AI is able to achieve? All of these are uncertain.

But we shouldn’t expect a ceiling to keep us safe.

8. Abuse of behavioural models?

Social Media companies are holding extensive volumes of information. I am concerned about not only online disinformation but also the modification of manipulation of behaviour, all the way to cognitive impairment, including when governments are involved. We adults have the ability to anticipate several decades down the road. How could behavioural models be abused or weaponized in the future?

9. Extinction scenarios?

What do you think are the top scenarios how AI can cause the extinction of humanity?

Answer (DW): A good starting point is the research article An Overview of Catastrophic AI Risks.

See also my own presentation Assessing the risks of AI catastrophe, or my book The Singularity Principles (whose entire content is available online free-of-charge).

10. Income gap?

Will artificial superintelligence be able to help humanity close the income gap between rich and poor countries?

11. Using viruses to disrupt rogue AI systems?

Perhaps a silly question from a non-techie – are there any indications of viruses that could disrupt rogue ai systems?

12. Additional threats is AI becomes conscious?

Whichever is true, whether consciousness is biological phenomena, or something more spiritual, in what way would consciousness for AI not be a huge threat. If you give the machine real feelings, how could you possibly hope to control its alignment? Additionally, what would happen to its rights vs human rights. My feeling is not nearly enough thought has gone into this to risk stumbling across conscious AI at this stage. Everything should be done to avoid it.

Answer (DW): I agree! See my article Conscious AI: Five Options for some considerations. Also keep an eye on forthcoming announcements from the recently launched startup Conscium.

13. The research of Mark Solms?

Regarding Conscious AGI apps…

Mark Solms, author of the Hidden Spring, has argued that consciousness is not about intelligence but, instead, is rooted in feelings, physically located in the brainstem.

His view makes sense to me.

As I understand it, he’s involved in experiments/studies around the implications of this view of consciousness for AGI.

Thoughts about this?

14. Multi-dimensional intelligence?

Thanks Mariana for raising issues of the multi-dimensions in which human consciousness appears to operate, compared to AGI – is there an argument that communities need to race to develop our other levels of consciousness, as potentially our only defence against the 1-dimensional AGI?

15. The views of Eric Schmidt and other accelerationists?

The question I asked above, “AGI as the solution to existential risk?”, looms large in the minds of the accelerationist community.

Eric Schmidt has explicitly said that he’s an accelerationist because that’s something like the fastest and effective way to address climate change…

That view is extremely widespread and must be explicitly addressed for the imitations discussed in this meeting to be made reality.

16. Need to work on Phase 1 concurrently with Phase 0?

What is described in A Narrow Path is phases 0, 1 and 2 in sequential order but three concurrent objectives. While the risk of loss of control is surely the highest, doesn’t the risk of concentration of power need to be largely tackled concurrently? Otherwise by the time phase 0 or 1 are completed a global dystopia will have been durably entrenched with one or two states or persons ruling the world for years, decades or more.

Answer (by Robert Whitfield): You make a valid point. I am not sure that Narrow Path says that there can be no overlap. Certainly you can start thinking about and working on Phase 1 before you have completed Phase 0 – but the basic concept is sound: the initial priority is to pause the further development towards AGI. Once that has been secured, it is possible to focus on bring about longer term stability.

17. The role of a veto?

The Narrow Path describes the governance for Phase 1 (lasting 20 years) to be: “The Executive Board, analogous to the UN Security Council, would consist of representatives of major member states and supranational organizations, which would all be permanent members with vetoes on decisions taken by the Executive Board, as well as non-permanent representatives elected by a two-thirds majority of the Council”. But wouldn’t such a veto make it impossible to ensure wide enough compliance and marginalize economically all other states?

As a comparison, back in 1946, it was the veto that prevented the Baruch Plan or the Gromyko Plans to be approved, and lead us to a huge gamble with nuclear technology.

Answer (by Robert Whitfield): It depends in part upon how long it takes to achieve Phase 0. As discussed in the meeting, completing Phase 0 is extremely urgent. If this is NOT achieved, then you can start to talk about dystopia. But if it is achieved, Governments can stand up to the Big Tech companies and address the concentration of power, which would be difficult but not dystopic.

There is a very strong argument that an agreement could be achieved without removing vetoes much more quickly than one that does remove vetoes. This points to a two-phase Treaty:

  • An initial Treaty, sufficient for the purposes of Phase 0
  • A more robust Baruch style agreement for securing the long term.

18. Who chooses the guardians?

Who would be the people in charge of these groups of guardians or protectors against uncontrolled AI? How would they be chosen? Would they be publicly known?

6 November 2024

A bump on the road – but perhaps only a bump

Filed under: AGI, politics, risks — Tags: , , , — David Wood @ 3:56 pm

How will the return of Donald Trump to the US White House change humanity’s path toward safe transformative AI and sustainable superabundance?

Of course, the new US regime will make all kinds of things different. But at the macro level, arguably nothing fundamental changes. The tasks remain the same, for what engaged citizens can and should be doing.

At that macro level, the path toward safe sustainable superabundance runs roughly as follows. Powerful leaders, all around the world, need to appreciate that:

  1. For each of them, it is in their mutual self-interest to constrain the development and deployment of what could become catastrophically dangerous AI superintelligence
  2. The economic and humanitarian benefits that they each hope could be delivered by advanced AI, can in fact be delivered by AI which is restricted from having features of general intelligence; that is, utility AI is all that we need
  3. There are policy measures which can be adopted, around the world, to prevent the development and deployment of catastrophically dangerous AI superintelligence – for example, measures to control the spread and use of vast computing resources
  4. There are measures of monitoring and auditing which can also be adopted, around the world, to ensure the strict application of the agreed policy measures – and to prevent malign action by groups or individuals that have, so far, failed to sign up to the policies
  5. All of the above can be achieved without any damaging loss of the leaders’ own sovereignty: these leaders can remain masters within their own realms, provided that the above basic AI safety framework is adopted and maintained
  6. All of the above can be achieved in a way that supports evolutionary changes in the AI safety framework, as more insight is obtained; in other words, this system is agile rather than static
  7. Even though the above safety framework is yet to be properly developed and agreed, there are plenty of ideas for how it can be rapidly developed, so long as that project is given sufficient resources.

The above agreements necessarily need to include politicians of very different outlooks on the world. But similar to the negotiations over other global threats – nuclear proliferation, bioweapons, gross damage to the environment – politicians can reach across vast philosophical or ideological gulfs to forge agreement when it really matters.

That’s especially the case when the threat of a bigger shared “enemy”, so to speak, is increasingly evident.

AI superintelligence is not yet sitting at the table with global political leaders. But it will soon become clear that human politicians (as well as human leaders in other walks of life) are going to lose understanding, and lose control, of the AI systems being developed by corporations and other organisations that are sprinting at full speed.

However, as with responses to other global threats, there’s a collective action problem. Who is going to be first to make the necessary agreements, to sign up to them, and to place the AI development and deployment systems within their realms under the remote supervision of the new AI safety framework?

There are plenty of countries where the leaders may say: My country is ready to join that coalition. But unless these are the countries which control the resources that will be used to develop and deploy the potentially catastrophic AI superintelligence systems, such gestures have little utility.

To paraphrase Benito Mussolini, it’s not sufficient for the sparrows to request peace and calm: the eagles need to wholeheartedly join in too.

Thus, the agreement needs to start with the US and with China, and to extend rapidly to include the likes of Japan, the EU, Russia, Saudi Arabia, Israel, India, the UK, and both South and North Korea.

Some of these countries will no doubt initially resist making any such agreement. That’s where two problems need to be solved:

  • Ensuring the leaders in each country understand the arguments for points 1 through 7 listed above – starting with point 1 (the one that is most essential, to focus minds)
  • Setting in motion at least the initial group of signatories.

The fact that it is Donald Trump who will be holding the reins of power in Washington DC, rather than Joe Biden or Kamala Harris, introduces its own new set of complications. However, the fundamentals, as I have sketched the above, remain the same.

The key tasks for AI safety activists, therefore, remain:

  • Deepening public understanding of points 1 to 7 above
  • Where there are gaps in the details of these points, ensuring that sufficient research takes place to address these gaps
  • Building bridges to powerful leaders, everywhere, regardless of the political philosophies of these leaders, and finding ways to gain their support – so that they, in turn, can become catalysts for the next stage of global education.

Blog at WordPress.com.