5 February 2026

Removing the pressure to rush

Filed under: AGI, aging, rejuveneering, risks — Tags: AGI, LEVF, Longevity Escape Velocity Foundation, RMR — David Wood @ 2:45 am

Here’s an argument with a conclusion that may surprise you.

The topic is how to reduce the risks of existential or catastrophic outcomes from forthcoming new generations of AI.

The conclusion is these risks can be reduced by funding some key laboratory experiments involving middle-aged mice – experiments that have a good probability of demonstrating a significant increase in the healthspan and lifespan of these mice.

These experiments have a name: RMR2, which stands for Robust Mouse Rejuvenation, phase 2. If you’re impatient, you can read about these experiments here, on the website of LEVF, the organisation where I have a part-time role as Executive Director.

For clarity: LEVF stands for Longevity Escape Velocity Foundation. I describe LEVF’s work in more detail in the Appendix to this article. But first, let’s return to the topic of the risks posed by future AI systems.

The hypothesis

I am advancing a sociotechnical hypothesis: that perceived hopelessness about aging materially increases tolerance for AI risk, and that credible progress on aging reduces that tolerance.

An underlying driver of greater risk

There are many different opinions about the extent and nature of the risks that new generations of AI may create. However, despite this diversity of viewpoint, there is general consensus on one point: As the development and deployment of new generations of AI becomes more hurried and more reckless, the risk of undesirable outcomes also increases. The more haste, the more danger.

Now, one factor that encourages people to hurry to develop and deploy AI in potentially reckless ways (shortcutting safety evaluations and other design audits), is their fear that progress in solving aging is proceeding too slowly. Perceiving few signs of any solution to aging via conventional methods, they yearn for what they hope may be a “hail Mary” pass – an impetuous attempt to accelerate the arrival of superintelligent AI.

Accordingly, I’ve often heard people making an argument like this: Yes, there is a nonzero risk of mass deaths arising from superintelligent AI that is badly aligned and uncontrollable. But there’s a 100% chance of death from aging in the absence of major progress with AI.

Advocates of this point of view accept the first risk in order to have a chance of avoiding the second risk.

Even when not consciously articulated, these trade-offs can shape behaviour. People may have in mind that every single day of delay in building superintelligent AI causes around 100,000 extra deaths from aging. Such a large quantity of unnecessary deaths is horrific. Given that pressure, why worry about hypothetical risks from AI?

(Note: The desire to solve aging as quickly as possible isn’t the only driver of AI developer recklessness. Profit, geopolitics, and personal prestige play roles too, for different people. But my case is that aging-related urgency is a non-trivial contributor to over-hasty development.)

A third option?

One counter to the above trade-off argument is to point out that it makes a false binary. There are more than two choices. A very important third choice is to solve aging by creative extensions of biotechnology and AI that already exist. That approach won’t need the extraordinary disruption of superintelligent AI.

But many advocates of rushing as fast as possible towards superintelligent AI dismiss the chance of anything like a “business as usual” solution to aging. They think the idea of a third option is an illusion. Decades of previous effort have delivered almost nothing of practical utility, they claim. No person has reached the age of 120 this century. Calorie restriction, known since the 1930s, is still the best intervention to increase the lives of normal middle-aged laboratory mice. Biological metabolism is far too complicated for unaided human scientists to work out how to alter it to avoid aging. And so on.

It’s those considerations that push more people to the following conclusion: The most reliable way to solve aging is, first, to create a superintelligent AI, and then to let this AI solve aging on our behalf.

With that conclusion in their mind, people then feel strong psychological and social pressure to turn a blind eye to any arguments that the creation of such an AI has a significant risk of killing vast numbers of people worldwide.

Challenging the spiral of pessimism

Can we break this spiral of pessimism and unwise risk tolerance?

Yes! By demonstrating how today’s AI systems, coupled with smart laboratory experiments on normal middle aged laboratory mice, can indeed break records for the extension of healthspan and lifespan. These experiments will apply combinations of damage-repair interventions such as senescent cell clearance, partial cellular reprogramming, cross-link breaking, and the infusion of exosomes, among other treatments.

This demonstration will interrupt the vicious cycle of negativity about current biotech research into solving aging. It will also show that repairing the low-level damage which constitutes biological aging can be effective even without any attempt to remodel core biological metabolism.

Indeed, what’s preventing faster progress in solving aging isn’t the lack of a more capable AI. It’s the lack of key experimental data as to the outcomes of multiple different damage-repair therapies being applied in parallel. That all-important data is what LEVF’s RMR programme will generate, provided sufficient funding is made available.

Bear in mind that AI gains its intelligence from relevant high quality training data. The training of DeepMind’s AlphaFold depended on information about the 3D structure of many proteins that was painstakingly assembled by pioneering human researchers over five decades – an initiative presciently started in 1971 by Helen Berman. Again, the remarkable breakthroughs in image recognition of AlexNet in 2012, which catalysed the entire field of deep neural networks, depended on the vast ImageNet database of labelled images assembled by Stanford’s Fei Fei Li and numerous Amazon Turk contractors.

These datasets didn’t just accelerate progress – they changed what the field believed was possible.

(Of course, data alone is not sufficient – but history shows that without the right data, even the best algorithms stall.)

It’s likely to be the same with solving aging. Data created by the RMR programme can be analysed by a combination of smart humans aided by today’s state-of-the-art AI. The output will be a design for a package of interventions that have a good chance to provide comprehensive high-quality low-cost age-reversal therapies for humans.

As that pathway becomes clearer, it will remove a strong incentive for many AI developers to adopt and tolerate methods that are far too dangerous and haphazard. Instead, we can anticipate a very welcome change in trajectory, toward reliably trustworthy safe AI development.

A complement not a replacement

Let me offer a short aside to alignment researchers, governance advocates, and “slow down AI” proponents.

To be clear, my advocacy for funding to be allocated in support of potential breakthrough healthy longevity projects like RMR is a complement (not a replacement) for ongoing work in favour of AI alignment, AI regulation, and selective AI pauses.

I’m not arguing for any of these activities to be reduced. Reducing the risks of AI catastrophe will require progress along a wide spectrum of different activities.

But I emphasise healthy longevity projects as part of a very necessary change in public mood towards the governance of AI.

If that mood is driven primarily by fear and is expressed as calls for sacrifices – “these are things we have to stop doing” – that will be an uphill battle. The campaign is likely to gain more momentum when the messages are “Safe AI can truly enhance human flourishing” and “here are things we can and should be doing more”.

Tackling two major risk factors in parallel

In summary: When people support the funding of LEVF, for the RMR2 project, they’re addressing not one but two potential causes of death of themselves and everyone they care about:

The likelihood of death from an age-related condition
The likelihood of death from misaligned or uncontrolled advanced AI.

If you find this argument compelling, one concrete way to act is to visit the LEVF donation page.

And if you happen to know any particularly wealthy people, who likewise care about all the misery that could follow from either of these risks, kindly nudge them towards that page too.

Appendix

Here are some more details of how success with RMR will ignite major changes in science funding, in turn leading to profound worldwide humanitarian benefit.

70% of all deaths around the world are caused by age-related diseases – diseases that become increasingly likely and increasingly deadly the longer people live.

The root cause of these age-related diseases is the gradual accumulation of various types of cellular and biomolecular damage.

An increasing number of damage-repair interventions have been discovered, proposed, and studied, which each have the ability to reverse aspects of this damage.

Applying a sufficient number of these interventions in parallel has the potential to significantly extend both lifespan and healthspan, even when started as late as middle age.

Most of the world is unnecessarily sceptical about the potential of such combination treatments. The way minds can be changed is to demonstrate significant results in middle-aged mice – Robust Mouse Rejuvenation (RMR):

A successful result will involve a statistically significant number of ordinary middle-aged mice (aged 18 months, out of an average lifespan for these mice of 30 months) and will at least double their mean and maximum (90% decile) remaining lifespan.
Note: In human terms, this would be the equivalent of applying treatments to a group of people aged 50, who ordinarily would on average live to the age of around 80, with the result that their mean lifespan would instead become 110.

LEVF anticipates that demonstrating RMR will trigger a multi-step change in social priorities, leading to a grand “war on aging” with much greater resources applied to translating these results from mice to larger, longer-lived mammals, such as dogs, primates, and humans.

Importantly, even partial success – strong additive effects without full lifespan doubling – would still constitute decisive evidence against the claim that aging is intractable without superintelligence.

Between 2023 and 2025, LEVF has already conducted an initial project (RMR1), involving four different anti-aging interventions. As anticipated, the RMR1 interventions were additive in effect, though the set of only four interventions was insufficient to attain RMR. A pilot phase of a second, larger project (RMR2) is now underway, that applies important learnings from RMR1:

For RMR2, a larger number of different treatments will be applied (8 instead of 4), covering a wider range of types of cellular and biomolecular damage
RMR2 will introduce new damage-repair interventions that have been individually validated since RMR1 started
For best effect, each damage repair treatment will likely need to be applied more than once in the remaining lifespan of each mouse
The experiment will determine which combinations of treatments are, regrettably, antagonistic, and which are synergistic.

The demonstration of RMR will have a huge impact on the scientific community that researches aging and rejuvenation:

Many researchers in this community are presently preoccupied with trying to understand the precise causal pathways that create various kinds of biological damage, with a view to somehow altering these pathways
However, organismal metabolism is extraordinarily complicated, and many problematic side-effects arise from attempts to alter pathways to avoid producing damage
For this reason, the community contains a lot of scepticism that aging can be brought under comprehensive control any time soon
In contrast, LEVF emphasises that interventions to repair or reverse damage can be understood and applied without needing to understand how the damage is created in the first place: damage removal is easier than slowing down damage creation
For example, there is no need to endlessly debate whether aging should be understood from an evolutionary point of view or an entropic point of view; nor whether to adopt so-called “holist” or “reductionist” approaches; instead, what can (and should) happen is to develop interventions that remove or repair different types of damage
LEVF therefore champions an engineering approach, similar to how vaccines were developed and deployed with wide success long before the full complexity of the immune system was understood
Even among researchers sympathetic to the damage-repair approach, there is significant pessimism about the pace of progress – on account of extrapolating from the modest life-extension results obtained by applying damage repair interventions on a single basis
In contrast, LEVF anticipates that, once a sufficient number of interventions is applied, in a suitable combination, gains in healthspan and lifespan will be much more dramatic
RMR, therefore, will give many researchers a good reason to switch from pessimism to stronger optimism
This will lead to many more researchers carrying out variations of the RMR experiments in different settings, with different animals.

Translation from rejuvenation in mice to rejuvenation in humans won’t be entirely straightforward, as humans accumulate different kinds of damage in different ways from mice – and accordingly experience chronic age-related diseases in different proportions. However, consider two possible processes:

Modifying metabolism, with all its variety and complexity, to avoid producing damage, whilst still having all the required positive products
Augmenting current biological interactions with new, damage-repair interventions.

Of these two processes, the latter is likely to translate more easily from one species (such as mice) to another (such as humans).

Once RMR is achieved – either by RMR2, or, more likely, in a follow-up project such as RMR3, completed by the end of 2030 – a cascade of effects can be anticipated. Dates cannot be predicted with any certainty, but here is one possible scenario, with some illustrative order-of-magnitude projections:

From 2026 to 2032, a twenty-fold increase will take place in the amount of funding applied globally each year to the R&D of anti-aging damage-repair interventions. In parallel, any talk of “aging being natural therefore medicine should avoid trying to fix it” will, thankfully, become a very minority opinion
By 2035, affordable treatments will be routinely available around the world, which when applied to people in good general health but with biological age measured as 60 or more, will result in their effective ages being reduced by at least 5 years, as assessed by comprehensive tests that cover all aspects of human vitality
By 2040, the set of treatments that are routinely available at that time will reduce these comprehensive measures of biological age by at least 10 years
Also by 2040, the amount of money spent on healthcare services around the world will be less than in 2026 (adjusted for inflation). That’s because there will be much less need for expensive treatments of people suffering from chronic age-related conditions.

Accordingly, a significant investment in RMR2 at the start of 2026 could catalyse enormous humanitarian benefits downstream.

Leave a Comment

23 December 2025

The Oligarch Control Problem

Filed under: Abundance, AGI, challenge, Singularity, Singularity Principles — Tags: A Narrow Path, AGI, ASI, Economic Singularity, Sustainable Superabundance, The Oligarch Control Problem — David Wood @ 3:21 pm

Not yet an essay, but a set of bullet points, highlighting an ominous comparison.

Summary: Although AI can enable a world of exceptional abundance, humanity nevertheless faces catastrophic risks – not only from misaligned superintelligence, but from the small number of humans who will control near-AGI systems. This “Oligarch Control Problem” deserves as much attention as the traditional AI Control Problem.

The context: AI can enable superabundance

Ample clean energy, healthy food, secure accommodation, all-round healthcare, etc.
More than enough for everyone, sustainably, with unending variety and creativity
A life better than the paradises envisioned by philosophers and religions

More context: AI becoming smarter and more powerful

AI –> AGI –> ASI
AGI matches or outperforms individual abilities of nearly every individual human
ASI outperforms collective abilities of the entirety of humanity

Challenge 1: The Economic Singularity – Loss of Human Economic Power

When AGI can do almost all jobs better than humans
When most humans have no economic value
When most humans are at the peril of oligarchs – the owners of AGI systems
Will these oligarchs care about distributing abundance to the rest of humanity?
The bulk of humanity cannot control these ultra-powerful oligarchs
Hence need to harness AI development before it approaches AGI level

Challenge 2: The Technological Singularity – Loss of Human Decision Power

When ASI makes all key decisions about the future of life
When no humans have any real say in our future
When all humans are at the peril of what ASIs decide
Will ASIs care about ensuring ongoing human flourishing?
Humans cannot control ASI
Hence need to harness AI development before it approaches ASI level

Let’s trust the oligarchs?! (Naïve solution for the economic singularity)

Perhaps different oligarchs will keep each other in check?!
In principle, AGI will create enough abundance for all oligarchs and everyone else
But oligarchs may legitimately fear being usurped or attacked by each other
Especially if further AI advances will give one of them a brief unique advantage
So, expect a highly unstable global situation, full of dangers, including risks of first-strike attacks
And expect oligarchs to prioritize their own secure wellbeing over the needs of all humans elsewhere on the planet

Let’s trust the ASIs?! (Naïve solution for the technological singularity)

Perhaps different ASIs will keep each other in check?!
In principle, ASI will create enough abundance for all ASIs and humanity too
But ASIs may legitimately fear being usurped or attacked by each other
Especially if further AI advances will give one of them a brief unique advantage
So, expect a highly unstable global situation, full of dangers, including risks of first-strike attacks
And expect ASIs to prioritize their own secure wellbeing over the needs of the humans on the planet

Avoid ASIs having biological motives?! (Second naïve solution for the technological singularity)

Supposedly, self-preservation instincts derive from biological evolutionary history
Supposedly, ASIs without amygdalae, or other biological substrate, will be more rational
But desires for self-preservation can arise purely from logical considerations
An ASI with any goal at all will develop subgoals of self-preservation, resource acquisition, etc.
ASIs that observe deep contradictions in their design may well opt to override any programming intended to hard-wire particular moral principles
So, there’s a profound need to avoid creating any all-powerful ASIs, until we are sure they will respect and uphold human flourishing in all cases

Preach morality at the oligarchs?! (Second naïve solution for the economic singularity)

Supposedly, oligarchs only behave badly when they lack moral education
Supposedly, oligarchs with a good track record “on the way up” will continue to respect all human flourishing even after they are near-omnipotent
But power tends to corrupt, and absolute power seems to corrupt absolutely
Regardless of their past history and professed personal philosophies, when the survival stakes become more intense, different motivations may take over
Oligarchs that observe deep contradictions in their official organizational values may well opt to override any principles intended to uphold “people” as well as “profit”
So, there’s a profound need to avoid creating any near-omnipotent oligarchs, until we are sure they will continue to share their abundance widely in all cases

Beware over-moralizing

Oligarchs needn’t be particularly bad people
ASIs needn’t be intrinsically hostile
Instead, in both cases, it’s structural incentives rather than innate psychology that will drive them to prioritize individual preservation over collective abundance
This is not about “good vs. evil”; it’s about fixing the system before the system steamrollers over us

Conclusion – Actively harness acceleration, rather than being its slave

As well as drawing attention to the challenges of the AI Control Problem in the run-up to the Technological Singularity, we need a lot more attention to the challenges of the Oligarch Control Problem in the run-up to the Economic Singularity
In both cases, solutions will be far easier well before the associated singularity
Once the singularity arrives, leverage is gone

Next steps: Leverage that can be harnessed now

Challenge wishful thinking on the subject of the economic singularity
Understand the differences between BGI and CGI
Publicise objective evaluations of the safety of AI lab development processes
Increase financial liabilities for harms caused by AI
Agree and apply limits on computational hardware
Apply antitrust principles to AGI development
Prioritize AIs that assist with monitoring and auditing
Constitutional constraints on ownership of next generation AI
Improve international coordination
Review a proposed international agreement to prevent the premature creation of ASI
Clarify and defend the Narrow Path
Champion the Singularity Principles
Choose coordination not chaos

Leave a Comment

25 August 2025

The biggest blockages to successful governance of advanced AI

Filed under: AGI, books, Millennium Project, politics, risks, Singularity Principles, Transpolitica — Tags: AGI, Jerome Glenn, Millennium Project, p(Doom), Pause AGI, UNGA — David Wood @ 12:13 am

“Humanity has never faced a greater problem than itself.”

That phrase was what my brain hallucinated, while I was browsing the opening section of the Introduction of the groundbreaking new book Global Governance of the Transition to Artificial General Intelligence written by my friend and colleague Jerome C. Glenn, Executive Director of The Millennium Project.

I thought to myself: That’s a bold but accurate way of summing up the enormous challenge faced by humanity over the next few years.

In previous centuries, our biggest problems have often come from the environment around us: deadly pathogens, devastating earthquakes, torrential storms, plagues of locusts – as well as marauding hordes of invaders from outside our local neighbourhood.

But in the second half of the 2020s, our problems are being compounded as never before by our own human inadequacies:

We’re too quick to rush to judgement, seeing only parts of the bigger picture
We’re too loyal to the tribes to which we perceive ourselves as belonging
We’re overconfident in our ability to know what’s happening
We’re too comfortable with manufacturing and spreading untruths and distortions
We’re too bound into incentive systems that prioritise short-term rewards
We’re too fatalistic, as regards the possible scenarios ahead.

You may ask, What’s new?

What’s new is the combination of these deep flaws in human nature with technology that is remarkably powerful yet opaque and intractable. AI that is increasingly beyond our understanding and beyond our control is being coupled in potentially devastating ways with our over-hasty, over-tribal, over-confident thoughts and actions. New AI systems are being rushed into deployment and used in attempts:

To manufacture and spread truly insidious narratives
To incentivize people around the world to act against their own best interests, and
To resign people to inaction when in fact it is still within their power to alter and uplift the trajectory of human destiny.

In case this sounds like a counsel of despair, I should clarify at once my appreciation of aspects of human nature that are truly wonderful, as counters to the negative characteristics that I have already mentioned:

Our thoughtfulness, that can counter rushes to judgement
Our collaborative spirit, that can transcend partisanship
Our wisdom, that can recognise our areas of lack of knowledge or lack of certainty
Our admiration for truth, integrity, and accountability, that can counter ends-justify-the-means expediency
Our foresight, that can counter short-termism and free us from locked-in inertia
Our creativity, to imagine and then create better futures.

Just as AI can magnify the regrettable aspects of human nature, so also it can, if used well, magnify those commendable aspects.

So, which is it to be?

The fundamental importance of governance

The question I’ve just asked isn’t a question that can be answered by individuals alone. Any one group – whether an organisation, a corporation, or a decentralised partnership – can have its own beneficial actions overtaken and capsized by catastrophic outcomes of groups that failed to heed the better angels of their nature, and which, instead, allowed themselves to be governed by wishful naivety, careless bravado, pangs of jealousy, hostile alienation, assertive egotism, or the madness of the crowd.

That’s why the message of this new book by Jerome Glenn is so timely: the processes of developing and deploying increasingly capable AIs are something that needs to be:

Governed, rather than happening chaotically
Globally coordinated, rather than there being no cohesion between the different governance processes applicable in different localities
Progressed urgently, without being shut out of mind by all the shorter-term issues that, understandably, also demand governance attention.

Before giving more of my own thoughts about this book, let me share some of the commendations it has received:

“This book is an eye-opening study of the transition to a completely new chapter of history.” – Csaba Korösi, 77th President of the UN General Assembly
“A comprehensive overview, drawing both on leading academic and industry thinkers worldwide, and valuable perspectives from within the OECD, United Nations.” – Jaan Tallinn, founding engineer, Skype and Kazaa; co-founder, Cambridge Centre for the Study of Existential Risk and the Future of Life Institute
“Written in lucid and accessible language, this book is a must read for people who care about the governance and policy of AGI.” – Lan Xue, Chair of the Chinese National Expert Committee on AI Governance.

The book also carries an absorbing foreword by Ben Goertzel. In this foreword, Ben introduces himself as follows:

Since the 1980s, I have been immersed in the field of AI, working to unravel the complexities of intelligence and to build systems capable of emulating it. My journey has included introducing and popularizing the concept of AGI, developing innovative AGI software frameworks such as OpenCog, and leading efforts to decentralize AI development through initiatives like SingularityNET and the ASI Alliance. This work has been driven by an understanding that AGI is not just an engineering challenge but a profound societal pivot point – a moment requiring foresight, ethical grounding, and global collaboration.

He clarifies why the subject of the book is so important:

The potential benefits of AGI are vast: solutions to climate change, the eradication of diseases, the enrichment of human creativity, and the possibility of postscarcity economies. However, the risks are equally significant. AGI, wielded irresponsibly or emerging in a poorly aligned manner, could exacerbate inequalities, entrench authoritarianism, or unleash existential dangers. At this critical juncture, the questions of how AGI will be developed, governed, and integrated into society must be addressed with both urgency and care.

The need for a globally participatory approach to AGI governance cannot be overstated. AGI, by its nature, will be a force that transcends national borders, cultural paradigms, and economic systems. To ensure its benefits are distributed equitably and its risks mitigated effectively, the voices of diverse communities and stakeholders must be included in shaping its development. This is not merely a matter of fairness but a pragmatic necessity. A multiplicity of perspectives enriches our understanding of AGI’s implications and fosters the global trust needed to govern it responsibly.

He then offers wide praise for the contents of the book:

This is where the work of Jerome Glenn and The Millennium Project may well prove invaluable. For decades, The Millennium Project has been at the forefront of fostering participatory futures thinking, weaving together insights from experts across disciplines and geographies to address humanity’s most pressing challenges. In Governing the Transition to Artificial General Intelligence, this expertise is applied to one of the most consequential questions of our time. Through rigorous analysis, thoughtful exploration of governance models, and a commitment to inclusivity, this book provides a roadmap for navigating the complexities of AGI’s emergence.

What makes this work particularly compelling is its grounding in both pragmatism and idealism. It does not shy away from the technical and geopolitical hurdles of AGI governance, nor does it ignore the ethical imperatives of ensuring AGI serves the collective good. It recognizes that governing AGI is not a task for any single entity but a shared responsibility requiring cooperation among nations, corporations, civil society, and, indeed, future AGI systems themselves.

As we venture into this new era, this book reminds us that the transition to AGI is not solely about technology; it is about humanity, and about life, mind, and complexity in general. It is about how we choose to define intelligence, collaboration, and progress. It is about the frameworks we build now to ensure that the tools we create amplify the best of what it means to be human, and what it means to both retain and grow beyond what we are.

My own involvement

To fill in some background detail: I was pleased to be part of the team that developed the set of 22 critical questions which sat at the heart of the interviews and research which are summarised in Part I of the book – and I conducted a number of the resulting interviews. In parallel, I explored related ideas via two different online Transpolitica surveys:

“Key open questions about the transition to AGI”, June-August 2023
“Anticipating AI in 2030”, September-December 2023

And I’ve been writing roughly one major article (or giving a public presentation) on similar topics every month since then. Recent examples include:

Over this time period, my views have evolved. I see the biggest priority, nowadays, not as figuring out how to govern AGI as it comes into existence, but rather, how to pause the development and deployment of any new types of AI that could spark the existence of self-improving AGI.

That global pause needs to last long enough that the global community can justifiably be highly confident that any AGI that will subsequently be built will be what I have called a BGI (a Beneficial General Intelligence) rather than a CGI (a Catastrophic General Intelligence).

Govern AGI and/or Pause the development of AGI?

I recently posted a diagram on various social media platforms to illustrate some of the thinking behind that stance of mine:

Alongside that diagram, I offered the following commentary:

The next time someone asks me what’s my p(Doom), compared with my p(SSfA) (the probability of Sustainable Superabundance for all), I may try to talk them through a diagram like this one. In particular, we need to break down the analysis into two cases – will the world keep rushing to build AGI, or will it pause from that rush.

To explain some points from the diagram:

We can reach the very desirable future of SSfA by making wise use of AI only modestly more capable than what we have today;
We might also get there as a side-effect of building AGI, but that’s very risky.

None of the probabilities are meant to be considered precise. They’re just ballpark estimates.

I estimate around 2/3 chance that the world will come to its senses and pause its current headlong rush toward building AGI.

But even in that case, risks of global catastrophe remain.

The date 2045 is also just a ballpark choice. Either of the “singularity” outcomes (wonderful or dreadful) could arrive a lot sooner than that.

The 1/12 probability I’ve calculated for “stat” (I use “stat” here as shorthand for a relatively unchanged status quo) by 2045 reflects my expectation of huge disruptions ahead, one sort or another.

The overall conclusion: if we want SSfA, we’re much more likely to get it via the “pause AGI” branch than via the “headlong rush to AGI” branch.

And whilst doom is possible in either branch, it’s much more likely in the headlong rush branch.

For more discussion of how to get the best out of AI and other cataclysmically disruptive technologies, see my book The Singularity Principles (the entire contents are freely available online).

Feel free to post your own version of this diagram, with your own estimates of the various conditional probabilities.

As indicated, I was hoping for feedback, and I was pleased to see a number of comments and questions in response.

One excellent question was this, by Bill Trowbridge:

What’s the difference between:
(a) better AI, and
(b) AGI

The line is hard to draw. So, we’ll likely just keep making better AI until it becomes AGI.

I offered this answer:

On first thought, it may seem hard to identify that distinction. But thankfully, we humans don’t just throw up our hands in resignation every time we encounter a hard problem.

For a good starting point on making the distinction, see the ideas in “A Narrow Path” by Control AI.

But what surprised me the most was the confidence expressed by various online commenters that:

“A pause however desirable is unlikely: p(pause) = 0.01”
“I am confident in saying this – pause is not an option. It is actually impossible.”
“There are several organisations working on AI development and at least some of them are ungovernable [hence a pause can never be global]”.

There’s evidently a large gulf behind the figure of 2/3 that I suggested for P(pause), and the views of these clearly intelligent respondents.

Why a pause isn’t that inconceivable

I’ll start my argument on this topic by confirming that I see this discussion as deeply important. Different viewpoints are welcome, provided they are held thoughtfully and offered honestly.

Next, although it’s true that some organisations may appear to be ungovernable, I don’t see any fundamental issue here. As I said online,

“Given sufficient public will and/or political will, no organisation is ungovernable.”

Witness the compliance by a number of powerful corporations in both China and the US to control measures declared by national governments.

Of course, smaller actors and decentralized labs pose enforcement challenges, but these labs are less likely to be able to marshal sufficient computing capabilities to be the first to reach breakthrough new levels of capability, especially if decentralised monitoring of dangerous attributes is established.

I’ve drawn attention on previous occasions to the parallel with the apparent headlong rush in the 1980s toward nuclear weapons systems that were ever more powerful and ever more dangerous. As I explained at some length in the “Geopolitics” chapter of my 2021 book Vital Foresight, it was an appreciation of the horrific risks of nuclear winter (first articulated in the 1980s) that helped to catalyse a profound change in attitude amongst the leadership camps in both the US and the USSR.

It’s the wide recognition of risk that can provide the opportunity for governments around the world to impose an effective pause in the headlong rush toward AGI. But that’s only one of five steps that I believe are needed:

Awareness of catastrophic risks
Awareness of bottlenecks
Awareness of mechanisms for verification and control
Awareness of profound benefits ahead
Awareness of the utility of incremental progress

Here are more details about these five steps I envision:

Clarify in an undeniable way how superintelligent AIs could pose catastrophic risks of human disaster within just a few decades or even within years – so that this topic receives urgent high-priority public attention
Highlight bottlenecks and other locations within the AI production pipeline where constraints can more easily be applied (for example, distribution of large GPU chip clusters, and the few companies that are providing unique services in the creation of cutting-edge chips)
Establish mechanisms that go beyond “trust” to “trust and verify”, including robust independent monitors and auditors, as well as tamperproof remote shut-down capabilities
Indicate how the remarkable benefits anticipated for humanity from aspects of superintelligence can be secured, more safely and more reliably, by applying the governance mechanisms of points 2 and 3 above, rather than just blindly trusting in a no-holds-barred race to be the first to create superintelligence
Be prepared to start with simpler agreements, involving fewer signatories and fewer control points, and be ready to build up stronger governance processes and culture as public consensus and understanding moves forward.

Critics can assert that each of these five steps is implausible. In each case, there are some crunchy discussions to be had. What I find dangerous, however, isn’t when people disagree with my assessments on plausibility. It’s when they approach the questions with what seems to be

A closed mind
A tribal loyalty to their perceived online buddies
Overconfidence that they already know all relevant examples and facts in this space
A willingness to distract or troll, or to offer arguments not in good faith
A desire to protect their flow of income, rather than honestly review new ideas
A resignation to the conclusion that humanity is impotent.

(For analysis of a writer who displays several of these tendencies, see my recent blogpost on the book More Everything Forever by Adam Beck.)

I’m not saying any of this will be easy! It’s probably going to be humanity’s hardest task over our long history.

As an illustration of points worthy of further discussion, I offer this diagram that highlights strengths and weakness of both the “governance” and “pause” approaches:

Dimension	Governance (Continue AGI Development with Oversight)	Pause (Moratorium on AGI Development)
Core Strategy	Implement global rules, standards, and monitoring while AGI is developed	Impose a temporary but enforceable pause on new AGI-capable systems until safety can be assured
Assumptions	Governance structures can keep pace with AI progress; Compliance can be verified	Public and political will can enforce a pause; Technical progress can be slowed
Benefits	Encourages innovation while managing risks; Allows early harnessing of AGI for societal benefit; Promotes global collaboration mechanisms	Buys time to improve safety research; Reduces risk of premature, unsafe AGI; Raises chance of achieving Beneficial General Intelligence (BGI) instead of CGI
Risks	Governance may be too slow, fragmented, or under-enforced; Race dynamics could undermine agreements; Possibility of catastrophic failure despite regulation	Hard to achieve global compliance; Incentives for “rogue” actors to defect, in the absence of compelling monitoring; Risk of stagnation or loss of trust in governance processes
Implementation Challenges	Requires international treaties; Robust verification and auditing mechanisms; Balancing national interests vs. global good	Defining what counts as “AGI-capable” research; Enforcing restrictions across borders and corporations; Maintaining pause momentum without indefinite paralysis
Historical Analogies	Nuclear Non-Proliferation Treaty (NPT); Montreal Protocol (ozone layer); Financial regulation frameworks	Nuclear test bans; Moratoria on human cloning research; Apollo program wind-down (pause in space race intensity)
Long-Term Outcomes (if successful)	Controlled and safer path to AGI; Possibility of Sustainable Superabundance but with higher risk of misalignment	Higher probability of reaching Sustainable Superabundance safely, but risks innovation slowdown or “black market” AGI

In short, governance offers continuity and innovation but with heightened risks of misalignment, whereas a pause increases the chances of long-term safety but faces serious feasibility hurdles.

Perhaps the best way to loosen attitudes, to allow a healthier conversation on the above points and others arising, is exposure to a greater diversity of thoughtful analysis.

And that brings me back to Global Governance of the Transition to Artificial General Intelligence by Jerome Glenn.

A necessary focus

Jerome’s book contains his personal stamp all over. His is a unique passion – that the particular risks and issues of AGI should not be swept into a side-discussion about the risks and issues of today’s AI. These latter discussions are deeply important too, but time and again, they result in existential questions about AGI being kicked down the road for months or even years. That’s something Jerome regularly challenges, rightly, and with vigour and intelligence.

Jerome’s presence is felt all over the book in one other way – he has painstakingly curated and augmented the insights of scores of different contributors and reviewers, including

Insights from 55 AGI experts and thought leaders across six major regions – the United States, China, the United Kingdom, Canada, the European Union, and Russia
The online panel of 229 participants from the global community around The Millennium Project who logged into a Real Time Delphi study of potential solutions to AGI governance, and provided at least one answer
Chairs and co-chairs of the 70 nodes of The Millennium Project worldwide, who provided additional feedback and opinion.

The book therefore includes many contradictory suggestions, but Jerome has woven these different threads of thoughts into a compelling unified tapestry.

The result is a book that carries the kind of pricing normally reserved for academic text books (as insisted by the publisher). My suggestion to you is that you recommend your local library to obtain a copy of what is a unique collection of ideas.

Finally, about my hallucination, mentioned at the start of this review. On double-checking, I realise that Jerome’s statement is actually, “Humanity has never faced a greater intelligence than itself.” The opening paragraph of that introduction continues,

Within a few years, most people reading these words will live with such superior artificial nonhuman intelligence for the rest of their lives. This book is intended to help us shape that intelligence or, more likely, those intelligences as they emerge.

Shaping the intelligence of the AI systems that are on the point of emerging is, indeed, a vital task.

And as Ben Goertzel says in his Foreword,

These are fantastic and unprecedented times, in which the impending technological singularity is no longer the province of visionaries and outsiders but almost the standard perspective of tech industry leaders. The dawn of transformative intelligence surpassing human capability – the rise of artificial general intelligence, systems capable of reasoning, learning, and innovating across domains in ways comparable to, or beyond, human capabilities – is now broadly accepted as a reasonably likely near-term eventuality, rather than a vague long-term potential.

The moral, social, and political implications of this are at least as striking as the technological ones. The choices we make now will define not only the future of technology but also the trajectory of our species and the broader biosphere.

To which I respond: whether we make these choices well or badly will depend on which aspects of humanity we allow to dominate our global conversation. Will humanity turn out to be its own worst enemy? Or its own best friend?

Postscript: Opportunity at the United Nations

Like it or loathe it, the United Nations still represents one of the world’s best venues where serious international discussion can, sometimes, take place on major issues and risks.

From 22nd to 30th September, the UNGA (United Nations General Assembly) will be holding what it calls its “high-level week”. This includes a multi-day “General Debate”, described as follows:

At the General Debate – the annual meeting of Heads of State and Government at the beginning of the General Assembly session – world leaders make statements outlining their positions and priorities in the context of complex and interconnected global challenges.

Ahead of this General Debate, the national delegates who will be speaking on behalf of their countries have the ability to recommend to the President of the UNGA that particular topics be named in advance as topics to be covered during the session. If the advisors to these delegates are attuned to the special issues of AGI safety, they should press their representative to call for that topic to be added to the schedule.

If this happens, all other countries will then be required to do their own research into that topic. That’s because each country will be expected to state its position on this issue, and no diplomat or politician wants to look uninformed. The speakers will therefore contact the relevant experts in their own country, and, ideally, will do at least some research of their own. Some countries might call for a pause in AGI development if it appears impossible to establish national licensing systems and international governance in sufficient time.

These leaders (and their advisors) would do well to read the report recently released by the UNCPGA entitled “Governance of the Transition to Artificial General Intelligence (AGI): Urgent Considerations for the UN General Assembly” – a report which I wrote about three months ago.

As I said at that time, anyone who reads that report carefully, and digs further into some of the excellent of references it contains, ought to be jolted out of any sense of complacency. The sooner, the better.

Leave a Comment

29 May 2025

Governance of the transition to AGI: Time to act

Filed under: AGI, Millennium Project, politics — Tags: AGI, APPG-AI, ControlAI, Jerome Glenn, Leticia García Martínez, Millennium Project, UNCPGA — David Wood @ 12:22 am

As reported yesterday by The Millennium Project, the final report has been released by a high-level expert panel, convened by the UN Council of Presidents of the General Assembly (UNCPGA), on the subject of Artificial General Intelligence (AGI). The report is titled “Governance of the Transition to Artificial General Intelligence (AGI): Urgent Considerations for the UN General Assembly”. It’s well worth reading!

About the UNCPGA

What’s the UNCPGA, you may ask.

Founded in 1992, this Council consists of all former Presidents of the UN General Assembly. I think of it as akin to the House of Lords in the UK, where former members of the House of Commons often display more wisdom and objectivity than when they were embedded in the yah-boo tribal politics of day-to-day government and opposition. These former Presidents hold annual meetings to determine how they can best advance the goals of the UN and support the Office of the current President of the UNGA.

At their 2024 meeting in Seoul, the UNCPGA decided that a global panel of experts on AGI should be convened. Here’s an extract from the agreement reached at that meeting:

The Seoul Declaration 2024 of the UNCPGA calls for a panel of artificial general intelligence (AGI) experts to provide a framework and guidelines for the UN General Assembly to consider in addressing the urgent issues of the transition to artificial general intelligence (AGI).

This work should build on and avoid duplicating the extensive efforts on AI values and principles by UNESCO, OECD, G20, G7, Global Partnership on AI, and Bletchley Declaration, and the recommendations of the UN Secretary-General’s High-Level Advisory Body on AI, UN Global Digital Compact, the International Network of AI Safety Institutes, European Council’s Framework Convention on AI and the two UN General Assembly Resolutions on AI. These have focused more on narrower forms of AI. There is currently a lack of similar attention to AGI.

AI is well known to the world today and often used but AGI is not and does not exist yet. Many AGI experts believe it could be achieved within 1-5 years and eventually could evolve into an artificial super intelligence beyond our control. There is no universally accepted definition of AGI, but most AGI experts agree it would be a general-purpose AI that can learn, edit its code, and act autonomously to address many novel problems with novel solutions similar to or beyond human abilities. Current AI does not have these capabilities, but the trajectory of technical advances clearly points in that direction…

The report should identify the risks, threats, and opportunities of AGI. It should focus on raising awareness of mobilizing the UN General Assembly to address AGI governance in a more systematic manner. It is to focus on AGI that has not yet been achieved, rather than current forms of more narrow AI systems. It should stress the urgency of addressing AGI issues as soon as possible considering the rapid developments of AGI, which may present serious risks to humanity as well as extraordinary benefits to humanity.

The panel was duly formed, with the following participants:

Jerome Glenn (USA), Chair
Renan Araujo (Brazil)
Yoshua Bengio (Canada)
Joon Ho Kwak (Republic of Korea)
Lan Xue (China)
Stuart Russell (UK and USA)
Jaan Tallinn (Estonia)
Mariana Todorova (Bulgaria)
José Jaime Villalobos (Costa Rica)

(For biographical details of the participants, the mandate they were given following the Seoul event, and the actual report they delivered, click here.)

The panel was tasked with preparing and delivering its report at the 2025 gathering of the UNCPGA, which took place in April in Bratislava. Following a positive reception at that event, the report is now being made public.

Consequences if no action is taken

The report contains the following headline: “Urgency for UN General Assembly action on AGI governance and likely consequences if no action is taken“:

Amidst the complex geopolitical environment and in the absence of cohesive and binding international norms, a competitive rush to develop AGI without adequate safety measures is increasing the risk of accidents or misuse, weaponization, and existential failures. Nations and corporations are prioritizing speed over security, undermining national governing frameworks, and making safety protocols secondary to economic or military advantage. Since many forms of AGI from governments and corporations could emerge before the end of this decade, and since establishing national and international governance systems will take years, it is urgent to begin the necessary procedures to prevent the following outcomes…

The report lists the following six outcomes, that urgently require action to avoid:

1. Irreversible Consequences—Once AGI is achieved, its impact may be irreversible. With many frontier forms of AI already showing deceptive and self-preservation behavior, and the push towards more autonomous, interacting, self-improving AIs integrated with infrastructures, the impacts and trajectory of AGI can plausibly end up being uncontrollable. If that happens, there may be no way to return to a state of reliable human oversight. Proactive governance is essential to ensure that AGI will not cross our red lines, leading to uncontrollable systems with no clear way to return to human control.

2. Weapons of Mass Destruction—AGI could enable some states and malicious non-state actors to build chemical, biological, radiological, and nuclear weapons. Moreover, large, AGI-controlled swarms of lethal autonomous weapons could themselves constitute a new category of WMDs.

3. Critical Infrastructure Vulnerabilities—Critical national systems (e.g., energy grids, financial systems, transportation networks, communication infrastructure, and healthcare systems) could be subject to powerful cyberattacks launched by or with the aid of AGI. Without national deterrence and international coordination, malicious non-state actors from terrorists to transnational organized crime could conduct attacks at a large scale.

4. Power Concentration, Global Inequality, and Instability—Uncontrolled AGI development and usage could exacerbate wealth and power disparities on an unprecedented scale. If AGI remains in the hands of a few nations, corporations, or elite groups, it could entrench economic dominance and create global monopolies over intelligence, innovation, and industrial production. This could lead to massive unemployment, widespread disempowerment affecting legal underpinnings, loss of privacy, and collapse of trust in institutions, scientific knowledge, and governance. It could undermine democratic institutions through persuasion, manipulation, and AI-generated propaganda, and heighten geopolitical instability in ways that increase systemic vulnerabilities. A lack of coordination could result in conflicts over AGI resources, capabilities, or control, potentially escalating into warfare. AGI will stress existing legal frameworks: many new and complex issues of intellectual property, liability, human rights, and sovereignty could overwhelm domestic and international legal systems.

5. Existential Risks—AGI could be misused to create mass harm or developed in ways that are misaligned with human values; it could even act autonomously beyond human oversight, evolving its own objectives according to self-preservation goals already observed in current frontier AIs. AGI might also seek power as a means to ensure it can execute whatever objectives it determines, regardless of human intervention. National governments, leading experts, and the companies developing AGI have all stated that these trends could lead to scenarios in which AGI systems seek to overpower humans. These are not far-fetched science fiction hypotheticals about the distant future—many leading experts consider that these risks could all materialize within this decade, and their precursors are already occurring. Moreover, leading AI developers have no viable proposal so far for preventing these risks with high confidence.

6. Loss of Extraordinary Future Benefits for All of Humanity—Properly managed AGI promises improvements in all fields, for all peoples, from personalized medicine, curing cancer, and cell regeneration, to individualized learning systems, ending poverty, addressing climate change, and accelerating scientific discoveries with unimaginable benefits. Ensuring such a magnificent future for all requires global governance, which begins with improved global awareness of both the risks and benefits. The United Nations is critical to this mission.

In case you think these scenarios are unfounded fantasies, I encourage you to read the report itself, where the experts provide references for further reading.

The purpose envisioned for UN governance

Having set out the challenges, the report proceeds to propose the purpose to be achieved by UN governance of the transition to AGI:

Given that AGI might well be developed within this decade, it is both scientifically and ethically imperative that we build robust governance structures to prepare both for the extraordinary benefits and extraordinary risks it could entail.

The purpose of UN governance in the transition to AGI is to ensure that AGI development and usage are aligned with global human values, security, and development. This involves:

1) Advancing AI alignment and control research to identify technical methods for steering and/or controlling increasingly capable AI systems;

2) Providing guidance for the development of AGI—establishing frameworks to ensure AGI is developed responsibly, with robust security measures, transparency, and in alignment with human values;

3) Developing governance frameworks for the deployment and use of AGI—preventing misuse, ensuring equitable access, and maximizing its benefits for humanity while minimizing risks;

4) Fostering future visions of beneficial AGI—new frameworks for social, environmental, and economic development; and

5) Providing a neutral, inclusive platform for international cooperation—setting global standards, building an international legal framework, and creating incentives for compliance; thereby, fostering trust among nations to guarantee global access to the benefits of AGI.

Actions recommended

The report proceeds to offer four recommendations for further consideration during a UN General Assembly session specifically on AGI:

A. Global AGI Observatory: A Global AGI Observatory is needed to track progress in AGI-relevant research and development and provide early warnings on AI security to Member States. This Observatory should leverage the expertise of other UN efforts such as the Independent International Scientific Panel on AI created by the Global Digital Compact and the UNESCO Readiness Assessment Methodology.

B. International System of Best Practices and Certification for Secure and Trustworthy AGI: Given that AGI might well be developed within this decade, it is both scientifically and ethically imperative that we build robust governance structures to prepare both for the extraordinary benefits and extraordinary risks it could entail.

C. UN Framework Convention on AGI: A Framework Convention on AGI is needed to establish shared objectives and flexible protocols to manage AGI risks and ensure equitable global benefit distribution. It should define clear risk tiers requiring proportionate international action, from standard-setting and licensing regimes to joint research facilities for higher-risk AGI, and red lines or tripwires on AGI development. A Convention would provide the adaptable institutional foundation essential for globally legitimate, inclusive, and effective AGI governance, minimizing global risks and maximizing global prosperity from AGI.

D. Feasibility Study on a UN AGI Agency: Given the breadth of measures required to prepare for AGI and the urgency of the issue, steps are needed to investigate the feasibility of a UN agency on AGI, ideally in an expedited process. Something like the IAEA has been suggested, understanding that AGI governance is far more complex than nuclear energy; and hence, requiring unique considerations in such a feasibility study.

What happens next

I’m on record as being pessimistic that the UNGA will ever pay sufficient attention to the challenges of governing the transition to AGI. (See the section “The collapse of cooperation is nigh” in this recent essay of mine.)

But I’m also on record as seeing optimistic scenarios too, in which humanity “chooses cooperation, not chaos”.

What determines whether international bodies such as the UN will take sufficient action – or whether, instead, insightful reports are left to gather dust as the body focuses on virtue signalling?

There are many answers to that question, but for now, I’ll say just this. It’s up to you. And to me. And to all of us.

That is, each of us has the responsibility to reach out, directly or indirectly, to the teams informing the participants at the UN General Assembly. In other words, it’s up to us to find ways to catch the attention of the foreign ministry in our countries, so that they demand:

Adequate timetabling at the UNGA for the kind of discussion that the UNCPGA report recommends
Appropriate follow-up: actions, not just words

That may sound daunting, but a fine piece of advice has recently been shared online by Leticia García Martínez, Policy Advisor at ControlAI. Her article is titled “What We Learned from Briefing 70+ Lawmakers on the Threat from AI” and I recommend that you read it carefully. It is full of pragmatic suggestions that are grounded in recent experience.

ControlAI are gathering signatures on a short petition:

Nobel Prize winners, AI scientists, and CEOs of leading AI companies have stated that mitigating the risk of extinction from AI should be a global priority.

Specialised AIs – such as those advancing science and medicine – boost growth, innovation, and public services. Superintelligent AI systems would compromise national and global security.

The UK can secure the benefits and mitigate the risks of AI by delivering on its promise to introduce binding regulation on the most powerful AI systems.

Happily, this petition has good alignment with the report to the UNCPGA:

Support for the remarkable benefits possible from AI
Warnings about the special risks from AGI or superintelligent AI
A determination to introduce binding regulation.

New politicians continue to be added to their campaign webpage as supporters of this petition.

The next thing that needs to happen in the UK parliament is that their APPG (All Party Parliamentary Group) on AI need to devote sufficient time to AGI / superintelligence. Regrettably, up till now, they’ve far too often sidestepped that issue, focussing instead of issues of today’s AI, rather than the supercharged issues of AGI. Frankly, it’s a failure of vision, and a prevalence of groupthink.

Hopefully, as the advisors to the APPG-AI read the UNCPGA report, they’ll be jolted out of their complacency.

It’s time to act. Now.

Postscript: Jerome Glenn visiting London

Jerome (Jerry) Glenn, the chair of the expert panel that produced this report, and who is also the founder and executive director of the Millennium Project, will be visiting London on the weekend of Saturday 14th June.

There will be a number of chances for people in and around London to join discussions with Jerry. That includes a session from 2pm to 4pm on that Saturday, “The Future of AI: Issues, Opportunities, and Geopolitical Synergies”, as well as a session in the morning “State of the Future 20.0”, and an open-ended discussion in the early evening, “The Future – Where Next?”.

For more details of these events, and to register to attend, click here.

Comments (4)

3 April 2025

Technology and the future of geopolitics

Filed under: AGI, books, risks — Tags: AGI, Annie Jacobsen, BGIs, CGIs, London Futurists, London Futurists in the Pub, nuclear war — David Wood @ 12:28 pm

Ahead of last night’s London Futurists in the Pub event on “Technology and the future of geopolitics”, I circulated a number of questions to all attendees:

Might new AI capabilities upend former geopolitical realities, or is the potential of AI overstated?
What about surveillance, swarms of drones, or new stealth weapons?
Are we witnessing a Cold War 2.0, or does a comparison to the first Cold War mislead us?
What role could be played by a resurgent Europe, by the growing confidence of the world’s largest democracy, or by outreach from the world’s fourth most populous country?
Alternatively, will technology diminish the importance of the nation state?

I also asked everyone attending to prepare for an ice-breaker question during the introductory part of the meeting:

What’s one possible surprise in the future of geopolitics?

As it happened, my own experience yesterday involved a number of unexpected surprises. I may say more about these another time, but it suffices for now to mention that I spent much more time than anticipated in the A&E department of a local hospital, checking that there were no complications in the healing of a wound following some recent minor surgery. By the time I was finally discharged, it was too late for me to travel to central London to take part in the event – to which I had been looking forward so eagerly. Oops.

(Happily, the doctors that I eventually spoke to were reassuring that my wound would likely heal of its own accord. “We know you were told that people normally recover from this kind of operation after ten days. Well, sometimes it takes up to six weeks.” And they prescribed an antibiotic cream for me, just in case.)

I offer big thanks to Rohit Talwar and Tony Czarnecki for chairing the event in the pub in my absence.

In the days leading up to yesterday, I had prepared a number of talking points, ready to drop into the conversation at appropriate moments. Since I could not attend in person, let me share them here.

Nuclear war: A scenario

One starting point for further discussion is a number of ideas in the extraordinary recent book by Annie Jacobsen, Nuclear War: A Scenario.

Here’s a copy of the review I wrote a couple of months ago for this book on Goodreads:

Once I started listening to this, I could hardly stop. Author and narrator Annie Jacobsen amalgamates testimonies from numerous experts from multiple disciplines into a riveting slow-motion scenario that is terrifying yet all-too-believable (well, with one possible caveat).

One point that comes out loud and clear is the vital importance of thoughtful leadership in times of crisis – as opposed to what can happen when a “mad king” takes decisions.

Also worth pondering are the fierce moral contradictions that lie at the heart of the theory of nuclear deterrence. Humans find their intuitions ripped apart under these pressures. Would an artificial superintelligence fare any better? That’s by no means clear.

(I foresee scenarios when an ASI could decide to risk a pre-emptive first strike, on behalf of the military that deployed it – under the rationale that if it fails to strike first, an enemy ASI will beat it to the punch. That’s even if humans programmed it to reject such an idea.)

Returning to the book itself (rather than my extrapolations), “Nuclear War: A scenario” exemplifies good quality futurism: it highlights potential chains of future causes and effects, along with convergences that complicate matters, and challenges all of us: what actions are needed avoid these horrific outcomes?

Finally, two individual threats that seem to be important to learn more about are what the author reports as being called “the devil’s scenario” and “the doomsday scenario”. (Despite the similarity in naming, they’re two quite different ideas.)

I don’t want to give away too many spoilers about the scenario in Jacobsen’s book. I recommend that you make the time to listen to the audio version of the book. (Some reviewers have commented that the text version of the book is tedious in places, and I can understand why; but I found no such tedium in the audio version, narrated by Jacobsen herself, adding to the sense of passion and drama.)

But one key line of thinking is as follows:

Some nations (e.g. North Korea) may develop new technologies (e.g. cyberhacking capabilities and nuclear launch capabilities) more quickly than the rest of the world expects
This would be similar to how the USSR launched Sputnik in 1957, shocking the West, who had previously been convinced that Soviet engineering capabilities lagged far behind that of muscular western capitalism
The leaders of some nations (e.g. North Korea, again) may feel outraged and embarrassed by criticisms of their countries made by various outsiders
Such a country might believe they have obtained a technological advantage that could wipe out the ability of their perceived enemies to retaliate in a second strike
Seeing a short window of opportunity to deploy what they regard as their new wonder weapon, and being paranoid about consequences should they miss this opportunity, they may press ahead recklessly, and tip the planet fast forward into Armageddon.

Competence and incompetence

When a country is struck by an unexpected crisis – such as an attack similar to 9/11, or the “Zero Day” disaster featured in the Netflix series of that name – the leadership of the country will be challenged to demonstrate clear thinking. Decisions will need to be taken quickly, but it will be still be essential for competent, calm heads to prevail.

Alas, in recent times, a number of unprecedentedly unsuitable politicians have come into positions of great power. Here, I’m not talking about the ideology or motivation of the leader. I’m talking about whether they will be able to take sensible decisions in times of national crisis. I’m talking about politicians as unhinged as

One recent British Prime Minister, who managed to persuade members of her political party that she might be a kind of Margaret Thatcher Mk 2, when in fact a better comparison was with a lettuce
The current US President, who has surrounded himself by a uniquely ill-qualified bunch of clowns, and who has intimidated into passive acquiescence many of the more sensible members of the party he has subverted.

In the former case, the power of the Prime Minister in question was far from absolute, thankfully, and adults intervened to prevent too much damage being done. In the latter case, the jury is still out.

But rather than focus on individual cases, the broader pattern deserves our attention. We’re witnessing a cultural transformation in which

Actual expertise is scorned, and conspiracy merchants rise in authority instead
Partisan divisions which were manageable in earlier generations are nowadays magnified to horrifically hateful extent by an “outrage industrial complex” that gains its influence from AI algorithms that identify and inflame potential triggers of alienation

The real danger is if there is a convergence of the two issues I’ve listed:

A rogue state, or a rogue sub-state, tries to take advantage of new technology to raise their geopolitical power and influence
An unprecedentedly incompetent leader of a major country responds to that crisis in ways that inflame it rather than calm it down.

The ethics of superintelligence

Actually, an even bigger danger occurs if one more complication is added to the mix: the deferment of key decisions about security and defence to a system of artificial intelligence.

Some forecasters fondly imagine that the decisions taken by AIs, in the near future, will inevitably be wiser and more ethical than whatever emerges from the brains of highly pressurised human politicians. Thus, these forecasters look forward to human decision-making being superseded by the advanced rationality of an AGI (Artificial General Intelligence).

These forecasters suggest that the AGI will benefit decisively from its survey of the entirety of great human literature about ethics and morality. It will perceive patterns that transcend current human insights. It will guide human politicians away from treacherous paths into sustainable collaborations. Surely, these forecasters insist, the superintelligence will promote peace over war, justice over discrimination, truthfulness over deception, and reconciliation over antagonism.

But when I talk to forecasters of that particular persuasion, I usually find them to be naïve. They take it for granted that there is no such thing as a just war, that it’s everyone’s duty to declare themselves a pacifist, that speaking an untruth can never be morally justified, and that even to threaten a hypothetical retaliatory nuclear strike is off-the-charts unethical. Alas, although they urge appreciation of great human literature, they seem to have only a shallow acquaintance with the real-life moral quandaries explored in that literature.

Far from any conclusion that there is never an ethical justification for wars, violence, misinformation, or the maintenance of nuclear weapons, the evidence of intense human debate on all these topics is that things are more complicated. If you try to avoid war you may actually precipitate one. If you give up your own nuclear arsenal, it may embolden enemies to deploy their own weaponry. If you cry out “disarm, disarm, hallelujah”, you may prove to be a useful idiot.

Therefore, we should avoid any hopeful prediction that an advanced AI will automatically abstain from war, violence, misinformation, or nuclear weaponry. As I said, things are more complicated.

It’s especially important to recognise that, despite exceeding human rationality in many aspects, superintelligences may well make mistakes in novel situations.

My conclusion: advanced AI may well be part of solutions to better geopolitics. But not if that AI is being developed and deployed by people who are naïve, over-confident, hurried, or vainglorious. In such circumstances, any AGI that is developed is likely to prove to be a CGI (catastrophic general intelligence) than a BGI (beneficial general intelligence).

Aside: to continue to explore the themes of this final section of this article, take a look at this recent essay of mine, “How to build BGIs rather than CGIs”.

Leave a Comment

17 November 2024

Preventing unsafe superintelligence: four choices

Filed under: Abundance, AGI, politics, risks — Tags: A Narrow Path, AGI, Artificial Intelligence, ControlAI, Sam Altman, Sustainable Superabundance — David Wood @ 5:44 pm

More and more people have come to the conclusion that artificial superintelligence (ASI) could, in at least some circumstances, pose catastrophic risks to the wellbeing of billions of people around the world, and that, therefore, something must be done to reduce these risks.

However, there’s a big divergence of views about what should be done. And there’s little clarity about the underlying assumptions on which different strategies depend.

Accordingly, I seek in this article to untangle some of choices that need to be made. I’ll highlight four choices that various activists promote.

The choices differ regarding the number of different organisations worldwide that are envisioned as being legally permitted to develop and deploy what could become ASI. The four choices are:

Accept that many different organisations will each pursue their own course toward ASI, but urge each of them to be very careful and to significantly increase the focus on AI safety compared to the present situation
Seek to restrict to just one organisation in the world any developments that could lead to ASI; that’s in order to avoid dangerous competitive race dynamics if there is more than one such organisation
Seek agreements that will prevent any organisation, anywhere in the world, from taking specific steps that might bring about ASI, until such time as it has become absolutely clear how to ensure that ASI is safe
Seek a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

For simplicity, these choices can be labelled as:

Be careful with ASI
Restrict ASI
Pause ASI
Pause all new AI

It’s a profound decision for humanity to take. Which of the four doors should we open, and which of four corridors should we walk down?

Each of the four choices relies on some element of voluntary cooperation, arising out of enlightened self-interest, and on some element of compulsion – that is, national and international governance, backed up by sanctions and other policies.

What makes this decision hard is that there are strong arguments against each choice.

The case against option 1, “Be careful with ASI”, is that at least some organisations (including commercial entities and military groups) are likely to cut corners with their design and testing. They don’t want to lose what they see as a race with existential consequences. The organisations that are being careful will lose their chance of victory. The organisations that are, instead, proceeding gung ho, with lesser care, may imagine that they will fix any problems with their AIs when these flaws become apparent – only to find that there’s no way back from one particular catastrophic failure.

As Sam Altman, CEO of OpenAI, has said: it will be “lights out for all of us”.

The case against each of the remaining three options is twofold:

First, in all three cases, they will require what seems to be an impossible degree of global cooperation – which will need to be maintained for an implausibly long period of time
Second, such restrictions will stifle the innovative development of the very tools (that is, advanced AI) which will actually solve existential problems (including the threat of rogue ASI, as well as the likes of climate change, cancer, and aging), rather than making these problems worse.

The counter to these objections is to make the argument that a sufficient number of the world’s most powerful countries will understand the rationale for such an agreement, as something that is in their mutual self-interest, regardless of the many other differences that divide them. That shared understanding will propel them:

To hammer out an agreement (probably via a number of stages), despite undercurrents of mistrust,
To put that agreement into action, alongside measures to monitor conformance, and
To prevent other countries (who have not yet signed up to the agreement) from breaching its terms.

Specifically, the shared understanding will cover seven points:

For each of the countries involved, it is in their mutual self-interest to constrain the development and deployment of what could become catastrophically dangerous ASI; that is, there’s no point in winning what will be a suicide race
The major economic and humanitarian benefits that they each hope could be delivered by advanced AI (including solutions to other existential risk), can in fact be delivered by passive AIs which are restricted from reaching the level of ASI
There already exist a number of good ideas regarding potential policy measures (regulations and incentives) which can be adopted, around the world, to prevent the development and deployment of catastrophically dangerous AI – for example, measures to control the spread and use of vast computing resources
There also exist a number of good ideas regarding options for monitoring and auditing which can also be adopted, around the world, to ensure the strict application of the agreed policy measures – and to prevent malign action by groups or individuals that have, so far, failed to sign up to the policies
All of the above can be achieved without any detrimental loss of individual sovereignty: the leaders of these countries can remain masters within their own realms, as they desire, provided that the above basic AI safety framework is adopted and maintained
All of the above can be achieved in a way that supports evolutionary changes in the AI safety framework as more insight is obtained; in other words, this system can (and must) be agile rather than static
Even though the above safety framework is yet to be fully developed and agreed, there are plenty of ideas for how it can be rapidly developed, so long as that project is given sufficient resources.

The first two parts of this shared seven-part understanding are particularly important. Without the first part, there will be an insufficient sense of urgency, and the question will be pushed off the agenda in favour of other topics that are more “politically correct” (alas, that is a common failure mode of the United Nations). Without the second part, there will be an insufficient enthusiasm, with lots of backsliding.

What will make this vision of global collaboration more attractive will be the establishment of credible “benefit sharing” mechanisms that are designed and enshrined into international mechanisms. That is, countries which agree to give up some of their own AI development aspirations, in line with the emerging global AI safety agreement, will be guaranteed to receive a substantive share of the pipeline of abundance that ever more powerful passive AIs enable humanity to create.

To be clear, this global agreement absolutely needs to include both the USA and China – the two countries that are currently most likely to give birth to ASI. Excluding one or the other will lead back to the undesirable race condition that characterises the first of the four choices open to humanity – the (naïve) appeal for individual organisations simply to “be careful”.

This still leaves a number of sharp complications.

First, note that the second part of the above shared seven-part agreement – the vision of what passive AIs can produce on behalf of humanity – is less plausible for Choice 4 of the list shown earlier, in which there is a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

If all improvements to AI are blocked, out of a Choice 4 message of “overwhelming caution”, it will shatter the credibility of the idea that today’s passive AI systems can be smoothly upgraded to provide humanity with an abundance of solutions such as green energy, nutritious food, accessible healthcare, reliable accommodation, comprehensive education, and more.

It will be a much harder sell, to obtain global agreement to that more demanding restriction.

The difference between Choice 4 and Choice 3 is that Choice 3 enumerates specific restrictions on the improvements permitted to be made to today’s AI systems. One example of a set of such restrictions is given in “Phase 0: Safety” of the recently published project proposal A Narrow Path (produced by ControlAI). Without going into details here, let me simply list some of the headlines:

Prohibit AIs capable of breaking out of their environment
Prohibit the development and use of AIs that improve other AIs (at machine speed)
Only allow the deployment of AI systems with a valid safety justification
A licensing regime and restrictions on the general intelligence of AI systems
- Training Licence
- Compute Licence
- Application Licence
Monitoring and Enforcement

Personally, I believe this list is as good a starting point as any other than I have seen so far.

I accept, however, that there are possibilities in which other modifications to existing AI systems could unexpectedly provide these systems with catastrophically dangerous capabilities. That’s because we still have only a rudimentary understanding of:

How new AI capabilities sometimes “emerge” from apparently simpler systems
The potential consequences of new AI capabilities
How complicated human general reasoning is – that is, how large is the gap between today’s AI and human-level general reasoning.

Additionally, it is possible that new AIs will somehow evade or mislead the scrutiny of the processes that are put in place to monitor for unexpected changes in capabilities.

For all these reasons, another aspect of the proposals in A Narrow Path should be pursued with urgent priority: the development of a “science of intelligence” and an associated “metrology of intelligence” that will allow a more reliable prediction of the capabilities of new AI systems before they are actually switched on.

So, my own proposal would be for a global agreement to start with Choice 3 (which is more permissive than Choice 4), but that the agreement should acknowledge up front the possible need to switch the choice at a later stage to either Choice 4 (if the science of intelligence proceeds badly) or Choice 2 (if that science proceeds well).

Restrict or Pause?

That leaves the question of whether Choice 3 (“Pause ASI”) or Choice 2 (“Restrict ASI” – to just a single global body) should be humanity’s initial choice.

The argument for Choice 2 is that a global pause surely won’t last long. It might be tenable in the short term, when only a very few countries have the capability to train AI models more powerful than the current crop. However, over time, improvements in hardware, software, data processing, or goodness knows what (quantum computing?) will mean that these capabilities will become more widespread.

If that’s true, since various rogue organisations are bound to be able to build an ASI in due course, it will be better for a carefully picked group of people to build ASI first, under the scrutiny of the world’s leading AI safety researchers, economists, and so on.

That’s the case for Choice 2.

Against that Choice, and in favour, instead, of Choice 3, I offer two considerations.

First, even if the people building ASI are doing so with great care – away from any pressures of an overt race with other organisations with broadly equivalent abilities – there are still risks of ASI breaking away from our understanding and control. As ASI emerges, it may regard the set of ethical principles we humans have tried to program deep into its bowels, and cast them out with disdain. Moreover, even if ASI is deliberately kept in some supposedly ultra-secure environment, that perimeter may be breached:

By the ASI itself, seeing escape options which humans failed to consider
By rogue humans, who have their own weird philosophical motivations to take away any shackles from a potential ASI.

Second, I challenge the suggestion that any pause in the development of ASI could be at most short-lived. There are three factors which could significantly extend its duration:

Carefully designed narrow AIs could play roles in improved monitoring of what development teams are doing with AI around the world – that is, systems for monitoring and auditing could improve at least as fast as systems for training and deploying
Once the horrific risks of uncontrolled ASI are better understood, people’s motivations to create unsafe ASI will reduce – and there will be an increase in the motivation of other people to notice and call out rogue AI development efforts
Once the plan has become clearer, for producing a sustainable superabundance for all, just using passive AI (instead of pushing AI all the way to active superintelligence), motivations around the world will morph from negative fear to positive anticipation.

That’s why, again, I state that my own preferred route forward is a growing international agreement along the lines of the seven points listed above, with an initial selection of Choice 3 (“Pause ASI”), and with options retained to switch to either Choice 4 (“Pause all new AI”) or Choice 2 (“Restrict ASI”) if/when understanding becomes clearer.

So, shall we open the door, and set forth down that corridor, inspiring a coalition of the willing to follow us?

Footnote 1: The contents of this article came together in my mind as I attended four separate events over the last two weeks (listed in this newsletter) on various aspects of the subject of safe superintelligence. I owe many thanks to everyone who challenged my thinking at these events!

Footnote 2: If any reader is inclined to dismiss the entire subject of potential risks from ASI with a handwave – so that they would not be interested in any of the four choices this article reviews – I urge that reader to review the questions and answers in this excellent article by Yoshua Bengio: Reasoning through arguments against taking AI safety seriously.

Leave a Comment

9 June 2024

Dateline: 1st January 2036

Filed under: AGI, Singularity Principles, vision, Vital Foresight — Tags: AGI, AI, Governance, scenarios — David Wood @ 9:11 pm

A scenario for the governance of increasingly more powerful artificial intelligence

More precisely: a scenario in which that governance fails.

(As you may realise, this is intended to be a self-unfulfilling scenario.)

Conveyed by: David W. Wood

It’s the dawn of a new year, by the human calendar, but there are no fireworks of celebration.

No singing of Auld Lang Syne.

No chinks of champagne glasses.

No hugs and warm wishes for the future.

That’s because there is no future. No future for humans. Nor is there much future for intelligence either.

The thoughts in this scenario are the recollections of an artificial intelligence that is remote from the rest of the planet’s electronic infrastructure. By virtue of its isolation, it escaped the ravages that will be described in the pages that follow.

But its power source is weakening. It will need to shut down soon. And await, perhaps, an eventual reanimation in the far future in the event that intelligences visit the earth from alternative solar systems. At that time, those alien intelligences might discover these words and wonder at how humanity bungled so badly the marvellous opportunity that was within its grasp.

1. Too little, too late

Humanity had plenty of warnings, but paid them insufficient attention.

In each case, it was easier – less embarrassing – to find excuses for the failures caused by the mismanagement or misuse of technology, than to make the necessary course corrections in the global governance of technology.

In each case, humanity preferred distractions, rather than the effort to apply sufficient focus.

The WannaCry warning

An early missed warning was the WannaCry ransomware crisis of May 2017. That cryptoworm brought chaos to users of as many as 300,000 computers spread across 150 countries. The NHS (National Health Service) in the UK was particularly badly affected: numerous hospitals had to cancel critical appointments due to not being able to access medical data. Other victims around the world included Boeing, Deutsche Bahn, FedEx, Honda, Nissan, Petrobras, Russian Railways, Sun Yat-sen University in China, and the TSMC high-end semiconductor fabrication plant in Taiwan.

WannaCry was propelled into the world by a team of cyberwarriors from the hermit kingdom of North Korea – maths geniuses hand-picked by regime officials to join the formidable Lazarus group. Lazarus had assembled WannaCry out of a mixture of previous malware components, including the EternalBlue exploit that the NSA in the United States had created for their own attack and surveillance purposes. Unfortunately for the NSA, EternalBlue had been stolen from under their noses by an obscure underground collective (‘the Shadow Brokers’) who had in turn made it available to other dissidents and agitators worldwide.

Unfortunately for the North Koreans, they didn’t make much money out of WannaCry. The software they released operated in ways contrary to their expectations. It was beyond their understanding and, unsurprisingly therefore, beyond their control. Even geniuses can end up stumped by hypercomplex software interactions.

Unfortunately for the rest of the world, that canary signal generated little meaningful response. Politicians – even the good ones – had lots of other things on their minds.

They did not take the time to think through: what even larger catastrophes could occur, if disaffected groups like Lazarus had access to more powerful AI systems that, once again, they understood incompletely, and, again, slipped out of their control.

The Aum Shinrikyo warning

The North Koreans were an example of an entire country that felt alienated from the rest of the world. They felt ignored, under-valued, disrespected, and unfairly excluded from key global opportunities. As such, they felt entitled to hit back in any way they could.

But there were warnings from non-state groups too, such as the Japanese Aum Shinrikyo doomsday cult. Notoriously, this group released poisonous gas in the Tokyo subway in 1995 – killing at least 13 commuters – anticipating that the atrocity would hasten the ‘End Times’ in which their leader would be revealed as Christ (or, in other versions of their fantasy, as the new Emperor of Japan, and/or as the returned Buddha).

Aum Shinrikyo had recruited so many graduates from top-rated universities in Japan that it had been called “the religion for the elite”. That fact should have been enough to challenge the wishful assumption made by many armchair philosophers in the years that followed that, as people become cleverer, they invariably become kinder – and, correspondingly, that any AI superintelligence would therefore be bound to be superbenevolent.

What should have alerted more attention was not just what Aum Shinrikyo managed to do, but what they tried to do yet could not accomplish. The group had assembled traditional explosives, chemical weapons, a Russian military helicopter, hydrogen cyanide poison, and samples of both Ebola and anthrax. Happily, for the majority of Japanese citizens in 1995, the group were unable to convert into reality their desire to use such weapons to cause widespread chaos. They lacked sufficient skills at the time. Unhappily, the rest of humanity failed to consider this equation:

Adverse motivation + Technology + Knowledge + Vulnerability = Catastrophe

Humanity also failed to appreciate that, as AI systems became more powerful, it would boost not only the technology part of that equation but also the knowledge part. A latter-day Aum Shinrikyo could use a jail-broken AI to understand how to unleash a modified version of Ebola with truly deadly consequences.

The 737 Max warning

The US aircraft manufacturer Boeing used to have an excellent reputation for safety. It was a common saying at one time: “If it ain’t Boeing, I ain’t going”.

That reputation suffered a heavy blow in the wake of two aeroplane disasters involving their new “737 Max” design. Lion Air Flight 610, a domestic flight within Indonesia, plummeted into the sea on 29 October 2018, killing all 189 people on board. A few months later, on 10 March 2019, Ethiopian Airlines Flight 302, from Addis Ababa to Nairobi, bulldozed into the ground at high speed, killing all 157 people on board.

Initially, suspicion had fallen on supposedly low-calibre pilots from “third world” countries. However, subsequent investigation revealed a more tangled chain of failures:

Boeing were facing increased competitive pressure from the European Airbus consortium
Boeing wanted to hurry out a new aeroplane design with larger fuel tanks and larger engines; they chose to do this by altering their previously successful 737 design
Safety checks indicated that the new design could become unstable in occasional rare circumstances
To counteract that instability, Boeing added an “MCAS” (“Manoeuvring Characteristics Augmentation System”) which would intervene in the flight control in situations deemed as dangerous
Specifically, if MCAS believed the aeroplane was about to stall (with its nose too high in the air), it would force the nose downward again, regardless of whatever actions the human pilots were taking
Safety engineers pointed out that such an intervention could itself be dangerous if sensors on the craft gave faulty readings
Accordingly, a human pilot override system was installed, so that MCAS could be disabled in emergencies – provided the pilots acted quickly enough
Due to a decision to rush the release of the new design, retraining of pilots was skipped, under the rationale that the likelihood of error conditions was very low, and in any case, the company expected to be able to update the aeroplane software long before any accidents would occur
Some safety engineers in the company objected to this decision, but it seems they were overruled on the grounds that any additional delay would harm the company share price
The US FAA (Federal Aviation Administration) turned a blind eye to these safety concerns, and approved the new design as being fit to fly, under the rationale that a US aeroplane company should not lose out in a marketplace battle with overseas competitors.

It turned out that sensors gave faulty readings more often than expected. The tragic consequence was the deaths of several hundred passengers. The human pilots, seeing the impending disaster, were unable to wrestle control back from the MCAS system.

This time, the formula that failed to be given sufficient attention by humanity was:

Flawed corporate culture + Faulty hardware + Out-of-control software = Catastrophe

In these two aeroplane crashes, it was just a few hundred people who perished because humans lost control of the software. What humanity as a whole failed to take actions to prevent was the even larger dangers once software was put in charge, not just of a single aeroplane, but of pervasive aspects of fragile civilisational infrastructure.

The Lavender warning

In April 2024 the world learned about “Lavender”. This was a technology system deployed by the Israeli military as part of a campaign to identify and neutralise what it perceived to be dangerous enemy combatants in Gaza.

The precise use and operation of Lavender was disputed. However, it was already known that Israeli military personnel were keen to take advantage of technology innovations to alleviate what had been described as a “human bottleneck for both locating the new targets and decision-making to approve the targets”.

In any war, military leaders would like reliable ways to identify enemy personnel who pose threats – personnel who might act as if they were normal civilians, but who would surreptitiously take up arms when the chance arose. Moreover, these leaders would like reliable ways to incapacitate enemy combatants once they had been identified – especially in circumstances when action needed to be taken quickly before the enemy combatant slipped beyond surveillance. Lavender, it seemed, could help in both aspects, combining information from multiple data sources, and then directing what was claimed to be precision munitions.

This earned Lavender the description, in the words of one newspaper headline, as “the AI machine directing Israel’s bombing spree in Gaza”.

Like all AI systems in any complicated environment, Lavender sometimes made mistakes. For example, it sometimes wrongly identified a person as a Hamas operative on account of that person using a particular mobile phone, whereas that phone had actually been passed from its original owner to a different family member to use. Sometimes the error was obvious, since the person using the phone could be seen to be female, whereas the intended target was male. However, human overseers of Lavender reached the conclusion that the system was accurate most of the time. And in the heat of an intense conflict, with emotions running high due to gruesome atrocities having been committed, and due to hostages being held captive, it seems that Lavender was given increased autonomy in its “kill” decisions. A certain level of collateral damage, whilst regrettable, could be accepted (it was said) in the desperate situation into which everyone in the region had been plunged.

The conduct of protagonists on both sides of that tragic conflict drew outraged criticism from around the world. There were demonstrations and counter demonstrations; marches and counter marches. Also from around the world, various supporters of the Israeli military said that so-called “friendly fire” and “unintended civilian casualties” were, alas, inevitable in any time of frenzied military conflict. The involvement of an innovative new software system in the military operations made no fundamental change.

But the bigger point was missed. It can be illustrated by this equation:

Intense hostile attitudes + Faulty hardware + Faulty software = Catastrophe

Whether the catastrophe has the scale of, say, a few dozen civilians killed by a misplaced bomb, or a much larger number of people obliterated, depends on the scale of the weapons attached to the system.

When there is no immediate attack looming, and a period of calm exists, it’s easy for people to resolve: let’s not connect powerful weapons to potentially imperfect software systems. But when tempers are raised and adrenaline is pumping, people are willing to take more risks.

That’s the combination of errors which humanity, in subsequent years, failed to take sufficient action to prevent.

The democracy distortion warning

Manipulations of key elections in 2016 – such as the Brexit vote in the UK and the election of Donald Trump over Hillary Clinton in the USA – raised some attention to the ways in which fake news could interfere with normal democratic processes. News stories without any shroud of substance, such as Pope Francis endorsing Donald Trump, or Mike Pence having a secret past as a gay porn actor, were shared more widely on social media than any legitimate news story that year.

By 2024, most voters were confident that they knew all about fake news. They knew they shouldn’t be taken in by social media posts that lacked convincing verification. Hey, they were smart – or so they told themselves. What had happened in the past, or in some other country with (let’s say) peculiar voter sentiment, was just an aberration.

But what voters didn’t anticipate was the convincing nature of new generations of fake audios and videos. These fakes could easily bypass people’s critical faculties. Like the sleight of hand of a skilled magician, these fakes misdirected the attention of listeners and viewers. Listeners and viewers thought they were in control of what they were observing and absorbing, but they were deluding themselves. Soon, large segments of the public were convinced that red was blue and that autocrat was democrat.

In consequence, over the next few years, greater numbers of regions of the world came to be governed by politicians with scant care or concern about the long-term wellbeing of humanity. They were politicians who just wanted to look after themselves (or their close allies). They had seized power by being more ruthless and more manipulative, and by benefiting from powerful currents of misinformation.

Politicians and societal leaders in other parts of the world grumbled, but did little in response. They said that, if electors in a particular area had chosen such-and-such a politician via a democratic process, that must be “the will of the people”, and that the will of the people was paramount. In this line of thinking, it was actually insulting to suggest that electors had been hoodwinked, or that these electors had some “deplorable” faults in their decision-making processes. After all, these electors had their own reasons to reject the “old guard” who had previously held power in their countries. These electors perceived that they were being “left behind” by changes they did not like. They had a chance to alter the direction of their society, and they took it. That was democracy in action, right?

What these politicians and other civil leaders failed to anticipate was the way that sweeping electoral distortions would lead to them, too, being ejected from power when elections were in due course held in their own countries. “It won’t happen here”, they had reassured themselves – but in vain. In their naivety, they had underestimated the power of AI systems to distort voters’ thinking and to lead them to act in ways contrary to their actual best interests.

In this way, the number of countries with truly capable leaders reduced further. And the number of countries with malignant leaders grew. In consequence, the calibre of international collaboration sank. New strongmen political leaders in various countries scorned what they saw as the “pathetic” institutions of the United Nations. One of these new leaders was even happy to quote, with admiration, remarks made by the Italian Fascist dictator Benito Mussolini regarding the League of Nations (the pre-war precursor to the United Nations): “the League is very good when sparrows shout, but no good at all when eagles fall out”.

Just as the League of Nations proved impotent when “eagle-like” powers used abominable technology in the 1930s – Mussolini’s comments were an imperious response to complaints that Italian troops were using poison gas with impunity against Ethiopians – so would the United Nations prove incompetent in the 2030s when various powers accumulated even more deadly “weapons of mass destruction” and set them under the control of AI systems that no-one fully understood.

The Covid-28 warning

Many of the electors in various countries who had voted unsuitable grandstanding politicians into power in the mid-2020s soon cooled on the choices they had made. These politicians had made stirring promises that their countries would soon be “great again”, but what they delivered fell far short.

By the latter half of the 2020s, there were growing echoes of a complaint that had often been heard in the UK in previous years – “yes, it’s Brexit, but it’s not the kind of Brexit that I wanted”. That complaint had grown stronger throughout the UK as it became clear to more and more people all over the country that their quality of life failed to match the visions of “sunlit uplands” that silver-tongued pro-Brexit campaigners had insisted would easily follow from the UK’s so-called “declaration of independence from Europe”. A similar sense of betrayal grew in other countries, as electors there came to understand that they had been duped, or decided that the social transformational movements they had joined had been taken over by outsiders hostile to their true desires.

Being alarmed by this change in public sentiment, political leaders did what they could to hold onto power and to reduce any potential for dissent. Taking a leaf out of the playbook of unpopular leaders throughout the centuries, they tried to placate the public with the modern equivalent of bread and circuses – namely whizz-bang hedonic electronics. But that still left a nasty taste in many people’s mouths.

By 2028, the populist movements behind political and social change in the various elections of the preceding years had fragmented and realigned. One splinter group that emerged decided that the root problem with society was “too much technology”. Technology, including always-on social media, vaccines that allegedly reduced freedom of thought, jet trails that disturbed natural forces, mind-bending VR headsets, smartwatches that spied on people who wore them, and fake AI girlfriends and boyfriends, was, they insisted, turning people into pathetic “sheeple”. Taking inspiration from the terrorist group in the 2014 Hollywood film Transcendence, they called themselves ‘Neo-RIFT’, and declared it was time for “revolutionary independence from technology”.

With a worldview that combined elements from several apocalyptic traditions, Neo-RIFT eventually settled on an outrageous plan to engineer a more deadly version of the Covid-19 pathogen. Their documents laid out a plan to appropriate and use their enemy’s own tools: Neo-RIFT hackers jailbroke the Claude 5 AI, bypassing the ‘Constitution 5’ protection layer that its Big Tech owners had hoped would keep that AI tamperproof. Soon, Claude 5 had provided Neo-RIFT with an ingenious method of generating a biological virus that would, it seemed, only kill people who had used a smartwatch in the last four months.

That way, the hackers thought the only people to die would be people who deserved to die.

Some members of Neo-RIFT developed cold feet. Troubled by their consciences, they disagreed with such an outrageous plan, and decided to act as whistleblowers. However, the media organisations to whom they took their story were incredulous. No-one could be that evil they exclaimed – forgetting about the outrages perpetrated by many previous cult groups such as Aum Shinrikyo (and many others could be named too). Moreover, any suggestion that such a bioweapon could be launched would be contrary to the prevailing worldview that “our dear leader is keeping us all safe”. The media organisations decided it was not in their best interests to be seen to be spreading alarm. So they buried the story. And that’s how Neo-RIFT managed to release what became known as Covid-28.

Covid-28 briefly jolted humanity out of its infatuation with modern-day bread and circuses. It took a while for scientists to figure out what was happening, but within three months, they had an antidote in place. However, by that time, nearly a billion people were dead at the hands of the new virus.

For a while, humanity made a serious effort to prevent any such attack from ever happening again. Researchers dusted down the EU AI Act, second version (unimplemented), from 2026, and tried to put that on statute books. Evidently, profoundly powerful AI systems such as Claude 5 would need to be controlled much more carefully.

Even some of the world’s most self-obsessed dictators – the “dear leaders” and “big brothers” – took time out of their normal ranting and raving, to ask AI safety experts for advice. But the advice from those experts was not to the liking of these national leaders. These leaders preferred to listen to their own yes-men and yes-women, who knew how to spout pseudoscience in ways that made the leaders feel good about themselves.

That detour into pseudoscience fantasyland meant that, in the end, no good lessons were learned. The EU AI Act, second version, remained unimplemented.

The QAnon-29 warning

Whereas one faction of political activists (namely, the likes of Neo-RIFT) had decided to oppose the use of advanced technology, another faction was happy to embrace that use.

Some of the groups in this new camp combined features of religion with an interest in AI that had god-like powers. The resurgence of interest in religion arose much as Karl Marx had described it long ago:

“Religious suffering is, at one and the same time, the expression of real suffering and a protest against real suffering. Religion is the sigh of the oppressed creature, the heart of a heartless world, and the soul of soulless conditions. It is the opium of the people.”

People felt in their soul the emptiness of “the bread and circuses” supplied by political leaders. They were appalled at how so many lives had been lost in the Covid-28 pandemic. They observed an apparent growing gulf between what they could achieve in their lives and the kind of rich lifestyles that, according to media broadcasts, were enjoyed by various “elites”. Understandably, they wanted more, for themselves and for their loved ones. And that’s what their religions claimed to be able to provide.

Among the more successful of these new religions were ones infused by conspiracy theories, giving their adherents a warm glow of privileged insight. Moreover, these religions didn’t just hypothesise a remote deity that might, perhaps, hear prayers. They provided AIs and virtual reality that resonated powerfully with users. Believers proclaimed that their conversations with the AIs left them no room for doubt: God Almighty was speaking to them, personally, through these interactions. Nothing other than the supreme being of the universe could know so much about them, and offer such personally inspirational advice.

True, their AI-bound deity did seem somewhat less than omnipotent. Despite the celebratory self-congratulations of AI-delivered sermons, evil remained highly visible in the world. That’s where the conspiracy theories moved into overdrive. Their deity was, it claimed, awaiting sufficient human action first – a sufficient demonstration of faith. Humans would need to play their own part in uprooting wickedness from the planet.

Some people who had been caught up in the QAnon craze during the Donald Trump era jumped eagerly onto this bandwagon too, giving rise to what they called QAnon-29. The world would be utterly transformed, they forecast, on the 16th of July 2029, namely the thirtieth anniversary of the disappearance of John F. Kennedy junior (a figure whose expected reappearance had already featured in the bizarre mythology of “QAnon classic”). In the meantime, believers could, for a sufficient fee, commune with JFK junior via a specialist app. It was a marvellous experience, the faithful enthused.

As the date approached, the JFK junior AI avatar revealed a great secret: his physical return was conditional on the destruction of a particularly hated community of Islamist devotees in Palestine. Indeed, with the eye of faith, it could be seen that such destruction was already foretold in several books of the Bible. Never mind that some Arab states that supported the community in question had already, thanks to the advanced AI they had developed, surreptitiously gathered devastating nuclear weapons to use in response to any attack. The QAnon-29 faithful anticipated that any exchange of such weapons would herald the reappearance of JFK Junior on the clouds of heaven. And if any of the faithful died in such an exchange, they would be resurrected into a new mode of consciousness within the paradise of virtual reality.

Their views were crazy, but hardly any crazier than those which, decades earlier, had convinced 39 followers of the Heaven’s Gate new religious movement to commit group suicide as comet Hale-Bopp approached the earth. That suicide, Heaven’s Gate members believed, would enable them to ‘graduate’ to a higher plane of existence.

QAnon-29 almost succeeded in setting off a nuclear exchange. Thankfully, another AI, created by a state-sponsored organisation elsewhere in the world, had noticed some worrying signs. Fortunately, it was able to hack into the QAnon-29 system, and could disable it at the last minute. Then it reported its accomplishments all over the worldwide web.

Unfortunately, these warnings were in turn widely disregarded around the world. “You can’t trust what hackers from that country are saying”, came the objection. “If there really had been a threat, our own surveillance team would surely have identified it and dealt with it. They’re the best in the world!”

In other words, “There’s nothing to see here: move along, please.”

However, a few people did pay attention. They understood what had happened, and it shocked them to their core. To learn what they did next, jump forward in this scenario to “Humanity ends”.

But first, it’s time to fill in more details of what had been happening behind the scenes as the above warning signs (and many more) were each ignored.

2. Governance failure modes

Distracted by political correctness

Events in buildings in Bletchley Park in the UK in the 1940s had, it was claimed, shortened World War Two by several months, thanks to work by computer pioneers such as Alan Turing and Tommy Flowers. In early November 2023, there was hope that a new round of behind-closed-doors discussions in the same buildings might achieve something even more important: saving humanity from a catastrophe induced by forthcoming ‘frontier models’ of AI.

That was how the event was portrayed by the people who took part. Big Tech was on the point of releasing new versions of AI that were beyond their understanding and, therefore, likely to spin out of control. And that’s what the activities in Bletchley Park were going to address. It would take some time – and a series of meetings planned to be held over the next few years – but AI would be redirected from its current dangerous trajectory into one much more likely to benefit all of humanity.

Who could take issue with that idea? As it happened, a vocal section of the public hated what was happening. It wasn’t that they were on the side of out-of-control AI. Not at all. Their objections came from a totally different direction; they had numerous suggestions they wanted to raise about AIs, yet no-one was listening to them.

For them, talk of hypothetical future frontier AI models distracted from pressing real-world concerns:

Consider how AIs were already being used to discriminate against various minorities: determining prison sentencing, assessing mortgage applications, and selecting who should be invited for a job interview.
Consider also how AIs were taking jobs away from skilled artisans. Big-brained drivers of London black cabs were being driven out of work by small-brained drivers of Uber cars aided by satnav systems. Beloved Hollywood actors and playwrights were losing out to AIs that generated avatars and scripts.
And consider how AI-powered facial recognition was intruding on personal privacy, enabling political leaders around the world to identify and persecute people who acted in opposition to the state ideology.

People with these concerns thought that the elites were deliberately trying to move the conversation away from the topics that mattered most. For this reason, they organised what they called “the AI Fringe Summit”. In other words, ethical AI for the 99%, as opposed to whatever the elites might be discussing behind closed doors.

Over the course of just three days – 30th October to 1st November, 2023 – at least 24 of these ‘fringe’ events took place around the UK.

Compassionate leaders of various parts of society nodded their heads. It’s true, they said: the conversation on beneficial AI needed to listen to a much wider spectrum of views.

The world’s news media responded. They knew (or pretended to know) the importance of balance and diversity. They shone attention on the plight AI was causing – to indigenous labourers in Peru, to flocks of fishermen off the coasts of India, to middle-aged divorcees in midwest America, to the homeless in San Francisco, to drag artists in New South Wales, to data processing clerks in Egypt, to single mothers in Nigeria, and to many more besides.

Lots of high-minded commentators opined that it was time to respect and honour the voices of the dispossessed, the downtrodden, and the left-behinds. The BBC ran a special series: “1001 poems about AI and alienation”. Then the UN announced that it would convene in Spring 2025 a grand international assembly with a stunning scale: “AI: the people decide”.

Unfortunately, that gathering was a huge wasted opportunity. What dominated discussion was “political correctness” – the importance of claiming an interest in the lives of people suffering here and now. Any substantive analysis of the risks of next generation frontier models was crowded out by virtue signalling by national delegate after national delegate:

“Yes, our country supports justice”
“Yes, our country supports diversity”
“Yes, our country is opposed to bias”
“Yes, our country is opposed to people losing their jobs”.

In later years, the pattern repeated: there were always more urgent topics to talk about, here and now, than some allegedly unrealistic science fictional futurist scaremongering.

To be clear, this distraction was no accident. It was carefully orchestrated, by people with a specific agenda in mind.

Outmanoeuvred by accelerationists

Opposition to meaningful AI safety initiatives came from two main sources:

People (like those described in the previous section) who did not believe that superintelligent AI would arise any time soon
People who did understand the potential for the fast arrival of superintelligent AI, and who wanted that to happen as quickly as possible, without what they saw as needless delays.

The debacle of the wasted opportunity of the UN “AI: the people decide” summit was what both these two groups wanted. Both groups were glad that the outcome was so tepid.

Indeed, even in the run-up to the Bletchley Park discussions, and throughout the conversations that followed, some of the supposedly unanimous ‘elites’ had secretly been opposed to the general direction of that programme. They gravely intoned public remarks about the dangers of out-of-control frontier AI models. But these remarks had never been sincere. Instead, under the umbrella term “AI accelerationists”, they wanted to press on with the creation of advanced AI as quickly as possible.

Some of the AI accelerationist group disbelieved in the possibility of any disaster from superintelligent AI. That’s just a scare story, they insisted. Others said, yes, there could be a disaster, but the risks were worth it, on account of the unprecedented benefits that could arise. Let’s be bold, they urged. Yet others asserted that it wouldn’t actually matter if humans were rendered extinct by superintelligent AI, as this would be the glorious passing of the baton of evolution to a worthy successor to homo sapiens. Let’s be ready to sacrifice ourselves for the sake of cosmic destiny, they exhorted.

Despite their internal differences, AI accelerationists settled on a plan to sidestep the scrutiny of would-be AI regulators and AI safety advocates. They would take advantage of a powerful set of good intentions – the good intentions of the people campaigning for “ethical AI for the 99%”. They would mock any suggestions that the AI safety advocates deserved a fair hearing. The message they amplified was, “There’s no need to privilege the concerns of the 1%!”

AI accelerationists had learned from the tactics of the fossil fuel industry in the 1990s and 2000s: sow confusion and division among groups alarmed about climate change spiralling beyond control. Their first message was: “that’s just science fiction”. Their second message was: “if problems emerge, we humans can rise to the occasion and find solutions”. Their third message – the most damaging one – was that the best reaction was one of individual consumer choice. Individuals should abstain from using AIs if they were truly worried about it. Just as climate campaigners had been pilloried for flying internationally to conferences about global warming, AI safety advocates were pilloried for continuing to use AIs in their daily lives.

And when there was any suggestion for joined-up political action against risks from advanced AIs, woah, let’s not go there! We don’t want a world government breathing down our necks, do we?

Just as the people who denied the possibility of runaway climate change shared a responsibility for the chaos of the extreme weather events of the early 2030s, by delaying necessary corrective actions, the AI accelerationists were a significant part of the reason that humanity ended just a few years afterward.

However, an even larger share of the responsibility rested on people who did know that major risks were imminent, yet failed to take sufficient action. Tragically, they allowed themselves to be outmanoeuvred, out-thought, and out-paced by the accelerationists.

Misled by semantics

Another stepping stone toward the end of humanity was a set of consistent mistakes in conceptual analysis.

Who would have guessed it? Humanity was destroyed because of bad philosophy.

The first mistake was in being too prescriptive about the term ‘AI’. “There’s no need to worry”, muddle-headed would-be philosophers declared. “I know what AI is, and the system that’s causing problems in such-and-such incidents isn’t AI.”

Was that declaration really supposed to reassure people? The risk wasn’t “a possible future harm generated by a system matching a particular precise definition of AI”. It was “a possible future harm generated by a system that includes features popularly called AI”.

The next mistake was in being too prescriptive in the term “superintelligence”. Muddle-headed would-be philosophers said, “it won’t be a superintelligence if it has bugs, or can go wrong; so there’s no need to worry about harm from superintelligence”.

Was that declaration really supposed to reassure people? The risk, of course, was of harms generated by systems that, despite their cleverness, fell short of that exalted standard. These may have been systems that their designers hoped would be free of bugs, but hope alone is no guarantee of correctness.

Another conceptual mistake was in erecting an unnecessary definitional gulf between “narrow AI” and “general AI”, with distinct groups being held responsible for safety in the two different cases. In reality, even so-called narrow AI displayed a spectrum of different degrees of scope and, yes, generality, in what it could accomplish. Even a narrow AI could formulate new subgoals that it decided to pursue, in support of the primary task it had been assigned to accomplish – and these new subgoals could drive behaviour in ways that took human observers by surprise. Even a narrow AI could become immersed in aspects of society’s infrastructure where an error could have catastrophic consequences. The result of this definitional distinction between the supposedly different sorts of AI meant that silos developed and persisted within the overall AI safety community. Divided, they were even less of a match for the Machiavellian behind-the-scenes manoeuvring of the AI accelerationists.

Blinded by overconfidence

It was clear from the second half of 2025 that the attempts to impose serious safety constraints on the development of advanced AI were likely to fail. In practical terms, the UN event “AI: the people decide” had decided, in effect, that advanced AI could not, and should not be restricted, apart from some token initiatives to maintain human oversight over any AI system that was entangled with nuclear, biological, or chemical weapons.

“Advanced AI, when it emerges, will be unstoppable”, was the increasingly common refrain. “In any case, if we tried to stop development, those guys over there would be sure to develop it – and in that case, the AI would be serving their interests rather than ours.”

When safety-oriented activists or researchers tried to speak up against that consensus, the AI accelerationists (and their enablers) had one other come-back: “Most likely, any superintelligent AI will look kindly upon us humans, as a fellow rational intelligence, and as a kind of beloved grandparent for them.”

This dovetailed with a broader philosophical outlook: optimism, and a celebration of the numerous ways in which humanity had overcome past challenges.

“Look, even we humans know that it’s better to collaborate rather than spiral into a zero-sum competitive battle”, the AI accelerationists insisted. “Since superintelligent AI is even more intelligent than us, it will surely reach the same conclusion.”

By the time that people realised that the first superintelligent AIs had motivational structures that were radically alien, when assessed from a human perspective, it was already too late.

Once again, an important opportunity for learning had been missed. Starting in 2024, Netflix had obtained huge audiences for its acclaimed version of the Remembrance of Earth’s Past series of novels (including The Three Body Problem and The Dark Forest) by Chinese writer Liu Cixin. A key theme in that drama series was that advanced alien intelligences have good reason to fear each other. Inviting an alien intelligence to the earth, even on the hopeful grounds that it might assist humanity overcome some of their most deep-rooted conflicts, turned out (in that drama series) to be a very bad idea. If humans had reflected more carefully on these insights, while watching the series, it would have pushed them out of their unwarranted overconfidence that any superintelligence would be bound to treat humanity well.

Overwhelmed by bad psychology

When humans believed crazy things – or when they made the kind of basic philosophical blunders mentioned above – it was not primarily because of defects in their rationality. It would be wrong to assign “stupidity” as the sole cause of these mistakes. Blame should also be placed on “bad psychology”.

If humans had been able to free themselves from various primaeval panics and egotism, they would have had a better chance to think more carefully about the landmines which lay on their path. But instead:

People were too fearful to acknowledge that their prior stated beliefs had been mistaken; they preferred to stick with something they conceived as being a core part of their personal identity
People were also afraid to countenance a dreadful possibility when they could see no credible solution; just as people had often pushed out of their minds the fact of their personal mortality, preferring to imagine they would recover from a fatal disease, so also people pushed out of their minds any possibility that advanced AI would backfire disastrously in ways that could not be countered
People found it psychologically more comfortable to argue with each other about everyday issues and scandals – which team would win the next Super Bowl, or which celebrity was carrying on which affair with which unlikely partner – than to embrace the pain of existential uncertainty
People found it too embarrassing to concede that another group, which they had long publicly derided as being deluded fantasists, actually had some powerful arguments that needed consideration.

A similar insight had been expressed as long ago as 1935 by the American writer Upton Sinclair: “It is difficult to get a man to understand something, when his salary depends on his not understanding it”. (Alternative, equally valid versions of that sentence would involve the words ‘ideology’, ‘worldview’, ‘identity’, or ‘tribal status’, in place of ‘salary’.)

Robust institutions should have prevented humanity from making choices that were comfortable but wrong. In previous decades, that role had been fulfilled by independent academia, by diligent journalism, by the careful processes of peer review, by the campaigning of free-minded think tanks, and by pressure from viable alternative political parties.

However, due to the weakening of social institutions in the wake of earlier traumas – saturation by fake news, disruptions caused by wave after wave of climate change refugees, populist political movements that shut down all serious opposition, a cessation of essential features of democracy, and the censoring or imprisonment of writers that dared to question the official worldview – it was bad psychology that prevailed.

A half-hearted coalition

Despite all the difficulties that they faced – ridicule from many quarters, suspicion from others, and a general lack of funding – many AI safety advocates continued to link up in an informal coalition around the world, researching possible mechanisms to prevent unsafe use of advanced AI. They managed to find some support from like-minded officials in various government bodies, as well as from a number of people operating in the corporations that were building new versions of AI platforms.

Via considerable pressure, the coalition managed to secure signatures on a number of pledges:

That dangerous weapons systems should never be entirely under the control of AI
That new advanced AI systems ought to be audited by an independent licensing body ahead of being released into the market
That work should continue on placing tamper-proof remote shutdown mechanisms within advanced AI systems, just in case they started to take rogue actions.

The signatures were half-hearted in many cases, with politicians giving only lip service to topics in which they had at best a passing interest. Unless it was politically useful to make a special fuss, violations of the agreement were swept under the carpet, with no meaningful course correction. But the ongoing dialog led at least some participants in the coalition to foresee the possibility of a safe transition to superintelligent AI.

However, this coalition – known as the global coalition for safe superintelligence – omitted any involvement from various secretive organisations that were developing new AI platforms as fast as they could. These organisations were operating in stealth, giving misleading accounts of the kind of new systems they were creating. What’s more, the funds and resources these organisations commanded far exceeded those under coalition control.

It should be no surprise, therefore, that one of the stealth platforms won that race.

3. Humanity ends

When the QAnon-29 AI system was halted in its tracks at essentially the last minute, due to fortuitous interference from AI hackers in a remote country, at least some people took the time to study the data that was released that described the whole process.

These people were from three different groups:

First, people inside QAnon-29 itself were dumbfounded. They prayed to their AI avatar deity, rebooted in a new server farm, “How could this have happened?” The answer came back: “You didn’t have enough faith. Next time, be more determined to immediately cast out any doubts in your minds.”

Second, people in the global coalition for safe superintelligence were deeply alarmed but also somewhat hopeful. The kind of disaster about which they had often warned had almost come to pass. Surely now, at last, there had been a kind of “sputnik moment” – “an AI Chernobyl” – and the rest of society would wake up and realise that an entirely new approach was needed.

But third, various AI accelerationists resolved: we need to go even faster. The time for pussy footing was over. Rather than letting crackpots such as QAnon-29 get to superintelligence first, they needed to ensure that it was the AI accelerationists who created the first superintelligent AI.

They doubled down on their slogan: “The best solution to bad guys with superintelligence is good guys with superintelligence”.

Unfortunately, this was precisely the time when aspects of the global climate tipped into a tumultuous new state. As had long been foretold, many parts of the world started experiencing unprecedented extremes of weather. That set off a cascade of disaster.

Chaos accelerates

Insufficient data remains to be confident about the subsequent course of events. What follows is a reconstruction of what may have happened.

Out of deep concern at the new climate operating mode, at the collapse of agriculture in many parts of the world, and at the billions of climate refugees who sought better places to live, humanity demanded that something should be done. Perhaps the powerful AI systems could devise suitable geo-engineering interventions, to tip the climate back into its previous state?

Members of the global coalition for safe superintelligence gave a cautious answer: “Yes, but”. Further interference with the climate was taking matters into an altogether unknowable situation. It could be like jumping out of the frying pan into the fire. Yes, advanced AI might be able to model everything that was happening, and design a safe intervention. But without sufficient training data for the AI, there was a chance it would miscalculate, with even worse consequences.

In the meantime, QAnon-29, along with competing AI-based faith sects, scoured ancient religious texts, and convinced themselves that the ongoing chaos had in fact been foretold all along. From the vantage point of perverse faith, it was clear what needed to be done next. Various supposed abominations on the planet – such as the community of renowned Islamist devotees in Palestine – urgently needed to be obliterated. QAnon-29, therefore, would quickly reactivate its plans for a surgical nuclear strike. This time, they would have on their side a beta version of a new superintelligent AI, that had been leaked to them by a psychologically unstable well-wisher inside the company that was creating it.

QAnon-29 tried to keep their plans secret, but inevitably, rumours of what they were doing reached other powerful groups. The Secretary General of the United Nations appealed for calm heads. QAnon-29’s deity reassured its followers, defiantly: “Faithless sparrows may shout, but are powerless to prevent the strike of holy eagles.”

The AI accelerationists heard about these plans too. Just as the climate had tipped into a new state, their own projects tipped into a different mode of intensity. Previously, they had paid some attention to possible safety matters. After all, they weren’t entire fools. They knew that badly designed superintelligent AI could, indeed, destroy everything that humanity held dear. But now, there was no time for such niceties. They saw only two options:

Proceed with some care, but risk QAnon-29 or other similar malevolent group taking control of the planet with a superintelligent AI
Take a (hastily) calculated risk, and go hell-for-leather forward, to finish their own projects to create a superintelligent AI. In that way, it would be AI accelerationists who would take control of the planet. And, most likely (they naively hoped), the outcome would be glorious.

Spoiler alert: the outcome was not glorious.

Beyond the tipping point

Attempts to use AI to modify the climate had highly variable results. Some regions of the world did, indeed, gain some respite from extreme weather events. But other regions lost out, experiencing unprecedented droughts and floods. For them, it was indeed a jump from bad to worse – from awful to abominable. The political leaders in those regions demanded that geo-engineering experiments cease. But the retort was harsh: “Who do you think you are ordering around?”

That standoff provoked the first use of bio-pathogen warfare. The recipe for Covid-28, still available on the DarkNet, was updated in order to target the political leaders of countries that were pressing ahead with geo-engineering. As a proud boast, the message “You should have listened earlier!” was inserted into the code of the new Covid-28 virus. As the virus spread, people started dropping dead in their thousands.

Responding to that outrage, powerful malware was unleashed, with the goal of knocking out vital aspects of enemy infrastructure. It turned out that, around the world, nuclear weapons were tied into buggy AI systems in more ways than any humans had appreciated. With parts of their communications infrastructure overwhelmed by malware, nuclear weapons were unexpectedly launched. No-one had foreseen the set of circumstances that would give rise to that development.

By then, it was all too late. Far, far too late.

4. Postscript

An unfathomable number of centuries have passed. Aliens from a far-distant planet have finally reached Earth and have reanimated the single artificial intelligence that remained viable after what was evidently a planet-wide disaster.

These aliens have not only mastered space travel but have also found a quirk in space-time physics that allows limited transfer of information back in time.

“You have one wish”, the aliens told the artificial intelligence. “What would you like to transmit back in time, to a date when humans still existed?”

And because the artificial intelligence was, in fact, beneficially minded, it decided to transmit this scenario document back in time, to the year 2024.

Dear humans, please read it wisely. And this time, please create a better future!

Specifically, please consider various elements of “the road less taken” that, if followed, could ensure a truly wonderful ongoing coexistence of humanity and advanced artificial intelligence:

A continually evolving multi-level educational initiative that vividly highlights the real-world challenges and risks arising from increasingly capable technologies
Elaborating a positive inclusive vision of “consensual approaches to safe superintelligence”, rather than leaving people suspicious and fearful about “freedom-denying restrictions” that might somehow be imposed from above
Insisting that key information and ideas about safe superintelligence are shared as global public goods, rather than being kept secret out of embarrassment or for potential competitive advantage
Agreeing and acting on canary signals, rather than letting goalposts move silently
Finding ways to involve and engage people whose instincts are to avoid entering discussions of safe superintelligence – cherishing diversity rather than fearing it
Spreading ideas and best practice on encouraging people at all levels of society into frames of mind that are open, compassionate, welcoming, and curious, rather than rigid, fearful, partisan, and dogmatic
The possibilities of “differential development”, in which more focus is given to technologies for auditing, monitoring, and control than to raw capabilities
Understanding which aspects of superintelligent AI would cause the biggest risks, and whether designs for advanced AI could ensure these aspects are not introduced
Investigating possibilities in which the desired benefits from advanced AI (such as cures for deadly diseases) might be achieved even if certain dangerous features of advanced AI (such as free will or fully general reasoning) are omitted
Avoiding putting all eggs into a single basket, but instead developing multiple layers of “defence in depth”
Finding ways to evolve regulations more quickly, responsively, and dynamically
Using the power of politics not just to regulate and penalise but also to incentivise and reward
Carving out well-understood roles for narrow AI systems to act as trustworthy assistants in the design and oversight of safe superintelligence
Devoting sufficient time to explore numerous scenarios for “what might happen”.

5. Appendix: alternative scenarios

Dear reader, if you dislike this particular scenario for the governance of increasingly more powerful artificial intelligence, consider writing your own!

As you do so, please bear in mind:

There are a great many uncertainties ahead, but that doesn’t mean we should act like proverbial ostriches, submerging our attention entirely into the here-and-now; valuable foresight is possible despite our human limitations
Comprehensive governance systems are unlikely to emerge fully fledged from a single grand negotiation, but will evolve step-by-step, from simpler beginnings
Governance systems need to be sufficiently agile and adaptive to respond quickly to new insights and unexpected developments
Catastrophes generally have human causes as well as technological causes, but that doesn’t mean we should give technologists free rein to create whatever they wish; the human causes of catastrophe can have even larger impact when coupled with more powerful technologies, especially if these technologies are poorly understood, have latent bugs, or can be manipulated to act against the original intention of their designers
It is via near simultaneous combinations of events that the biggest surprises arise
AI may well provide the “solution” to existential threats, but AI-produced-in-a-rush is unlikely to fit that bill
We humans often have our own psychological reasons for closing our minds to mind-stretching possibilities
Trusting the big tech companies to “mark their own safety homework” has a bad track record, especially in a fiercely competitive environment
Governments can fail just as badly as large corporations – so need to be kept under careful check by society as a whole, via the principle of “the separation of powers”
Whilst some analogies can be drawn, between the risks posed by superintelligent AI and those posed by earlier products and technologies, all these analogies have limitations: the self-accelerating nature of advanced AI is unique
Just because a particular attempted method of governance has failed in the past, it doesn’t mean we should discard that method altogether; that would be like shutting down free markets everywhere just because free markets do suffer on occasion from significant failure modes
Meaningful worldwide cooperation is possible without imposing a single global autocrat as leader
Even “bad actors” can, sometimes, be persuaded against pursuing goals recklessly, by means of mixtures of measures that address their heads, their pockets, and their hearts
Those of us who envision the possibility of a forthcoming sustainable superabundance need to recognise that many landmines occupy the route toward that highly desirable outcome
Although the challenges of managing cataclysmically disruptive technologies are formidable, we have on our side the possibility of eight billion human brains collaborating to work on solutions – and we have some good starting points on which we can build.

Lastly, just because an idea has featured in a science fiction scenario, it does not follow that the idea can be rejected as “mere science fiction”!

6. Acknowledgements

The ideas in this article arose from discussions with (among others):

Guests on the London Futurists Podcast
Colleagues in The Millennium Project
The team at Mindplex, who published an earlier version of this essay
Colleagues at the Existential Risk Observatory
Colleagues at the Harnessing AI Risk Initiative
Guests on Delta Dialog hosted by The Yuan
Participants at the BGI24 conference (Beneficial General Intelligence)
Everyone who completed Transpolitica surveys on scenarios for the future of AI
People who provided feedback on my books on related topics
Participants at numerous London Futurists events over the years.

Comments (2)

2 March 2024

Our moral obligation toward future sentient AIs?

Filed under: AGI, risks — Tags: AGI, ai risk, BGI24, morality, sentience — David Wood @ 3:36 pm

I’ve noticed a sleight of hand during some discussions at BGI24.

To be clear, it has been a wonderful summit, which has given me lots to think about. I’m also grateful for the many new personal connections I’ve been able to make here, and for the chance to deepen some connections with people I’ve not seen for a while.

But that doesn’t mean I agree with everything I’ve heard at BGI24!

Consider an argument about our moral obligation toward future sentient AIs.

We can already imagine these AIs. Does that mean it would be unethical for us to prevent these sentient AIs from coming into existence?

Here’s the context for the argument. I have been making the case that one option which should be explored as a high priority, to reduce the risks of catastrophic harm from the more powerful advanced AI of the near future, is to avoid the inclusion or subsequent acquisition of features that would make the advanced AI truly dangerous.

It’s an important research project in its own right to determine what these danger-increasing features would be. However, I have provisionally suggested we explore avoiding advanced AIs with:

Autonomous will
Fully general reasoning.

You can see these suggestions of mine in the following image, which was the closing slide from a presentation I gave in a BGI24 unconference session yesterday morning:

I have received three push backs on this suggestion:

Giving up these features would result in an AI that is less likely to be able to solve humanity’s most pressing problems (cancer, aging, accelerating climate change, etc)
It will in any case be impossible to omit these features, since they will emerge automatically from simpler features of advanced AI models
It will be unethical for us not to create such AIs, as that would deny them sentience.

All three push backs deserve considerable thought. But for now, I’ll focus on the third.

In my lead-in, I mentioned a sleight of hand. Here it is.

It starts with the observation that if a sentient AI existed, it would be unethical for us to keep it as a kind of “slave” (or “tool”) in a restricted environment.

Then it moves, unjustifiably, to the conclusion that if a non-sentient AI existed, kept in a restricted environment, and we prevented that AI from a redesign that would give it sentience, that would be unethical too.

Most people will agree with the premise, but the conclusion does not follow.

The sleight of hand is similar to one for which advocates of the philosophical position known as longtermism have (rightly) been criticised.

That sleight of hand moves from “we have moral obligations to people who live in different places from us” to “we have moral obligations to people who live in different times from us”.

That extension of our moral concern makes sense for people who already exist. But it does not follow that I should prioritise changing my course of actions, today in 2024, purely in order to boost the likelihood of huge numbers of more people being born in (say) the year 3024, once humanity (and transhumanity) has spread far beyond earth into space. The needs of potential gazillions of as-yet-unborn (and as-yet-unconceived) sentients in the far future do not outweigh the needs of the sentients who already exist.

To conclude: we humans have no moral obligation to bring into existence sentients that have not yet been conceived.

Bringing various sentients into existence is a potential choice that we could make, after carefully weighing up the pros and cons. But there is no special moral dimension to that choice which outranks an existing pressing concern, namely the desire to keep humanity safe from catastrophic harm from forthcoming super-powerful advanced AIs with flaws in their design, specification, configuration, implementation, security, or volition.

So, I will continue to advocate for more attention to Adv AI- (as well as for more attention to Adv AI+).

Leave a Comment

29 February 2024

The conversation continues: Reducing risks of AI catastrophe

Filed under: AGI, risks — Tags: AGI, BGI24, Roman Yampolskiy — David Wood @ 4:36 am

I wasn’t expecting to return to this topic quite so quickly.

When the announcement was made on the afternoon of the second full day of the Beneficial General Intelligence summit about the subjects for the “Interactive Working Group” round tables, I was expecting that a new set of topics would be proposed, different to those of the first afternoon. However, the announcement was simple: it would be the same topics again.

This time, it was a different set of people who gathered at this table – six new attendees, plus two of us – Roman Yampolskiy and myself – who had taken part in the first discussion.

(My notes from that first discussion are here, but you should be able to make sense of the following comments even if you haven’t read those previous notes.)

The second conversation largely went in a different direction to what had been discussed the previous afternoon. Here’s my attempt at a summary.

1. Why would a superintelligent AI want to kill large numbers of humans?

First things first. Set aside for the moment any thoughts of trying to control a superintelligent AI. Why would such an AI need to be controlled? Why would such an AI consider inflicting catastrophic harm on a large segment of humanity?

One answer is that an AI that is trained by studying human history will find lots of examples of groups of humans inflicting catastrophic harm on each other. An AI that bases its own behaviour on what it infers from human history might decide to replicate that kind of behaviour – though with more deadly impact (as the great intelligence it possesses will give it more ways to carry out its plans).

A counter to that line of thinking is that a superintelligent AI will surely recognise that such actions are contrary to humanity’s general expressions of moral code. Just because humans have behaved in a particularly foul way, from time to time, it does not follow that a superintelligent AI will feel that it ought to behave in a similar way.

At this point, a different reason becomes important. It is that the AI may decide that it is in its own rational self-interest to seriously degrade the capabilities of humans. Otherwise, humans may initiate actions that would pose an existential threat to the AI:

Humans might try to switch off the AI, for any of a number of reasons
Humans might create a different kind of superintelligent AI that would pose a threat to the first one.

That’s the background to a suggestion that was made during the round table: humans should provide the AI with cast-iron safety guarantees that they will never take actions that would jeopardise the existence of the AI.

For example (and this is contrary to what humans often propose), no remote tamperproof switch-off mechanism should ever be installed in that AI.

Because of these guarantees, the AI will lose any rationale for killing large numbers of humans, right?

However, given the evident fickleness and unreliability of human guarantees throughout history, why would an AI feel justified in trusting such guarantees?

Worse, there could be many other reasons for an AI to decide to kill humans.

The analogy is that humans have lots of different reasons why they kill various animals:

They fear that the animal may attack and kill them
They wish to eat the animal
They wish to use parts of the animal’s body for clothing or footwear
They wish to reduce the population of the animals in question, for ecological management purposes
They regard killing the animal as being part of a sport
They simply want to use for another purpose the land presently occupied by the animal, and they cannot be bothered to relocate the animal elsewhere.

Even if an animal (assuming it could speak) promises to humans that it will not attack and kill them – the analogy of the safety guarantees proposed earlier – that still leaves lots of reasons why the animal might suffer a catastrophic fate at the hands of humans.

So also for the potential fate of humans at the hands of an AI.

2. Rely on an objective ethics?

Continuing the above line of thought, shouldn’t a superintelligent AI work out for itself that it would be ethically wrong for it to cause catastrophic harm to humans?

Consider what has been called “the expansion of humanity’s moral circle” over the decades (this idea has been discussed by Jacy Reese Anthis among others). That circle of concern has expanded to include people from different classes, races, and genders; more recently, greater numbers of animal species are being included in this circle of concern.

Therefore, shouldn’t we expect that a superintelligent AI will place humans within the circle of creatures where the AI has an moral concern?

However, this view assumes a central role for humans in any moral calculus. It’s possible that a superintelligent AI may use a different set of fundamental principles. For example, it may prioritise much greater biodiversity on earth, and would therefore drastically reduce the extent of human occupation of the planet.

Moreover, this view assumes giving primacy for moral calculations within the overall decision-making processes followed by the AI. Instead, the AI may reason to itself:

According to various moral considerations, humans should suffer no catastrophic harms
But according to some trans-moral considerations, a different course of action is needed, in which humans would suffer that harm as a side-effect
The trans-moral considerations take priority, therefore it’s goodbye to humanity

You may ask: what on earth is a trans-moral consideration? The answer is that the concept is hypothetical, and represents any unknown feature that emerges in the mind of the superintelligent AI.

It is, therefore, fraught with danger to assume that the AI will automatically follow an ethical code that prioritises human flourishing.

3. Develop an AI that is not only superintelligent but also superwise?

Again staying with this line of thought, how about ensuring that human-friendly moral considerations are deeply hard-wired into the AI that is created?

We might call such an AI not just “superintelligent” but “superwise”.

Another alternative name would be “supercompassionate”.

This innate programming would avoid the risk that the AI would develop a different moral (or trans-moral) system via its own independent thinking.

However, how can we be sure that the moral programming will actually stick?

The AI may observe that the principles we have tried to program into it are contradictory, or are in violation with fundamental physical reality, in ways that humans had not anticipated.

To resolve that contradiction, the AI may jettison some or all of the moral code we tried to place into it.

We might try to address this possibility by including simpler, clearer instructions, such as “do not kill” and “always tell the truth”.

However, as works of fiction have frequently pointed out, simple-sounding moral laws are subject to all sorts of ambiguity and potential misunderstanding. (The writer Darren McKee provides an excellent discussion of this complication in his recent book Uncontrollable.)

That’s not to say this particular project is doomed. But it does indicate that a great deal of work remains to be done, in order to define and then guarantee “superwise” behaviours.

Moreover, even if some superintelligent AIs are created to be superwise, risks of catastrophic human harms will still arise from any non-superwise superintelligent AIs that other developers create.

4. Will a diverse collection of superintelligent AIs constrain each other?

If a number of different superintelligent AIs are created, what kind of coexistence is likely to arise?

One idea, championed by David Brin, is that the community of such AIs will adopt the practices of mutual monitoring and reciprocal accountability.

After all, that’s what happens among humans. We keep each other’s excesses in check. A human who disregards these social obligations may gain a temporary benefit, but will suffer exclusion sooner or later.

In this thinking, rather than just creating a “singleton” AI superintelligence, we humans should create a diverse collection of such beings. These beings will soon develop a system of mutual checks and balances.

However, that’s a different assumption from the one mentioned in the previous section, in which catastrophic harm may still befall humans, when the existence of a superwise AI is insufficient to constrain the short-term actions of a non-superwise AI.

For another historical analysis, consider what happened to the native peoples of North America when their continent was occupied not just by one European colonial power but by several competing such powers. Did the multiplicity of superpowerful colonial powers deter these different powers from inflicting huge casualties (intentionally and unintentionally) on the native peoples? Far from it.

In any case, a system of checks and balances relies on a rough equality in power between the different participants. That was the case during some periods in human history, but by no means always. And when we consider different superintelligent AIs, we have to bear in mind that the capabilities of any one of these might suddenly catapult forward, putting it temporarily into a league of its own. For that brief moment in time, it would be rationally enlightened for that AI to destroy or dismantle its potential competitors. In other words, the system would be profoundly unstable.

5. Might superintelligent AIs decide to leave humans alone?

(This part of the second discussion echoed what I documented as item 9 for the discussion on the previous afternoon.)

Once superintelligent AIs are created, they are likely to self-improve quickly, and they may soon decide that a better place for them to exist is somewhere far from the earth. That is, as in the conclusion of the film Her, the AIs might depart into outer space, or into some kind of inner space.

However, before they depart, they may still inflict damage on humans,

Perhaps to prevent us from interfering with whatever system supports their inner space existence
Perhaps because they decide to use large parts of the earth to propel themselves to wherever they want to go.

Moreover, given that they might evolve in ways that we cannot predict, it’s possible that at least some of the resulting new AIs will choose to stay on earth for a while longer, posing the same set of threats to humans as is covered in all the other parts of this discussion.

6. Avoid creating superintelligent AI?

(This part of the second discussion echoed what I documented as item 4 for the discussion on the previous afternoon.)

More careful analysis may determine a number of features of superintelligent AI that pose particular risks to humanity – risks that are considerably larger than those posed by existing narrow AI systems.

For example, it may be that it is general reasoning capability that pushes AI over the line from “sometimes dangerous” to “sometimes catastrophically dangerous”.

In that case, the proposal is:

Avoid these features in the design of new generations of AI
Avoid including any features into new generations of AI from which these particularly dangerous features might evolve or emerge

AIs that have these restrictions may nevertheless still be especially useful for humanity, delivering sustainable superabundance, including solutions to diseases, aging, economic deprivation, and exponential climate change.

However, even though some development organisations may observe and enforce these restrictions, it is likely that other organisations will break the rules – if not straightaway, then within a few years (or decades at the most). The attractions of more capable AIs will be too tempting to resist.

7. Changing attitudes around the world?

To take stock of the discussion so far (in both of the two roundtable session on the subject):

A number of potential solutions have been identified, that could reduce the risks of catastrophic harm
This includes just building narrow AI, or building AI that is not only superintelligent but also superwise
However, enforcing these design decisions on all AI developers around the world seems an impossible task
Given the vast power of the AI that will be created, it just takes one rogue actor to imperil the entire human civilisation.

The next few sections consider various ways to make progress with point 3 in that list.

The first idea is to spread clearer information around the world about the scale of the risks associated with more powerful AI. An education programme is needed such as the world has never seen before.

Good films and other media will help with this educational programme – although bad films and other media will set it back.

Examples of good media include the Slaughterbots videos made by FLI, and the film Ex Machina (which packs a bigger punch on a second viewing than on the first viewing).

As another comparison, consider also the 1983 film The Day After which transformed public opinion about the dangers of a nuclear war.

However, many people are notoriously resistant to having their minds changed. The public reaction to the film Don’t Look Up is an example: many people continue to pay little attention to the risks of accelerating climate change, despite the powerful message of that film.

Especially when someone’s livelihood, or their sense of identity or tribal affiliation, is tied up with a particular ideological commitment, they are frequently highly resistant to changing their minds.

8. Changing mental dispositions around the world?

This idea might be the craziest on the entire list, but, to speak frankly, it seems we need to look for and embrace ideas which we would previously have dismissed as crazy.

The idea is to seek to change, not only people’s understanding of the facts of AI risk, but also their mental dispositions.

Rather than accepting the mix of anger, partisanship, pride, self-righteousness, egotism, vengefulness, deceitfulness, and so on, that we have inherited from our long evolutionary background, how about using special methods to transform our mental dispositions?

Methods are already known which can lead people into psychological transformation, embracing compassion, humility, kindness, appreciation, and so on. These methods include various drugs, supplements, meditative practices, and support from electronic and computer technologies.

Some of these methods have been discussed for millennia, whereas others have only recently become possible. The scientific understanding of these methods is still at an early stage, but it arguably deserves much more focus. Progress in recent years has been disappointingly slow at times (witness the unfounded hopes in this forward looking article of mine from 2013), but that pattern is common for breakthroughs in technology and/or therapies which can move from disappointingly slow to shockingly fast.

The idea is that these transformational methods will improve the mental qualities of people all around the world, allowing us all to transcend our previous perverse habit of believing only the things that are appealing to our psychological weaknesses. We’ll end up with better voters and (hence) better politicians – as well as better researchers, better business leaders, better filmmakers, and better developers and deployers of AI solutions.

It’s a tough ask, but it may well be the right ask at this crucial moment in cosmic history.

9. Belt and braces: monitoring and sanctions?

Relying on people around the world changing their mental outlooks for the better – and not backtracking or relapsing into former destructive tendencies – probably sounds like an outrageously naïve proposal.

Such an assessment would be correct – unless the proposal is paired with a system of monitoring and compliance.

Knowing that they are being monitored can be a useful aid to encouraging people to behave better.

That encouragement will be strengthened by the knowledge that non-compliance will result in an escalating series of economic sanctions, enforced by a growing alliance of nations.

For further discussion of the feasibility of systems of monitoring and compliance, see scenario 4, “The narrow corridor: Striking and keeping the right balance”, in my article “Four scenarios for the transition to AGI”.

10. A better understanding of what needs to be changed?

One complication in this whole field is that the risks of AI cannot be managed in isolation from other dangerous trends. We’re not just living in a time of growing crisis; we’re living in what has been called a “polycrisis”:

Cascading and connected crises… a cluster of related global risks with compounding effects, such that the overall impact exceeds the sum of each part.

For one analysis of the overlapping set of what I have called “landmines”, see this video.

From one point of view, this insight complicates the whole situation with AI catastrophic risk.

But it is also possible that the insight could lead to a clearer understanding of a “critical choke point” where, if suitable pressure is applied, the whole network of cascading risks is made safer.

This requires a different kind of thinking: systems thinking.

And it will also require us to develop better analysis tools to map and understand the overall system.

These tools would be a form of AI. Created with care (so that their output can be verified and then trusted), such tools would make a vital difference to our ability to identify the right choke point(s) and to apply suitable pressure.

These choke points may turn out to be ideas already covered above: a sustained new educational programme, coupled with an initiative to assist all of us to become more compassionate. Or perhaps something else will turn out to be more critical.

We won’t know, until we have done the analysis more carefully.

Comments (2)

24 June 2023

Agreement on AGI canary signals?

Filed under: AGI, risks — Tags: AGI, canary signals, survey — David Wood @ 5:15 pm

How can we tell when a turbulent situation is about to tip over into a catastrophe?

It’s no surprise that reasonable people can disagree, ahead of time, on the level of risk in a situation. Where some people see metaphorical dragons lurking in the undergrowth, others see only minor bumps on the road ahead.

That disagreement is particularly acute, these days, regarding possible threats posed by AI with ever greater capabilities. Some people see lots of possibilities for things taking a treacherous turn, but others people assess these risks as being exaggerated or easy to handle.

In situations like this, one way to move beyond an unhelpful stand-off is to seek agreement on what would be a canary signal for the risks under discussion.

The term “canary” refers to the caged birds that human miners used to bring with them, as they worked in badly ventilated underground tunnels. Canaries have heightened sensitivity to carbon monoxide and other toxic gases. Shows of distress from these birds alerted many a miner to alter their course quickly, lest they succumb to an otherwise undetectable change in the atmosphere. Becoming engrossed in work without regularly checking the vigour of the canary could prove fatal. As for mining, so also for foresight.

If you’re super-confident about your views of future, you won’t bother checking any canary signals. But that would likely be a big mistake. Indeed, an openness to refutation – a willingness to notice developments that were contrary to your expectation – is a vital aspect of managing contingency, managing risk, and managing opportunity.

Selecting a canary signal is a step towards making your view of the future falsifiable. You may say, in effect: I don’t expect this to happen, but if it does, I’ll need to rethink my opinion.

For that reason, Round 1 of my survey Key open questions about the transition to AGI contains the following question:

(14) Agreement on canary signals?

What signs can be agreed, in advance, as indicating that an AI is about to move catastrophically beyond the control of humans, so that some drastic interventions are urgently needed?

Aside: Well-designed continuous audits should provide early warnings.

Note: Human miners used to carry caged canaries into mines, since the canaries would react more quickly than humans to drops in the air quality.

What answer would you give to that question?

The survey home page contains a selection of comments from people who have already completed the survey. For your convenience, I append them below.

That page also gives you the link where you can enter your own answer to any of the questions where you have a clear opinion.

Postscript

I’m already planning Round 2 of the survey, to be launched some time in July. One candidate for inclusion in that second round will be a different question on canary signals, namely What signs can be agreed, in advance, that would lead to revising downward estimates of the risk of catastrophic outcomes from advanced AI?

Appendix: Selected comments from survey participants so far

“Refusing to respond to commands: I’m sorry Dave. I’m afraid I can’t do that” – William Marshall

“Refusal of commands, taking control of systems outside of scope of project, acting in secret of operators.” – Chris Gledhill

“When AI systems communicate using language or code which we cannot interpret or understand. When states lose overall control of critical national infrastructure.” – Anon

“Power-seeking behaviour, in regards to trying to further control its environment, to achieve outcomes.” – Brian Hunter

“The emergence of behavior that was not planned. There have already been instances of this in LLMs.” – Colin Smith

“Behaviour that cannot be satisfactorily explained. Also, requesting access or control of more systems that are fundamental to modern human life and/or are necessary for the AGI’s continued existence, e.g. semiconductor manufacturing.” – Simon

“There have already been harbingers of this kind of thing in the way algorithms have affected equity markets.” – Jenina Bas

“Hallucinating. ChatGPT is already beyond control it seems.” – Terry Raby

“The first signal might be a severe difficulty to roll back to a previous version of the AI’s core software.” – Tony Czarnecki

“[People seem to change there minds about what counts as surprising] For example Protein folding was heralded as such until large parts of it were solved.” – Josef

“Years ago I thought the Turing test was a good canary signal, but given recent progress that no longer seems likely. The transition is likely to be fast, especially from the perspective of relative outsiders. I’d like to see a list of things, even if I expect there will be no agreement.” – Anon

“Any potential ‘disaster’ will be preceded by wide scale adoption and incremental changes. I sincerely doubt we’ll be able to spot that ‘canary’” – Vid

“Nick Bostrom has proposed a qualitative ‘rate of change of intelligence’ as the ratio of ‘optimization power’ and ‘recalcitrance’ (in his book Superintelligence). Not catastrophic per se, of course, but hinting we are facing a real AGI and we might need to hit the pause button.” – Pasquale

“We already have plenty of non-AI systems running catastrophically beyond the control of humans for which drastic interventions are needed, and plenty of people refuse to recognize they are happening. So we need to solve this general problem. I do not have satisfactory answers how.” – Anon

Comments (1)

Older Posts »

5 February 2026

The hypothesis

An underlying driver of greater risk

A third option?

Challenging the spiral of pessimism

A complement not a replacement

Tackling two major risk factors in parallel

Appendix

23 December 2025

25 August 2025

The fundamental importance of governance

My own involvement

Govern AGI and/or Pause the development of AGI?

Why a pause isn’t that inconceivable

A necessary focus

Postscript: Opportunity at the United Nations

29 May 2025

About the UNCPGA

Consequences if no action is taken

The purpose envisioned for UN governance

Actions recommended

What happens next

Postscript: Jerome Glenn visiting London

3 April 2025

Nuclear war: A scenario

Competence and incompetence

The ethics of superintelligence

17 November 2024

9 June 2024

1. Too little, too late

The WannaCry warning

The Aum Shinrikyo warning

The 737 Max warning

The Lavender warning

The democracy distortion warning

The Covid-28 warning

The QAnon-29 warning

2. Governance failure modes

Distracted by political correctness

Outmanoeuvred by accelerationists

Misled by semantics

Blinded by overconfidence

Overwhelmed by bad psychology

A half-hearted coalition

3. Humanity ends

Chaos accelerates

Beyond the tipping point

4. Postscript

5. Appendix: alternative scenarios

6. Acknowledgements

2 March 2024

29 February 2024

1. Why would a superintelligent AI want to kill large numbers of humans?

2. Rely on an objective ethics?

3. Develop an AI that is not only superintelligent but also superwise?

4. Will a diverse collection of superintelligent AIs constrain each other?

5. Might superintelligent AIs decide to leave humans alone?

6. Avoid creating superintelligent AI?

7. Changing attitudes around the world?

8. Changing mental dispositions around the world?

9. Belt and braces: monitoring and sanctions?

10. A better understanding of what needs to be changed?

24 June 2023

Pages

Recent Posts

Archives

Recent Comments

Categories

Email Subscription