The Singularity Principles

6 November 2025

An audacious singularity analogy

Filed under: AGI, risks — Tags: BGI25, Singularity, The Singularity Principles — David Wood @ 10:10 am

Here’s the latest in my thinking about how humanity can most reliably obtain wonderful benefits from advanced AI – a situation I describe as sustainable superabundance for all – rather than the horrific outcomes of a negative technological singularity – a situation I describe as Catastrophic General Intelligence (CGI).

These thoughts have sharpened in my mind following conversations at the recent SingularityNET BGI 2025 summit in Istanbul, Türkiye.

My conclusion is that, in order to increase the likelihood of the profoundly positive fork on the road ahead, it is necessary but not sufficient to highlight the real and credible dangers of the truly awful negative fork on that same road.

Yes, it is essential to highlight how a very plausible extension of our current reckless trajectory, past accelerating tipping points, will plunge humanity into a situation that is wildly unstable, dangerously opaque, and impossible to rein back. Clarifying these seismic risks is necessary, not to induce a state of panic (which would be counterproductive) or doom (which would be psychologically enfeebling), but to cause minds to focus with great seriousness. Without a sufficient sense of urgency, any actions taken will be inadequate: “too little, too late”.

However, unless that climactic warning is accompanied by an uplifting positive message, the result is likely to be misery, avoidance, distraction, self-deception, and disinformation.

If the only message heard is “pause” or “sacrifice”, our brains are likely to rebel.

If people already appreciate that advanced AI has the potential to solve aging, climate change, and more, that’s not an option they will give up easily.

If such people see no credible alternative to the AI systems currently being produced by big tech companies (notwithstanding the opaque and inexplicable nature of these systems), they are likely to object to efforts to alter that trajectory, complaining that “Any attempt to steer the development of advanced AI risks people dying from aging!”

The way out of this impasse is to establish that new forms of advanced AI can be prioritised, which lack dangerous features such as autonomy, volition, and inscrutability – new forms of AI that will still be able to deliver, quickly, the kinds of solution (including all-round rejuvenation) that people wish to obtain from AGI.

Examples of these new forms of advanced AI include “Scientist AI” (to use a term favoured by Yoshua Bengio) and “Tool AI” (the term favoured by Anthony Aguirre). These new forms potentially also include AI delivered on the ASI:Chain being created by F1r3fly and SingularityNET (as featured in talks at BGI 2025), and AI using neural networks trained by predictive coding (as described by Faezeh Habibi at that same summit).

These new forms of AI have architectures designed for transparency, controllability, and epistemic humility, rather than self-optimising autonomy.

It’s when the remarkable potential of these new, safer, forms of AI becomes clearer, that more people can be expected to snap out of their head-in-the-sand opposition to steering and controlling AGI development.

Once I returned home from Istanbul, I wrote up my reflections on what I called “five of the best” talks at BGI 2025. These reflections ended with a rather audacious analogy, which I repeat here:

The challenge facing us regarding runaway development of AI beyond our understanding and beyond our control can be compared to a major controversy within the field of preventing runaway climate change. That argument runs as follows:

Existing patterns of energy use, which rely heavily on fuels that emit greenhouse gases, risk the climate reaching dangerous tipping points and transitioning beyond a “climate singularity” into an utterly unpredictable, chaotic, cataclysmically dangerous situation
However, most consumers of energy prefer dirty sources to clean (“green”) sources, because the former have lower cost and appear to be more reliable (in the short term at least)
Accordingly, without an autocratic world government (“yuk!”), there is almost no possibility of people switching away in sufficient numbers from dirty energy to clean energy
Some observers might therefore be tempted to hope that theories of accelerating climate change are mistaken, and that there is no dangerous “climate singularity” in the near future
In turn, that drives people to look for faults in parts of the climate change argumentation – cherry picking various potential anomalies in order to salve their conscience
BUT this miserable flow of thought can be disrupted once it is seen how clean energy can be lower cost than dirty energy
From this new perspective, there will be no need to plead with energy users to make sacrifices for the larger good; instead, these users will happily transition to abundant cleaner energy sources, for their short-term economic benefit as well as the longer-term environmental benefits.

You can likely see how a similar argument applies for safer development of trustworthy beneficial advanced AI:

Existing AGI development processes, which rely heavily on poorly understood neural networks trained by back propagation, risk AI development reaching dangerous tipping points (when AIs repeatedly self-improve) and transitioning beyond a “technological singularity” into an utterly unpredictable, chaotic, cataclysmically dangerous situation
However, most AI developers prefer opaque AI creation processes to transparent, explainable ones, because the former appear to produce more exciting results (in the short term at least)
Accordingly, without an autocratic world government (“yuk!”), there is almost no possibility of developers switching away from their current reckless “suicide race” to build AGI first
Some observers might therefore be tempted to hope that theories of AGI being “Unexplainable, Unpredictable, Uncontrollable” (as advanced for example by Roman Yampolskiy) are mistaken, and that there is no dangerous “technological singularity” in the future
In turn, that drives people to look for faults in the work of Yampolskiy, Yoshua Bengio, Eliezer Yudkowsky, and others, cherry picking various potential anomalies in order to salve their conscience
BUT this miserable flow of thought can be disrupted once it is seen how alternative forms of advanced AI can deliver the anticipated benefits of AGI without the terrible risks of currently dominant development methods
From this new perspective, there will be no need to plead with AGI developers to pause their research for the greater good; instead, these developers will happily transition to safer forms of AI development.

To be clear, this makes things appear somewhat too simple. In both cases, the complication is that formidable inertial forces will need to be overcome – deeply entrenched power structures that, for various pathological reasons, are hell-bent on preserving the status quo.

For that reason, the battle for truly beneficial advanced AI is going to require great fortitude as well as great skill – skill not only in technological architectures but also in human social and political dynamics.

And also to be clear, it’s a tough challenge to identify and describe the dividing line between safe advanced AI and dangerous advanced AI (AI with its own volition, autonomy, and desire to preserve itself – as well as AI that is inscrutable and unmonitorable). Indeed, transparency and non-autonomy are not silver bullets. But that’s a challenge which it is vital for us to accept and progress.

Footnote: I offer additional practical advice on anticipating and managing cataclysmically disruptive technologies in my book The Singularity Principles.

23 June 2023

The rise of AI: beware binary thinking

Filed under: AGI, risks — Tags: AGI, Max More, The Singularity Principles — David Wood @ 10:20 am

When Max More writes, it’s always worth paying attention.

His recent article Existential Risk vs. Existential Opportunity: A balanced approach to AI risk is no exception. There’s much in that article that deserves reflection.

Nevertheless, there are three key aspects where I see things differently.

The first is the implication that humanity has just two choices:

We are intimidated by the prospect of advanced AI going wrong, so we seek to stop the development and deployment of advanced AI
We appreciate the enormous benefits of advanced AI going right, so we hustle to obtain these benefits as quickly as possible.

From what Max writes, he suggests that an important aspect of winning over the doomsters in camp 1 is to emphasise the wonderful upsides of superintelligent AI.

In that viewpoint, instead of being preoccupied by thoughts of existential risk, we need to emphasise existential opportunity. Things could be a lot better than we have previously imagined, provided we’re not hobbled by doomster pessimism.

However, that binary choice omits the pathway that is actually the most likely to reach the hoped-for benefits of advanced AI. That’s the pathway of responsible development. It’s different from either of the options given earlier.

As an analogy, consider this scenario:

In our journey, we see a wonderful existential opportunity ahead – a lush valley, fertile lands, and gleaming mountain peaks soaring upward to a transcendent realm. But in front of that opportunity is a river of uncertainty, bordered by a swamp of uncertainty, perhaps occupied by hungry predators lurking in shadows.

Are there just two options?

We are intimidated by the possible dangers ahead, and decide not to travel any further
We fixate on the gleaming mountain peaks, and rush on regardless, belittling anyone who warns of piranhas, treacherous river currents, alligators, potential mud slides, and so on

Isn’t there a third option? To take the time to gain a better understanding of the lie of the land ahead. Perhaps there’s a spot, to one side, where it will be easier to cross the river. A spot where a stable bridge can be built. Perhaps we could even build a helicopter that can assist us over the strongest currents…

It’s the same with the landscape of our journey towards the sustainable superabundance that could be achieved, with the assistance of advanced AI, provided we act wisely.

That brings me to my second point of divergence with the analysis Max offers. It’s in the assessment of the nature of the risk ahead.

Max lists a number of factors and suggests they must ALL be true, in order for advanced AI to pose an existential risk. That justifies him in multiplying together probabilities, eventually achieving a very small number.

Heck, with such a small number, that river poses no risk worth worrying about!

But on the contrary, it’s not just a single failure scenario that we need to consider. There are multiple ways in which advanced AI can lead to catastrophe – if it is misconfigured, hacked, has design flaws, encounters an environment that its creators didn’t anticipate, interacts in unforeseen ways with other advanced AIs, etc, etc.

Thus it’s not a matter of multiplying probabilities (getting a smaller number each time). It’s a matter of adding probabilities (getting a larger number).

Quoting Rohit Krishnan, Max lists the following criteria, which he says must ALL hold for us to be concerned about AI catastrophe:

Probability the AI has “real intelligence”
Probability the AI is “of being “agentic”
Probability the AI has “ability to act in the world”
Probability the AI is “uncontrollable”
Probability the AI is “unique”
Probability the AI has “alien morality”
Probability the AI is “self-improving”
Probability the AI is “deceptive”

That’s a very limited view of future possibilities.

In contrast, in my own writings and presentations, I have outlined four separate families of failure modes. Here’s the simple form of the slide I often use:

And here’s the fully-built version of that slide:

To be clear, the various factors I list on this slide are additive rather than multiplicative.

Also to be clear, I’m definitely not pointing my finger at “bad AI” and saying that it’s AI, by itself, which could lead to our collective demise. Instead, what would cause that outcome would be a combination of adverse developments in two or more of the factors shown in red on this slide:

If you have questions about these slides, you can hear my narrative for them as part of the following video:

If you prefer to read a more careful analysis, I’ll point you at the book I released last year: The Singularity Principles: Anticipating and Managing Cataclysmically Disruptive Technologies.

To recap: those of us who are concerned about the risks of AI-induced catastrophe are, emphatically, not saying any of the following:

“We should give up on the possibility of existential opportunity”
“We’re all doomed, unless we stop all development of advanced AI”
“There’s nothing we could do, to improve the possibility of a wonderful outcome”.

Instead, Singularity Activism sees the possibility of steering the way AI is developed and deployed. That won’t be easy. But there are definitely important steps we can take.

That brings me to the third point where my emphasis differs from Max. Max offers this characterisation of what he calls “precautionary regulation”:

Forbidding trial and error, precautionary regulation reduces learning and reduces the benefits that could have been realized.

Regulations based on the precautionary principle block any innovation until it can be proved safe. Innovations are seen as guilty until proven innocent.

But regulation needn’t be like that. Regulation can, and should, be sensitive to the scale of potential failures. When failures are local – they would just cause “harm” – then there is merit in allowing these errors to occur, and to grow wiser as a result. But when there’s a risk of a global outcome – “ruin” – a different mentality is needed. Namely, the mentality of responsible development and Singularity Activism.

What’s urgently needed, therefore, is:

Deeper, thoughtful, investigation into the multiple scenarios in which failures of AI have ruinous consequences
Analysis of previous instances, in various industries, when regulation has been effective, and where it has gone wrong
A focus on the aspects of the rise of advanced AI for which there are no previous precedents
A clearer understanding, therefore, of how we can significantly raise the probability of finding a safe way across that river of uncertainty to the gleaming peaks of sustainable superabundance.

On that matter: If you have views on the transition from today’s AI to the much more powerful AI of the near future, I encourage you to take part in this open survey. Round 1 of that survey is still open. I’ll be designing Round 2 shortly, based on the responses received in Round 1.

Comments (1)

7 March 2023

What are the minimum conditions for software global catastrophe?

Filed under: AGI, risks — Tags: AGI, The Singularity Principles — David Wood @ 11:55 am

Should we be seriously concerned that forthcoming new software systems might cause a global catastrophe?

Or are there, instead, good reasons to dismiss any such concern?

(image by Midjourney)

It’s a vitally important public debate. Alas, this debate is bedevilled by false turnings.

For example, dismissers often make claims with this form:

The argument for being concerned assumes that such-and-such a precondition holds
But that precondition is suspect (or false)
Therefore the concern can be dismissed.

Here’s a simple example – which used to be common, though it appears less often these days:

The argument for being concerned assumes that Moore’s Law will hold for the next three decades
But Moore’s Law is slowing down
Therefore the concern can be dismissed.

Another one:

The argument for being concerned assumes that deep learning systems understand what they’re talking about
But by such-and-such a definition of understanding, these systems lack understanding
(They’re “just stochastic parrots”)
Therefore the concern can be dismissed.

Or a favourite:

You call these systems AI, meaning they’re (supposedly) artificially intelligent
But by such-and-such a definition of intelligence, these systems lack intelligence
Therefore the concern can be dismissed.

Perhaps the silliest example:

Your example of doom involves a software system that is inordinately obsessed with paperclips
But any wise philosopher would design an AI that has no such paperclip obsession
Therefore the concern can be dismissed.

My conclusion: those of us who are seriously concerned about the prospects of a software-induced global catastrophe should clarify what are the minimum conditions that would give rise to such a catastrophe.

To be clear, these minimum conditions don’t include the inexorability of Moore’s Law. Nor the conformance of software systems to particular academic models of language understanding. Nor that a fast take-off occurs. Nor that the software system becomes sentient.

Here’s my suggestion of these minimum conditions:

A software system that can influence, directly or indirectly (e.g. by psychological pressure) what happens in the real world
That has access, directly or indirectly, to physical mechanisms that can seriously harm humans
That operates in ways which we might fail to understand or anticipate
That can anticipate actions humans might take, and can calculate and execute countermeasures
That can take actions quickly enough (and/or stealthily enough) to avoid being switched off or reconfigured before catastrophic damage is done.

Even more briefly: the software system operates outside our understanding and outside our control, with potential devastating power.

I’ve chosen to use the term “software” rather than “AI” in order to counter a whole posse of dismissers right at the beginning of the discussion. Not even the smuggest of dismissers denies that software exists and can, indeed, cause harm when it contains bugs, is misconfigured, is hacked, or has gaps in its specification.

Critically, note that software systems often do have real-world impact. Consider the Stuxnet computer worm that caused centrifuges to speed up and destroy themselves inside Iran’s nuclear enrichment facilities. Consider the WannaCry malware that disabled critical hospital equipment around the world in 2017.

Present-day chatbots have already influenced millions of people around the world, via the ideas emerging in chat interactions. Just as people can make life-changing decisions after talking with human therapists or counsellors, people are increasingly taking life-changing decisions following their encounters with the likes of ChatGPT.

Software systems are already involved in the design and operation of military weapons. Presently, humans tend to remain “in the loop”, but military leaders are making the case for humans instead being just “on the loop”, in order for their defence systems to be able to move “at the speed of relevance”.

So the possibility of this kind of software shouldn’t be disputed.

It’s not just military weapons where the potential risk exists. Software systems can be involved with biological pathogens, or with the generation of hate-inducing fake news, or with geoengineering. Or with the manipulation of parts of our infrastructure that we currently only understand dimly, but which might turn out to be horribly fragile, when nudged in particular ways.

Someone wanting to dismiss the risk of software-induced global catastrophe therefore needs to make one or more of the following cases:

All such software systems will be carefully constrained – perhaps by tamperproof failsafe mechanisms that are utterly reliable
All such software systems will remain fully within human understanding, and therefore won’t take any actions that surprise us
All such software systems will fail to develop an accurate “theory of mind” and therefore won’t be able to anticipate human countermeasures
All such software systems will decide, by themselves, to avoid humans experiencing significant harm, regardless of which other goals are found to be attractive by the alien mental processes of that system.

If you still wish to dismiss the risk of software global catastrophe, which of these four cases do you wish to advance?

Or do you have something different in mind?

And can you also be sure that all such software systems will operate correctly, without bugs, configuration failures, gaps in their specification, or being hacked?

Case 2, by the way, includes the idea that “we humans will merge with software and will therefore remain in control of that software”. But in that case, how confident are you that:

Humans can speed up their understanding as quickly as the improvement rate of software systems that are free from the constraints of the human skull?
Any such “superintelligent” humans will take actions that avoid the same kinds of global catastrophe (after all, some of the world’s most dangerous people have intelligence well above the average)?

Case 4 includes the idea that at least some aspects of morality are absolute, and that a sufficiently intelligent piece of software will discover these principles. But in that case, how confident are you that:

The software will decide to respect these principles of morality, rather than (like many humans) disregarding them in order to pursue some other objectives?
That these fundamental principles of morality will include the preservation and flourishing of eight billion humans (rather than, say, just a small representative subset in a kind of future “zoo”)?

Postscript: My own recommendations for how to address these very serious risks are in The Singularity Principles. Spoiler alert: there are no magic bullets.

Comments (1)

26 February 2023

Ostriches and AGI risks: four transformations needed

Filed under: AGI, risks, Singularity, Singularity Principles — Tags: AGI, The Singularity Principles — David Wood @ 12:48 am

I confess to having been pretty despondent at various times over the last few days.

The context: increased discussions on social media triggered by recent claims about AGI risk – such as I covered in my previous blogpost.

The cause of my despondency: I’ve seen far too many examples of people with scant knowledge expressing themselves with unwarranted pride and self-certainty.

I call these people the AGI ostriches.

It’s impossible for AGI to exist, one of these ostriches squealed. The probability that AGI can exist is zero.

Anyone concerned about AGI risks, another opined, fails to understand anything about AI, and has just got their ideas from Hollywood or 1950s science fiction.

Yet another claimed: Anything that AGI does in the world will be the inscrutable cosmic will of the universe, so we humans shouldn’t try to change its direction.

Just keep your hand by the off switch, thundered another. Any misbehaving AGI can easily be shut down. Problem solved! You didn’t think of that, did you?

Don’t give the robots any legs, shrieked yet another. Problem solved! You didn’t think of that, did you? You fool!

It’s not the ignorance that depressed me. It was the lack of interest shown by the AGI ostriches regarding alternative possibilities.

I had tried to engage some of the ostriches in conversation. Try looking at things this way, I asked. Not interested, came the answer. Discussions on social media never change any minds, so I’m not going to reply to you.

Click on this link to read a helpful analysis, I suggested. No need, came the answer. Nothing you have written could possibly be relevant.

And the ostriches rejoiced in their wilful blinkeredness. There’s no need to look in that direction, they said. Keep wearing the blindfolds!

(The following image is by the Midjourney AI.)

But my purpose in writing this blogpost isn’t to complain about individual ostriches.

Nor is my purpose to lament the near-fatal flaws in human nature, including our many cognitive biases, our emotional self-sabotage, and our perverse ideological loyalties.

Instead, my remarks will proceed in a different direction. What most needs to change isn’t the ostriches.

It’s the community of people who want to raise awareness of the catastrophic risks of AGI.

That includes me.

On reflection, we’re doing four things wrong. Four transformations are needed, urgently.

Without these changes taking place, it won’t be surprising if the ostriches continue to behave so perversely.

(1) Stop tolerating the Singularity Shadow

When they briefly take off their blindfolds, and take a quick peak into the discussions about AGI, ostriches often notice claims that are, in fact, unwarranted.

These claims confuse matters. They are overconfident claims about what can be expected about the advent of AGI, also known as the Technological Singularity. These claims form part of what I call the Singularity Shadow.

There are seven components in the Singularity Shadow:

Singularity timescale determinism
Singularity outcome determinism
Singularity hyping
Singularity risk complacency
Singularity term overloading
Singularity anti-regulation fundamentalism
Singularity preoccupation

If you’ve not come across the concept before, here’s a video all about it:

Or you can read this chapter from The Singularity Principles on the concept: “The Singularity Shadow”.

People who (like me) point out the dangers of badly designed AGI often too easily make alliances with people in the Singularity Shadow. After all, both groups of people:

Believe that AGI is possible
Believe that AGI might happen soon
Believe that AGI is likely to be cause an unprecedented transformation in the human condition.

But the Singularity Shadow causes far too much trouble. It is time to stop being tolerant of its various confusions, wishful thinking, and distortions.

To be clear, I’m not criticising the concept of the Singularity. Far from it. Indeed, I consider myself a singularitarian, with the meaning I explain here. I look forward to more and more people similarly adopting this same stance.

It’s the distortions of that stance that now need to be countered. We must put our own house in order. Sharply.

Otherwise the ostriches will continue to be confused.

(2) Clarify the credible risk pathways

The AI paperclip maximiser has had its day. It needs to be retired.

Likewise the cancer-solving AI that solves cancer by, perversely, killing everyone on the planet.

Likewise the AI that “rescues” a woman from a burning building by hurling her out of the 20th floor window.

In the past, these thought experiments all helped the discussion about AGI risks, among people who were able to see the connections between these “abstract” examples and more complicated real-world scenarios.

But as more of the general public shows an interest in the possibilities of advanced AI, we urgently need a better set of examples. Explained, not by mathematics, nor by cartoonish simplifications, but in plain everyday language.

I’ve tried to offer some examples, for example in the section “Examples of dangers with uncontrollable AI” in the chapter “The AI Control Problem” of my book The Singularity Principles.

But it seems these scenarios still fail to convince. The ostriches find themselves bemused. Oh, that wouldn’t happen, they say.

So this needs more work. As soon as possible.

I anticipate starting from themes about which even the most empty-headed ostrich occasionally worries:

The prospects of an arms race involving lethal autonomous weapons systems
The risks from malware that runs beyond the control of the people who originally released it
The dangers of geoengineering systems that seek to manipulate the global climate
The “gain of function” research which can create ultra-dangerous pathogens
The side-effects of massive corporations which give priority to incentives such as “increase click-through”
The escalation in hatred stirred up by automated trolls with more ingenious “fake social media”

On top of these starting points, the scenarios I envision mix in AI systems with increasing power and increasing autonomy – AI systems which are, however, incompletely understood by the people who deploy them, and which might manifest terrible bugs in unexpected circumstances. (After all, AIs include software, and software generally contains bugs.)

If there’s not already a prize competition to encourage clearer communication of such risk scenarios, in ways that uphold credibility as well as comprehensibility, there should be!

(3) Clarify credible solution pathways

Even more important than clarifying the AGI risk scenarios is to clarify some credible pathways to managing these risks.

Without seeing such solutions, ostriches go into an internal negative feedback loop. They think to themselves as follows:

Any possible solution to AGI risks seems unlikely to be successful
Any possible solution to AGI risks seems likely to have bad consequences in its own right
These thoughts are too horrible to contemplate
Therefore we had better believe the AGI risks aren’t actually real
Therefore anyone who makes AGI risks seem real needs to be silenced, ridiculed, or mocked.

Just as we need better communication of AGI risk scenarios, we need better communication of positive examples that are relevant to potential solutions:

Examples of when society collaborated to overcome huge problems which initially seemed impossible
Successful actions against the tolerance of drunk drivers, against dangerous features in car design, against the industrial pollutants which caused acid rain, and against the chemicals which depleted the ozone layer
Successful actions by governments to limit the powers of corporate monopolies
The de-escalation by Ronald Reagan and Mikhail Gorbachev of the terrifying nuclear arms race between the USA and the USSR.

But we also need to make it clearer how AGI risks can be addressed in practice. This includes a better understanding of:

Options for AIs that are explainable and interpretable – with the aid of trusted tools built from narrow AI
How AI systems can be designed to be free from the unexpected “emergence” of new properties or subgoals
How trusted monitoring can be built into key parts of our infrastructure, to provide early warnings of potential AI-induced catastrophic failures
How powerful simulation environments can be created to explore potential catastrophic AI failure modes (and solutions to these issues) in the safety of a virtual model
How international agreements can be built up, initially from a “coalition of the willing”, to impose powerful penalties in cases when AI is developed or deployed in ways that violate agreed standards
How research into AGI safety can be managed much more effectively, worldwide, than is presently the case.

Again, as needed, significant prizes should be established to accelerate breakthroughs in all these areas.

(4) Divide and conquer

The final transformation needed is to divide up the overall huge problem of AGI safety into more manageable chunks.

What I’ve covered above already suggests a number of vitally important sub-projects.

Specifically, it is surely worth having separate teams tasked with investigating, with the utmost seriousness, a range of potential solutions for the complications that advanced AI brings to each of the following:

The prospects of an arms race involving lethal autonomous weapons systems
The risks from malware that runs beyond the control of the people who originally released it
The dangers of geoengineering systems that seek to manipulate the global climate
The “gain of function” research which can create ultra-dangerous pathogens
The side-effects of massive corporations which give priority to incentives such as “increase click-through”
The escalation in hatred stirred up by automated trolls with more ingenious “fake social media”

(Yes, these are the same six scenarios for catastrophic AI risk that I listed in section (2) earlier.)

Rather than trying to “boil the entire AGI ocean”, these projects each appear to require slightly less boiling.

Once candidate solutions have been developed for one or more of these risk scenarios, the outputs from the different teams can be compared with each other.

What else should be added to the lists above?

Comments (2)

23 February 2023

Nuclear-level catastrophe: four responses

Filed under: AGI, risks, Singularity Principles — Tags: AGI, Ben Goertzel, Eliezer Yudkowsky, London Futurists, London Futurists Podcast, The Singularity Principles — David Wood @ 2:11 pm

36% of respondents agree that it is plausible that AI could produce catastrophic outcomes in this century, on the level of all-out nuclear war.

That’s 36% of a rather special group of people. People who replied to this survey needed to meet the criterion of being a named author on at least two papers published in the last three years in accredited journals in the field of Computational Linguistics (CL) – the field sometimes also known as NLP (Natural Language Processing).

The survey took place in May and June 2022. 327 complete responses were received, by people matching the criteria.

A full report on this survey (31 pages) is available here (PDF).

Here’s a screenshot from page 10 of the report, illustrating the answers to questions about Artificial General Intelligence (AGI):

You can see the responses to question 3-4. 36% of the respondents either “agreed” or “weakly agreed” with the statement that

It is plausible that decisions made by AI or machine learning systems could cause a catastrophe this century that is at least as bad as an all-out nuclear war.

That statistic is a useful backdrop to discussions stirred up in the last few days by a video interview given by polymath autodidact and long-time AGI risk researcher Eliezer Yudkowsky:

The publishers of that video chose the eye-catching title “we’re all gonna die”.

If you don’t want to spend 90 minutes watching that video – or if you are personally alienated by Eliezer’s communication style – here’s a useful twitter thread summary by Liron Shapira:

Hey what if AI is going to literally slaughter every living creature on this planet in the next 3 years?

Watch @ESYudkowsky’s new interview on @BanklessHQ and see why that's not even a joke 🤯😵https://t.co/Yk8CKHLwVE

🧵 Here are my notes and abridged clips:
— Liron Shapira (@liron) February 21, 2023

In contrast to the question posed in the NLP survey I mentioned earlier, Eliezer isn’t thinking about “outcomes of AGI in this century“. His timescales are much shorter. His “ballpark estimate” for the time before AGI arrives is “3-15 years”.

So, doctor, how long do we have before superintelligent AGI?

Eliezer's ballpark estimate is 3-15 years.

But he points out that even top researchers can't necessarily distinguish whether the timeline of a future technological breakthrough will be a couple years, or many decades. pic.twitter.com/yCy5moqpcG
— Liron Shapira (@liron) February 21, 2023

How are people reacting to this sombre prediction?

More generally, what responses are there to the statistic that, as quoted above,

36% of respondents agree that it is plausible that AI could produce catastrophic outcomes in this century, on the level of all-out nuclear war.

I’ve seen a lot of different reactions. They break down into four groups: denial, sabotage, trust, and hustle.

1. Denial

One example of denial is this claim: We’re nowhere near an understanding the magic of human minds. Therefore there’s no chance that engineers are going to duplicate that magic in artificial systems.

I have two counters:

The risks of AGI arise, not because the AI may somehow become sentient, and take on the unpleasant aspects of alpha male human nature. Rather, the risks arise from systems that operate beyond our understanding and outside our control, and which may end up pursuing objectives different from the ones we thought (or wished) we had programmed into them
Many systems have been created over the decades without the underlying science being fully understood. Steam engines predated the laws of thermodynamics. More recently, LLMs (Large Language Model AIs) have demonstrated aspects of intelligence that the designers of these systems had not anticipated. In the same way, AIs with some extra features may unexpectedly tip over into greater general intelligence.

Another example of denial: Some very smart people say they don’t believe that AGI poses risks. Therefore we don’t need to pay any more attention to this stupid idea.

My counters:

The mere fact that someone very smart asserts an idea – likely outside of their own field of special expertise – does not confirm the idea is correct
None of these purported objections to the possibility of AGI risk holds water (for a longer discussion, see my book The Singularity Principles).

Digging further into various online discussion threads, I caught the impression that what was motivating some of the denial was often a terrible fear. The people loudly proclaiming their denial were trying to cope with depression. The thought of potential human extinction within just 3-15 years was simply too dreadful for them to contemplate.

It’s similar to how people sometimes cope with the death of someone dear to them. There’s a chance my dear friend has now been reunited in an afterlife with their beloved grandparents, they whisper to themselves. Or, It’s sweet and honourable to die for your country: this death was a glorious sacrifice. And then woe betide any uppity humanist who dares to suggests there is no afterlife, or that patriotism is the last refuge of a scoundrel!

Likewise, woe betide any uppity AI risk researcher who dares to suggest that AGI might not be so benign after all! Deny! Deny!! Deny!!!

(For more on this line of thinking, see my short chapter “The Denial of the Singularity” in The Singularity Principles.)

A different motivation for denial is the belief that any sufficient “cure” to the risk of AGI catastrophe would be worse than the risk it was trying to address. This line of thinking goes as follows:

A solution to AGI risk will involve pervasive monitoring and widespread restrictions
That monitoring and restrictions will only be possible if an autocratic world government is put in place
Any autocratic world government would be absolutely terrible
Therefore, the risk of AGI can’t be that bad after all.

I’ll come back later to the flaws in that particular argument. (In the meantime, see if you can spot what’s wrong.)

2. Sabotage

In the video interview, Eliezer made one suggestion for avoiding AGI catastrophe: Destroy all the GPU server farms.

These vast collections of GPUs (a special kind of computing chip) are what enables the training of many types of AI. If these chips were all put out of action, it would delay the arrival of AGI, giving humanity more time to work out a better solution to coexisting with AGI.

Another suggestion Eliezer makes is that the superbright people who are currently working flat out to increase the capabilities of their AI systems should be paid large amounts of money to do nothing. They could lounge about on a beach all day, and still earn more money than they are currently receiving from OpenAI, DeepMind, or whoever is employing them. Once again, that would slow down the emergence of AGI, and buy humanity more time.

I’ve seen other similar suggestions online, which I won’t repeat here, since they come close to acts of terrorism.

All these suggestions have in common: let’s find ways to stop the development of AI in its tracks, all across the world. Companies should be stopped in their tracks. Shadowy military research groups should be stopped in their tracks. Open source hackers should be stopped in their tracks. North Korean ransomware hackers must be stopped in their tracks.

This isn’t just a suggestion that specific AI developments should be halted, namely those with an explicit target of creating AGI. Instead, it recognises that the creation of AGI might occur via unexpected routes. Improving the performance of various narrow AI systems, including fact-checking, or emotion recognition, or online request interchange marketplaces – any of these might push the collection of AI modules over the critical threshold. Mixing metaphors, AI could go nuclear.

Shutting down all these research activities seems a very tall order. Especially since many of the people who are currently working flat out to increase AI capabilities are motivated, not by money, but by the vision that better AI could do a tremendous amount of good in the world: curing cancer, solving nuclear fusion, improving agriculture by leaps and bounds, and so on. They’re not going to be easy to persuade to change course. For them, there’s a lot more at stake than money.

I have more to say about the question “To AGI or not AGI” in this chapter. In short, I’m deeply sceptical.

In response, a would-be saboteur may admit that their chances of success are low. But what do you suggest instead, they will ask.

Read on.

3. Trust

Let’s start again from the statistic that 36% of the NLP survey respondents agreed, with varying degrees of confidence, that advanced AI could trigger a catastrophe as bad as an all-out nuclear war some time this century.

It’s a pity that the question wasn’t asked with shorter timescales. Comparing the chances of an AI-induced global catastrophe in the next 15 years with one in the next 85 years:

The longer timescale makes it more likely that AGI will be developed
The shorter timescale makes it more likely that AGI safety research will still be at a primitive (deeply ineffective) level.

Even since the date of the survey – May and June 2022 – many forecasters have shortened their estimates of the likely timeline to the arrival of AGI.

So, for the sake of the argument, let’s suppose that the risk of an AI-induced global catastrophe happening by 2038 (15 years from now) is 1/10.

There are two ways to react to this:

1/10 is fine odds. I feel lucky. What’s more, there are plenty of reasons we ought to feel lucky about
1/10 is terrible odds. That’s far too high a risk to accept. We need to hustle to find ways to change these odds in our favour.

I’ll come to the hustle response in a moment. But let’s first consider the trust response.

A good example is in this comment from SingularityNET founder and CEO Ben Goertzel:

Eliezer is a very serious thinker on these matters and was the core source of most of the ideas in Nick Bostrom’s influential book Superintelligence. But ever since I met him, and first debated these issues with him, back in 2000 I have felt he had a somewhat narrow view of humanity and the universe in general.

There are currents of love and wisdom in our world that he is not considering and seems to be mostly unaware of, and that we can tap into by creating self reflective compassionate AGIs and doing good loving works together with them.

In short, rather than fearing humanity, we should learn to trust humanity. Rather than fearing what AGI will do, we should trust that AGI can do wonderful things.

You can find a much longer version of Ben’s views in the review he wrote back in 2015 of Superintelligence. It’s well worth reading.

What are the grounds for hope? Humanity has come through major challenges in the past. Even though the scale of the challenge is more daunting on this occasion, there are also more people contributing ideas and inspiration than before. AI is more accessible than nuclear weapons, which increases the danger level, but AI could also be deployed as part of the solution, rather than just being a threat.

Another idea is that if an AI looks around for data teaching it which values to respect and uphold, it will find plenty of positive examples in great human literature. OK, that literature also includes lots of treachery, and different moral codes often conflict, but a wise AGI should be able to see through all these conclusions to discern the importance of defending human flourishing. OK, much of AI training at the moment focuses on deception, manipulation, enticement, and surveillance, but, again, we can hope that a wise AGI will set aside those nastier aspects of human behaviour. Rather than aping trolls or clickbait, we can hope that AGI will echo the better angels of human nature.

It’s also possible that, just as DeepMind’s AlphaGo Zero worked out by itself, without any human input, superior strategies at the board games Go and Chess, a future AI might work out, by itself, the principles of universal morality. (That’s assuming such principles exist.)

We would still have to hope, in such a case, that the AI that worked out the principles of universal morality would decide to follow these principles, rather than having some alternative (alien) ways of thinking.

But surely hope is better than despair?

To quote Ben Goertzel again:

Despondence is unwarranted and unproductive. We need to focus on optimistically maximizing odds of a wildly beneficial Singularity together.

My view is the same as expressed by Berkeley professor of AI Stuart Russell, in part of a lengthy exchange with Steven Pinker on the subject of AGI risks:

The meta argument is that if we don’t talk about the failure modes, we won’t be able to address them…

Just like in nuclear safety, it’s not against the rules to raise possible failure modes like, what if this molten sodium that you’re proposing should flow around all these pipes? What if it ever came into contact with the water that’s on the turbine side of the system? Wouldn’t you have a massive explosion which could rip off the containment and so on? That’s not exactly what happened in Chernobyl, but not so dissimilar…

The idea that we could solve that problem without even mentioning it, without even talking about it and without even pointing out why it’s difficult and why it’s important, that’s not the culture of safety. That’s sort of more like the culture of the communist party committee in Chernobyl, that simply continued to assert that nothing bad was happening.

(By the way, my sympathies in that long discussion, when it comes to AGI risk, are approximately 100.0% with Russell and approximately 0.0% with Pinker.)

4. Hustle

The story so far:

The risks are real (though estimates of their probability vary)
Some possible “solutions” to the risks might produce results that are, by some calculations, worse than letting AGI take its own course
If we want to improve our odds of survival – and, indeed, for humanity to reach something like a sustainable superabundance with the assistance of advanced AIs – we need to be able to take a clear, candid view of the risks facing us
Being naïve about the dangers we face is unlikely to be the best way forward
Since time may be short, the time to press for better answers is now
We shouldn’t despair. We should hustle.

Some ways in which research could generate useful new insight relatively quickly:

When the NLP survey respondents expressed their views, what reasons did they have for disagreeing with the statement? And what reasons did they have for agreeing with it? And how do these reasons stand up, in the cold light of a clear analysis? (In other words, rather than a one-time survey, an iterative Delphi survey should lead to deeper understanding.)
Why have the various AI safety initiatives formed in the wake of the Puerto Rico and Asilomar conferences of 2015 and 2017 fallen so far short of expectations?
Which descriptions of potential catastrophic AI failure modes are most likely to change the minds of those critics who currently like to shrug off failure scenarios as “unrealistic” or “Hollywood fantasy”?

Constructively, I invite conversation on the strengths and weaknesses of the 21 Singularity Principles that I have suggested as contributing to improving the chances of beneficial AGI outcomes.

For example:

Can we identify “middle ways” that include important elements of global monitoring and auditing of AI systems, without collapsing into autocratic global government?
Can we improve the interpretability and explainability of advanced AI systems (perhaps with the help of trusted narrow AI tools), to diminish the risks of these systems unexpectedly behaving in ways their designers failed to anticipate?
Can we deepen our understanding of the ways new capabilities “emerge” in advanced AI systems, with a particular focus on preventing the emergence of alternative goals?

I also believe we should explore more fully the possibility that an AGI will converge on a set of universal values, independent of whatever training we provide it – and, moreover, the possibility that these values will include upholding human flourishing.

And despite me saying just now that these values would be “independent of whatever training we provide”, is there, nevertheless, a way for us to tilt the landscape so that the AGI is more likely to reach and respect these conclusions?

Postscript

To join me in “camp hustle”, visit Future Surge, which is the activist wing of London Futurists.

If you’re interested in the ideas of my book The Singularity Principles, here’s a podcast episode in which Calum Chace and I discuss some of these ideas more fully.

In a subsequent episode of our podcast, Calum and I took another look at the same topics, this time with Millennium Project Executive Director Jerome Glenn: “Governing the transition to AGI”.

Comments (9)

19 December 2022

Rethinking

Filed under: AGI, politics, Singularity Principles — Tags: AGI, Future Surge, The Singularity Principles — David Wood @ 2:06 am

I’ve been rethinking some aspects of AI control and AI alignment.

In the six months since publishing my book The Singularity Principles: Anticipating and Managing Cataclysmically Disruptive Technologies, I’ve been involved in scores of conversations about the themes it raises. These conversations have often brought my attention to fresh ideas and different perspectives.

These six months have also seen the appearance of numerous new AI models with capabilities that often catch observers by surprise. The general public is showing a new willingness (at least some of the time) to consider the far-reaching implications of these AI models and their more powerful successors.

People from various parts of my past life have been contacting me. The kinds of things they used to hear me forecasting – the kinds of things they thought, at the time, were unlikely to ever happen – are becoming more credible, more exciting, and, yes, more frightening.

They ask me: What is to be done? And, pointedly, Why aren’t you doing more to stop the truly bad outcomes that now seem ominously likely?

The main answer I give is: read my book. Indeed, you can find all the content online, spread out over a family of webpages.

Indeed, my request is that people should read my book all the way through. That’s because later chapters of that book anticipate questions that tend to come to readers’ minds during earlier chapters, and try to provide answers.

Six months later, although I would give some different (newer) examples were I to rewrite that book today, I stand by the analysis I offered and the principles I championed.

However, I’m inclined to revise my thinking on a number of points. Please find these updates below.

An option to control superintelligent AI

I remain doubtful about the prospects for humans to retain control of any AGI (Artificial General Intelligence) that we create.

That is, the arguments I gave in my chapter “The AI Control Problem” still look strong to me.

But one line of thinking may have some extra mileage. That’s the idea of keeping AGI entirely as an advisor to humans, rather than giving it any autonomy to act directly in the world.

Such an AI would provide us with many recommendations, but it wouldn’t operate any sort of equipment.

More to the point: such an AI would have no desire to operate any sort of equipment. It would have no desires whatsoever, nor any motivations. It would simply be a tool. Or, to be more precise, it would simply be a remarkable tool.

In The Singularity Principles I gave a number of arguments why that idea is unsustainable:

Some decisions require faster responses than slow-brained humans can provide; that is, AIs with direct access to real-world levers and switches will be more effective than those that are merely advisory
Smart AIs will inevitably develop “subsidiary goals” (intermediate goals) such as having greater computational power, even when there is no explicit programming for such goals
As soon as a smart AI acquires any such subsidiary goal, it will find ways to escape any confinement imposed by human overseers.

But I now think this should be explored more carefully. Might a useful distinction be made between:

AIs that do have direct access to real-world levers and switches – with the programming of such AIs being carefully restricted to narrow lines of thinking
AIs with more powerful (general) capabilities, that operate purely in advisory capacities.

In that case, the damage that could be caused by failures of the first type of AI, whilst significant, would not involve threats to the entirety of human civilisation. And failures of the second type of AI would be restricted by the actions of humans as intermediaries.

This approach would require confidence that:

The capabilities of AIs of the first type will remain narrow, despite competitive pressures to give these systems at least some extra rationality
The design of AIs of the second type will prevent the emergence of any dangerous “subsidiary goals”.

As a special case of the second point, the design of these AIs will need to avoid any risk of the systems developing sentience or intrinsic motivation.

These are tough challenges – especially since we still have only a vague understanding of how desires and/or sentience can emerge as smaller systems combine and evolve into larger ones.

But since we are short of other options, it’s definitely something to be considered more fully.

An option for automatically aligned superintelligence

If controlling an AGI turns out to be impossible – as seems likely – what about the option that an AGI will have goals and principles that are fundamentally aligned with human wellbeing?

In such a case, it will not matter if an AGI is beyond human control. The actions it takes will ensure that humans have a very positive future.

The creation of such an AI – sometimes called a “friendly AI” – remains my best hope for humanity’s future.

However, there are severe difficulties in agreeing and encoding “goals and principles that are fundamentally aligned with human wellbeing”. I reviewed these difficulties in my chapter “The AI Alignment Problem”.

But what if such goals and principles are somehow part of an objective reality, awaiting discovery, rather than needing to be invented? What if something like the theory of “moral realism” is true?

In this idea, a principle like “treat humans well” would follow from some sort of a priori logical analysis, a bit like the laws of mathematics (such as the fact, discovered by one of the followers of Pythagoras, that the square root of two is an irrational number).

Accordingly, a sufficiently smart AGI would, all being well, reach its own conclusion that humans ought to be well treated.

Nevertheless, even in this case, significant risks would remain:

The principle might be true, but an AGI might not be motivated to discover it
The principle might be true, but an AGI, despite its brilliance, may fail to discover it
The principle might be true, and an AGI might recognise it, but it may take its own decision to ignore it – like the way that we humans often act in defiance of what we believe at the time to be overarching moral principles

The design criteria and initial conditions that we humans provide for an AGI may well influence the outcome of these risk factors.

I plan to return to these weighty matters in a future blog post!

Two different sorts of control

I’ve come to realise that there are not one but two questions of control of AI:

Can we humans retain control of an AGI that we create?
Can society as a whole control the actions of companies (or organisations) that may create an AGI?

Whilst both these control problems are profoundly hard, the second is less hard.

Moreover, it’s the second problem which is the truly urgent one.

This second control problem involves preventing teams inside corporations (and other organisations) from rushing ahead without due regard to questions of the potential outcomes of their work.

It’s the second control problem that the 21 principles which I highlight in my book are primarily intended to address.

When people say “it’s impossible to solve the AI control problem”, I think they may be correct regarding the first problem, but I passionately believe they’re wrong concerning the second problem.

The importance of psychology

When I review what people say about the progress and risks of AI, I am frequently struck by the fact that apparently intelligent people are strongly attached to views that are full of holes.

When I try to point out the flaws in their thinking, they hardly seem to pause in their stride. They portray a stubborn confidence that they are sure they are correct.

What’s at play here is more than logic. It’s surely a manifestation of humanity’s often defective psychology.

My book includes a short chapter “The denial of the Singularity” which touched on various matters of psychology. If I were to rewrite my book today, I believe that chapter would become larger, and that psychological themes would be spread more widely throughout the book.

Of course, noticing psychological defects is only the start of making progress. Circumventing or transcending these defects is an altogether harder question. But it’s one that needs a lot more attention.

The option of merging with AI

How can we have a better, more productive conversation about anticipating and managing AGI?

How can we avoid being derailed by ineffective arguments, hostile rhetoric, stubborn prejudices, hobby-horse obsessions, outdated ideologies, and (see the previous section) flawed psychology?

How might our not-much-better-than-monkey brains cope with the magnitude of these questions?

One possible answer is that technology can help us (so long as we use it wisely).

For example, the chapter “Uplifting politics”, from near the end of my book, listed ten ways for “technology improving politics”.

More broadly, we humans have the option to selectively deploy some aspects of technology to improve our capabilities in handling other aspects of technology.

We must recognise that technology is no panacea. But it can definitely make a big difference.

Especially if we restrict ourselves to putting heavy reliance only on those technologies – narrow technologies – whose mode of operation we fully understand, and where risks of malfunction can be limited.

This forms part of a general idea that “we humans don’t need to worry about being left behind by robots, or about being subjugated by robots, since we will be the robots”.

As I put it in the chapter “No easy solutions” in my book,

If humans merge with AI, humans could remain in control of AIs, even as these AIs rapidly become more powerful. With such a merger in place, human intelligence will automatically be magnified, as AI improves in capability. Therefore, we humans wouldn’t need to worry about being left behind.

Now I’ve often expressed strong criticisms of this notion of merger. I still believe these criticisms are sound.

But what these criticisms show is that any such merger cannot be the entirety of our response to the prospect of the emergence of AGI. They can only be part of the solution. That’s especially true because humans-augmented-by-technology are still very likely to lag behind pure technology systems, until such time as human minds might be removed from biological skulls and placed into new silicon hosts. That’s something that I’m not expecting to happen before the arrival of AGI, so it will be too late to solve (by itself) the problems of AI alignment and control.

(And since you ask, I probably won’t be in any hurry, even after the arrival of AGI, for my mind to be removed from my biological skull. I guess I might rethink that reticence in due course. But that’s rethinking for another day.)

The importance of politics

Any serious discussion about managing cataclysmically disruptive technologies (such as advanced AIs) pretty soon rubs up against the questions of politics.

That’s not just small-p “politics” – questions of how to collaborate with potential partners where there are many points of disagreement and even dislike.

It’s large-P “Politics” – interacting with presidents, prime ministers, cabinets, parliaments, and so on.

Questions of large-P politics occur throughout The Singularity Principles. My thoughts now, six months afterwards, is that even more focus should be placed on the subject of improving politics:

Helping politics to escape the clutches of demagogues and autocrats
Helping politics to avoid stultifying embraces between politicians and their “cronies” in established industries
Ensuring that the best insights and ideas of the whole electorate can rise to wide attention, without being quashed or distorted by powerful incumbents
Bringing everyone involved in politics rapidly up-to-date with the real issues regarding cataclysmically disruptive technologies
Distinguishing effective regulations and incentives from those that are counter-productive.

As 2022 has progressed, I’ve seen plenty new evidence of deep problems within political systems around the world. These problems were analysed with sharp insight in the book The Revenge of Power by Moisés Naím that I recently identified as “the best book that I read in 2022”.

Happily, as well as evidence of deep problems in our politics worldwide, there are also encouraging signs, as well as sensible plans for improvement. You can find some of these plans inside the book by Naím, and, yes, I offer suggestions in my own book too.

To accelerate improvements in politics was one of the reasons I created Future Surge a few months back. That’s an initiative on which I expect to spend a lot more of my time in 2023.

Note: the image underlying the picture at the top of this article was created by DALL.E 2 from the prompt “A brain with a human face on it rethinks, vivid stormy sky overhead, photorealistic style”.

Comments (2)

3 November 2022

Four options for avoiding an AI cataclysm

Filed under: AGI, podcast, Singularity Principles — Tags: dw2blog, London Futurists Podcast, The Singularity Principles — David Wood @ 9:56 pm

Let’s consider four hard truths, and then four options for a solution.

Hard truth 1: Software has bugs.

Even when clever people write the software, and that software passes numerous verification tests, any complex software system generally still has bugs. If the software encounters a circumstance outside its verification suite, it can go horribly wrong.

Hard truth 2: Just because software becomes more powerful, that won’t make all the bugs go away.

Newer software may run faster. It may incorporate input from larger sets of training data. It may gain extra features. But none of these developments mean the automatic removal of subtle errors in the logic of the software, or shortcomings in its specification. It might still reach terrible outcomes – just quicker than before!

Hard truth 3: As AI becomes more powerful, there will be more pressure to deploy it in challenging real-world situations.

Consider the real-time management of:

Complex arsenals of missiles, anti-missile missiles, and so on
Geoengineering interventions, which are intended to bring the planet’s climate back from the brink of a cascade of tipping points
Devious countermeasures against the growing weapons systems of a group (or nation) with a dangerously unstable leadership
Social network conversations, where changing sentiments can have big implications for electoral dynamics or for the perceived value of commercial brands
Ultra-hot plasmas inside whirling magnetic fields in nuclear fusion energy generators
Incentives for people to spend more money than is wise, on addictive gambling sites
The buying and selling of financial instruments, to take advantage of changing market sentiments.

In each case, powerful AI software could be a very attractive option. A seductive option. Especially if it has been written by clever people, and appears to have a good track record of delivering results.

Until it goes wrong. In which case the result could be cataclysmic. (Accidental nuclear war. The climate walloped past a tipping point in the wrong direction. Malware going existentially wrong. Partisan outrage propelling a psychological loose cannon over the edge. Easy access to weapons of mass destruction. Etc.)

Indeed, the real risk of AI cataclysm – as opposed to the Hollywood version of any such risk – is that an AI system may acquire so much influence over human society and our surrounding environment that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

Implementation defect
Design defect
Design overridden
Implementation overridden.

Hard truth 4: There are no simple solutions to the risks described above.

What’s more, people who naively assume that a simple solution can easily be put in place (or already exists) are making the overall situation worse. They encourage complacency, whereas greater attention is urgently needed.

But perhaps you disagree?

That’s the context for the conversation in Episode 11 of the London Futurists Podcast, which was published yesterday morning.

In just thirty minutes, that episode dug deep into some of the ideas in my recent book The Singularity Principles. Co-host Calum Chace and I found plenty on which to agree, but had differing opinions on one of the most important questions.

Calum listed three suggestions that people sometimes make for how the dangers of potentially cataclysmic AI might be handled.

In response, I described a different approach – something that Calum said would be a fourth idea for a solution. As you can hear from the recording of the podcast, I evidently left him unconvinced.

Therefore, I’d like to dig even deeper.

Option 1: Humanity gets lucky

It might be the case that AI software that is smart enough, will embody an unshakeable commitment toward humanity having the best possible experience.

Such software won’t miscalculate (after all, it is superintelligent). If there are flaws in how it has been specified, it will be smart enough to notice these flaws, rather than stubbornly following through on the letter of its programming. (After all, it is superintelligent.)

Variants of this wishful thinking exist. In some variants, what will guarantee a positive outcome isn’t just a latent tendency of superintelligence toward superbenevolence. It’s the invisible hand of the free market that will guide consumer choices away from software that might harm users, toward software that never, ever, ever goes wrong.

My response here is that software which appears to be bug free can, nevertheless, harbour deep mistakes. It may be superintelligent, but that doesn’t mean it’s omniscient or infallible.

Second, software which is bug free may be monstrously efficient at doing what some of its designers had in mind – manipulating consumers into actions which increase the share price of a given corporation, despite all the externalities arising.

Moreover, it’s too much of a stretch to say that greater intelligence always makes your wiser and kinder. There are plenty of dreadful counterexamples, from humans in the worlds of politics, crime, business, academia, and more. Who is to say that a piece of software with an IQ equivalent to 100,000 will be sure to treat us humans any better than we humans sometimes treat swarms of insects (e.g. ant colonies) that get in our way?

Do you feel lucky? My view is that any such feeling, in these circumstances, is rash in the extreme.

Option 2: Safety engineered in

Might a team of brilliant AI researchers, Mary and Flo (to make up a couple of names), devise a clever method that will ensure their AI (once it is built) never harms humanity?

Perhaps the answer lies in some advanced mathematical wizardry. Or in chiselling a 21st century version of Asimov’s Laws of Robotics into the chipsets at the heart of computer systems. Or in switching from “correlation logic” to “causation logic”, or some other kind of new paradigm in AI systems engineering.

Of course, I wish Mary and Flo well. But their ongoing research won’t, by itself, prevent lots of other people releasing their own unsafe AI first. Especially when these other engineers are in a hurry to win market share for their companies.

Indeed, the considerable effort being invested by various researchers and organisations in a search for a kind of fix for AI safety is, arguably, a distraction from a sober assessment of the bigger picture. Better technology, better product design, better mathematics, and better hardware can all be part of the full solution. But that full solution also needs, critically, to include aspects of organisational design, economic incentives, legal frameworks, and political oversight. That’s the argument I develop in my book. We ignore these broader forces at our peril.

Option 3: Humans merge with machines

If we can’t beat them, how about joining them?

If human minds are fused into silicon AI systems, won’t the good human sense of these minds counteract any bugs or design flaws in the silicon part of the hybrid formed?

With such a merger in place, human intelligence will automatically be magnified, as AI improves in capability. Therefore, we humans wouldn’t need to worry about being left behind. Right?

I see two big problems with this idea. First, so long as human intelligence is rooted in something like the biology of the brain, the mechanisms for any such merger may only allow relatively modest increases in human intelligence. Our biological brains would be bottlenecks that constrain the speed of progress in this hybrid case. Compared to pure AIs, the human-AI hybrid would, after all, be left behind in this intelligence race. So much for humans staying in control!

An even bigger problem is the realisation that a human with superhuman intelligence is likely to be at least as unpredictable and dangerous as an AI with superhuman intelligence. The magnification of intelligence will allow that superhuman human to do all kinds of things with great vigour – settling grudges, acting out fantasies, demanding attention, pursuing vanity projects, and so on. Recall: power tends to corrupt. Such a person would be able to destroy the earth. Worse, they might want to do so.

Another way to state this point is that, just because AI elements are included inside a person, that won’t magically ensure that these elements become benign, or are subject to the full control of the person’s best intentions. Consider as comparisons what happens when biological viruses enter a person’s body, or when a cancer grows there. In neither case does the intruding element lose its ability to cause damage, just on account of being part of a person who has humanitarian instincts.

This reminds me of the statement that is sometimes heard, in defence of accelerating the capabilities of AI systems: “I am not afraid of artificial intelligence. I am afraid of human stupidity”.

In reality, what we need to fear is the combination of imperfect AI and imperfect humanity.

The conclusion of this line of discussion is that we need to do considerably more than enable greater intelligence. We also need to accelerate greater wisdom – so that any beings with superhuman intelligence will operate truly beneficently.

Option 4: Greater wisdom

The cornerstone insight of ethics is that, just because we can do something, and indeed may even want to do that thing, it doesn’t mean we should do that thing.

Accordingly, human societies since prehistory have placed constraints on how people should behave.

Sometimes, moral sanction is sufficient: people constrain their actions in deference to public opinion. In other cases, restrictions are codified into laws and regulations.

Likewise, just because a corporation could boost its profits by releasing a new version of its AI software, that doesn’t mean it should release that software.

But what is the origin of these “should” imperatives? And how do we resolve conflicts, when two different groups of people champion two different sets of ethical intuitions?

Where can we find a viable foundation for ethical restrictions – something more solid than “we’ve always done things like this” or “this feels right to me” or “we need to submit to the dictates in our favourite holy scripture”?

Welcome to the world of philosophy.

It’s a world that, according to some observers, has made little progress over the centuries. People still argue over fundamentals. Deontologists square off against consequentialists. Virtue ethicists stake out a different position.

It’s a world in which it is easier to poke holes in the views held by others, rather than defending a consistent view of your own.

But it’s my position that the impending threat of cataclysmic AI impels us to reach a wiser agreement.

It’s like how the devastation of the Covid pandemic impelled society to find significantly quicker ways to manufacture, verify, and deploy vaccines.

It’s like how society can come together, remarkably, in a wartime situation, notwithstanding the divisions that previously existed.

In the face of the threats of technology beyond our control, minds should focus, with unprecedented clarity. We’ll gradually build a wider consensus in favour of various restrictions and, yes, in favour of various incentives.

What’s your reaction? Is option 4 simply naïve?

Practical steps forward

Rather than trying to “boil the ocean” of philosophical disputes over contrasting ethical foundations, we can, and should, proceed in a kaizen manner.

To start with, we can give our attention to specific individual questions:

What are the circumstances when we should welcome AI-powered facial recognition software, and when should we resist it?
What are the circumstances when we should welcome AI systems that supervise aspects of dangerous weaponry?
What are the circumstances that could transform AI-powered monitoring systems from dangerous to helpful?

As we reach some tentative agreements on these individual matters, we can take the time to highlight principles with potential wider applicability.

In parallel, we can revisit some of the agreements (explicit and implicit) for how we measure the health of society and the liberties of individuals:

The GDP (Gross Domestic Product) statistics that provide a perspective on economic activities
The UDHR (Universal Declaration of Human Rights) statement that was endorsed in the United Nations General Assembly in 1948.

I don’t deny it will be hard to build consensus. It will be even harder to agree how to enforce the guidelines arising – especially in light of the wretched partisan conflicts that are poisoning the political processes in a number of parts of the world.

But we must try. And with some small wins under our belt, we can anticipate momentum building.

These are some of the topics I cover in the closing chapters of The Singularity Principles:

I by no means claim to know all the answers.

But I do believe that these are some of the most important questions to address.

And to help us make progress, something that could help us is – you guessed it – AI. In the right circumstances, AI can help us think more clearly, and can propose new syntheses of our previous ideas.

Thus today’s AI can provide stepping stones to the design and deployment of better, safer, wiser AI tomorrow. That’s provided we maintain human oversight.

Footnotes

The image above includes a design by Pixabay user Alexander Antropov, used with thanks.

See also this article by Calum in Forbes, Taking Back Control Of The Singularity.

8 June 2022

Pre-publication review: The Singularity Principles

Filed under: books, Singularity, Singularity Principles — Tags: The Singularity Principles — David Wood @ 9:23 am

I’ve recently been concentrating on finalising the content of my forthcoming new book, The Singularity Principles.

The reasons why I see this book as both timely and necessary are explained in the extract, below, taken from the introduction to the book.

This link provides pointers to the full text of every chapter in the book. (Or use the links in the listing below of the extended table of contents.)

Please get in touch with me if you would prefer to read the pre-publication text in PDF format, rather than on the online HTML pages linked above.

At this stage, I will gratefully appreciate any feedback:

Aspects of the book that I should consider changing
Aspects of the book that you particularly like.

Feedback on any parts of the book will be welcome. It’s by no means necessary for you to read the entire text. (However, I hope you will find it sufficiently interesting that you will end up reading more than you originally planned…)

By the way, it’s a relatively short book, compared to some others I’ve written. The wordcount is a bit over 50 thousand words. That works out at around 260 pages of fairly large text on 5″x8″ paper.

I will also appreciate any commendations or endorsements, which I can include with the publicity material for the book, to encourage more people to pay attention to it.

The timescale I have in mind: I will release electronic and physical copies of the book some time early next month (July), followed up soon afterward by an audio version.

Therefore, if you’re thinking of dipping into any chapters to provide feedback and/or endorsements, the sooner the better!

Thanks in anticipation!

Preface

This book is dedicated to what may be the most important concept in human history, namely, the Singularity – what it is, what it is not, the steps by which we may reach it, and, crucially, how to make it more likely that we’ll experience a positive singularity rather than a negative singularity.

For now, here’s a simple definition. The Singularity is the emergence of Artificial General Intelligence (AGI), and the associated transformation of the human condition. Spoiler alert: that transformation will be profound. But if we’re not paying attention, it’s likely to be profoundly bad.

Despite the importance of the concept of the Singularity, the subject receives nothing like the attention it deserves. When it is discussed, it often receives scorn or ridicule. Alas, you’ll hear sniggers and see eyes rolling.

That’s because, as I’ll explain, there’s a kind of shadow around the concept – an unhelpful set of distortions that make it harder for people to fully perceive the real opportunities and the real risks that the Singularity brings.

These distortions grow out of a wider confusion – confusion about the complex interplay of forces that are leading society to the adoption of ever-more powerful technologies, including ever-more powerful AI.

It’s my task in this book to dispel the confusion, to untangle the distortions, to highlight practical steps forward, and to attract much more serious attention to the Singularity. The future of humanity is at stake.

Let’s start with the confusion.

Confusion, turbulence, and peril

The 2020s could be called the Decade of Confusion. Never before has so much information washed over everyone, leaving us, all too often, overwhelmed, intimidated, and distracted. Former certainties have dimmed. Long-established alliances have fragmented. Flurries of excitement have pivoted quickly to chaos and disappointment. These are turbulent times.

However, if we could see through the confusion, distraction, and intimidation, what we should notice is that human flourishing is, potentially, poised to soar to unprecedented levels. Fast-changing technologies are on the point of providing a string of remarkable benefits. We are near the threshold of radical improvements to health, nutrition, security, creativity, collaboration, intelligence, awareness, and enlightenment – with these improvements being available to everyone.

Alas, these same fast-changing technologies also threaten multiple sorts of disaster. These technologies are two-edged swords. Unless we wield them with great skill, they are likely to spin out of control. If we remain overwhelmed, intimidated, and distracted, our prospects are poor. Accordingly, these are perilous times.

These dual future possibilities – technology-enabled sustainable superabundance, versus technology-induced catastrophe – have featured in numerous discussions that I have chaired at London Futurists meetups going all the way back to March 2008.

As these discussions have progressed, year by year, I have gradually formulated and refined what I now call the Singularity Principles. These principles are intended:

To steer humanity’s relationships with fast-changing technologies,
To manage multiple risks of disaster,
To enable the attainment of remarkable benefits,
And, thereby, to help humanity approach a profoundly positive singularity.

In short, the Singularity Principles are intended to counter today’s widespread confusion, distraction, and intimidation, by providing clarity, credible grounds for hope, and an urgent call to action.

This time it’s different

I first introduced the Singularity Principles, under that name and with the same general format, in the final chapter, “Singularity”, of my 2021 book Vital Foresight: The Case for Active Transhumanism. That chapter is the culmination of a 642 page book. The preceding sixteen chapters of that book set out at some length the challenges and opportunities that these principles need to address.

Since the publication of Vital Foresight, it has become evident to me that the Singularity Principles require a short, focused book of their own. That’s what you now hold in your hands.

The Singularity Principles is by no means the only new book on the subject of the management of powerful disruptive technologies. The public, thankfully, are waking up to the need to understand these technologies better, and numerous authors are responding to that need. As one example, the phrase “Artificial Intelligence”, forms part of the title of scores of new books.

I have personally learned many things from some of these recent books. However, to speak frankly, I find myself dissatisfied by the prescriptions these authors have advanced. These authors generally fail to appreciate the full extent of the threats and opportunities ahead. And even if they do see the true scale of these issues, the recommendations these authors propose strike me as being inadequate.

Therefore, I cannot keep silent.

Accordingly, I present in this new book the content of the Singularity Principles, brought up to date in the light of recent debates and new insights. The book also covers:

Why the Singularity Principles are sorely needed
The source and design of these principles
The significance of the term “Singularity”
Why there is so much unhelpful confusion about “the Singularity”
What’s different about the Singularity Principles, compared to recommendations of other analysts
The kinds of outcomes expected if these principles are followed
The kinds of outcomes expected if these principles are not followed
How you – dear reader – can, and should, become involved, finding your place in a growing coalition
How these principles are likely to evolve further
How these principles can be put into practice, all around the world – with the help of people like you.

The scope of the Principles

To start with, the Singularity Principles can and should be applied to the anticipation and management of the NBIC technologies that are at the heart of the current, fourth industrial revolution. NBIC – nanotech, biotech, infotech, and cognotech – is a quartet of four interlinked technological disruptions which are likely to grow significantly stronger as the 2020s unfold. Each of these four technological disruptions has the potential to fundamentally transform large parts of the human experience.

However, the same set of principles can and should also be applied to the anticipation and management of the core technology that will likely give rise to a fifth industrial revolution, namely the technology of AGI (artificial general intelligence), and the rapid additional improvements in artificial superintelligence that will likely follow fast on the footsteps of AGI.

The emergence of AGI is known as the technological singularity – or, more briefly, as the Singularity.

In other words, the Singularity Principles apply both:

To the longer-term lead-up to the Singularity, from today’s fast-improving NBIC technologies,
And to the shorter-term lead-up to the Singularity, as AI gains more general capabilities.

In both cases, anticipation and management of possible outcomes will be of vital importance.

By the way – in case it’s not already clear – please don’t expect a clever novel piece of technology, or some brilliant technical design, to somehow solve, by itself, the challenges posed by NBIC technologies and AGI. These challenges extend far beyond what could be wrestled into submission by some dazzling mathematical wizardry, by the incorporation of an ingenious new piece of silicon at the heart of every computer, or by any other “quick fix”. Indeed, the considerable effort being invested by some organisations in a search for that kind of fix is, arguably, a distraction from a sober assessment of the bigger picture.

Better technology, better product design, better mathematics, and better hardware can all be part of the full solution. But that full solution also needs, critically, to include aspects of organisational design, economic incentives, legal frameworks, and political oversight. That’s the argument I develop in the chapters ahead.