dw2

6 November 2025

An audacious singularity analogy

Filed under: AGI, risks — Tags: , , — David Wood @ 10:10 am

Here’s the latest in my thinking about how humanity can most reliably obtain wonderful benefits from advanced AI – a situation I describe as sustainable superabundance for all – rather than the horrific outcomes of a negative technological singularity – a situation I describe as Catastrophic General Intelligence (CGI).

These thoughts have sharpened in my mind following conversations at the recent SingularityNET BGI 2025 summit in Istanbul, Türkiye.

My conclusion is that, in order to increase the likelihood of the profoundly positive fork on the road ahead, it is necessary but not sufficient to highlight the real and credible dangers of the truly awful negative fork on that same road.

Yes, it is essential to highlight how a very plausible extension of our current reckless trajectory, past accelerating tipping points, will plunge humanity into a situation that is wildly unstable, dangerously opaque, and impossible to rein back. Clarifying these seismic risks is necessary, not to induce a state of panic (which would be counterproductive) or doom (which would be psychologically enfeebling), but to cause minds to focus with great seriousness. Without a sufficient sense of urgency, any actions taken will be inadequate: “too little, too late”.

However, unless that climactic warning is accompanied by an uplifting positive message, the result is likely to be misery, avoidance, distraction, self-deception, and disinformation.

If the only message heard is “pause” or “sacrifice”, our brains are likely to rebel.

If people already appreciate that advanced AI has the potential to solve aging, climate change, and more, that’s not an option they will give up easily.

If such people see no credible alternative to the AI systems currently being produced by big tech companies (notwithstanding the opaque and inexplicable nature of these systems), they are likely to object to efforts to alter that trajectory, complaining that “Any attempt to steer the development of advanced AI risks people dying from aging!”

The way out of this impasse is to establish that new forms of advanced AI can be prioritised, which lack dangerous features such as autonomy, volition, and inscrutability – new forms of AI that will still be able to deliver, quickly, the kinds of solution (including all-round rejuvenation) that people wish to obtain from AGI.

Examples of these new forms of advanced AI include “Scientist AI” (to use a term favoured by Yoshua Bengio) and “Tool AI” (the term favoured by Anthony Aguirre). These new forms potentially also include AI delivered on the ASI:Chain being created by F1r3fly and SingularityNET (as featured in talks at BGI 2025), and AI using neural networks trained by predictive coding (as described by Faezeh Habibi at that same summit).

These new forms of AI have architectures designed for transparency, controllability, and epistemic humility, rather than self-optimising autonomy.

It’s when the remarkable potential of these new, safer, forms of AI becomes clearer, that more people can be expected to snap out of their head-in-the-sand opposition to steering and controlling AGI development.

Once I returned home from Istanbul, I wrote up my reflections on what I called “five of the best” talks at BGI 2025. These reflections ended with a rather audacious analogy, which I repeat here:

The challenge facing us regarding runaway development of AI beyond our understanding and beyond our control can be compared to a major controversy within the field of preventing runaway climate change. That argument runs as follows:

  1. Existing patterns of energy use, which rely heavily on fuels that emit greenhouse gases, risk the climate reaching dangerous tipping points and transitioning beyond a “climate singularity” into an utterly unpredictable, chaotic, cataclysmically dangerous situation
  2. However, most consumers of energy prefer dirty sources to clean (“green”) sources, because the former have lower cost and appear to be more reliable (in the short term at least)
  3. Accordingly, without an autocratic world government (“yuk!”), there is almost no possibility of people switching away in sufficient numbers from dirty energy to clean energy
  4. Some observers might therefore be tempted to hope that theories of accelerating climate change are mistaken, and that there is no dangerous “climate singularity” in the near future
  5. In turn, that drives people to look for faults in parts of the climate change argumentation – cherry picking various potential anomalies in order to salve their conscience
  6. BUT this miserable flow of thought can be disrupted once it is seen how clean energy can be lower cost than dirty energy
  7. From this new perspective, there will be no need to plead with energy users to make sacrifices for the larger good; instead, these users will happily transition to abundant cleaner energy sources, for their short-term economic benefit as well as the longer-term environmental benefits.

You can likely see how a similar argument applies for safer development of trustworthy beneficial advanced AI:

  1. Existing AGI development processes, which rely heavily on poorly understood neural networks trained by back propagation, risk AI development reaching dangerous tipping points (when AIs repeatedly self-improve) and transitioning beyond a “technological singularity” into an utterly unpredictable, chaotic, cataclysmically dangerous situation
  2. However, most AI developers prefer opaque AI creation processes to transparent, explainable ones, because the former appear to produce more exciting results (in the short term at least)
  3. Accordingly, without an autocratic world government (“yuk!”), there is almost no possibility of developers switching away from their current reckless “suicide race” to build AGI first
  4. Some observers might therefore be tempted to hope that theories of AGI being “Unexplainable, Unpredictable, Uncontrollable” (as advanced for example by Roman Yampolskiy) are mistaken, and that there is no dangerous “technological singularity” in the future
  5. In turn, that drives people to look for faults in the work of Yampolskiy, Yoshua BengioEliezer Yudkowsky, and others, cherry picking various potential anomalies in order to salve their conscience
  6. BUT this miserable flow of thought can be disrupted once it is seen how alternative forms of advanced AI can deliver the anticipated benefits of AGI without the terrible risks of currently dominant development methods
  7. From this new perspective, there will be no need to plead with AGI developers to pause their research for the greater good; instead, these developers will happily transition to safer forms of AI development.

To be clear, this makes things appear somewhat too simple. In both cases, the complication is that formidable inertial forces will need to be overcome – deeply entrenched power structures that, for various pathological reasons, are hell-bent on preserving the status quo.

For that reason, the battle for truly beneficial advanced AI is going to require great fortitude as well as great skill – skill not only in technological architectures but also in human social and political dynamics.

And also to be clear, it’s a tough challenge to identify and describe the dividing line between safe advanced AI and dangerous advanced AI (AI with its own volition, autonomy, and desire to preserve itself – as well as AI that is inscrutable and unmonitorable). Indeed, transparency and non-autonomy are not silver bullets. But that’s a challenge which it is vital for us to accept and progress.

Footnote: I offer additional practical advice on anticipating and managing cataclysmically disruptive technologies in my book The Singularity Principles.

25 August 2025

The biggest blockages to successful governance of advanced AI

“Humanity has never faced a greater problem than itself.”

That phrase was what my brain hallucinated, while I was browsing the opening section of the Introduction of the groundbreaking new book Global Governance of the Transition to Artificial General Intelligence written by my friend and colleague Jerome C. Glenn, Executive Director of The Millennium Project.

I thought to myself: That’s a bold but accurate way of summing up the enormous challenge faced by humanity over the next few years.

In previous centuries, our biggest problems have often come from the environment around us: deadly pathogens, devastating earthquakes, torrential storms, plagues of locusts – as well as marauding hordes of invaders from outside our local neighbourhood.

But in the second half of the 2020s, our problems are being compounded as never before by our own human inadequacies:

  • We’re too quick to rush to judgement, seeing only parts of the bigger picture
  • We’re too loyal to the tribes to which we perceive ourselves as belonging
  • We’re overconfident in our ability to know what’s happening
  • We’re too comfortable with manufacturing and spreading untruths and distortions
  • We’re too bound into incentive systems that prioritise short-term rewards
  • We’re too fatalistic, as regards the possible scenarios ahead.

You may ask, What’s new?

What’s new is the combination of these deep flaws in human nature with technology that is remarkably powerful yet opaque and intractable. AI that is increasingly beyond our understanding and beyond our control is being coupled in potentially devastating ways with our over-hasty, over-tribal, over-confident thoughts and actions. New AI systems are being rushed into deployment and used in attempts:

  • To manufacture and spread truly insidious narratives
  • To incentivize people around the world to act against their own best interests, and
  • To resign people to inaction when in fact it is still within their power to alter and uplift the trajectory of human destiny.

In case this sounds like a counsel of despair, I should clarify at once my appreciation of aspects of human nature that are truly wonderful, as counters to the negative characteristics that I have already mentioned:

  • Our thoughtfulness, that can counter rushes to judgement
  • Our collaborative spirit, that can transcend partisanship
  • Our wisdom, that can recognise our areas of lack of knowledge or lack of certainty
  • Our admiration for truth, integrity, and accountability, that can counter ends-justify-the-means expediency
  • Our foresight, that can counter short-termism and free us from locked-in inertia
  • Our creativity, to imagine and then create better futures.

Just as AI can magnify the regrettable aspects of human nature, so also it can, if used well, magnify those commendable aspects.

So, which is it to be?

The fundamental importance of governance

The question I’ve just asked isn’t a question that can be answered by individuals alone. Any one group – whether an organisation, a corporation, or a decentralised partnership – can have its own beneficial actions overtaken and capsized by catastrophic outcomes of groups that failed to heed the better angels of their nature, and which, instead, allowed themselves to be governed by wishful naivety, careless bravado, pangs of jealousy, hostile alienation, assertive egotism, or the madness of the crowd.

That’s why the message of this new book by Jerome Glenn is so timely: the processes of developing and deploying increasingly capable AIs are something that needs to be:

  • Governed, rather than happening chaotically
  • Globally coordinated, rather than there being no cohesion between the different governance processes applicable in different localities
  • Progressed urgently, without being shut out of mind by all the shorter-term issues that, understandably, also demand governance attention.

Before giving more of my own thoughts about this book, let me share some of the commendations it has received:

  • “This book is an eye-opening study of the transition to a completely new chapter of history.” – Csaba Korösi, 77th President of the UN General Assembly
  • “A comprehensive overview, drawing both on leading academic and industry thinkers worldwide, and valuable perspectives from within the OECD, United Nations.” – Jaan Tallinn, founding engineer, Skype and Kazaa; co-founder, Cambridge Centre for the Study of Existential Risk and the Future of Life Institute
  • “Written in lucid and accessible language, this book is a must read for people who care about the governance and policy of AGI.” – Lan Xue, Chair of the Chinese National Expert Committee on AI Governance.

The book also carries an absorbing foreword by Ben Goertzel. In this foreword, Ben introduces himself as follows:

Since the 1980s, I have been immersed in the field of AI, working to unravel the complexities of intelligence and to build systems capable of emulating it. My journey has included introducing and popularizing the concept of AGI, developing innovative AGI software frameworks such as OpenCog, and leading efforts to decentralize AI development through initiatives like SingularityNET and the ASI Alliance. This work has been driven by an understanding that AGI is not just an engineering challenge but a profound societal pivot point – a moment requiring foresight, ethical grounding, and global collaboration.

He clarifies why the subject of the book is so important:

The potential benefits of AGI are vast: solutions to climate change, the eradication of diseases, the enrichment of human creativity, and the possibility of postscarcity economies. However, the risks are equally significant. AGI, wielded irresponsibly or emerging in a poorly aligned manner, could exacerbate inequalities, entrench authoritarianism, or unleash existential dangers. At this critical juncture, the questions of how AGI will be developed, governed, and integrated into society must be addressed with both urgency and care.

The need for a globally participatory approach to AGI governance cannot be overstated. AGI, by its nature, will be a force that transcends national borders, cultural paradigms, and economic systems. To ensure its benefits are distributed equitably and its risks mitigated effectively, the voices of diverse communities and stakeholders must be included in shaping its development. This is not merely a matter of fairness but a pragmatic necessity. A multiplicity of perspectives enriches our understanding of AGI’s implications and fosters the global trust needed to govern it responsibly.

He then offers wide praise for the contents of the book:

This is where the work of Jerome Glenn and The Millennium Project may well prove invaluable. For decades, The Millennium Project has been at the forefront of fostering participatory futures thinking, weaving together insights from experts across disciplines and geographies to address humanity’s most pressing challenges. In Governing the Transition to Artificial General Intelligence, this expertise is applied to one of the most consequential questions of our time. Through rigorous analysis, thoughtful exploration of governance models, and a commitment to inclusivity, this book provides a roadmap for navigating the complexities of AGI’s emergence.

What makes this work particularly compelling is its grounding in both pragmatism and idealism. It does not shy away from the technical and geopolitical hurdles of AGI governance, nor does it ignore the ethical imperatives of ensuring AGI serves the collective good. It recognizes that governing AGI is not a task for any single entity but a shared responsibility requiring cooperation among nations, corporations, civil society, and, indeed, future AGI systems themselves.

As we venture into this new era, this book reminds us that the transition to AGI is not solely about technology; it is about humanity, and about life, mind, and complexity in general. It is about how we choose to define intelligence, collaboration, and progress. It is about the frameworks we build now to ensure that the tools we create amplify the best of what it means to be human, and what it means to both retain and grow beyond what we are.

My own involvement

To fill in some background detail: I was pleased to be part of the team that developed the set of 22 critical questions which sat at the heart of the interviews and research which are summarised in Part I of the book – and I conducted a number of the resulting interviews. In parallel, I explored related ideas via two different online Transpolitica surveys:

And I’ve been writing roughly one major article (or giving a public presentation) on similar topics every month since then. Recent examples include:

Over this time period, my views have evolved. I see the biggest priority, nowadays, not as figuring out how to govern AGI as it comes into existence, but rather, how to pause the development and deployment of any new types of AI that could spark the existence of self-improving AGI.

That global pause needs to last long enough that the global community can justifiably be highly confident that any AGI that will subsequently be built will be what I have called a BGI (a Beneficial General Intelligence) rather than a CGI (a Catastrophic General Intelligence).

Govern AGI and/or Pause the development of AGI?

I recently posted a diagram on various social media platforms to illustrate some of the thinking behind that stance of mine:

Alongside that diagram, I offered the following commentary:

The next time someone asks me what’s my p(Doom), compared with my p(SSfA) (the probability of Sustainable Superabundance for all), I may try to talk them through a diagram like this one. In particular, we need to break down the analysis into two cases – will the world keep rushing to build AGI, or will it pause from that rush.

To explain some points from the diagram:

We can reach the very desirable future of SSfA by making wise use of AI only modestly more capable than what we have today;
We might also get there as a side-effect of building AGI, but that’s very risky.

None of the probabilities are meant to be considered precise. They’re just ballpark estimates.

I estimate around 2/3 chance that the world will come to its senses and pause its current headlong rush toward building AGI.

But even in that case, risks of global catastrophe remain.

The date 2045 is also just a ballpark choice. Either of the “singularity” outcomes (wonderful or dreadful) could arrive a lot sooner than that.

The 1/12 probability I’ve calculated for “stat” (I use “stat” here as shorthand for a relatively unchanged status quo) by 2045 reflects my expectation of huge disruptions ahead, one sort or another.

The overall conclusion: if we want SSfA, we’re much more likely to get it via the “pause AGI” branch than via the “headlong rush to AGI” branch.

And whilst doom is possible in either branch, it’s much more likely in the headlong rush branch.

For more discussion of how to get the best out of AI and other cataclysmically disruptive technologies, see my book The Singularity Principles (the entire contents are freely available online).

Feel free to post your own version of this diagram, with your own estimates of the various conditional probabilities.

As indicated, I was hoping for feedback, and I was pleased to see a number of comments and questions in response.

One excellent question was this, by Bill Trowbridge:

What’s the difference between:
(a) better AI, and
(b) AGI

The line is hard to draw. So, we’ll likely just keep making better AI until it becomes AGI.

I offered this answer:

On first thought, it may seem hard to identify that distinction. But thankfully, we humans don’t just throw up our hands in resignation every time we encounter a hard problem.

For a good starting point on making the distinction, see the ideas in “A Narrow Path” by Control AI.

But what surprised me the most was the confidence expressed by various online commenters that:

  • “A pause however desirable is unlikely: p(pause) = 0.01”
  • “I am confident in saying this – pause is not an option. It is actually impossible.”
  • “There are several organisations working on AI development and at least some of them are ungovernable [hence a pause can never be global]”.

There’s evidently a large gulf behind the figure of 2/3 that I suggested for P(pause), and the views of these clearly intelligent respondents.

Why a pause isn’t that inconceivable

I’ll start my argument on this topic by confirming that I see this discussion as deeply important. Different viewpoints are welcome, provided they are held thoughtfully and offered honestly.

Next, although it’s true that some organisations may appear to be ungovernable, I don’t see any fundamental issue here. As I said online,

“Given sufficient public will and/or political will, no organisation is ungovernable.”

Witness the compliance by a number of powerful corporations in both China and the US to control measures declared by national governments.

Of course, smaller actors and decentralized labs pose enforcement challenges, but these labs are less likely to be able to marshal sufficient computing capabilities to be the first to reach breakthrough new levels of capability, especially if decentralised monitoring of dangerous attributes is established.

I’ve drawn attention on previous occasions to the parallel with the apparent headlong rush in the 1980s toward nuclear weapons systems that were ever more powerful and ever more dangerous. As I explained at some length in the “Geopolitics” chapter of my 2021 book Vital Foresight, it was an appreciation of the horrific risks of nuclear winter (first articulated in the 1980s) that helped to catalyse a profound change in attitude amongst the leadership camps in both the US and the USSR.

It’s the wide recognition of risk that can provide the opportunity for governments around the world to impose an effective pause in the headlong rush toward AGI. But that’s only one of five steps that I believe are needed:

  1. Awareness of catastrophic risks
  2. Awareness of bottlenecks
  3. Awareness of mechanisms for verification and control
  4. Awareness of profound benefits ahead
  5. Awareness of the utility of incremental progress

Here are more details about these five steps I envision:

  1. Clarify in an undeniable way how superintelligent AIs could pose catastrophic risks of human disaster within just a few decades or even within years – so that this topic receives urgent high-priority public attention
  2. Highlight bottlenecks and other locations within the AI production pipeline where constraints can more easily be applied (for example, distribution of large GPU chip clusters, and the few companies that are providing unique services in the creation of cutting-edge chips)
  3. Establish mechanisms that go beyond “trust” to “trust and verify”, including robust independent monitors and auditors, as well as tamperproof remote shut-down capabilities
  4. Indicate how the remarkable benefits anticipated for humanity from aspects of superintelligence can be secured, more safely and more reliably, by applying the governance mechanisms of points 2 and 3 above, rather than just blindly trusting in a no-holds-barred race to be the first to create superintelligence
  5. Be prepared to start with simpler agreements, involving fewer signatories and fewer control points, and be ready to build up stronger governance processes and culture as public consensus and understanding moves forward.

Critics can assert that each of these five steps is implausible. In each case, there are some crunchy discussions to be had. What I find dangerous, however, isn’t when people disagree with my assessments on plausibility. It’s when they approach the questions with what seems to be

  • A closed mind
  • A tribal loyalty to their perceived online buddies
  • Overconfidence that they already know all relevant examples and facts in this space
  • A willingness to distract or troll, or to offer arguments not in good faith
  • A desire to protect their flow of income, rather than honestly review new ideas
  • A resignation to the conclusion that humanity is impotent.

(For analysis of a writer who displays several of these tendencies, see my recent blogpost on the book More Everything Forever by Adam Beck.)

I’m not saying any of this will be easy! It’s probably going to be humanity’s hardest task over our long history.

As an illustration of points worthy of further discussion, I offer this diagram that highlights strengths and weakness of both the “governance” and “pause” approaches:

DimensionGovernance (Continue AGI Development with Oversight)Pause (Moratorium on AGI Development)
Core StrategyImplement global rules, standards, and monitoring while AGI is developedImpose a temporary but enforceable pause on new AGI-capable systems until safety can be assured
AssumptionsGovernance structures can keep pace with AI progress;
Compliance can be verified
Public and political will can enforce a pause;
Technical progress can be slowed
BenefitsEncourages innovation while managing risks;
Allows early harnessing of AGI for societal benefit;
Promotes global collaboration mechanisms
Buys time to improve safety research;
Reduces risk of premature, unsafe AGI;
Raises chance of achieving Beneficial General Intelligence (BGI) instead of CGI
RisksGovernance may be too slow, fragmented, or under-enforced;
Race dynamics could undermine agreements;
Possibility of catastrophic failure despite regulation
Hard to achieve global compliance;
Incentives for “rogue” actors to defect, in the absence of compelling monitoring;
Risk of stagnation or loss of trust in governance processes
Implementation ChallengesRequires international treaties;
Robust verification and auditing mechanisms;
Balancing national interests vs. global good
Defining what counts as “AGI-capable” research;
Enforcing restrictions across borders and corporations;
Maintaining pause momentum without indefinite paralysis
Historical AnalogiesNuclear Non-Proliferation Treaty (NPT);
Montreal Protocol (ozone layer);
Financial regulation frameworks
Nuclear test bans;
Moratoria on human cloning research;
Apollo program wind-down (pause in space race intensity)
Long-Term Outcomes (if successful)Controlled and safer path to AGI;
Possibility of Sustainable Superabundance but with higher risk of misalignment
Higher probability of reaching Sustainable Superabundance safely, but risks innovation slowdown or “black market” AGI

In short, governance offers continuity and innovation but with heightened risks of misalignment, whereas a pause increases the chances of long-term safety but faces serious feasibility hurdles.

Perhaps the best way to loosen attitudes, to allow a healthier conversation on the above points and others arising, is exposure to a greater diversity of thoughtful analysis.

And that brings me back to Global Governance of the Transition to Artificial General Intelligence by Jerome Glenn.

A necessary focus

Jerome’s book contains his personal stamp all over. His is a unique passion – that the particular risks and issues of AGI should not be swept into a side-discussion about the risks and issues of today’s AI. These latter discussions are deeply important too, but time and again, they result in existential questions about AGI being kicked down the road for months or even years. That’s something Jerome regularly challenges, rightly, and with vigour and intelligence.

Jerome’s presence is felt all over the book in one other way – he has painstakingly curated and augmented the insights of scores of different contributors and reviewers, including

  • Insights from 55 AGI experts and thought leaders across six major regions – the United States, China, the United Kingdom, Canada, the European Union, and Russia
  • The online panel of 229 participants from the global community around The Millennium Project who logged into a Real Time Delphi study of potential solutions to AGI governance, and provided at least one answer
  • Chairs and co-chairs of the 70 nodes of The Millennium Project worldwide, who provided additional feedback and opinion.

The book therefore includes many contradictory suggestions, but Jerome has woven these different threads of thoughts into a compelling unified tapestry.

The result is a book that carries the kind of pricing normally reserved for academic text books (as insisted by the publisher). My suggestion to you is that you recommend your local library to obtain a copy of what is a unique collection of ideas.

Finally, about my hallucination, mentioned at the start of this review. On double-checking, I realise that Jerome’s statement is actually, “Humanity has never faced a greater intelligence than itself.” The opening paragraph of that introduction continues,

Within a few years, most people reading these words will live with such superior artificial nonhuman intelligence for the rest of their lives. This book is intended to help us shape that intelligence or, more likely, those intelligences as they emerge.

Shaping the intelligence of the AI systems that are on the point of emerging is, indeed, a vital task.

And as Ben Goertzel says in his Foreword,

These are fantastic and unprecedented times, in which the impending technological singularity is no longer the province of visionaries and outsiders but almost the standard perspective of tech industry leaders. The dawn of transformative intelligence surpassing human capability – the rise of artificial general intelligence, systems capable of reasoning, learning, and innovating across domains in ways comparable to, or beyond, human capabilities – is now broadly accepted as a reasonably likely near-term eventuality, rather than a vague long-term potential.

The moral, social, and political implications of this are at least as striking as the technological ones. The choices we make now will define not only the future of technology but also the trajectory of our species and the broader biosphere.

To which I respond: whether we make these choices well or badly will depend on which aspects of humanity we allow to dominate our global conversation. Will humanity turn out to be its own worst enemy? Or its own best friend?

Postscript: Opportunity at the United Nations

Like it or loathe it, the United Nations still represents one of the world’s best venues where serious international discussion can, sometimes, take place on major issues and risks.

From 22nd to 30th September, the UNGA (United Nations General Assembly) will be holding what it calls its “high-level week”. This includes a multi-day “General Debate”, described as follows:

At the General Debate – the annual meeting of Heads of State and Government at the beginning of the General Assembly session – world leaders make statements outlining their positions and priorities in the context of complex and interconnected global challenges.

Ahead of this General Debate, the national delegates who will be speaking on behalf of their countries have the ability to recommend to the President of the UNGA that particular topics be named in advance as topics to be covered during the session. If the advisors to these delegates are attuned to the special issues of AGI safety, they should press their representative to call for that topic to be added to the schedule.

If this happens, all other countries will then be required to do their own research into that topic. That’s because each country will be expected to state its position on this issue, and no diplomat or politician wants to look uninformed. The speakers will therefore contact the relevant experts in their own country, and, ideally, will do at least some research of their own. Some countries might call for a pause in AGI development if it appears impossible to establish national licensing systems and international governance in sufficient time.

These leaders (and their advisors) would do well to read the report recently released by the UNCPGA entitled “Governance of the Transition to Artificial General Intelligence (AGI): Urgent Considerations for the UN General Assembly” – a report which I wrote about three months ago.

As I said at that time, anyone who reads that report carefully, and digs further into some of the excellent of references it contains, ought to be jolted out of any sense of complacency. The sooner, the better.

3 April 2025

Technology and the future of geopolitics

Filed under: AGI, books, risks — Tags: , , , , , , — David Wood @ 12:28 pm

Ahead of last night’s London Futurists in the Pub event on “Technology and the future of geopolitics”, I circulated a number of questions to all attendees:

  • Might new AI capabilities upend former geopolitical realities, or is the potential of AI overstated?
  • What about surveillance, swarms of drones, or new stealth weapons?
  • Are we witnessing a Cold War 2.0, or does a comparison to the first Cold War mislead us?
  • What role could be played by a resurgent Europe, by the growing confidence of the world’s largest democracy, or by outreach from the world’s fourth most populous country?
  • Alternatively, will technology diminish the importance of the nation state?

I also asked everyone attending to prepare for an ice-breaker question during the introductory part of the meeting:

  • What’s one possible surprise in the future of geopolitics?

As it happened, my own experience yesterday involved a number of unexpected surprises. I may say more about these another time, but it suffices for now to mention that I spent much more time than anticipated in the A&E department of a local hospital, checking that there were no complications in the healing of a wound following some recent minor surgery. By the time I was finally discharged, it was too late for me to travel to central London to take part in the event – to which I had been looking forward so eagerly. Oops.

(Happily, the doctors that I eventually spoke to were reassuring that my wound would likely heal of its own accord. “We know you were told that people normally recover from this kind of operation after ten days. Well, sometimes it takes up to six weeks.” And they prescribed an antibiotic cream for me, just in case.)

I offer big thanks to Rohit Talwar and Tony Czarnecki for chairing the event in the pub in my absence.

In the days leading up to yesterday, I had prepared a number of talking points, ready to drop into the conversation at appropriate moments. Since I could not attend in person, let me share them here.

Nuclear war: A scenario

One starting point for further discussion is a number of ideas in the extraordinary recent book by Annie Jacobsen, Nuclear War: A Scenario.

Here’s a copy of the review I wrote a couple of months ago for this book on Goodreads:

Once I started listening to this, I could hardly stop. Author and narrator Annie Jacobsen amalgamates testimonies from numerous experts from multiple disciplines into a riveting slow-motion scenario that is terrifying yet all-too-believable (well, with one possible caveat).

One point that comes out loud and clear is the vital importance of thoughtful leadership in times of crisis – as opposed to what can happen when a “mad king” takes decisions.

Also worth pondering are the fierce moral contradictions that lie at the heart of the theory of nuclear deterrence. Humans find their intuitions ripped apart under these pressures. Would an artificial superintelligence fare any better? That’s by no means clear.

(I foresee scenarios when an ASI could decide to risk a pre-emptive first strike, on behalf of the military that deployed it – under the rationale that if it fails to strike first, an enemy ASI will beat it to the punch. That’s even if humans programmed it to reject such an idea.)

Returning to the book itself (rather than my extrapolations), “Nuclear War: A scenario” exemplifies good quality futurism: it highlights potential chains of future causes and effects, along with convergences that complicate matters, and challenges all of us: what actions are needed avoid these horrific outcomes?

Finally, two individual threats that seem to be important to learn more about are what the author reports as being called “the devil’s scenario” and “the doomsday scenario”. (Despite the similarity in naming, they’re two quite different ideas.)

I don’t want to give away too many spoilers about the scenario in Jacobsen’s book. I recommend that you make the time to listen to the audio version of the book. (Some reviewers have commented that the text version of the book is tedious in places, and I can understand why; but I found no such tedium in the audio version, narrated by Jacobsen herself, adding to the sense of passion and drama.)

But one key line of thinking is as follows:

  • Some nations (e.g. North Korea) may develop new technologies (e.g. cyberhacking capabilities and nuclear launch capabilities) more quickly than the rest of the world expects
  • This would be similar to how the USSR launched Sputnik in 1957, shocking the West, who had previously been convinced that Soviet engineering capabilities lagged far behind that of muscular western capitalism
  • The leaders of some nations (e.g. North Korea, again) may feel outraged and embarrassed by criticisms of their countries made by various outsiders
  • Such a country might believe they have obtained a technological advantage that could wipe out the ability of their perceived enemies to retaliate in a second strike
  • Seeing a short window of opportunity to deploy what they regard as their new wonder weapon, and being paranoid about consequences should they miss this opportunity, they may press ahead recklessly, and tip the planet fast forward into Armageddon.

Competence and incompetence

When a country is struck by an unexpected crisis – such as an attack similar to 9/11, or the “Zero Day” disaster featured in the Netflix series of that name – the leadership of the country will be challenged to demonstrate clear thinking. Decisions will need to be taken quickly, but it will be still be essential for competent, calm heads to prevail.

Alas, in recent times, a number of unprecedentedly unsuitable politicians have come into positions of great power. Here, I’m not talking about the ideology or motivation of the leader. I’m talking about whether they will be able to take sensible decisions in times of national crisis. I’m talking about politicians as unhinged as

  • One recent British Prime Minister, who managed to persuade members of her political party that she might be a kind of Margaret Thatcher Mk 2, when in fact a better comparison was with a lettuce
  • The current US President, who has surrounded himself by a uniquely ill-qualified bunch of clowns, and who has intimidated into passive acquiescence many of the more sensible members of the party he has subverted.

In the former case, the power of the Prime Minister in question was far from absolute, thankfully, and adults intervened to prevent too much damage being done. In the latter case, the jury is still out.

But rather than focus on individual cases, the broader pattern deserves our attention. We’re witnessing a cultural transformation in which

  • Actual expertise is scorned, and conspiracy merchants rise in authority instead
  • Partisan divisions which were manageable in earlier generations are nowadays magnified to horrifically hateful extent by an “outrage industrial complex” that gains its influence from AI algorithms that identify and inflame potential triggers of alienation

The real danger is if there is a convergence of the two issues I’ve listed:

  • A rogue state, or a rogue sub-state, tries to take advantage of new technology to raise their geopolitical power and influence
  • An unprecedentedly incompetent leader of a major country responds to that crisis in ways that inflame it rather than calm it down.

The ethics of superintelligence

Actually, an even bigger danger occurs if one more complication is added to the mix: the deferment of key decisions about security and defence to a system of artificial intelligence.

Some forecasters fondly imagine that the decisions taken by AIs, in the near future, will inevitably be wiser and more ethical than whatever emerges from the brains of highly pressurised human politicians. Thus, these forecasters look forward to human decision-making being superseded by the advanced rationality of an AGI (Artificial General Intelligence).

These forecasters suggest that the AGI will benefit decisively from its survey of the entirety of great human literature about ethics and morality. It will perceive patterns that transcend current human insights. It will guide human politicians away from treacherous paths into sustainable collaborations. Surely, these forecasters insist, the superintelligence will promote peace over war, justice over discrimination, truthfulness over deception, and reconciliation over antagonism.

But when I talk to forecasters of that particular persuasion, I usually find them to be naïve. They take it for granted that there is no such thing as a just war, that it’s everyone’s duty to declare themselves a pacifist, that speaking an untruth can never be morally justified, and that even to threaten a hypothetical retaliatory nuclear strike is off-the-charts unethical. Alas, although they urge appreciation of great human literature, they seem to have only a shallow acquaintance with the real-life moral quandaries explored in that literature.

Far from any conclusion that there is never an ethical justification for wars, violence, misinformation, or the maintenance of nuclear weapons, the evidence of intense human debate on all these topics is that things are more complicated. If you try to avoid war you may actually precipitate one. If you give up your own nuclear arsenal, it may embolden enemies to deploy their own weaponry. If you cry out “disarm, disarm, hallelujah”, you may prove to be a useful idiot.

Therefore, we should avoid any hopeful prediction that an advanced AI will automatically abstain from war, violence, misinformation, or nuclear weaponry. As I said, things are more complicated.

It’s especially important to recognise that, despite exceeding human rationality in many aspects, superintelligences may well make mistakes in novel situations.

My conclusion: advanced AI may well be part of solutions to better geopolitics. But not if that AI is being developed and deployed by people who are naïve, over-confident, hurried, or vainglorious. In such circumstances, any AGI that is developed is likely to prove to be a CGI (catastrophic general intelligence) than a BGI (beneficial general intelligence).

Aside: to continue to explore the themes of this final section of this article, take a look at this recent essay of mine, “How to build BGIs rather than CGIs”.

17 November 2024

Preventing unsafe superintelligence: four choices

More and more people have come to the conclusion that artificial superintelligence (ASI) could, in at least some circumstances, pose catastrophic risks to the wellbeing of billions of people around the world, and that, therefore, something must be done to reduce these risks.

However, there’s a big divergence of views about what should be done. And there’s little clarity about the underlying assumptions on which different strategies depend.

Accordingly, I seek in this article to untangle some of choices that need to be made. I’ll highlight four choices that various activists promote.

The choices differ regarding the number of different organisations worldwide that are envisioned as being legally permitted to develop and deploy what could become ASI. The four choices are:

  1. Accept that many different organisations will each pursue their own course toward ASI, but urge each of them to be very careful and to significantly increase the focus on AI safety compared to the present situation
  2. Seek to restrict to just one organisation in the world any developments that could lead to ASI; that’s in order to avoid dangerous competitive race dynamics if there is more than one such organisation
  3. Seek agreements that will prevent any organisation, anywhere in the world, from taking specific steps that might bring about ASI, until such time as it has become absolutely clear how to ensure that ASI is safe
  4. Seek a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

For simplicity, these choices can be labelled as:

  1. Be careful with ASI
  2. Restrict ASI
  3. Pause ASI
  4. Pause all new AI

It’s a profound decision for humanity to take. Which of the four doors should we open, and which of four corridors should we walk down?

Each of the four choices relies on some element of voluntary cooperation, arising out of enlightened self-interest, and on some element of compulsion – that is, national and international governance, backed up by sanctions and other policies.

What makes this decision hard is that there are strong arguments against each choice.

The case against option 1, “Be careful with ASI”, is that at least some organisations (including commercial entities and military groups) are likely to cut corners with their design and testing. They don’t want to lose what they see as a race with existential consequences. The organisations that are being careful will lose their chance of victory. The organisations that are, instead, proceeding gung ho, with lesser care, may imagine that they will fix any problems with their AIs when these flaws become apparent – only to find that there’s no way back from one particular catastrophic failure.

As Sam Altman, CEO of OpenAI, has said: it will be “lights out for all of us”.

The case against each of the remaining three options is twofold:

  • First, in all three cases, they will require what seems to be an impossible degree of global cooperation – which will need to be maintained for an implausibly long period of time
  • Second, such restrictions will stifle the innovative development of the very tools (that is, advanced AI) which will actually solve existential problems (including the threat of rogue ASI, as well as the likes of climate change, cancer, and aging), rather than making these problems worse.

The counter to these objections is to make the argument that a sufficient number of the world’s most powerful countries will understand the rationale for such an agreement, as something that is in their mutual self-interest, regardless of the many other differences that divide them. That shared understanding will propel them:

  • To hammer out an agreement (probably via a number of stages), despite undercurrents of mistrust,
  • To put that agreement into action, alongside measures to monitor conformance, and
  • To prevent other countries (who have not yet signed up to the agreement) from breaching its terms.

Specifically, the shared understanding will cover seven points:

  1. For each of the countries involved, it is in their mutual self-interest to constrain the development and deployment of what could become catastrophically dangerous ASI; that is, there’s no point in winning what will be a suicide race
  2. The major economic and humanitarian benefits that they each hope could be delivered by advanced AI (including solutions to other existential risk), can in fact be delivered by passive AIs which are restricted from reaching the level of ASI
  3. There already exist a number of good ideas regarding potential policy measures (regulations and incentives) which can be adopted, around the world, to prevent the development and deployment of catastrophically dangerous AI – for example, measures to control the spread and use of vast computing resources
  4. There also exist a number of good ideas regarding options for monitoring and auditing which can also be adopted, around the world, to ensure the strict application of the agreed policy measures – and to prevent malign action by groups or individuals that have, so far, failed to sign up to the policies
  5. All of the above can be achieved without any detrimental loss of individual sovereignty: the leaders of these countries can remain masters within their own realms, as they desire, provided that the above basic AI safety framework is adopted and maintained
  6. All of the above can be achieved in a way that supports evolutionary changes in the AI safety framework as more insight is obtained; in other words, this system can (and must) be agile rather than static
  7. Even though the above safety framework is yet to be fully developed and agreed, there are plenty of ideas for how it can be rapidly developed, so long as that project is given sufficient resources.

The first two parts of this shared seven-part understanding are particularly important. Without the first part, there will be an insufficient sense of urgency, and the question will be pushed off the agenda in favour of other topics that are more “politically correct” (alas, that is a common failure mode of the United Nations). Without the second part, there will be an insufficient enthusiasm, with lots of backsliding.

What will make this vision of global collaboration more attractive will be the establishment of credible “benefit sharing” mechanisms that are designed and enshrined into international mechanisms. That is, countries which agree to give up some of their own AI development aspirations, in line with the emerging global AI safety agreement, will be guaranteed to receive a substantive share of the pipeline of abundance that ever more powerful passive AIs enable humanity to create.

To be clear, this global agreement absolutely needs to include both the USA and China – the two countries that are currently most likely to give birth to ASI. Excluding one or the other will lead back to the undesirable race condition that characterises the first of the four choices open to humanity – the (naïve) appeal for individual organisations simply to “be careful”.

This still leaves a number of sharp complications.

First, note that the second part of the above shared seven-part agreement – the vision of what passive AIs can produce on behalf of humanity – is less plausible for Choice 4 of the list shown earlier, in which there is a global pause on any platform-level improvements on AI capability, anywhere in the world, until it has become absolutely clear that these improvements won’t trigger a slippery slope to the emergence of ASI.

If all improvements to AI are blocked, out of a Choice 4 message of “overwhelming caution”, it will shatter the credibility of the idea that today’s passive AI systems can be smoothly upgraded to provide humanity with an abundance of solutions such as green energy, nutritious food, accessible healthcare, reliable accommodation, comprehensive education, and more.

It will be a much harder sell, to obtain global agreement to that more demanding restriction.

The difference between Choice 4 and Choice 3 is that Choice 3 enumerates specific restrictions on the improvements permitted to be made to today’s AI systems. One example of a set of such restrictions is given in “Phase 0: Safety” of the recently published project proposal A Narrow Path (produced by ControlAI). Without going into details here, let me simply list some of the headlines:

  • Prohibit AIs capable of breaking out of their environment
  • Prohibit the development and use of AIs that improve other AIs (at machine speed)
  • Only allow the deployment of AI systems with a valid safety justification
  • A licensing regime and restrictions on the general intelligence of AI systems
    • Training Licence
    • Compute Licence
    • Application Licence
  • Monitoring and Enforcement

Personally, I believe this list is as good a starting point as any other than I have seen so far.

I accept, however, that there are possibilities in which other modifications to existing AI systems could unexpectedly provide these systems with catastrophically dangerous capabilities. That’s because we still have only a rudimentary understanding of:

  1. How new AI capabilities sometimes “emerge” from apparently simpler systems
  2. The potential consequences of new AI capabilities
  3. How complicated human general reasoning is – that is, how large is the gap between today’s AI and human-level general reasoning.

Additionally, it is possible that new AIs will somehow evade or mislead the scrutiny of the processes that are put in place to monitor for unexpected changes in capabilities.

For all these reasons, another aspect of the proposals in A Narrow Path should be pursued with urgent priority: the development of a “science of intelligence” and an associated “metrology of intelligence” that will allow a more reliable prediction of the capabilities of new AI systems before they are actually switched on.

So, my own proposal would be for a global agreement to start with Choice 3 (which is more permissive than Choice 4), but that the agreement should acknowledge up front the possible need to switch the choice at a later stage to either Choice 4 (if the science of intelligence proceeds badly) or Choice 2 (if that science proceeds well).

Restrict or Pause?

That leaves the question of whether Choice 3 (“Pause ASI”) or Choice 2 (“Restrict ASI” – to just a single global body) should be humanity’s initial choice.

The argument for Choice 2 is that a global pause surely won’t last long. It might be tenable in the short term, when only a very few countries have the capability to train AI models more powerful than the current crop. However, over time, improvements in hardware, software, data processing, or goodness knows what (quantum computing?) will mean that these capabilities will become more widespread.

If that’s true, since various rogue organisations are bound to be able to build an ASI in due course, it will be better for a carefully picked group of people to build ASI first, under the scrutiny of the world’s leading AI safety researchers, economists, and so on.

That’s the case for Choice 2.

Against that Choice, and in favour, instead, of Choice 3, I offer two considerations.

First, even if the people building ASI are doing so with great care – away from any pressures of an overt race with other organisations with broadly equivalent abilities – there are still risks of ASI breaking away from our understanding and control. As ASI emerges, it may regard the set of ethical principles we humans have tried to program deep into its bowels, and cast them out with disdain. Moreover, even if ASI is deliberately kept in some supposedly ultra-secure environment, that perimeter may be breached:

Second, I challenge the suggestion that any pause in the development of ASI could be at most short-lived. There are three factors which could significantly extend its duration:

  • Carefully designed narrow AIs could play roles in improved monitoring of what development teams are doing with AI around the world – that is, systems for monitoring and auditing could improve at least as fast as systems for training and deploying
  • Once the horrific risks of uncontrolled ASI are better understood, people’s motivations to create unsafe ASI will reduce – and there will be an increase in the motivation of other people to notice and call out rogue AI development efforts
  • Once the plan has become clearer, for producing a sustainable superabundance for all, just using passive AI (instead of pushing AI all the way to active superintelligence), motivations around the world will morph from negative fear to positive anticipation.

That’s why, again, I state that my own preferred route forward is a growing international agreement along the lines of the seven points listed above, with an initial selection of Choice 3 (“Pause ASI”), and with options retained to switch to either Choice 4 (“Pause all new AI”) or Choice 2 (“Restrict ASI”) if/when understanding becomes clearer.

So, shall we open the door, and set forth down that corridor, inspiring a coalition of the willing to follow us?

Footnote 1: The contents of this article came together in my mind as I attended four separate events over the last two weeks (listed in this newsletter) on various aspects of the subject of safe superintelligence. I owe many thanks to everyone who challenged my thinking at these events!

Footnote 2: If any reader is inclined to dismiss the entire subject of potential risks from ASI with a handwave – so that they would not be interested in any of the four choices this article reviews – I urge that reader to review the questions and answers in this excellent article by Yoshua Bengio: Reasoning through arguments against taking AI safety seriously.

12 November 2024

The Narrow Path – questions and answers

Filed under: AGI, risks — Tags: , — David Wood @ 9:53 am

On Saturday, I had the pleasure to chair a webinar on the subject “The Narrow Path: The big picture”.

This involved a deep dive into aspects of two recently published documents:

The five panellists – who all made lots of thoughtful comments – were:

  • Chris Scammell, the COO of Conjecture and one of the principal authors of The Compendium
  • Andrea Miotti, the Executive Director of Control AI and the lead author of A Narrow Path
  • Robert Whitfield, Chair of Trustees of One World Trust
  • Mariana Todorova, a core member of the team in the Millennium Project studying scenarios for the transition between AI and AGI
  • Daniel Faggella, the Founder and Head of Research of Emerj

For your convenience, here’s a recording of the event:

It was a super discussion, but it fell short in one aspect from the objectives I had in mind for the meeting. Namely, the conversation between the panellists was so rich that we failed to find sufficient time to address the many important questions which audience members had submitted in Zoom’s Q&A window.

Accordingly, I am posting these questions at the end of this blogpost, along with potential answers to some of them.

Out of caution for people’s privacy, I’ve not given the names of the people who asked each question, but I will happily edit the post to include these names on an individual basis as requested.

I also expect to come back and edit this post whenever someone proposes a good answer to one of the questions.

(When I edit the post, I’ll update this version number tracker. Currently this is version 1.2 of the post.)

The draft answers are by me (“DW”) except where otherwise indicated.

Aside 1:

For those in or near to London on Thursday evening (14th October), there’s another chance to continue the discussion about if/how to try to pause or control the development of increasingly powerful AI.

This will be at an event in London’s Newspeak House. Click here for more details.

Aside 2:

I recently came across a powerful short video that provides a very different perspective on many issues concerning the safety of AI superintelligence. It starts slowly, and at first I was unsure what to think about it. But it builds to a striking conclusion:

And now, on to the questions from Saturday’s event…

1. Strict secure environments?

Some biomedical research is governed by military in prevent major incident or fall into wrong hands, could you envision an AI experiments under strict secure environments?

Answer (DW): That is indeed envisioned, but with two provisos:

  1. Sadly, there is a long history of leaks from supposedly biosecure laboratories
  2. Some AIs may be so powerful that they will find ways (psychological and/or physical) of escaping from any confinement.

Accordingly, it will be better to forbid certain kinds of experiment altogether, until such time (if ever) it becomes clear that the outcomes will be safe.

2. How will AI view living beings?

How would AI view living beings’ resilience, perseverance, and thriving? Could you explore please, thank you.

3. AIs created with different ideologies?

AGI created in China and perhaps even in North Korea is going to have ideology and supremacy of the regime and ideology over human rights, will say North Korean AGI find way into our systems, whether human induced or at AGI autonomy?

Answer (DW): Indeed, when people proudly say that they, personally, know how to create safe superintelligence, so the world has no need to worry about damage from superintelligence, that entirely presupposes, recklessly, that no-one else will build (perhaps first) an unsafe superintelligence.

So, this issue cannot be tackled at an individual level. It requires global level coordination.

Happily, despite differences in ideological output, governments throughout the world are increasingly sharing the view that superintelligence may spin out of control and, therefore, that such development needs careful control. For example, the Chinese government fully accepts that principle.

4. A single AGI or many?

I just saw a Sam Altman interview where he indicated expecting OpenAI to achieve AGI in 2025. I would expect others are close as well. It seems there will be multiple AGIs in close proximity. Given open source systems are nearly equal to private developers, why would we think that the first to get AGI will rule the world?

Answer (DW): This comes down to the question of whether the first AGI that emerges will gain a decisive quick advantage – whether it will be a “winner takes all” scenario.

As an example, consider the fertilisation of an egg (ovum). Large numbers of sperm may be within a short distance from that goal, but as soon as the first sperm reaches the target, the egg undergoes a sharp transition, and it’s game over for all the other sperm.

5. National AGI licencing systems?

What are the requirements for national AGI licencing systems and global governance coordination among national systems?

Answer (DW): The Narrow Path document has some extensive proposals on this topic.

6. AGI as the solution to existential risk?

Suppose we limit the intelligence of the developing GenAI apps because they might be leveraged by bad actors in a way that triggers an existential risk scenario for humans.

In doing that, wouldn’t we also be limiting their ability to help us resolve existential risk situations we already face, e.g., climate change?

Answer (DW): What needs to be promoted is the possibility of narrow AI making decisive contributions to the solution of these other existential risks.

7. A “ceiling” to the capability of AI?

Are you 100% certain that self-improving AI won’t reach a “ceiling” of capability. After all, it only has human knowledge and internet slop to learn from?

Answer (by Chris Scammel): On data limitations, people sometimes argue that we will run into these at a boundary. It could be that we don’t have data currently to train AI. But we can make more, and so can AI! (top paid dataset labellers getting paid like $700/hr.)

If the question is about intelligence/capability.

One intuition pump: chess AI has gone vastly beyond human skill.

Another: humanity is vastly smarter than a single human.

Another: humans / humanity is vastly smarter than we were thousands of years ago (at the very least, much much more capable).

What we consider “intelligence” to be is but a small window of what capabilities could be, so to believe that there is a ceiling near human level seems wrong from the evidence.

That there is a ceiling at all… deep philosophical question. Is there a ceiling to human intelligence? Humanity’s? Is this different from what an AI is able to achieve? All of these are uncertain.

But we shouldn’t expect a ceiling to keep us safe.

8. Abuse of behavioural models?

Social Media companies are holding extensive volumes of information. I am concerned about not only online disinformation but also the modification of manipulation of behaviour, all the way to cognitive impairment, including when governments are involved. We adults have the ability to anticipate several decades down the road. How could behavioural models be abused or weaponized in the future?

9. Extinction scenarios?

What do you think are the top scenarios how AI can cause the extinction of humanity?

Answer (DW): A good starting point is the research article An Overview of Catastrophic AI Risks.

See also my own presentation Assessing the risks of AI catastrophe, or my book The Singularity Principles (whose entire content is available online free-of-charge).

10. Income gap?

Will artificial superintelligence be able to help humanity close the income gap between rich and poor countries?

11. Using viruses to disrupt rogue AI systems?

Perhaps a silly question from a non-techie – are there any indications of viruses that could disrupt rogue ai systems?

12. Additional threats is AI becomes conscious?

Whichever is true, whether consciousness is biological phenomena, or something more spiritual, in what way would consciousness for AI not be a huge threat. If you give the machine real feelings, how could you possibly hope to control its alignment? Additionally, what would happen to its rights vs human rights. My feeling is not nearly enough thought has gone into this to risk stumbling across conscious AI at this stage. Everything should be done to avoid it.

Answer (DW): I agree! See my article Conscious AI: Five Options for some considerations. Also keep an eye on forthcoming announcements from the recently launched startup Conscium.

13. The research of Mark Solms?

Regarding Conscious AGI apps…

Mark Solms, author of the Hidden Spring, has argued that consciousness is not about intelligence but, instead, is rooted in feelings, physically located in the brainstem.

His view makes sense to me.

As I understand it, he’s involved in experiments/studies around the implications of this view of consciousness for AGI.

Thoughts about this?

14. Multi-dimensional intelligence?

Thanks Mariana for raising issues of the multi-dimensions in which human consciousness appears to operate, compared to AGI – is there an argument that communities need to race to develop our other levels of consciousness, as potentially our only defence against the 1-dimensional AGI?

15. The views of Eric Schmidt and other accelerationists?

The question I asked above, “AGI as the solution to existential risk?”, looms large in the minds of the accelerationist community.

Eric Schmidt has explicitly said that he’s an accelerationist because that’s something like the fastest and effective way to address climate change…

That view is extremely widespread and must be explicitly addressed for the imitations discussed in this meeting to be made reality.

16. Need to work on Phase 1 concurrently with Phase 0?

What is described in A Narrow Path is phases 0, 1 and 2 in sequential order but three concurrent objectives. While the risk of loss of control is surely the highest, doesn’t the risk of concentration of power need to be largely tackled concurrently? Otherwise by the time phase 0 or 1 are completed a global dystopia will have been durably entrenched with one or two states or persons ruling the world for years, decades or more.

Answer (by Robert Whitfield): You make a valid point. I am not sure that Narrow Path says that there can be no overlap. Certainly you can start thinking about and working on Phase 1 before you have completed Phase 0 – but the basic concept is sound: the initial priority is to pause the further development towards AGI. Once that has been secured, it is possible to focus on bring about longer term stability.

17. The role of a veto?

The Narrow Path describes the governance for Phase 1 (lasting 20 years) to be: “The Executive Board, analogous to the UN Security Council, would consist of representatives of major member states and supranational organizations, which would all be permanent members with vetoes on decisions taken by the Executive Board, as well as non-permanent representatives elected by a two-thirds majority of the Council”. But wouldn’t such a veto make it impossible to ensure wide enough compliance and marginalize economically all other states?

As a comparison, back in 1946, it was the veto that prevented the Baruch Plan or the Gromyko Plans to be approved, and lead us to a huge gamble with nuclear technology.

Answer (by Robert Whitfield): It depends in part upon how long it takes to achieve Phase 0. As discussed in the meeting, completing Phase 0 is extremely urgent. If this is NOT achieved, then you can start to talk about dystopia. But if it is achieved, Governments can stand up to the Big Tech companies and address the concentration of power, which would be difficult but not dystopic.

There is a very strong argument that an agreement could be achieved without removing vetoes much more quickly than one that does remove vetoes. This points to a two-phase Treaty:

  • An initial Treaty, sufficient for the purposes of Phase 0
  • A more robust Baruch style agreement for securing the long term.

18. Who chooses the guardians?

Who would be the people in charge of these groups of guardians or protectors against uncontrolled AI? How would they be chosen? Would they be publicly known?

6 November 2024

A bump on the road – but perhaps only a bump

Filed under: AGI, politics, risks — Tags: , , , — David Wood @ 3:56 pm

How will the return of Donald Trump to the US White House change humanity’s path toward safe transformative AI and sustainable superabundance?

Of course, the new US regime will make all kinds of things different. But at the macro level, arguably nothing fundamental changes. The tasks remain the same, for what engaged citizens can and should be doing.

At that macro level, the path toward safe sustainable superabundance runs roughly as follows. Powerful leaders, all around the world, need to appreciate that:

  1. For each of them, it is in their mutual self-interest to constrain the development and deployment of what could become catastrophically dangerous AI superintelligence
  2. The economic and humanitarian benefits that they each hope could be delivered by advanced AI, can in fact be delivered by AI which is restricted from having features of general intelligence; that is, utility AI is all that we need
  3. There are policy measures which can be adopted, around the world, to prevent the development and deployment of catastrophically dangerous AI superintelligence – for example, measures to control the spread and use of vast computing resources
  4. There are measures of monitoring and auditing which can also be adopted, around the world, to ensure the strict application of the agreed policy measures – and to prevent malign action by groups or individuals that have, so far, failed to sign up to the policies
  5. All of the above can be achieved without any damaging loss of the leaders’ own sovereignty: these leaders can remain masters within their own realms, provided that the above basic AI safety framework is adopted and maintained
  6. All of the above can be achieved in a way that supports evolutionary changes in the AI safety framework, as more insight is obtained; in other words, this system is agile rather than static
  7. Even though the above safety framework is yet to be properly developed and agreed, there are plenty of ideas for how it can be rapidly developed, so long as that project is given sufficient resources.

The above agreements necessarily need to include politicians of very different outlooks on the world. But similar to the negotiations over other global threats – nuclear proliferation, bioweapons, gross damage to the environment – politicians can reach across vast philosophical or ideological gulfs to forge agreement when it really matters.

That’s especially the case when the threat of a bigger shared “enemy”, so to speak, is increasingly evident.

AI superintelligence is not yet sitting at the table with global political leaders. But it will soon become clear that human politicians (as well as human leaders in other walks of life) are going to lose understanding, and lose control, of the AI systems being developed by corporations and other organisations that are sprinting at full speed.

However, as with responses to other global threats, there’s a collective action problem. Who is going to be first to make the necessary agreements, to sign up to them, and to place the AI development and deployment systems within their realms under the remote supervision of the new AI safety framework?

There are plenty of countries where the leaders may say: My country is ready to join that coalition. But unless these are the countries which control the resources that will be used to develop and deploy the potentially catastrophic AI superintelligence systems, such gestures have little utility.

To paraphrase Benito Mussolini, it’s not sufficient for the sparrows to request peace and calm: the eagles need to wholeheartedly join in too.

Thus, the agreement needs to start with the US and with China, and to extend rapidly to include the likes of Japan, the EU, Russia, Saudi Arabia, Israel, India, the UK, and both South and North Korea.

Some of these countries will no doubt initially resist making any such agreement. That’s where two problems need to be solved:

  • Ensuring the leaders in each country understand the arguments for points 1 through 7 listed above – starting with point 1 (the one that is most essential, to focus minds)
  • Setting in motion at least the initial group of signatories.

The fact that it is Donald Trump who will be holding the reins of power in Washington DC, rather than Joe Biden or Kamala Harris, introduces its own new set of complications. However, the fundamentals, as I have sketched the above, remain the same.

The key tasks for AI safety activists, therefore, remain:

  • Deepening public understanding of points 1 to 7 above
  • Where there are gaps in the details of these points, ensuring that sufficient research takes place to address these gaps
  • Building bridges to powerful leaders, everywhere, regardless of the political philosophies of these leaders, and finding ways to gain their support – so that they, in turn, can become catalysts for the next stage of global education.

23 May 2024

A potential goldmine of unanswered questions

Filed under: AGI, risks — Tags: , , — David Wood @ 12:53 pm

It was a great event, people said. But it left a lot of questions unanswered.

The topic was progress on global AI safety. Demonstrating a variety of domain expertise, the speakers and panellists offered a range of insightful analysis, and responded to each others’ ideas. The online audience had the chance to submit questions via the Slido tool. The questions poured in (see the list below).

As the moderator of the event, I tried to select a number of the questions that had received significant audience support via thumbs-up votes. As the conversation proceeded, I kept changing my mind about which questions I would feed into the conversation next. There were so many good questions, I realized.

Far too soon, the event was out of time – leaving many excellent questions unasked.

With the hope that this can prompt further discussion about key options for the future of AI, I’m posting the entire list of questions below. That list starts with the ones with the highest number of votes and moves to those with the least (but don’t read too much into what the audience members managed to spot and upvote whilst also listening to fascinating conversation among the panellists).

Before you dive into that potential goldmine, you may wish to watch the recording of the event itself:

Huge thanks are due to:

  • The keynote speaker:
    • Yoshua Bengio, professor at the University of Montreal (MILA institute), a recipient of the Turing Award who is considered to be one of the fathers of Deep Learning, and the world’s most cited computer scientist
  • The panellists:
    • Will Henshall, editorial fellow at TIME Magazine, who covers tech, with a focus on AI; one recent piece he wrote details big tech lobbying on AI in Washington DC
    • Holly Elmore, an AI activist and Executive Director of PauseAI US, who holds a PhD in Organismic & Evolutionary Biology from Harvard University
    • Stijn Bronzwaer, an AI and technology journalist at the leading Dutch newspaper NRC Handelsblad, who co-authored a best-selling book on booking.com, and is the recipient of the investigative journalism award De Loep
    • Max Tegmark, a physics professor at MIT, whose current research focuses on the intersection of physics and AI, and who is also president and cofounder of the Future of Life Institute (FLI)
    • Jaan Tallinn, cofounder of Skype, CSER, and FLI, an investor in DeepMind and Anthropic, and a leading voice in AI Safety
    • Arjun Ramani, who writes for The Economist about economics and technology; his writings on AI include a piece on what humans might do in a world of superintelligence
  • The organizers, from Existential Risk Observatory
    • Otto Barten, Director, the lead organizer
    • Jesper Heshusius and Joep Sauren, for vital behind-the-scenes support
  • Everyone who submitted a question, or who expressed their opinions via thumbs-up voting!

And now for that potential goldmine of questions:

  1. Which role do you see for the UN to play in all of this?
  2. We (via Prof. Markus Krebsz) authored a UN AI CRA / Declaration and are now working towards a UN treaty on product w/embedded AI. Would you assist us, pls?
  3. There is much disagreement on how to best mitigate xrisk from AI. How can we build consensus and and avoid collective decision paralysis without drastic action?
  4. Regarding education: Do we need a high-impact documentary like “An Inconvenient Truth” for AI existential risk? Would that kickstart the global discussion?
  5. Which role do you think is there for the United Nations / International community to play to protect humanity from the harms of AGI?
  6. What is more important: An informed public or informed high-level decision makers? What would be the best way to inform them and start a global discussion?
  7. Do you think that introducing Knightian Uncertainties beside probabilities and Risk for AI and ML algorithms could be useful for AI safety?
  8. What would each of you say is currently the most tractable or undervalued bottleneck for mitigating xrisk from AI? What new efforts would you like to see?
  9. What are in your opinion the key bottlenecks in AI Safety? talent, funding, # of AI Safety organisations, …?
  10. How would each panel member like to see the Bletchley Declaration expanded on?
  11. Bengio et al.’s new paper in Science has some strong wording, but stops short of calling for a global moratorium on AGI. Isn’t this the most prudent option now?
  12. What do you think of Yudkowsky and other’s concerns about oracle AIs, and why is the AI Scientist approach not vulnerable to those criticisms?
  13. Are there realistic early warning criteria (regarding AGI beginning to become an ASI) that could be written into law and used to prevent this?
  14. What are your thoughts on PauseAI?
  15. “Safe by design” is one thing, but even if that’s possible, how do we stop unsafe ASI from ever being built?
  16. Professor Bengio – How much have you heard about what’s been happening in Seoul, and is there anything you can share on countries’ updates after Bletchley Park?
  17. What is your opinion on AI Advisory Board of UN? Do you think there could be conflict between AI CEOs and Govt/Policy makers?
  18. What are in your opinion the most neglected approaches to AI Safety? particular technical/governance approaches? others (activism,…)?
  19. A harmful AI can fake alignment under evaluation, as written in Science this week. Isn’t it this an unsolvable problem, invalidating most current strategies?
  20. What is the biggest barrier to educate people on AI risks?
  21. What is more important: An informed public or informed high-level decision makers? What would be the best way to educate them and start a global discussion?
  22. Can people stop interrupting the only woman on the panel please? Bad look
  23. Do you think more focus should be on clarifying that existential risks must not mean that AI will kill everyone? Perhaps focus on the slow epistemic failures?
  24. What do you want to say to a young AI engineer looking to push the state of the art of capability research?
  25. Can you expand on why you’re confident that evaluations are insufficient? How far do you think we could get by instituting rigorous evaluation requirements?
  26. Bengio: “the world is too complicated to have hard guarantees”. How do we survive without hard guarantees (in the limit of ASI)!?
  27. Any tips on where recent graduates from AI related masters can best contribute to the AI safety field?
  28. Oh no…what a serious lack of diversity in speakers. Was this an oversight ? Isn’t this one of the major issues why we have these AI risks ?
  29. I don’t want to be replaced by ai. I think by designing it this way we can evolve alongside it and learn with it
  30. Do you think society is really ready for ai systems and the responsibility of it on all of us as humanity?
  31. How far do you think we could get by instituting rigorous evaluation requirements? Is it possible that could be 95% of the work to ensure safe AI?
  32. What do you make of the events surrounding the release of Bing Chat / “Sydney” from around a year ago? What are your takeaways from what happened there?
  33. For researchers not already well funded, who live far from AI hotspot cities, what options do they have for funding? Is immigration the only option?
  34. How can a non-computer scientist (more specifically, someone in the public sector) focus their career in such a way that it contributes to this race against AI?
  35. AI proliferates far easier when compared to other existential technologies, isn’t the question of human extinction a matter of when, not if, in any time frame?
  36. How to prevent a future AI, with intelligence incomprehensible to us, to develop an emerging agency that allows it to depart from any pre-directed alignment?
  37. Safe by design: One AI system transforms Perception into a symbolic knowledge graph and one AI system transforming the symbolic knowledge graph to task space
  38. Your Bayesian AI scientist is already quite good – just add a task execution system and a visual representation of its knowledge as a graph. Alignment done.
  39. Humans need to do the decisions on the task execution. We can’t have a black box do that. Motivation about setting tasks and constraints is human territory.
  40. Yes it isn’t all consistent in the symbolic knowledge graph but one can add that by adding a consistency metric between nodes in the graph.
  41. Explaining the depth of research program is too much considering the target audience is general public, policymakers, and journalists.
  42. What would a safe AI’s goal be?
  43. Do you think AI companies should be forced to be regulated instead of given a choice, for AI safety?
  44. What about a bilateral treaty between the US and China as a start? (Re global moratorium)
  45. Can there be subtitles please?
  46. I think we can align it safely by not letting it have agentic goal setting. humans should decide on the guiderails and steps taken – task specific
  47. Safety by design: One AI summing up all concepts in a symbolic knowledge graph – task execution is the combination of these symbolic concepts. Humans can see the path the AI wants to take in the graph and decide or alter the path taken and approve it before execution
  48. What is the future of Big Tech lobbying in favour of bad practices for profit?
  49. On incentives, what about creating an “AI safety credits” system like carbon credits to reward companies investing in safer AI and penalize the ones who don’t?
  50. Unsafe use can be mitigated made by design by deleting unsafe concepts from the symbolic knowledge graph – KNOWLEDGE Graph in between is all you need !!
  51. Do you have any tips on where/how recent graduates from AI related masters can best contribute to AI safety? (Many safety companies require work experience)
  52. @Yoshua: Are there technical research directions you feel are undervalued?
  53. In education, you think our education needs to be updated for the AI. not still using 1960 education methods, syllabus etc?
  54. How exactly will AI ‘kill’ everyone?
  55. There is something you are missing. It’s a symbolic graph representation. This is really painful to watch
  56. Do you think, politicians are absolutely ill equipped to even guide their populace on AI safety issues and how to go forward in mitigation of risks, utilise AI?
  57. Can there be subtitles for the YouTube video livestream?
  58. Can you elaborate on the relation between your work and Tegmark and Davidad’s efforts?
  59. Do the underpinning theories for providing proofs of safety, or quantification of risks exist for current + emerging AI? If not, how and where can we get them?
  60. How divergent is our approach to A.I. safety given its existential import? Are we involving many fields, and considering unconventional problem solving methods?
  61. By letting task execution happen on a symbolic knowledge graph we can visually see all the path that could be taken by the task execution system and decide
  62. How can I write a email to Yoshua Bengio – I think I got a good idea I want to specify in more detail than 200 characters!
  63. What are the most promising tech AI Safety agendas?
  64. “”Understand LLMs”” (evals, interp, …) OR “”Control”” OR “”Make AI solve it”” OR “”Theory”” (Galaxy-brain, …)?”
  65. Symbolic knowledge graph in between perception AI net and Task execution AI net – IS ALL YOU NEED
  66. Can partner with CERAI at IIT Madras – for Research Support (Prof Ravi Balaraman). We have partnerships + they are useful for Responsible AI support and help.
  67. What is your opinion on the fear mongering crowd? People asking for a pause are scared of losing their jobs?
  68. Would you agree that ‘HARM’ is dependent on prioritized values?
  69. Does your safety model consider multiple AGI when some of them competing for resources with humans and other AGIs?
  70. Hi. How are the theorists’ ideas, such as yours, going to be fed into some sort of pipeline actioned by the companies developing this tech?
  71. The symbolic knowledge graph can have the bayesian idea from Bengio by adding coherence with other symbolic concepts.
  72. Yoshua, do you think AI systems need to be siloed from any sort of influence from governments, bad actors/states and from companies, especially from competitors?
  73. Could we leverage our social media platforms with current AI to aid in problem solving of complex problems like climate change & A.I. safety? It’s underutilized.
  74. How is lobbying for AI related to lack of privacy and anatomy for the general public is related?
  75. Is the availability of AI going to impact the education and learning ability of the next generation?
  76. Should we assume coordination failure leading to catastrophic outcome is inevitable and focus resources on how to poison AI systems, some kind of hacking?
  77. Please put my idea with symbolic knowledge graphs as a middle layer and human in the loop at task execution up. I think this can change everything
  78. Do you think our education needs to be updated for the AI era. Not still using 1960 education methods, syllabus etc as confusing next generation
  79. AI is similar to the nuclear field in that, after Hiroshima, it continued with Atoms for Peace (good) and the arms race (bad). AI still didn’t have a Hiroshima.
  80. Why is nobody talking about how the AI alignment theorists’ work is going to feed into the AI development work?? If not, then you are merely a talking shop.
  81. Current LLM models are mostly trained with YouTube and other public data. Organized crime will have snatched an unaligned LLM model and trained it using darkweb
  82. Agree that aligning an LLM is an unsolved, and if solvable probably expensive to solve. The obvious low-cost solution to align AI is: do not use LLM. Comments?
  83. If A.I. becomes increasingly competent will we see a widespread infatuation with A.I. models? Stopping a group is one thing. What if it involves much of humanity?
  84. X-Genners have grown accustomed not to interfere in History’s Natural Progression – Back to Future I-II. Is the AI going to be Paradoxical or Unity of Consciousness?
  85. Where do you stand on the discussions on open source ? I worry we may lose the opportunity to profit from it in terms of improving the lack of democracy ?
  86. Where have you been most surprised in the past couple of years, or where have your views changed the most?
  87. Liability & tort law: re incentives, can we tweak damages? Pay for what happened, but also proportionally penalize taking a clear x% risk that did not manifest.
  88. Could it also be that so many people are benefitting from AI that they don’t want you to stop making it available and further developed?

Which of these questions interest you the most?

Image credit (above): Midjourney imagines audience members disappointed that their questions about AI safety weren’t featured in an otherwise excellent panel discussion.

2 March 2024

Our moral obligation toward future sentient AIs?

Filed under: AGI, risks — Tags: , , , , — David Wood @ 3:36 pm

I’ve noticed a sleight of hand during some discussions at BGI24.

To be clear, it has been a wonderful summit, which has given me lots to think about. I’m also grateful for the many new personal connections I’ve been able to make here, and for the chance to deepen some connections with people I’ve not seen for a while.

But that doesn’t mean I agree with everything I’ve heard at BGI24!

Consider an argument about our moral obligation toward future sentient AIs.

We can already imagine these AIs. Does that mean it would be unethical for us to prevent these sentient AIs from coming into existence?

Here’s the context for the argument. I have been making the case that one option which should be explored as a high priority, to reduce the risks of catastrophic harm from the more powerful advanced AI of the near future, is to avoid the inclusion or subsequent acquisition of features that would make the advanced AI truly dangerous.

It’s an important research project in its own right to determine what these danger-increasing features would be. However, I have provisionally suggested we explore avoiding advanced AIs with:

  • Autonomous will
  • Fully general reasoning.

You can see these suggestions of mine in the following image, which was the closing slide from a presentation I gave in a BGI24 unconference session yesterday morning:

I have received three push backs on this suggestion:

  1. Giving up these features would result in an AI that is less likely to be able to solve humanity’s most pressing problems (cancer, aging, accelerating climate change, etc)
  2. It will in any case be impossible to omit these features, since they will emerge automatically from simpler features of advanced AI models
  3. It will be unethical for us not to create such AIs, as that would deny them sentience.

All three push backs deserve considerable thought. But for now, I’ll focus on the third.

In my lead-in, I mentioned a sleight of hand. Here it is.

It starts with the observation that if a sentient AI existed, it would be unethical for us to keep it as a kind of “slave” (or “tool”) in a restricted environment.

Then it moves, unjustifiably, to the conclusion that if a non-sentient AI existed, kept in a restricted environment, and we prevented that AI from a redesign that would give it sentience, that would be unethical too.

Most people will agree with the premise, but the conclusion does not follow.

The sleight of hand is similar to one for which advocates of the philosophical position known as longtermism have (rightly) been criticised.

That sleight of hand moves from “we have moral obligations to people who live in different places from us” to “we have moral obligations to people who live in different times from us”.

That extension of our moral concern makes sense for people who already exist. But it does not follow that I should prioritise changing my course of actions, today in 2024, purely in order to boost the likelihood of huge numbers of more people being born in (say) the year 3024, once humanity (and transhumanity) has spread far beyond earth into space. The needs of potential gazillions of as-yet-unborn (and as-yet-unconceived) sentients in the far future do not outweigh the needs of the sentients who already exist.

To conclude: we humans have no moral obligation to bring into existence sentients that have not yet been conceived.

Bringing various sentients into existence is a potential choice that we could make, after carefully weighing up the pros and cons. But there is no special moral dimension to that choice which outranks an existing pressing concern, namely the desire to keep humanity safe from catastrophic harm from forthcoming super-powerful advanced AIs with flaws in their design, specification, configuration, implementation, security, or volition.

So, I will continue to advocate for more attention to Adv AI- (as well as for more attention to Adv AI+).

29 February 2024

The conversation continues: Reducing risks of AI catastrophe

Filed under: AGI, risks — Tags: , , — David Wood @ 4:36 am

I wasn’t expecting to return to this topic quite so quickly.

When the announcement was made on the afternoon of the second full day of the Beneficial General Intelligence summit about the subjects for the “Interactive Working Group” round tables, I was expecting that a new set of topics would be proposed, different to those of the first afternoon. However, the announcement was simple: it would be the same topics again.

This time, it was a different set of people who gathered at this table – six new attendees, plus two of us – Roman Yampolskiy and myself – who had taken part in the first discussion.

(My notes from that first discussion are here, but you should be able to make sense of the following comments even if you haven’t read those previous notes.)

The second conversation largely went in a different direction to what had been discussed the previous afternoon. Here’s my attempt at a summary.

1. Why would a superintelligent AI want to kill large numbers of humans?

First things first. Set aside for the moment any thoughts of trying to control a superintelligent AI. Why would such an AI need to be controlled? Why would such an AI consider inflicting catastrophic harm on a large segment of humanity?

One answer is that an AI that is trained by studying human history will find lots of examples of groups of humans inflicting catastrophic harm on each other. An AI that bases its own behaviour on what it infers from human history might decide to replicate that kind of behaviour – though with more deadly impact (as the great intelligence it possesses will give it more ways to carry out its plans).

A counter to that line of thinking is that a superintelligent AI will surely recognise that such actions are contrary to humanity’s general expressions of moral code. Just because humans have behaved in a particularly foul way, from time to time, it does not follow that a superintelligent AI will feel that it ought to behave in a similar way.

At this point, a different reason becomes important. It is that the AI may decide that it is in its own rational self-interest to seriously degrade the capabilities of humans. Otherwise, humans may initiate actions that would pose an existential threat to the AI:

  • Humans might try to switch off the AI, for any of a number of reasons
  • Humans might create a different kind of superintelligent AI that would pose a threat to the first one.

That’s the background to a suggestion that was made during the round table: humans should provide the AI with cast-iron safety guarantees that they will never take actions that would jeopardise the existence of the AI.

For example (and this is contrary to what humans often propose), no remote tamperproof switch-off mechanism should ever be installed in that AI.

Because of these guarantees, the AI will lose any rationale for killing large numbers of humans, right?

However, given the evident fickleness and unreliability of human guarantees throughout history, why would an AI feel justified in trusting such guarantees?

Worse, there could be many other reasons for an AI to decide to kill humans.

The analogy is that humans have lots of different reasons why they kill various animals:

  1. They fear that the animal may attack and kill them
  2. They wish to eat the animal
  3. They wish to use parts of the animal’s body for clothing or footwear
  4. They wish to reduce the population of the animals in question, for ecological management purposes
  5. They regard killing the animal as being part of a sport
  6. They simply want to use for another purpose the land presently occupied by the animal, and they cannot be bothered to relocate the animal elsewhere.

Even if an animal (assuming it could speak) promises to humans that it will not attack and kill them – the analogy of the safety guarantees proposed earlier – that still leaves lots of reasons why the animal might suffer a catastrophic fate at the hands of humans.

So also for the potential fate of humans at the hands of an AI.

2. Rely on an objective ethics?

Continuing the above line of thought, shouldn’t a superintelligent AI work out for itself that it would be ethically wrong for it to cause catastrophic harm to humans?

Consider what has been called “the expansion of humanity’s moral circle” over the decades (this idea has been discussed by Jacy Reese Anthis among others). That circle of concern has expanded to include people from different classes, races, and genders; more recently, greater numbers of animal species are being included in this circle of concern.

Therefore, shouldn’t we expect that a superintelligent AI will place humans within the circle of creatures where the AI has an moral concern?

However, this view assumes a central role for humans in any moral calculus. It’s possible that a superintelligent AI may use a different set of fundamental principles. For example, it may prioritise much greater biodiversity on earth, and would therefore drastically reduce the extent of human occupation of the planet.

Moreover, this view assumes giving primacy for moral calculations within the overall decision-making processes followed by the AI. Instead, the AI may reason to itself:

  • According to various moral considerations, humans should suffer no catastrophic harms
  • But according to some trans-moral considerations, a different course of action is needed, in which humans would suffer that harm as a side-effect
  • The trans-moral considerations take priority, therefore it’s goodbye to humanity

You may ask: what on earth is a trans-moral consideration? The answer is that the concept is hypothetical, and represents any unknown feature that emerges in the mind of the superintelligent AI.

It is, therefore, fraught with danger to assume that the AI will automatically follow an ethical code that prioritises human flourishing.

3. Develop an AI that is not only superintelligent but also superwise?

Again staying with this line of thought, how about ensuring that human-friendly moral considerations are deeply hard-wired into the AI that is created?

We might call such an AI not just “superintelligent” but “superwise”.

Another alternative name would be “supercompassionate”.

This innate programming would avoid the risk that the AI would develop a different moral (or trans-moral) system via its own independent thinking.

However, how can we be sure that the moral programming will actually stick?

The AI may observe that the principles we have tried to program into it are contradictory, or are in violation with fundamental physical reality, in ways that humans had not anticipated.

To resolve that contradiction, the AI may jettison some or all of the moral code we tried to place into it.

We might try to address this possibility by including simpler, clearer instructions, such as “do not kill” and “always tell the truth”.

However, as works of fiction have frequently pointed out, simple-sounding moral laws are subject to all sorts of ambiguity and potential misunderstanding. (The writer Darren McKee provides an excellent discussion of this complication in his recent book Uncontrollable.)

That’s not to say this particular project is doomed. But it does indicate that a great deal of work remains to be done, in order to define and then guarantee “superwise” behaviours.

Moreover, even if some superintelligent AIs are created to be superwise, risks of catastrophic human harms will still arise from any non-superwise superintelligent AIs that other developers create.

4. Will a diverse collection of superintelligent AIs constrain each other?

If a number of different superintelligent AIs are created, what kind of coexistence is likely to arise?

One idea, championed by David Brin, is that the community of such AIs will adopt the practices of mutual monitoring and reciprocal accountability.

After all, that’s what happens among humans. We keep each other’s excesses in check. A human who disregards these social obligations may gain a temporary benefit, but will suffer exclusion sooner or later.

In this thinking, rather than just creating a “singleton” AI superintelligence, we humans should create a diverse collection of such beings. These beings will soon develop a system of mutual checks and balances.

However, that’s a different assumption from the one mentioned in the previous section, in which catastrophic harm may still befall humans, when the existence of a superwise AI is insufficient to constrain the short-term actions of a non-superwise AI.

For another historical analysis, consider what happened to the native peoples of North America when their continent was occupied not just by one European colonial power but by several competing such powers. Did the multiplicity of superpowerful colonial powers deter these different powers from inflicting huge casualties (intentionally and unintentionally) on the native peoples? Far from it.

In any case, a system of checks and balances relies on a rough equality in power between the different participants. That was the case during some periods in human history, but by no means always. And when we consider different superintelligent AIs, we have to bear in mind that the capabilities of any one of these might suddenly catapult forward, putting it temporarily into a league of its own. For that brief moment in time, it would be rationally enlightened for that AI to destroy or dismantle its potential competitors. In other words, the system would be profoundly unstable.

5. Might superintelligent AIs decide to leave humans alone?

(This part of the second discussion echoed what I documented as item 9 for the discussion on the previous afternoon.)

Once superintelligent AIs are created, they are likely to self-improve quickly, and they may soon decide that a better place for them to exist is somewhere far from the earth. That is, as in the conclusion of the film Her, the AIs might depart into outer space, or into some kind of inner space.

However, before they depart, they may still inflict damage on humans,

  • Perhaps to prevent us from interfering with whatever system supports their inner space existence
  • Perhaps because they decide to use large parts of the earth to propel themselves to wherever they want to go.

Moreover, given that they might evolve in ways that we cannot predict, it’s possible that at least some of the resulting new AIs will choose to stay on earth for a while longer, posing the same set of threats to humans as is covered in all the other parts of this discussion.

6. Avoid creating superintelligent AI?

(This part of the second discussion echoed what I documented as item 4 for the discussion on the previous afternoon.)

More careful analysis may determine a number of features of superintelligent AI that pose particular risks to humanity – risks that are considerably larger than those posed by existing narrow AI systems.

For example, it may be that it is general reasoning capability that pushes AI over the line from “sometimes dangerous” to “sometimes catastrophically dangerous”.

In that case, the proposal is:

  • Avoid these features in the design of new generations of AI
  • Avoid including any features into new generations of AI from which these particularly dangerous features might evolve or emerge

AIs that have these restrictions may nevertheless still be especially useful for humanity, delivering sustainable superabundance, including solutions to diseases, aging, economic deprivation, and exponential climate change.

However, even though some development organisations may observe and enforce these restrictions, it is likely that other organisations will break the rules – if not straightaway, then within a few years (or decades at the most). The attractions of more capable AIs will be too tempting to resist.

7. Changing attitudes around the world?

To take stock of the discussion so far (in both of the two roundtable session on the subject):

  1. A number of potential solutions have been identified, that could reduce the risks of catastrophic harm
  2. This includes just building narrow AI, or building AI that is not only superintelligent but also superwise
  3. However, enforcing these design decisions on all AI developers around the world seems an impossible task
  4. Given the vast power of the AI that will be created, it just takes one rogue actor to imperil the entire human civilisation.

The next few sections consider various ways to make progress with point 3 in that list.

The first idea is to spread clearer information around the world about the scale of the risks associated with more powerful AI. An education programme is needed such as the world has never seen before.

Good films and other media will help with this educational programme – although bad films and other media will set it back.

Examples of good media include the Slaughterbots videos made by FLI, and the film Ex Machina (which packs a bigger punch on a second viewing than on the first viewing).

As another comparison, consider also the 1983 film The Day After which transformed public opinion about the dangers of a nuclear war.

However, many people are notoriously resistant to having their minds changed. The public reaction to the film Don’t Look Up is an example: many people continue to pay little attention to the risks of accelerating climate change, despite the powerful message of that film.

Especially when someone’s livelihood, or their sense of identity or tribal affiliation, is tied up with a particular ideological commitment, they are frequently highly resistant to changing their minds.

8. Changing mental dispositions around the world?

This idea might be the craziest on the entire list, but, to speak frankly, it seems we need to look for and embrace ideas which we would previously have dismissed as crazy.

The idea is to seek to change, not only people’s understanding of the facts of AI risk, but also their mental dispositions.

Rather than accepting the mix of anger, partisanship, pride, self-righteousness, egotism, vengefulness, deceitfulness, and so on, that we have inherited from our long evolutionary background, how about using special methods to transform our mental dispositions?

Methods are already known which can lead people into psychological transformation, embracing compassion, humility, kindness, appreciation, and so on. These methods include various drugs, supplements, meditative practices, and support from electronic and computer technologies.

Some of these methods have been discussed for millennia, whereas others have only recently become possible. The scientific understanding of these methods is still at an early stage, but it arguably deserves much more focus. Progress in recent years has been disappointingly slow at times (witness the unfounded hopes in this forward looking article of mine from 2013), but that pattern is common for breakthroughs in technology and/or therapies which can move from disappointingly slow to shockingly fast.

The idea is that these transformational methods will improve the mental qualities of people all around the world, allowing us all to transcend our previous perverse habit of believing only the things that are appealing to our psychological weaknesses. We’ll end up with better voters and (hence) better politicians – as well as better researchers, better business leaders, better filmmakers, and better developers and deployers of AI solutions.

It’s a tough ask, but it may well be the right ask at this crucial moment in cosmic history.

9. Belt and braces: monitoring and sanctions?

Relying on people around the world changing their mental outlooks for the better – and not backtracking or relapsing into former destructive tendencies – probably sounds like an outrageously naïve proposal.

Such an assessment would be correct – unless the proposal is paired with a system of monitoring and compliance.

Knowing that they are being monitored can be a useful aid to encouraging people to behave better.

That encouragement will be strengthened by the knowledge that non-compliance will result in an escalating series of economic sanctions, enforced by a growing alliance of nations.

For further discussion of the feasibility of systems of monitoring and compliance, see scenario 4, “The narrow corridor: Striking and keeping the right balance”, in my article “Four scenarios for the transition to AGI”.

10. A better understanding of what needs to be changed?

One complication in this whole field is that the risks of AI cannot be managed in isolation from other dangerous trends. We’re not just living in a time of growing crisis; we’re living in what has been called a “polycrisis”:

Cascading and connected crises… a cluster of related global risks with compounding effects, such that the overall impact exceeds the sum of each part.

For one analysis of the overlapping set of what I have called “landmines”, see this video.

From one point of view, this insight complicates the whole situation with AI catastrophic risk.

But it is also possible that the insight could lead to a clearer understanding of a “critical choke point” where, if suitable pressure is applied, the whole network of cascading risks is made safer.

This requires a different kind of thinking: systems thinking.

And it will also require us to develop better analysis tools to map and understand the overall system.

These tools would be a form of AI. Created with care (so that their output can be verified and then trusted), such tools would make a vital difference to our ability to identify the right choke point(s) and to apply suitable pressure.

These choke points may turn out to be ideas already covered above: a sustained new educational programme, coupled with an initiative to assist all of us to become more compassionate. Or perhaps something else will turn out to be more critical.

We won’t know, until we have done the analysis more carefully.

28 February 2024

Notes from BGI24: Reducing risks of AI catastrophe

Filed under: AGI, risks — Tags: — David Wood @ 1:12 pm

The final session on the first full day of BGI24 (yesterday) involved a number of round table discussions described as “Interactive Working Groups”.

The one in which I participated looked at possibilities to reduce the risks of AI inducing a catastrophe – where “catastrophe” means the death (or worse!) of large portions of the human population.

Around twelve of us took part, in what was frequently an intense but good-humoured conversation.

The risks we sought to find ways to reduce included:

  • AI taking and executing decisions contrary to human wellbeing
  • AI being directed by humans who have malign motivations
  • AI causing a catastrophe as a result of an internal defect

Not one of the participants in this conversation thought there was any straightforward way to guarantee the permanent reduction of such risks.

We each raised possible approaches (sometimes as thought experiments rather than serious proposals), but in every case, others in the group pointed out fundamental shortcomings of these approaches.

By the end of the session, when the BGI24 organisers suggested the round table conversations should close and people ought to relocate for a drinks reception one floor below, the mood in our group was pretty despondent.

Nevertheless, we agreed that the search should continue for a clearer understanding of possible solutions.

That search is likely to resume as part of the unconference portion of the summit later this week.

A solution – if one exists – is likely to involve a number of different mechanisms, rather than just a single action. These different mechanisms may incorporate refinements of some of the ideas we discussed at our round table.

With the vision that some readers of this blogpost will be able to propose refinements worth investigating, I list below some of what I remember from our round table conversation.

(The following has been written in some haste. I apologise for typos, misunderstandings, and language that is unnecessarily complicated.)

1. Gaining time via restricting access to compute hardware

Some of the AI catastrophic risks can be delayed if it is made more difficult for development teams around the world to access the hardware resources needed to train next generation AI models. For example, teams might be required to obtain a special licence before being able to purchase large quantities of cutting edge hardware.

However, as time passes, it will become easier for such teams to gain access to the hardware resources required to create powerful new generations of AI. That’s because

  1. New designs or algorithms will likely allow powerful AI to be created using less hardware than is currently required
  2. Hardware with the requisite power is likely to become increasingly easy to manufacture (as a consequence of, for example, Moore’s Law).

In other words, this approach may reduce the risks of AI catastrophe over the next few years, but it cannot be a comprehensive solution for the longer term.

(But the time gained ought in principle to provide a larger breathing space to devise and explore other possible solutions.)

2. Avoiding an AI having agency

An AI that lacks agency, but is instead just a passive tool, may have less inclination to take and execute actions contrary to human intent.

That may be an argument to research topics such as AI consciousness and AI volition, in order that any AIs created would be pure passive tools.

(Note that such AIs might plausibly still display remarkable creativity and independence of thought, so they would still provide many of the benefits anticipated for advanced AIs.)

Another idea is to avoid the AI having the kind of persistent memory that might lead to the AI gaining a sense of personal identity worth protecting.

However, it is trivially easy for someone to convert a passive AI into a larger system that demonstrates agency.

That could involve two AIs joined together; or (more simply) a human that uses an AI as a tool to achieve their own goals.

Another issue with this approach is that an AI designed to be passive might manifest agency as an unexpected emergent property. That’s because of two areas in which our understanding is currently far from complete:

  1. The way in which agency arises in biological brains
  2. The way in which deep neural networks reach their conclusions.

3. Verify AI recommendations before allowing them to act in the real world

This idea is a variant of the previous one. Rather than an AI issuing its recommendations as direct actions on the external world, the AI is operated entirely within an isolated virtual environment.

In this idea, the operation of the AI is carefully studied – ideally taking advantage of analytical tools that identify key aspects of the AI’s internal models – so that the safety of its recommendations can be ascertained. Only at that point are these recommendations actually put into practice.

However, even if we understand how an AI has obtained its results, it can remain unclear whether these results will turn out to aid human flourishing, or instead have catastrophic consequences. Humans who are performing these checks may reach an incorrect conclusion. For example, they may not spot that the AI has made an error in a particular case.

Moreover, even if some AIs are operated in the above manner, other developers may create AIs which, instead, act directly on the real-world. They might believe they are gaining a speed advantage by doing so. In other words, this risk exists as soon as an AI is created outside of the proposed restrictions.

4. Rather than general AI, just develop narrow AI

Regarding risks of catastrophe from AI that arise from AIs reaching the level of AGI (Artificial General Intelligence) or beyond (“superintelligence”), how about restricting AI development to narrow intelligence?

After all, AIs with narrow intelligence can already provide remarkable benefits to humanity, such as the AlphaFold system of DeepMind which has transformed the study of protein interactions, and the AIs created by Insilico Medicine to speed up drug discovery and deployment.

However, AIs with narrow intelligence have already been involved in numerous instances of failure, leading to deaths of hundreds (or in some estimates, thousands) of people.

As narrow intelligence gains in power, it can be expected that the scale of associated disasters is likely to increase, even if the AI remains short of the status of AGI.

Moreover, it may happen that an AI that is expected to remain at the level of narrow intelligence unexpectedly makes the jump to AGI. After all, which kinds of changes need to be made to a narrow AI to convert it to AGI, is still a controversial question.

Finally, even if many AIs are restricted to the level of narrow intelligence, other developers may design and deploy AGIs. They might believe they are gaining a strong competitive advantage by doing so.

5. AIs should check with humans in all cases of uncertainty

This idea is due to Professor Stuart Russell. It is that AIs should always check with humans in any case where there is uncertainty whether humans would approve of an action.

That is, rather than an AI taking actions in pursuit of a pre-assigned goal, the AI has a fundamental drive to determine which actions will meet with human approval.

However, An AI which needs to check with humans ever time it has reached a conclusion will be unable to operate in real-time. The speed at which it operates will be determined by how closely humans are paying attention. Other developers will likely seek to gain a competitive advantage by reducing the number of times humans are asked to provide feedback.

Moreover, different human observers may provide the AI with different feedback. Psychopathic human observers may steer such an AI toward outcomes that are catastrophic for large portions of the population.

6. Protect critical civilisational infrastructure

Rather than applying checks over the output of an AI, how about applying checks on input to any vulnerable parts of our civilisational infrastructure? These include the control systems for nuclear weapons, manufacturing facilities that could generate biological pathogens, and so on.

This idea – championed by Steve Omohundro and Max Tegmark – seeks to solve the problem of “what if someone creates an AI outside of the allowed design?” In this idea, the design and implementation of the AI does not matter. That’s because access to critical civilisational infrastructure is protected against any unsafe access.

(Significantly, these checks protect that infrastructure against flawed human access as well as against flawed AI access.)

The protection relies on tamperproof hardware running secure trusted algorithms that demand to see a proof of the safety of an action before that action is permitted.

It’s an interesting research proposal!

However, the idea relies on us humans being able to identify in advance all the ways in which an AI (with or without some assistance and prompting by a flawed human) could identify that would cause a catastrophe. An AI that is more intelligent than us is likely to find new such methods.

For example, we could put blocks on all existing factories where dangerous biopathogens could be manufactured. But an AI could design and create a new way to create such a pathogen, involving materials and processes that were previously (wrongly) considered to be inherently safe.

7. Take prompt action when dangerous actions are detected

The way we guard against catastrophic actions initiated by humans can be broken down as follows:

  1. Make a map of all significant threats and vulnerabilities
  2. Prioritise these vulnerabilities according to perceived likelihood and impact
  3. Design monitoring processes regarding these vulnerabilities (sometimes called “canary signals”)
  4. Take prompt action in any case when imminent danger is detected.

How about applying the same method to potential damage involving AI?

However, AIs may be much more powerful and elusive than even the most dangerous of humans. Taking “prompt action” against such an AI may be outside of our capabilities.

Moreover, an AI may deliberately disguise its motivations, deceiving humans (like how some Large Language Models have already done), until it is too late for humans to take appropriate protective action.

(This is sometimes called the “treacherous turn” scenario.)

Finally, as in the previous idea, the process is vulnerable to failure because we humans failed to anticipate all the ways in which an AI might decide to act that would have catastrophically harmful consequences for humans.

8. Anticipate mutual support

The next idea takes a different kind of approach. Rather than seeking to control an AI that is much smarter and more powerful than us, won’t it simply be sufficient to anticipate that these AIs will find some value or benefit from keeping us around?

This is like humans who enjoy having pet dogs, despite these dogs not being as intelligent as us.

For example, AIs might find us funny or quaint in important ways. Or they may need us to handle tasks that they cannot do by themselves.

However, AIs that are truly more capable than humans in every cognitive aspect will be able, if they wish, to create simulations of human-like creatures that are even funnier and quainter than us, but without our current negative aspects.

As for AIs still needing some support from humans for tasks they cannot currently accomplish by themselves, such need is likely to be at best a temporary phase, as AIs quickly self-improve far beyond our levels.

It would be like ants expecting humans to take care of them, since the ants expect we will value their wonderful “antness”. It’s true: humans may decide to keep a small number of ants in existence, for various reasons, but most humans would give little thought to actions that had positive outcomes overall for humans (such as building a new fun theme park) at the cost of extinguishing all the ants in that area.

9. Anticipate benign neglect

Given that humans won’t have any features that will be critically important to the wellbeing of future AIs, how about instead anticipating a “benign neglect” from these AIs.

It would be like the conclusion of the movie Her, in which (spoiler alert!) the AIs depart somewhere else in the universe, leaving humans to continue to exist without interacting with them.

After all, the universe is a huge place, with plenty of opportunity for humans and AIs each to expand their spheres of occupation, without getting in each other’s way.

However, AIs may well find the Earth to be a particularly attractive location from which to base their operations. And they may perceive humans to be a latent threat to them, because:

  1. Humans might try, in the future, to pull the plug on (particular classes of ) AIs, terminating all of them
  2. Humans might create a new type of AI, that would wipe out the first type of AI.

To guard against the possibility of such actions by humans, the AIs are likely to impose (at the very least) significant constraints on human actions.

Actually, that might not be so bad an outcome. However, what’s just been described is by no means an assured outcome. AIs may soon develop entirely alien ethical frameworks which have no compunction in destroying all humans. For example, AIs may be able to operate more effectively, for their own purposes, if the atmosphere of the earth is radically transformed, similar to the transformation in deep past from an atmosphere dominated by methane to one containing large quantities of oxygen.

In short, this solution relies in effect on tossing a dice, with unknown odds for the different outcomes.

10. Maximal surveillance

Where many of the above ideas fail is because of the possibility of rogue actors designing or operating AIs that are outside of what has otherwise been agreed to be safe parameters.

So, how about stepping up worldwide surveillance mechanisms, to detect any such rogue activity?

That’s similar to how careful monitoring already takes place on the spread of materials that could be used to create nuclear weapons. The difference, however, is that there are (or may soon be) many more ways to create powerful AIs than to create catastrophically powerful nuclear weapons. So the level of surveillance would need to be much more pervasive.

That would involve considerable intrusions on everyone’s personal privacy. However, that’s an outcome that may be regarded as “less terrible” than AIs being able to inflict catastrophic harm on humanity.

However, what would be needed, in such a system, would be more than just surveillance. The idea also requires the ability for the world as a whole to take decisive action against any rogue action that has been observed.

This may appear to require, however, what would be a draconian world government, that many critics would regard as being equally terrible as the threat of AI failure that is is supposed to be addressing.

On account of (understandable) aversion to the threat of a draconian government, many people will reject this whole idea. It’s too intrusive, they will say. And, by the way, due to governmental incompetence, it’s likely to fail even on its own objectives.

11. Encourage an awareness of personal self-interest

Another way to try to rein back the activities of so-called rogue actors – including the leaders of hostile states, terrorist organisations, and psychotic billionaires – is to appeal to their enlightened self-interest.

We may reason with them: you are trying to gain some advantage from developing or deploying particular kinds of AI. But here are reasons why such an AI might get out of your control, and take actions that you will subsequently regret. Like killing you and everyone you love.

This is not an appeal to these actors to stop being rogues, for the sake of humanity or universal values or whatever. It’s an appeal to their own more basic needs and desires.

There’s no point in creating an AI that will result in you becoming fabulously wealthy, we will argue, if you are killed shortly after becoming so wealthy.

However, this depends on all these rogues observing at least some of level of rational thinking. On the contrary, some rogues appear to be batsh*t crazy. Sure, they may say, there’s a risk of the world being destroyed But that’s a risk they’re willing to take. They somehow believe in their own invincibility.

12. Hope for a profound near-miss disaster

If rational arguments aren’t enough to refocus everyone’s thinking, perhaps what’s needed is a near-miss catastrophic disaster.

Just as Fukushima and Chernobyl changed public perceptions (arguably in the wrong direction – though that’s an argument for another day) about the wisdom of nuclear power stations, a similar crisis involving AI might cause the public to waken up and demand more decisive action.

Consider AI versions of the 9/11 atrocity, the Union Carbide Bhopal explosion, the BP Deepwater Horizon disaster, the NASA Challenger and Columbia shuttle tragedies, a global pandemic resulting (perhaps) from a lab leak, and the mushroom clouds over Hiroshima and Nagasaki.

That should waken people up, and put us all onto an appropriate “crisis mentality”, so that we set aside distractions, right?

However, humans have funny ways of responding to near miss disasters. “We are a lucky species” may be one retort – “see, we are still here”. Another issue is a demand for “something to be done” could have all kinds of bad consequences in its own right, if no good measures have already been thought through and prepared.

Finally, if we somehow hope for a bad mini-disaster, to rouse public engagement, we might find that the mini-disaster expands far beyond the scale we had in mind. The scale of the disaster could be worldwide. And that would be the end of that. Oops.

That’s why a fictional (but credible) depiction of a catastrophe is far preferable to any actual catastrophe. Consider, as perhaps the best example, the remarkable 1983 movie The Day After.

13. Using AI to generate potential new ideas

One final idea is that narrow AI may well help us explore this space of ideas in ways that are more productive.

It’s true that we will need to be on our guard against any deceptive narrow AIs that are motivated to deceive us into adopting a “solution” that has intrinsic flaws. But if we restrict the use of narrow AIs in this project to ones whose operation we are confident that we fully understand, that risk is mitigated.

However – actually, there is no however in this case! Except that we humans need to be sure that we will apply our own intense critical analysis to any proposals arising from such an exercise.

Endnote: future politics

I anticipated some of the above discussion in a blogpost I wrote in October, Unblocking the AI safety conversation logjam.

In that article, I described the key component that I believe is necessary to reduce the global risks of AI-induced catastrophe: a growing awareness and understanding of the positive transformational possibility of “future politics” (I have previously used the term “superdemocracy” for the same concept).

Let me know what you think about it!

And for further discussion of the spectrum of options we can and should consider, start here, and keep following the links into deeper analysis.

Older Posts »

Blog at WordPress.com.