dw2

3 November 2022

Four options for avoiding an AI cataclysm

Let’s consider four hard truths, and then four options for a solution.

Hard truth 1: Software has bugs.

Even when clever people write the software, and that software passes numerous verification tests, any complex software system generally still has bugs. If the software encounters a circumstance outside its verification suite, it can go horribly wrong.

Hard truth 2: Just because software becomes more powerful, that won’t make all the bugs go away.

Newer software may run faster. It may incorporate input from larger sets of training data. It may gain extra features. But none of these developments mean the automatic removal of subtle errors in the logic of the software, or shortcomings in its specification. It might still reach terrible outcomes – just quicker than before!

Hard truth 3: As AI becomes more powerful, there will be more pressure to deploy it in challenging real-world situations.

Consider the real-time management of:

  • Complex arsenals of missiles, anti-missile missiles, and so on
  • Geoengineering interventions, which are intended to bring the planet’s climate back from the brink of a cascade of tipping points
  • Devious countermeasures against the growing weapons systems of a group (or nation) with a dangerously unstable leadership
  • Social network conversations, where changing sentiments can have big implications for electoral dynamics or for the perceived value of commercial brands
  • Ultra-hot plasmas inside whirling magnetic fields in nuclear fusion energy generators
  • Incentives for people to spend more money than is wise, on addictive gambling sites
  • The buying and selling of financial instruments, to take advantage of changing market sentiments.

In each case, powerful AI software could be a very attractive option. A seductive option. Especially if it has been written by clever people, and appears to have a good track record of delivering results.

Until it goes wrong. In which case the result could be cataclysmic. (Accidental nuclear war. The climate walloped past a tipping point in the wrong direction. Malware going existentially wrong. Partisan outrage propelling a psychological loose cannon over the edge. Easy access to weapons of mass destruction. Etc.)

Indeed, the real risk of AI cataclysm – as opposed to the Hollywood version of any such risk – is that an AI system may acquire so much influence over human society and our surrounding environment that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

  1. Implementation defect
  2. Design defect
  3. Design overridden
  4. Implementation overridden.

Hard truth 4: There are no simple solutions to the risks described above.

What’s more, people who naively assume that a simple solution can easily be put in place (or already exists) are making the overall situation worse. They encourage complacency, whereas greater attention is urgently needed.

But perhaps you disagree?

That’s the context for the conversation in Episode 11 of the London Futurists Podcast, which was published yesterday morning.

In just thirty minutes, that episode dug deep into some of the ideas in my recent book The Singularity Principles. Co-host Calum Chace and I found plenty on which to agree, but had differing opinions on one of the most important questions.

Calum listed three suggestions that people sometimes make for how the dangers of potentially cataclysmic AI might be handled.

In response, I described a different approach – something that Calum said would be a fourth idea for a solution. As you can hear from the recording of the podcast, I evidently left him unconvinced.

Therefore, I’d like to dig even deeper.

Option 1: Humanity gets lucky

It might be the case that AI software that is smart enough, will embody an unshakeable commitment toward humanity having the best possible experience.

Such software won’t miscalculate (after all, it is superintelligent). If there are flaws in how it has been specified, it will be smart enough to notice these flaws, rather than stubbornly following through on the letter of its programming. (After all, it is superintelligent.)

Variants of this wishful thinking exist. In some variants, what will guarantee a positive outcome isn’t just a latent tendency of superintelligence toward superbenevolence. It’s the invisible hand of the free market that will guide consumer choices away from software that might harm users, toward software that never, ever, ever goes wrong.

My response here is that software which appears to be bug free can, nevertheless, harbour deep mistakes. It may be superintelligent, but that doesn’t mean it’s omniscient or infallible.

Second, software which is bug free may be monstrously efficient at doing what some of its designers had in mind – manipulating consumers into actions which increase the share price of a given corporation, despite all the externalities arising.

Moreover, it’s too much of a stretch to say that greater intelligence always makes your wiser and kinder. There are plenty of dreadful counterexamples, from humans in the worlds of politics, crime, business, academia, and more. Who is to say that a piece of software with an IQ equivalent to 100,000 will be sure to treat us humans any better than we humans sometimes treat swarms of insects (e.g. ant colonies) that get in our way?

Do you feel lucky? My view is that any such feeling, in these circumstances, is rash in the extreme.

Option 2: Safety engineered in

Might a team of brilliant AI researchers, Mary and Flo (to make up a couple of names), devise a clever method that will ensure their AI (once it is built) never harms humanity?

Perhaps the answer lies in some advanced mathematical wizardry. Or in chiselling a 21st century version of Asimov’s Laws of Robotics into the chipsets at the heart of computer systems. Or in switching from “correlation logic” to “causation logic”, or some other kind of new paradigm in AI systems engineering.

Of course, I wish Mary and Flo well. But their ongoing research won’t, by itself, prevent lots of other people releasing their own unsafe AI first. Especially when these other engineers are in a hurry to win market share for their companies.

Indeed, the considerable effort being invested by various researchers and organisations in a search for a kind of fix for AI safety is, arguably, a distraction from a sober assessment of the bigger picture. Better technology, better product design, better mathematics, and better hardware can all be part of the full solution. But that full solution also needs, critically, to include aspects of organisational design, economic incentives, legal frameworks, and political oversight. That’s the argument I develop in my book. We ignore these broader forces at our peril.

Option 3: Humans merge with machines

If we can’t beat them, how about joining them?

If human minds are fused into silicon AI systems, won’t the good human sense of these minds counteract any bugs or design flaws in the silicon part of the hybrid formed?

With such a merger in place, human intelligence will automatically be magnified, as AI improves in capability. Therefore, we humans wouldn’t need to worry about being left behind. Right?

I see two big problems with this idea. First, so long as human intelligence is rooted in something like the biology of the brain, the mechanisms for any such merger may only allow relatively modest increases in human intelligence. Our biological brains would be bottlenecks that constrain the speed of progress in this hybrid case. Compared to pure AIs, the human-AI hybrid would, after all, be left behind in this intelligence race. So much for humans staying in control!

An even bigger problem is the realisation that a human with superhuman intelligence is likely to be at least as unpredictable and dangerous as an AI with superhuman intelligence. The magnification of intelligence will allow that superhuman human to do all kinds of things with great vigour – settling grudges, acting out fantasies, demanding attention, pursuing vanity projects, and so on. Recall: power tends to corrupt. Such a person would be able to destroy the earth. Worse, they might want to do so.

Another way to state this point is that, just because AI elements are included inside a person, that won’t magically ensure that these elements become benign, or are subject to the full control of the person’s best intentions. Consider as comparisons what happens when biological viruses enter a person’s body, or when a cancer grows there. In neither case does the intruding element lose its ability to cause damage, just on account of being part of a person who has humanitarian instincts.

This reminds me of the statement that is sometimes heard, in defence of accelerating the capabilities of AI systems: “I am not afraid of artificial intelligence. I am afraid of human stupidity”.

In reality, what we need to fear is the combination of imperfect AI and imperfect humanity.

The conclusion of this line of discussion is that we need to do considerably more than enable greater intelligence. We also need to accelerate greater wisdom – so that any beings with superhuman intelligence will operate truly beneficently.

Option 4: Greater wisdom

The cornerstone insight of ethics is that, just because we can do something, and indeed may even want to do that thing, it doesn’t mean we should do that thing.

Accordingly, human societies since prehistory have placed constraints on how people should behave.

Sometimes, moral sanction is sufficient: people constrain their actions in deference to public opinion. In other cases, restrictions are codified into laws and regulations.

Likewise, just because a corporation could boost its profits by releasing a new version of its AI software, that doesn’t mean it should release that software.

But what is the origin of these “should” imperatives? And how do we resolve conflicts, when two different groups of people champion two different sets of ethical intuitions?

Where can we find a viable foundation for ethical restrictions – something more solid than “we’ve always done things like this” or “this feels right to me” or “we need to submit to the dictates in our favourite holy scripture”?

Welcome to the world of philosophy.

It’s a world that, according to some observers, has made little progress over the centuries. People still argue over fundamentals. Deontologists square off against consequentialists. Virtue ethicists stake out a different position.

It’s a world in which it is easier to poke holes in the views held by others, rather than defending a consistent view of your own.

But it’s my position that the impending threat of cataclysmic AI impels us to reach a wiser agreement.

It’s like how the devastation of the Covid pandemic impelled society to find significantly quicker ways to manufacture, verify, and deploy vaccines.

It’s like how society can come together, remarkably, in a wartime situation, notwithstanding the divisions that previously existed.

In the face of the threats of technology beyond our control, minds should focus, with unprecedented clarity. We’ll gradually build a wider consensus in favour of various restrictions and, yes, in favour of various incentives.

What’s your reaction? Is option 4 simply naïve?

Practical steps forward

Rather than trying to “boil the ocean” of philosophical disputes over contrasting ethical foundations, we can, and should, proceed in a kaizen manner.

To start with, we can give our attention to specific individual questions:

  • What are the circumstances when we should welcome AI-powered facial recognition software, and when should we resist it?
  • What are the circumstances when we should welcome AI systems that supervise aspects of dangerous weaponry?
  • What are the circumstances that could transform AI-powered monitoring systems from dangerous to helpful?

As we reach some tentative agreements on these individual matters, we can take the time to highlight principles with potential wider applicability.

In parallel, we can revisit some of the agreements (explicit and implicit) for how we measure the health of society and the liberties of individuals:

  • The GDP (Gross Domestic Product) statistics that provide a perspective on economic activities
  • The UDHR (Universal Declaration of Human Rights) statement that was endorsed in the United Nations General Assembly in 1948.

I don’t deny it will be hard to build consensus. It will be even harder to agree how to enforce the guidelines arising – especially in light of the wretched partisan conflicts that are poisoning the political processes in a number of parts of the world.

But we must try. And with some small wins under our belt, we can anticipate momentum building.

These are some of the topics I cover in the closing chapters of The Singularity Principles:

I by no means claim to know all the answers.

But I do believe that these are some of the most important questions to address.

And to help us make progress, something that could help us is – you guessed it – AI. In the right circumstances, AI can help us think more clearly, and can propose new syntheses of our previous ideas.

Thus today’s AI can provide stepping stones to the design and deployment of better, safer, wiser AI tomorrow. That’s provided we maintain human oversight.

Footnotes

The image above includes a design by Pixabay user Alexander Antropov, used with thanks.

See also this article by Calum in Forbes, Taking Back Control Of The Singularity.

8 June 2022

Pre-publication review: The Singularity Principles

Filed under: books, Singularity, Singularity Principles — Tags: — David Wood @ 9:23 am

I’ve recently been concentrating on finalising the content of my forthcoming new book, The Singularity Principles.

The reasons why I see this book as both timely and necessary are explained in the extract, below, taken from the introduction to the book

This link provides pointers to the full text of every chapter in the book. (Or use the links in the listing below of the extended table of contents.)

Please get in touch with me if you would prefer to read the pre-publication text in PDF format, rather than on the online HTML pages linked above.

At this stage, I will gratefully appreciate any feedback:

  • Aspects of the book that I should consider changing
  • Aspects of the book that you particularly like.

Feedback on any parts of the book will be welcome. It’s by no means necessary for you to read the entire text. (However, I hope you will find it sufficiently interesting that you will end up reading more than you originally planned…)

By the way, it’s a relatively short book, compared to some others I’ve written. The wordcount is a bit over 50 thousand words. That works out at around 260 pages of fairly large text on 5″x8″ paper.

I will also appreciate any commendations or endorsements, which I can include with the publicity material for the book, to encourage more people to pay attention to it.

The timescale I have in mind: I will release electronic and physical copies of the book some time early next month (July), followed up soon afterward by an audio version.

Therefore, if you’re thinking of dipping into any chapters to provide feedback and/or endorsements, the sooner the better!

Thanks in anticipation!

Preface

This book is dedicated to what may be the most important concept in human history, namely, the Singularity – what it is, what it is not, the steps by which we may reach it, and, crucially, how to make it more likely that we’ll experience a positive singularity rather than a negative singularity.

For now, here’s a simple definition. The Singularity is the emergence of Artificial General Intelligence (AGI), and the associated transformation of the human condition. Spoiler alert: that transformation will be profound. But if we’re not paying attention, it’s likely to be profoundly bad.

Despite the importance of the concept of the Singularity, the subject receives nothing like the attention it deserves. When it is discussed, it often receives scorn or ridicule. Alas, you’ll hear sniggers and see eyes rolling.

That’s because, as I’ll explain, there’s a kind of shadow around the concept – an unhelpful set of distortions that make it harder for people to fully perceive the real opportunities and the real risks that the Singularity brings.

These distortions grow out of a wider confusion – confusion about the complex interplay of forces that are leading society to the adoption of ever-more powerful technologies, including ever-more powerful AI.

It’s my task in this book to dispel the confusion, to untangle the distortions, to highlight practical steps forward, and to attract much more serious attention to the Singularity. The future of humanity is at stake.

Let’s start with the confusion.

Confusion, turbulence, and peril

The 2020s could be called the Decade of Confusion. Never before has so much information washed over everyone, leaving us, all too often, overwhelmed, intimidated, and distracted. Former certainties have dimmed. Long-established alliances have fragmented. Flurries of excitement have pivoted quickly to chaos and disappointment. These are turbulent times.

However, if we could see through the confusion, distraction, and intimidation, what we should notice is that human flourishing is, potentially, poised to soar to unprecedented levels. Fast-changing technologies are on the point of providing a string of remarkable benefits. We are near the threshold of radical improvements to health, nutrition, security, creativity, collaboration, intelligence, awareness, and enlightenment – with these improvements being available to everyone.

Alas, these same fast-changing technologies also threaten multiple sorts of disaster. These technologies are two-edged swords. Unless we wield them with great skill, they are likely to spin out of control. If we remain overwhelmed, intimidated, and distracted, our prospects are poor. Accordingly, these are perilous times.

These dual future possibilities – technology-enabled sustainable superabundance, versus technology-induced catastrophe – have featured in numerous discussions that I have chaired at London Futurists meetups going all the way back to March 2008.

As these discussions have progressed, year by year, I have gradually formulated and refined what I now call the Singularity Principles. These principles are intended:

  • To steer humanity’s relationships with fast-changing technologies,
  • To manage multiple risks of disaster,
  • To enable the attainment of remarkable benefits,
  • And, thereby, to help humanity approach a profoundly positive singularity.

In short, the Singularity Principles are intended to counter today’s widespread confusion, distraction, and intimidation, by providing clarity, credible grounds for hope, and an urgent call to action.

This time it’s different

I first introduced the Singularity Principles, under that name and with the same general format, in the final chapter, “Singularity”, of my 2021 book Vital Foresight: The Case for Active Transhumanism. That chapter is the culmination of a 642 page book. The preceding sixteen chapters of that book set out at some length the challenges and opportunities that these principles need to address.

Since the publication of Vital Foresight, it has become evident to me that the Singularity Principles require a short, focused book of their own. That’s what you now hold in your hands.

The Singularity Principles is by no means the only new book on the subject of the management of powerful disruptive technologies. The public, thankfully, are waking up to the need to understand these technologies better, and numerous authors are responding to that need. As one example, the phrase “Artificial Intelligence”, forms part of the title of scores of new books.

I have personally learned many things from some of these recent books. However, to speak frankly, I find myself dissatisfied by the prescriptions these authors have advanced. These authors generally fail to appreciate the full extent of the threats and opportunities ahead. And even if they do see the true scale of these issues, the recommendations these authors propose strike me as being inadequate.

Therefore, I cannot keep silent.

Accordingly, I present in this new book the content of the Singularity Principles, brought up to date in the light of recent debates and new insights. The book also covers:

  • Why the Singularity Principles are sorely needed
  • The source and design of these principles
  • The significance of the term “Singularity”
  • Why there is so much unhelpful confusion about “the Singularity”
  • What’s different about the Singularity Principles, compared to recommendations of other analysts
  • The kinds of outcomes expected if these principles are followed
  • The kinds of outcomes expected if these principles are not followed
  • How you – dear reader – can, and should, become involved, finding your place in a growing coalition
  • How these principles are likely to evolve further
  • How these principles can be put into practice, all around the world – with the help of people like you.

The scope of the Principles

To start with, the Singularity Principles can and should be applied to the anticipation and management of the NBIC technologies that are at the heart of the current, fourth industrial revolution. NBIC – nanotech, biotech, infotech, and cognotech – is a quartet of four interlinked technological disruptions which are likely to grow significantly stronger as the 2020s unfold. Each of these four technological disruptions has the potential to fundamentally transform large parts of the human experience.

However, the same set of principles can and should also be applied to the anticipation and management of the core technology that will likely give rise to a fifth industrial revolution, namely the technology of AGI (artificial general intelligence), and the rapid additional improvements in artificial superintelligence that will likely follow fast on the footsteps of AGI.

The emergence of AGI is known as the technological singularity – or, more briefly, as the Singularity.

In other words, the Singularity Principles apply both:

  • To the longer-term lead-up to the Singularity, from today’s fast-improving NBIC technologies,
  • And to the shorter-term lead-up to the Singularity, as AI gains more general capabilities.

In both cases, anticipation and management of possible outcomes will be of vital importance.

By the way – in case it’s not already clear – please don’t expect a clever novel piece of technology, or some brilliant technical design, to somehow solve, by itself, the challenges posed by NBIC technologies and AGI. These challenges extend far beyond what could be wrestled into submission by some dazzling mathematical wizardry, by the incorporation of an ingenious new piece of silicon at the heart of every computer, or by any other “quick fix”. Indeed, the considerable effort being invested by some organisations in a search for that kind of fix is, arguably, a distraction from a sober assessment of the bigger picture.

Better technology, better product design, better mathematics, and better hardware can all be part of the full solution. But that full solution also needs, critically, to include aspects of organisational design, economic incentives, legal frameworks, and political oversight. That’s the argument I develop in the chapters ahead.

Extended table of contents

For your convenience, here’s a listing of the main section headings for all the chapters in this book.

0. Preface

  • Confusion, turbulence, and peril
  • This time it’s different
  • The scope of the Principles
  • Collective insight
  • The short form of the Principles
  • The four areas covered by the Principles
  • What lies ahead

1. Background: Ten essential observations

  • Tech breakthroughs are unpredictable (both timing and impact)
  • Potential complex interactions make prediction even harder
  • Changes in human attributes complicate tech changes
  • Greater tech power enables more devastating results
  • Different perspectives assess “good” vs. “bad” differently
  • Competition can be hazardous as well as beneficial
  • Some tech failures would be too drastic to allow recovery
  • A history of good results is no guarantee of future success
  • It’s insufficient to rely on good intentions
  • Wishful thinking predisposes blindness to problems

2. Fast-changing technologies: risks and benefits

  • Technology risk factors
  • Prioritising benefits?
  • What about ethics?
  • The transhumanist stance

2.1 Special complications with artificial intelligence

  • Problems with training data
  • The black box nature of AI
  • Interactions between multiple algorithms
  • Self-improving AI
  • Devious AI
  • Four catastrophic error modes
  • The broader perspective

2.2 The AI Control Problem

  • The gorilla problem
  • Examples of dangers with uncontrollable AI
  • Proposed solutions (which don’t work)
  • The impossibility of full verification
  • Emotion misses the point
  • No off switch
  • The ineffectiveness of tripwires
  • Escaping from confinement
  • The ineffectiveness of restrictions
  • No automatic super ethics
  • Issues with hard-wiring ethical principles

2.3 The AI Alignment Problem

  • Asimov’s Three Laws
  • Ethical dilemmas and trade-offs
  • Problems with proxies
  • The gaming of proxies
  • Simple examples of profound problems
  • Humans disagree
  • No automatic super ethics (again)
  • Other options for answers?

2.4 No easy solutions

  • No guarantees from the free market
  • No guarantees from cosmic destiny
  • Planet B?
  • Humans merging with AI?
  • Approaching the Singularity

3. What is the Singularity?

  • Breaking down the definition
  • Four alternative definitions
  • Four possible routes to the Singularity
  • The Singularity and AI self-awareness
  • Singularity timescales
  • Positive and negative singularities
  • Tripwires and canary signals
  • Moving forward

3.1 The Singularitarian Stance

  • AGI is possible
  • AGI could happen within just a few decades
  • Winner takes all
  • The difficulty of controlling AGI
  • Superintelligence and superethics
  • Not the Terminator
  • Opposition to the Singularitarian Stance

3.2 A complication: the Singularity Shadow

  • Singularity timescale determinism
  • Singularity outcome determinism
  • Singularity hyping
  • Singularity risk complacency
  • Singularity term overloading
  • Singularity anti-regulation fundamentalism
  • Singularity preoccupation
  • Looking forward

3.3 Bad reasons to deny the Singularity

  • The denial of death
  • How special is the human mind?
  • A credible positive vision

4. The question of urgency

  • Factors causing AI to improve
  • 15 options on the table
  • The difficulty of measuring progress
  • Learning from Christopher Columbus
  • The possibility of fast take-off

5. The Singularity Principles in depth

5.1 Analysing goals and potential outcomes

  • Question desirability
  • Clarify externalities
  • Require peer reviews
  • Involve multiple perspectives
  • Analyse the whole system
  • Anticipate fat tails

5.2 Desirable characteristics of tech solutions

  • Reject opacity
  • Promote resilience
  • Promote verifiability
  • Promote auditability
  • Clarify risks to users
  • Clarify trade-offs

5.3 Ensuring development takes place responsibly

  • Insist on accountability
  • Penalise disinformation
  • Design for cooperation
  • Analyse via simulations
  • Maintain human oversight

5.4 Evolution and enforcement

  • Build consensus regarding principles
  • Provide incentives to address omissions
  • Halt development if principles are not upheld
  • Consolidate progress via legal frameworks

6. Key success factors

  • Public understanding
  • Persistent urgency
  • Reliable action against noncompliance
  • Public funding
  • International support
  • A sense of inclusion and collaboration

7. Questions arising

7.1 Measuring human flourishing

  • Some example trade-offs
  • Updating the Universal Declaration of Human Rights
  • Constructing an Index of Human and Social Flourishing

7.2 Trustable monitoring

  • Moore’s Law of Mad Scientists
  • Four projects to reduce the dangers of WMDs
  • Detecting mavericks
  • Examples of trustable monitoring
  • Watching the watchers

7.3 Uplifting politics

  • Uplifting regulators
  • The central role of politics
  • Toward superdemocracy
  • Technology improving politics
  • Transcending party politics
  • The prospects for political progress

7.4 Uplifting education

  • Top level areas of the Vital Syllabus
  • Improving the Vital Syllabus

7.5 To AGI or not AGI?

  • Global action against the creation of AGI?
  • Possible alternatives to AGI?
  • A dividing line between AI and AGI?
  • A practical proposal

7.6 Measuring progress toward AGI

  • Aggregating expert opinions
  • Metaculus predictions
  • Alternative canary signals for AGI
  • AI index reports

7.7. Growing a coalition of the willing

  • Risks and actions

Image credit

The draft book cover shown above includes a design by Pixabay member Ebenezer42.

Blog at WordPress.com.