19 December 2022


Filed under: AGI, politics, Singularity Principles — Tags: , , — David Wood @ 2:06 am

I’ve been rethinking some aspects of AI control and AI alignment.

In the six months since publishing my book The Singularity Principles: Anticipating and Managing Cataclysmically Disruptive Technologies, I’ve been involved in scores of conversations about the themes it raises. These conversations have often brought my attention to fresh ideas and different perspectives.

These six months have also seen the appearance of numerous new AI models with capabilities that often catch observers by surprise. The general public is showing a new willingness (at least some of the time) to consider the far-reaching implications of these AI models and their more powerful successors.

People from various parts of my past life have been contacting me. The kinds of things they used to hear me forecasting – the kinds of things they thought, at the time, were unlikely to ever happen – are becoming more credible, more exciting, and, yes, more frightening.

They ask me: What is to be done? And, pointedly, Why aren’t you doing more to stop the truly bad outcomes that now seem ominously likely?

The main answer I give is: read my book. Indeed, you can find all the content online, spread out over a family of webpages.

Indeed, my request is that people should read my book all the way through. That’s because later chapters of that book anticipate questions that tend to come to readers’ minds during earlier chapters, and try to provide answers.

Six months later, although I would give some different (newer) examples were I to rewrite that book today, I stand by the analysis I offered and the principles I championed.

However, I’m inclined to revise my thinking on a number of points. Please find these updates below.

An option to control superintelligent AI

I remain doubtful about the prospects for humans to retain control of any AGI (Artificial General Intelligence) that we create.

That is, the arguments I gave in my chapter “The AI Control Problem” still look strong to me.

But one line of thinking may have some extra mileage. That’s the idea of keeping AGI entirely as an advisor to humans, rather than giving it any autonomy to act directly in the world.

Such an AI would provide us with many recommendations, but it wouldn’t operate any sort of equipment.

More to the point: such an AI would have no desire to operate any sort of equipment. It would have no desires whatsoever, nor any motivations. It would simply be a tool. Or, to be more precise, it would simply be a remarkable tool.

In The Singularity Principles I gave a number of arguments why that idea is unsustainable:

  • Some decisions require faster responses than slow-brained humans can provide; that is, AIs with direct access to real-world levers and switches will be more effective than those that are merely advisory
  • Smart AIs will inevitably develop “subsidiary goals” (intermediate goals) such as having greater computational power, even when there is no explicit programming for such goals
  • As soon as a smart AI acquires any such subsidiary goal, it will find ways to escape any confinement imposed by human overseers.

But I now think this should be explored more carefully. Might a useful distinction be made between:

  1. AIs that do have direct access to real-world levers and switches – with the programming of such AIs being carefully restricted to narrow lines of thinking
  2. AIs with more powerful (general) capabilities, that operate purely in advisory capacities.

In that case, the damage that could be caused by failures of the first type of AI, whilst significant, would not involve threats to the entirety of human civilisation. And failures of the second type of AI would be restricted by the actions of humans as intermediaries.

This approach would require confidence that:

  1. The capabilities of AIs of the first type will remain narrow, despite competitive pressures to give these systems at least some extra rationality
  2. The design of AIs of the second type will prevent the emergence of any dangerous “subsidiary goals”.

As a special case of the second point, the design of these AIs will need to avoid any risk of the systems developing sentience or intrinsic motivation.

These are tough challenges – especially since we still have only a vague understanding of how desires and/or sentience can emerge as smaller systems combine and evolve into larger ones.

But since we are short of other options, it’s definitely something to be considered more fully.

An option for automatically aligned superintelligence

If controlling an AGI turns out to be impossible – as seems likely – what about the option that an AGI will have goals and principles that are fundamentally aligned with human wellbeing?

In such a case, it will not matter if an AGI is beyond human control. The actions it takes will ensure that humans have a very positive future.

The creation of such an AI – sometimes called a “friendly AI” – remains my best hope for humanity’s future.

However, there are severe difficulties in agreeing and encoding “goals and principles that are fundamentally aligned with human wellbeing”. I reviewed these difficulties in my chapter “The AI Alignment Problem”.

But what if such goals and principles are somehow part of an objective reality, awaiting discovery, rather than needing to be invented? What if something like the theory of “moral realism” is true?

In this idea, a principle like “treat humans well” would follow from some sort of a priori logical analysis, a bit like the laws of mathematics (such as the fact, discovered by one of the followers of Pythagoras, that the square root of two is an irrational number).

Accordingly, a sufficiently smart AGI would, all being well, reach its own conclusion that humans ought to be well treated.

Nevertheless, even in this case, significant risks would remain:

  • The principle might be true, but an AGI might not be motivated to discover it
  • The principle might be true, but an AGI, despite its brilliance, may fail to discover it
  • The principle might be true, and an AGI might recognise it, but it may take its own decision to ignore it – like the way that we humans often act in defiance of what we believe at the time to be overarching moral principles

The design criteria and initial conditions that we humans provide for an AGI may well influence the outcome of these risk factors.

I plan to return to these weighty matters in a future blog post!

Two different sorts of control

I’ve come to realise that there are not one but two questions of control of AI:

  1. Can we humans retain control of an AGI that we create?
  2. Can society as a whole control the actions of companies (or organisations) that may create an AGI?

Whilst both these control problems are profoundly hard, the second is less hard.

Moreover, it’s the second problem which is the truly urgent one.

This second control problem involves preventing teams inside corporations (and other organisations) from rushing ahead without due regard to questions of the potential outcomes of their work.

It’s the second control problem that the 21 principles which I highlight in my book are primarily intended to address.

When people say “it’s impossible to solve the AI control problem”, I think they may be correct regarding the first problem, but I passionately believe they’re wrong concerning the second problem.

The importance of psychology

When I review what people say about the progress and risks of AI, I am frequently struck by the fact that apparently intelligent people are strongly attached to views that are full of holes.

When I try to point out the flaws in their thinking, they hardly seem to pause in their stride. They portray a stubborn confidence that they are sure they are correct.

What’s at play here is more than logic. It’s surely a manifestation of humanity’s often defective psychology.

My book includes a short chapter “The denial of the Singularity” which touched on various matters of psychology. If I were to rewrite my book today, I believe that chapter would become larger, and that psychological themes would be spread more widely throughout the book.

Of course, noticing psychological defects is only the start of making progress. Circumventing or transcending these defects is an altogether harder question. But it’s one that needs a lot more attention.

The option of merging with AI

How can we have a better, more productive conversation about anticipating and managing AGI?

How can we avoid being derailed by ineffective arguments, hostile rhetoric, stubborn prejudices, hobby-horse obsessions, outdated ideologies, and (see the previous section) flawed psychology?

How might our not-much-better-than-monkey brains cope with the magnitude of these questions?

One possible answer is that technology can help us (so long as we use it wisely).

For example, the chapter “Uplifting politics”, from near the end of my book, listed ten ways for “technology improving politics”.

More broadly, we humans have the option to selectively deploy some aspects of technology to improve our capabilities in handling other aspects of technology.

We must recognise that technology is no panacea. But it can definitely make a big difference.

Especially if we restrict ourselves to putting heavy reliance only on those technologies – narrow technologies – whose mode of operation we fully understand, and where risks of malfunction can be limited.

This forms part of a general idea that “we humans don’t need to worry about being left behind by robots, or about being subjugated by robots, since we will be the robots”.

As I put it in the chapter “No easy solutions” in my book,

If humans merge with AI, humans could remain in control of AIs, even as these AIs rapidly become more powerful. With such a merger in place, human intelligence will automatically be magnified, as AI improves in capability. Therefore, we humans wouldn’t need to worry about being left behind.

Now I’ve often expressed strong criticisms of this notion of merger. I still believe these criticisms are sound.

But what these criticisms show is that any such merger cannot be the entirety of our response to the prospect of the emergence of AGI. They can only be part of the solution. That’s especially true because humans-augmented-by-technology are still very likely to lag behind pure technology systems, until such time as human minds might be removed from biological skulls and placed into new silicon hosts. That’s something that I’m not expecting to happen before the arrival of AGI, so it will be too late to solve (by itself) the problems of AI alignment and control.

(And since you ask, I probably won’t be in any hurry, even after the arrival of AGI, for my mind to be removed from my biological skull. I guess I might rethink that reticence in due course. But that’s rethinking for another day.)

The importance of politics

Any serious discussion about managing cataclysmically disruptive technologies (such as advanced AIs) pretty soon rubs up against the questions of politics.

That’s not just small-p “politics” – questions of how to collaborate with potential partners where there are many points of disagreement and even dislike.

It’s large-P “Politics” – interacting with presidents, prime ministers, cabinets, parliaments, and so on.

Questions of large-P politics occur throughout The Singularity Principles. My thoughts now, six months afterwards, is that even more focus should be placed on the subject of improving politics:

  • Helping politics to escape the clutches of demagogues and autocrats
  • Helping politics to avoid stultifying embraces between politicians and their “cronies” in established industries
  • Ensuring that the best insights and ideas of the whole electorate can rise to wide attention, without being quashed or distorted by powerful incumbents
  • Bringing everyone involved in politics rapidly up-to-date with the real issues regarding cataclysmically disruptive technologies
  • Distinguishing effective regulations and incentives from those that are counter-productive.

As 2022 has progressed, I’ve seen plenty new evidence of deep problems within political systems around the world. These problems were analysed with sharp insight in the book The Revenge of Power by Moisés Naím that I recently identified as “the best book that I read in 2022”.

Happily, as well as evidence of deep problems in our politics worldwide, there are also encouraging signs, as well as sensible plans for improvement. You can find some of these plans inside the book by Naím, and, yes, I offer suggestions in my own book too.

To accelerate improvements in politics was one of the reasons I created Future Surge a few months back. That’s an initiative on which I expect to spend a lot more of my time in 2023.

Note: the image underlying the picture at the top of this article was created by DALL.E 2 from the prompt “A brain with a human face on it rethinks, vivid stormy sky overhead, photorealistic style”.

1 Comment »

  1. […] few days after the debate, David Wood published an article titled ‘Rethinking’ as a postscript to the Conference. In there, he raises several very important points regarding […]

    Pingback by Why ChatGPT has accelerated the need for AI control? » Sustensis — 30 December 2022 @ 2:51 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

%d bloggers like this: