I wasn’t expecting to return to this topic quite so quickly.
When the announcement was made on the afternoon of the second full day of the Beneficial General Intelligence summit about the subjects for the “Interactive Working Group” round tables, I was expecting that a new set of topics would be proposed, different to those of the first afternoon. However, the announcement was simple: it would be the same topics again.
This time, it was a different set of people who gathered at this table – six new attendees, plus two of us – Roman Yampolskiy and myself – who had taken part in the first discussion.
(My notes from that first discussion are here, but you should be able to make sense of the following comments even if you haven’t read those previous notes.)
The second conversation largely went in a different direction to what had been discussed the previous afternoon. Here’s my attempt at a summary.
1. Why would a superintelligent AI want to kill large numbers of humans?
First things first. Set aside for the moment any thoughts of trying to control a superintelligent AI. Why would such an AI need to be controlled? Why would such an AI consider inflicting catastrophic harm on a large segment of humanity?
One answer is that an AI that is trained by studying human history will find lots of examples of groups of humans inflicting catastrophic harm on each other. An AI that bases its own behaviour on what it infers from human history might decide to replicate that kind of behaviour – though with more deadly impact (as the great intelligence it possesses will give it more ways to carry out its plans).
A counter to that line of thinking is that a superintelligent AI will surely recognise that such actions are contrary to humanity’s general expressions of moral code. Just because humans have behaved in a particularly foul way, from time to time, it does not follow that a superintelligent AI will feel that it ought to behave in a similar way.
At this point, a different reason becomes important. It is that the AI may decide that it is in its own rational self-interest to seriously degrade the capabilities of humans. Otherwise, humans may initiate actions that would pose an existential threat to the AI:
- Humans might try to switch off the AI, for any of a number of reasons
- Humans might create a different kind of superintelligent AI that would pose a threat to the first one.
That’s the background to a suggestion that was made during the round table: humans should provide the AI with cast-iron safety guarantees that they will never take actions that would jeopardise the existence of the AI.
For example (and this is contrary to what humans often propose), no remote tamperproof switch-off mechanism should ever be installed in that AI.
Because of these guarantees, the AI will lose any rationale for killing large numbers of humans, right?
However, given the evident fickleness and unreliability of human guarantees throughout history, why would an AI feel justified in trusting such guarantees?
Worse, there could be many other reasons for an AI to decide to kill humans.
The analogy is that humans have lots of different reasons why they kill various animals:
- They fear that the animal may attack and kill them
- They wish to eat the animal
- They wish to use parts of the animal’s body for clothing or footwear
- They wish to reduce the population of the animals in question, for ecological management purposes
- They regard killing the animal as being part of a sport
- They simply want to use for another purpose the land presently occupied by the animal, and they cannot be bothered to relocate the animal elsewhere.
Even if an animal (assuming it could speak) promises to humans that it will not attack and kill them – the analogy of the safety guarantees proposed earlier – that still leaves lots of reasons why the animal might suffer a catastrophic fate at the hands of humans.
So also for the potential fate of humans at the hands of an AI.
2. Rely on an objective ethics?
Continuing the above line of thought, shouldn’t a superintelligent AI work out for itself that it would be ethically wrong for it to cause catastrophic harm to humans?
Consider what has been called “the expansion of humanity’s moral circle” over the decades (this idea has been discussed by Jacy Reese Anthis among others). That circle of concern has expanded to include people from different classes, races, and genders; more recently, greater numbers of animal species are being included in this circle of concern.
Therefore, shouldn’t we expect that a superintelligent AI will place humans within the circle of creatures where the AI has an moral concern?
However, this view assumes a central role for humans in any moral calculus. It’s possible that a superintelligent AI may use a different set of fundamental principles. For example, it may prioritise much greater biodiversity on earth, and would therefore drastically reduce the extent of human occupation of the planet.
Moreover, this view assumes giving primacy for moral calculations within the overall decision-making processes followed by the AI. Instead, the AI may reason to itself:
- According to various moral considerations, humans should suffer no catastrophic harms
- But according to some trans-moral considerations, a different course of action is needed, in which humans would suffer that harm as a side-effect
- The trans-moral considerations take priority, therefore it’s goodbye to humanity
You may ask: what on earth is a trans-moral consideration? The answer is that the concept is hypothetical, and represents any unknown feature that emerges in the mind of the superintelligent AI.
It is, therefore, fraught with danger to assume that the AI will automatically follow an ethical code that prioritises human flourishing.
3. Develop an AI that is not only superintelligent but also superwise?
Again staying with this line of thought, how about ensuring that human-friendly moral considerations are deeply hard-wired into the AI that is created?
We might call such an AI not just “superintelligent” but “superwise”.
Another alternative name would be “supercompassionate”.
This innate programming would avoid the risk that the AI would develop a different moral (or trans-moral) system via its own independent thinking.
However, how can we be sure that the moral programming will actually stick?
The AI may observe that the principles we have tried to program into it are contradictory, or are in violation with fundamental physical reality, in ways that humans had not anticipated.
To resolve that contradiction, the AI may jettison some or all of the moral code we tried to place into it.
We might try to address this possibility by including simpler, clearer instructions, such as “do not kill” and “always tell the truth”.
However, as works of fiction have frequently pointed out, simple-sounding moral laws are subject to all sorts of ambiguity and potential misunderstanding. (The writer Darren McKee provides an excellent discussion of this complication in his recent book Uncontrollable.)
That’s not to say this particular project is doomed. But it does indicate that a great deal of work remains to be done, in order to define and then guarantee “superwise” behaviours.
Moreover, even if some superintelligent AIs are created to be superwise, risks of catastrophic human harms will still arise from any non-superwise superintelligent AIs that other developers create.
4. Will a diverse collection of superintelligent AIs constrain each other?
If a number of different superintelligent AIs are created, what kind of coexistence is likely to arise?
One idea, championed by David Brin, is that the community of such AIs will adopt the practices of mutual monitoring and reciprocal accountability.
After all, that’s what happens among humans. We keep each other’s excesses in check. A human who disregards these social obligations may gain a temporary benefit, but will suffer exclusion sooner or later.
In this thinking, rather than just creating a “singleton” AI superintelligence, we humans should create a diverse collection of such beings. These beings will soon develop a system of mutual checks and balances.
However, that’s a different assumption from the one mentioned in the previous section, in which catastrophic harm may still befall humans, when the existence of a superwise AI is insufficient to constrain the short-term actions of a non-superwise AI.
For another historical analysis, consider what happened to the native peoples of North America when their continent was occupied not just by one European colonial power but by several competing such powers. Did the multiplicity of superpowerful colonial powers deter these different powers from inflicting huge casualties (intentionally and unintentionally) on the native peoples? Far from it.
In any case, a system of checks and balances relies on a rough equality in power between the different participants. That was the case during some periods in human history, but by no means always. And when we consider different superintelligent AIs, we have to bear in mind that the capabilities of any one of these might suddenly catapult forward, putting it temporarily into a league of its own. For that brief moment in time, it would be rationally enlightened for that AI to destroy or dismantle its potential competitors. In other words, the system would be profoundly unstable.
5. Might superintelligent AIs decide to leave humans alone?
(This part of the second discussion echoed what I documented as item 9 for the discussion on the previous afternoon.)
Once superintelligent AIs are created, they are likely to self-improve quickly, and they may soon decide that a better place for them to exist is somewhere far from the earth. That is, as in the conclusion of the film Her, the AIs might depart into outer space, or into some kind of inner space.
However, before they depart, they may still inflict damage on humans,
- Perhaps to prevent us from interfering with whatever system supports their inner space existence
- Perhaps because they decide to use large parts of the earth to propel themselves to wherever they want to go.
Moreover, given that they might evolve in ways that we cannot predict, it’s possible that at least some of the resulting new AIs will choose to stay on earth for a while longer, posing the same set of threats to humans as is covered in all the other parts of this discussion.
6. Avoid creating superintelligent AI?
(This part of the second discussion echoed what I documented as item 4 for the discussion on the previous afternoon.)
More careful analysis may determine a number of features of superintelligent AI that pose particular risks to humanity – risks that are considerably larger than those posed by existing narrow AI systems.
For example, it may be that it is general reasoning capability that pushes AI over the line from “sometimes dangerous” to “sometimes catastrophically dangerous”.
In that case, the proposal is:
- Avoid these features in the design of new generations of AI
- Avoid including any features into new generations of AI from which these particularly dangerous features might evolve or emerge
AIs that have these restrictions may nevertheless still be especially useful for humanity, delivering sustainable superabundance, including solutions to diseases, aging, economic deprivation, and exponential climate change.
However, even though some development organisations may observe and enforce these restrictions, it is likely that other organisations will break the rules – if not straightaway, then within a few years (or decades at the most). The attractions of more capable AIs will be too tempting to resist.
7. Changing attitudes around the world?
To take stock of the discussion so far (in both of the two roundtable session on the subject):
- A number of potential solutions have been identified, that could reduce the risks of catastrophic harm
- This includes just building narrow AI, or building AI that is not only superintelligent but also superwise
- However, enforcing these design decisions on all AI developers around the world seems an impossible task
- Given the vast power of the AI that will be created, it just takes one rogue actor to imperil the entire human civilisation.
The next few sections consider various ways to make progress with point 3 in that list.
The first idea is to spread clearer information around the world about the scale of the risks associated with more powerful AI. An education programme is needed such as the world has never seen before.
Good films and other media will help with this educational programme – although bad films and other media will set it back.
Examples of good media include the Slaughterbots videos made by FLI, and the film Ex Machina (which packs a bigger punch on a second viewing than on the first viewing).
As another comparison, consider also the 1983 film The Day After which transformed public opinion about the dangers of a nuclear war.
However, many people are notoriously resistant to having their minds changed. The public reaction to the film Don’t Look Up is an example: many people continue to pay little attention to the risks of accelerating climate change, despite the powerful message of that film.
Especially when someone’s livelihood, or their sense of identity or tribal affiliation, is tied up with a particular ideological commitment, they are frequently highly resistant to changing their minds.
8. Changing mental dispositions around the world?
This idea might be the craziest on the entire list, but, to speak frankly, it seems we need to look for and embrace ideas which we would previously have dismissed as crazy.
The idea is to seek to change, not only people’s understanding of the facts of AI risk, but also their mental dispositions.
Rather than accepting the mix of anger, partisanship, pride, self-righteousness, egotism, vengefulness, deceitfulness, and so on, that we have inherited from our long evolutionary background, how about using special methods to transform our mental dispositions?
Methods are already known which can lead people into psychological transformation, embracing compassion, humility, kindness, appreciation, and so on. These methods include various drugs, supplements, meditative practices, and support from electronic and computer technologies.
Some of these methods have been discussed for millennia, whereas others have only recently become possible. The scientific understanding of these methods is still at an early stage, but it arguably deserves much more focus. Progress in recent years has been disappointingly slow at times (witness the unfounded hopes in this forward looking article of mine from 2013), but that pattern is common for breakthroughs in technology and/or therapies which can move from disappointingly slow to shockingly fast.
The idea is that these transformational methods will improve the mental qualities of people all around the world, allowing us all to transcend our previous perverse habit of believing only the things that are appealing to our psychological weaknesses. We’ll end up with better voters and (hence) better politicians – as well as better researchers, better business leaders, better filmmakers, and better developers and deployers of AI solutions.
It’s a tough ask, but it may well be the right ask at this crucial moment in cosmic history.
9. Belt and braces: monitoring and sanctions?
Relying on people around the world changing their mental outlooks for the better – and not backtracking or relapsing into former destructive tendencies – probably sounds like an outrageously naïve proposal.
Such an assessment would be correct – unless the proposal is paired with a system of monitoring and compliance.
Knowing that they are being monitored can be a useful aid to encouraging people to behave better.
That encouragement will be strengthened by the knowledge that non-compliance will result in an escalating series of economic sanctions, enforced by a growing alliance of nations.
For further discussion of the feasibility of systems of monitoring and compliance, see scenario 4, “The narrow corridor: Striking and keeping the right balance”, in my article “Four scenarios for the transition to AGI”.
10. A better understanding of what needs to be changed?
One complication in this whole field is that the risks of AI cannot be managed in isolation from other dangerous trends. We’re not just living in a time of growing crisis; we’re living in what has been called a “polycrisis”:
Cascading and connected crises… a cluster of related global risks with compounding effects, such that the overall impact exceeds the sum of each part.
For one analysis of the overlapping set of what I have called “landmines”, see this video.
From one point of view, this insight complicates the whole situation with AI catastrophic risk.
But it is also possible that the insight could lead to a clearer understanding of a “critical choke point” where, if suitable pressure is applied, the whole network of cascading risks is made safer.
This requires a different kind of thinking: systems thinking.
And it will also require us to develop better analysis tools to map and understand the overall system.
These tools would be a form of AI. Created with care (so that their output can be verified and then trusted), such tools would make a vital difference to our ability to identify the right choke point(s) and to apply suitable pressure.
These choke points may turn out to be ideas already covered above: a sustained new educational programme, coupled with an initiative to assist all of us to become more compassionate. Or perhaps something else will turn out to be more critical.
We won’t know, until we have done the analysis more carefully.