dw2

26 February 2024

How careful do AI safety red teams need to be?

In my quest to catalyse a more productive conversation about the future of AI, I’m keen to raise “transcendent questions” – questions that can help all of us to rise above the familiar beaten track of the positions we reflexively support and the positions we reflexively oppose.

I described “transcendent questions” in a previous article of mine in Mindplex Magazine:

Transcendent Questions On The Future Of AI: New Starting Points For Breaking The Logjam Of AI Tribal Thinking

These questions are potential starting points for meaningful non-tribal open discussions. These questions have the ability to trigger a suspension of ideology.

Well, now that I’ve arrived at the BGI24 conference, I’d like to share another potential transcendent question. It’s on the subject of limits on what AI safety red teams can do.

The following chain of thought was stimulated by me reading in Roman Yampolskiy’s new book “AI: Unexplainable, Unpredictable, Uncontrollable” about AI that is “malevolent by design”.

My first thought on coming across that phrase was, surely everyone will agree that the creation of “malevolent by design” AI is a bad idea. But then I realised that – as is so often the case in contemplating the future of advanced AI – things may be more complicated. And that’s where red teams come into the picture.

Here’s a definition of a “red team” from Wikipedia:

A red team is a group that pretends to be an enemy, attempts a physical or digital intrusion against an organization at the direction of that organization, then reports back so that the organization can improve their defenses. Red teams work for the organization or are hired by the organization. Their work is legal, but can surprise some employees who may not know that red teaming is occurring, or who may be deceived by the red team.

The idea is well-known. In my days in the mobile computing industry at Psion and Symbian, ad hoc or informal red teams often operated, to try to find flaws in our products before these products were released into the hands of partners and customers.

Google have written about their own “AI Red Team: the ethical hackers making AI safer”:

Google Red Team consists of a team of hackers that simulate a variety of adversaries, ranging from nation states and well-known Advanced Persistent Threat (APT) groups to hacktivists, individual criminals or even malicious insiders. The term came from the military, and described activities where a designated team would play an adversarial role (the “Red Team”) against the “home” team.

As Google point out, a red team is more effective if it takes advantage of knowledge about potential security issues and attack vectors:

Over the past decade, we’ve evolved our approach to translate the concept of red teaming to the latest innovations in technology, including AI. The AI Red Team is closely aligned with traditional red teams, but also has the necessary AI subject matter expertise to carry out complex technical attacks on AI systems. To ensure that they are simulating realistic adversary activities, our team leverages the latest insights from world class Google Threat Intelligence teams like Mandiant and the Threat Analysis Group (TAG), content abuse red teaming in Trust & Safety, and research into the latest attacks from Google DeepMind.

Here’s my first question – a gentle warm-up question. Do people agree that companies and organisations that develop advanced AI systems should use something like red teams to test their own products before they are released?

But the next question is the one I wish to highlight. What limits (if any) should be put on what a red team can do?

The concern is that a piece of test malware may in some cases turn out to be more dangerous than the red team foresaw.

For example, rather than just probing the limits of an isolated AI system in a pre-release environment, could test malware inadvertently tunnel its way out of its supposed bounding perimeter, and cause havoc more widely?

Oops. We didn’t intend our test malware to be that clever.

If that sounds hypothetical, consider the analogous question about gain-of-function research with biological pathogens. In that research, pathogens are given extra capabilities, in order to assess whether potential counter-measures could be applied quickly enough if a similar pathogen were to arise naturally. However, what if these specially engineered test pathogens somehow leak from laboratory isolation into the wider world? Understandably, that possibility has received considerable attention. Indeed, as Wikipedia reports, the United States imposed a three-year long moratorium on gain-of-function research from 2014 to 2017:

From 2014 to 2017, the White House Office of Science and Technology Policy and the Department of Health and Human Services instituted a gain-of-function research moratorium and funding pause on any dual-use research into specific pandemic-potential pathogens (influenza, MERS, and SARS) while the regulatory environment and review process were reconsidered and overhauled. Under the moratorium, any laboratory who conducted such research would put their future funding (for any project, not just the indicated pathogens) in jeopardy. The NIH has said 18 studies were affected by the moratorium.

The moratorium was a response to laboratory biosecurity incidents that occurred in 2014, including not properly inactivating anthrax samples, the discovery of unlogged smallpox samples, and injecting a chicken with the wrong strain of influenza. These incidents were not related to gain-of-function research. One of the goals of the moratorium was to reduce the handling of dangerous pathogens by all laboratories until safety procedures were evaluated and improved.

Subsequently, symposia and expert panels were convened by the National Science Advisory Board for Biosecurity (NSABB) and National Research Council (NRC). In May 2016, the NSABB published “Recommendations for the Evaluation and Oversight of Proposed Gain-of-Function Research”. On 9 January 2017, the HHS published the “Recommended Policy Guidance for Departmental Development of Review Mechanisms for Potential Pandemic Pathogen Care and Oversight” (P3CO). This report sets out how “pandemic potential pathogens” should be regulated, funded, stored, and researched to minimize threats to public health and safety.

On 19 December 2017, the NIH lifted the moratorium because gain-of-function research was deemed “important in helping us identify, understand, and develop strategies and effective countermeasures against rapidly evolving pathogens that pose a threat to public health.”

As for potential accidental leaks of biological pathogens engineered with extra capabilities, so also for potential accidental leaks of AI malware engineered with extra capabilities. In both cases, unforeseen circumstances could lead to these extra capabilities running amok in the wider world.

Especially in the case of AI systems which are already only incompletely understood, and where new properties appear to emerge in new circumstances, who can be sure what outcomes may arise?

One counter is that the red teams will surely be careful in the policing of the perimeters they set up to confine their tests. But can we be sure they have thought through every possibility? Or maybe a simple careless press of the wrong button – a mistyped parameter or an incomplete prompt – would temporary open a hole in the perimeter. The test AI malware would be jail-broken, and would now be real-world AI malware – potentially evading all attempts to track it and shut it down.

Oops.

My final question (for now) is: if it is agreed that constraints should be applied on how red teams operate, how will these constraints be overseen?

Postscript – for some additional scenarios involving the future of AI safety, take a look at my article “Cautionary Tales And A Ray Of Hope”.

7 February 2022

Options for controlling artificial superintelligence

What are the best options for controlling artificial superintelligence?

Should we confine it in some kind of box (or simulation), to prevent it from roaming freely over the Internet?

Should we hard-wire into its programming a deep respect for humanity?

Should we avoid it from having any sense of agency or ambition?

Should we ensure that, before it takes any action, it always double-checks its plans with human overseers?

Should we create dedicated “narrow” intelligence monitoring systems, to keep a vigilant eye on it?

Should we build in a self-destruct mechanism, just in case it stops responding to human requests?

Should we insist that it shares its greater intelligence with its human overseers (in effect turning them into cyborgs), to avoid humanity being left behind?

More drastically, should we simply prevent any such systems from coming into existence, by forbidding any research that could lead to artificial superintelligence?

Alternatively, should we give up on any attempt at control, and trust that the superintelligence will be thoughtful enough to always “do the right thing”?

Or is there a better solution?

If you have clear views on this question, I’d like to hear from you.

I’m looking for speakers for a forthcoming London Futurists online webinar dedicated to this topic.

I envision three speakers each taking up to 15 minutes to set out their proposals. Once all the proposals are on the table, the real discussion will begin – with the speakers interacting with each other, and responding to questions raised by the live audience.

The date for this event remains to be determined. I will find a date that is suitable for the speakers who have the most interesting ideas to present.

As I said, please get in touch if you have questions or suggestions about this event.

Image credit: the above graphic includes work by Pixabay user Geralt.

PS For some background, here’s a video recording of the London Futurists event from last Saturday, in which Roman Yampolskiy gave several reasons why control of artificial superintelligence will be deeply difficult.

For other useful background material, see the videos on the Singularity page of the Vital Syllabus project.

Blog at WordPress.com.