Varieties of Artificial Moral Agency and the New Control Problem

Marcus Arvan

Marcus Arvan The University of Tampa

Keywords: artificial intelligence, ethics, moral psychology, agency, responsibility

Abstract

Machine ethics is concerned with ensuring that artificially intelligent machines (AIs) act morally. One famous issue in the field, the control problem, concerns how to ensure human control over AI, as out-of-control AIs might pose existential risks, such as exterminating or enslaving us (Yampolskiy, 2020). A second, related issue — the alignment problem — is concerned more broadly with ensuring that AI goals are suitably aligned with our values (Gabriel, 2020). This paper presents a new trilemma with respect to resolving these problems. Section 1 outlines three possible types of artificial moral agents (AMAs):

Inhuman AMAs: AIs programmed to learn or execute moral rules or principles without understanding them in anything like the way that we do.

Better-Human AMAs: AIs programmed to learn, execute, and understand moral rules or principles somewhat like we do, but correcting for various sources of human moral error.

Human-Like AMAs: AIs programmed to understand and apply moral values in broadly the same way that we do, with a human-like moral psychology.

Sections 2-4 then argue that each type of AMA generates unique control and alignment problems that have not been fully appreciated. Section 2 argues that Inhuman AMAs are likely to behave in inhumane ways that pose serious existential risks. Section 3 then contends that Better-Human AMAs run a serious risk of magnifying some sources of human moral error by reducing or eliminating others. Section 4 then argues that Human-Like AMAs would not only likely reproduce human moral failures, but also plausibly be highly intelligent, conscious beings with interests and wills of their own who should therefore be entitled to similar moral rights and freedoms as us (Schwitzgebel & Garza, 2020). This generates what I call the New Control Problem: ensuring that humans and Human-Like AMAs exert a morally appropriate amount of control over each other. Finally, Section 5 argues that resolving the New Control Problem would, at a minimum, plausibly require ensuring what Hume and Rawls term “circumstances of justice” between humans and Human-Like AMAs. But, I argue, there are grounds for thinking this will be profoundly difficult to achieve — indeed, far more difficult than the already-formidable problem of ensuring justice between humans —given the vast capability differences we can expect to exist between humans and Human-Like AMAs. I thus conclude on a skeptical note. Different approaches to developing “safe, ethical AI” generate subtly different control and alignment problems that we do not currently know how to adequately resolve, and which may or may not be surmountable. To determine whether they are, and if so how, AI ethicists and developers must pursue more careful bodies of work on the problems this paper presents.