foundations of computational agents
Mary Wollstonecraft Shelley’s Frankenstein; or, The Modern Prometheus [Shelley, 1818], is the first true science fiction novel. It can be read as a morality tale, as signaled by Shelley’s alternate title, The Modern Prometheus. According to ancient Greek mythology, Prometheus stole fire from the gods and gave it to humanity. Zeus punished that theft, of technology and knowledge, by sentencing Prometheus to eternal torment. Dr. Frankenstein’s creature attempted to assimilate into human society by learning human customs and language, but humankind rejected him and misinterpreted his genuine acts of kindness. That rejection and his loneliness, given his lack of a companion, led to his choice to exact revenge. Frankenstein’s monster has now come to symbolize unbridled, uncontrolled technology turning against humans.
Concerns about the control of technology are now increasingly urgent as AI transforms our world. Discussions of so-called artificial general intelligence (AGI) envisage systems that outperform humans on a wide range of tasks, unlike so-called “narrow” AI that develops and trains systems for specific tasks. Some believe that AGI may lead to a singularity when AGI bootstraps to a superintelligence, that could dominate humans [Good, 1965]. Or, as Bostrom [2014] hypothesized, an imagined AGI system, given a goal that includes maximizing the number of paperclips in the universe, could consume every resource available to it, including those required by humans. This seemingly absurd thought experiment purports to show that an apparently innocuous AGI, without common sense, could pose an existential threat to humanity if its goals are misspecified or otherwise not aligned with the long-term survival of humans and the natural environment. This safety concern has come to be known as the alignment problem [Christian, 2020].
A more immediate threat is that AI systems, such as self-driving cars and lethal autonomous weapons, may make life-or-death decisions without meaningful human oversight. Less dramatically, AI systems may make harmful, even if not life-threatening, value-laden decisions impinging on human welfare, such as deciding who should get a mortgage or a job offer. This has given rise to a focus on autonomy and human control. How can designers create human-centred AI or human-compatible AI? Can human values be instilled in AI systems? These questions are examined by Russell [2019], Marcus and Davis [2019], and Shneiderman [2022]. One proposed technique for incorporating human values is Reinforcement Learning from Human Feedback (RLHF) [Knox and Stone, 2009]. RLHF is the framework for a key module of ChatGPT [OpenAI, 2022].
Increasingly, especially in high-stakes applications, human decision-makers are assisted by semi-autonomous agents; this combination is known as human-in-the-loop. As shown in Chapter 2, intelligent systems are often structured as a hierarchy of controllers, with the lower levels operating very quickly, on short time horizons, while the higher levels have longer time horizons, operating slowly on more symbolic data. Human interaction with hierarchically structured systems typically occurs at the higher levels. Human drivers cannot meaningfully modify the anti-lock braking systems on a car in real time, but they can provide high-level navigation preferences or directions. Humans can steer or brake to avoid accidents but only if they are paying attention; however, as vehicles become more automated the driver may well be distracted, or asleep, and unable to redirect their attention in time.
The concept of attention in neural networks is inspired by the concept of human attention. Concepts directly related to human attention include vigilance, the state of keeping careful watch for possible danger, and salience, the quality of being particularly noticeable or important. Designing AI systems so that humans can meaningfully interact with them requires designers who understand the economic, social, psychological, and ethical roles of vigilance, salience, and attention. Early research on human attention and vigilance is reported by N. H. Mackworth [1948] and J. F. Mackworth [1970]. Mole [2010] presents a philosophical theory of attention. The papers collected in Archer [2022] show how issues concerning salience, attention, and ethics intersect.
Designers of interactive AI systems must be well versed in the principles and practices of both human–computer interaction (HCI) [Rogers et al., 2023] and AI. Good designs for AI can go a long way in creating trustworthy systems. For example, the “Guidelines for Human–AI Interaction” by Amershi et al. [2019] give strategies for doing less when the system is uncertain to reduce the costs and consequences of incorrect predictions.
Assistive technology for disabled and aging populations is being pioneered by many researchers and companies. Assisted cognition, including memory prompts, is one application. Assisted perception and assisted action, in the form of smart wheelchairs, companions for older people, and nurses’ assistants in long-term care facilities, are beneficial technologies. Assistive technology systems are described by Pollack [2005], Liu et al. [2006], and Yang and Mackworth [2007]. Semi-autonomous smart wheelchairs are discussed by Mihailidis et al. [2007] and Viswanathan et al. [2011]. However, Sharkey [2008] and Shneiderman [2022] warn of some dangers of relying upon robotic assistants as companions for the elderly and the very young. As with autonomous vehicles, researchers must ask cogent questions about the development and use of their creations. Researchers and developers of assistive technology, and other AI applications, should be aware of the dictum of the disability rights movement presented by Charlton [1998], “Nothing about us without us.”
A plethora of concepts are used to evaluate AI systems from a human perspective, including transparency, interpretability, explainability, fairness, safety, accountability, and trustworthiness. They are useful concepts but they have multiple, overlapping, shifting, and contested meanings. Transparency typically refers to the complete ecosystem surrounding an AI application, including the description of the training data, the testing and certification of the application, and user privacy concerns. But transparency is also used to describe an AI system whose outcomes can be interpreted or explained, where humans can understand the models used and the reasons behind a particular decision. Black-box AI systems, based, say, on deep learning, are not transparent in that sense. Systems that have some understanding of how the world works, using causal models, may be better able to provide explanations. See, for example, this presentation on explainable human–AI interaction from a planning perspective by Sreedharan et al. [2022]. Enhancements in explainability may make an application more trustworthy, as Russell [2019] suggests.
Enhanced transparency, interpretability, and fairness may also improve trustworthiness. Interpretability is useful for developers to evaluate, debug and mitigate issues. However, the evidence that it is always useful for end-users is less convincing. Understanding the reasons behind predictions and actions is the subject of explainable AI. It might seem obvious that it is better if a system can explain its conclusion. However, having a system that can explain an incorrect conclusion, particularly if the explanation is approximate, might do more harm than good. Bansal et al. [2021] show that “Explanations increased the chance that humans will accept the AI’s recommendation, regardless of its correctness.”
As discussed in Section 7.7, models built by computational systems are open to probing in ways that humans are not. Probing and testing cannot cover all rare events, or corner cases, for real-world domains. Verification of systems, proving that their behaviors must always satisfy a formal specification that includes explicit safety and goal constraints could make them more trusted [Mackworth and Zhang, 2003]. Semi-autonomous systems that interact and collaborate with humans on an ongoing basis can become more trusted, if they prove to be reliable; however, that trust may prove to be misplaced for corner cases. The role of explicit utilities in open and accountable group decision making is described in Section 12.6. In Section 13.10, concerns about real-world deployment of reinforcement learning are outlined. Trust has to be earned.