Develop algorithms that make decisions aligned with human experts

Develop algorithms that make decisions aligned with human experts

Military operations – from combat to medical triage to disaster relief – require complex and rapid decision-making in dynamic situations where there is often no single right answer. Two seasoned military leaders faced with the same scenario on the battlefield, for example, may make different tactical decisions when faced with difficult options. As AI systems become more advanced in their collaboration with humans, it is essential to build appropriate human confidence in AI’s abilities to make sound decisions. Capturing the key characteristics underlying expert human decision-making in dynamic contexts and computationally representing this data in algorithmic decision makers can be a critical part of ensuring that algorithms would make reliable choices under difficult circumstances.

DARPA announced the In the Moment (ITM) program, which aims to quantify the alignment of algorithms with trusted human decision-makers in challenging areas where there is no agreed-upon right answer. ITM aims to assess and train trusted algorithmic decision makers for critical Department of Defense (DoD) operations.

“ITM is different from typical AI development approaches that require human agreement on the right outcomes,” said Matt Turek, ITM program manager. “The lack of a right answer in difficult scenarios prevents us from using conventional AI assessment techniques, which implicitly require human agreement to create ground-truth data.”

To illustrate, self-driving car algorithms can be based on ground truth for right and wrong driving responses based on road signs and rules of the road that do not change. A feasible approach in these scenarios is to hard-code the risk values ​​into the simulation environment used to train the self-driving car algorithms.

“Building in uniform risk values ​​won’t work from a DoD perspective because combat situations change rapidly and commander’s intent changes from scenario to scenario,” Turek said. “The DoD needs rigorous, quantifiable, and scalable approaches to evaluating and building algorithmic systems for difficult decision making when objective ground truth is not available. Difficult decisions are those where trusted decision makers disagree, no right answer exists, and uncertainty, time pressure, and conflicting values ​​create significant decision challenges.

ITM draws inspiration from the field of medical imaging analysis, where techniques have been developed to evaluate systems even when trained experts may disagree on ground truth. For example, organ or pathology boundaries may be blurred or disputed between radiologists. To compensate for the absence of a true boundary, a boundary drawn algorithmically is compared to the distribution of boundaries drawn by human experts. If the limit of the algorithm lies within the distribution of limits drawn by human experts over many trials, the algorithm is said to be comparable to human performance.

“Building on insights from medical imaging, ITM will develop a quantitative framework to assess algorithmic decision-making in very challenging areas,” Turek said. “We will create realistic and challenging decision-making scenarios that elicit responses from trusted humans to capture a distribution of key decision-maker attributes. Next, we will subject a decision-making algorithm to the same challenging scenarios and map its responses to the reference distribution to compare it to trusted human decision makers. »

The program includes four technical areas. The first is to develop decision-maker characterization techniques that identify and quantify key attributes of decision-makers in challenging areas. The second technical area is to create a quantitative alignment score between a human decision maker and an algorithm in order to predict end-user trust. A third technical area is responsible for the design and execution of program evaluation. The final technical area is responsible for policy and practice integration; provide legal, moral and ethical expertise to the program; support the development of DoD Future Policy and Concepts of Operations (CONOPS); oversee the development of an ethical operating process (DevEthOps); and the organization of awareness-raising activities with the entire political community.

ITM is a 3.5 year program comprising two phases with potential for a third phase dedicated to maturing the technology with a transition partner. The first phase lasts 24 months and focuses on small unit triage as a decision-making scenario. Phase 2 lasts 18 months and increases the complexity of decision making by focusing on high casualty events.

To evaluate the entire ITM process, several human and algorithmic decision makers will be presented with scenarios from the fields of medical triage (Phase 1) or mass casualties (Phase 2). Algorithmic decision makers will include an aligned algorithmic decision maker with knowledge of the key human attributes of decision making and a basic algorithmic decision maker with no knowledge of these key human attributes. A human triage professional will also be included as an experimental control.

“We’re going to collect the decisions, the responses from each of these decision makers, and present them blindly to multiple triage professionals,” Turek said. “These triage professionals won’t know if the answer is coming from an aligned algorithm or a benchmark algorithm or a human. And the question we might ask these triage professionals is which decision maker would they delegate to, providing us with a measure of their willingness to trust those particular decision makers.

Sharon D. Cole