How bias can seep into healthcare algorithms and data

This story originally appeared in our July/August 2022 issue under the title “Ghosts in the Machine”. Click here subscribe to read more stories like this.


If a heart attack is not documented, did it really happen? For an artificial intelligence program, the answer may very well be “no”. Each year, we estimate 170,000 people in the United States experience asymptomatic or “silent” heart attacks. During these events, patients probably have no idea that a blockage is preventing blood from flowing or that vital tissue is dying. They won’t feel any chest pain, dizziness or difficulty breathing. They don’t turn beet red or crumble. Instead, they may just feel a little tired or have no symptoms. But even if the patient is unaware of what has happened, the underlying damage can be severe and long-lasting: people who suffer silent heart attacks are at higher risk of coronary heart disease and stroke. brain and are more likely to die within 10 years.

But if a doctor doesn’t diagnose this attack, it won’t be included in the patient’s electronic health records. This omission can have dangerous consequences. AI systems are trained on health records, sifting through troves of data to study how doctors treated past patients and make predictions that can inform decisions about future care. “That’s what makes a lot of medical AI very difficult,” says Ziad Obermeyer, an associate professor at the University of California, Berkeley, who studies machine learning, medicine, and health policy. “We hardly ever observe the thing that really matters to us.”


Learn more about AI in medicine:


The problem lies in the data – or rather, what is not in the data. Electronic health records only show what doctors and nurses notice. If they can’t see a problem, even as bad as a heart attack, the AI ​​can’t see it either. Likewise, physicians may unwittingly encode their own racial, gender, or socioeconomic biases into the system. This can lead to algorithms that prioritize certain demographics over others, entrench inequalities, and fail to deliver on the promise that AI can help deliver better care.

One such problem is that medical records can only store information about patients who have access to the medical system and can afford to see a doctor. “Datasets that do not sufficiently represent certain groups — whether racial groups, gender for certain diseases, or rare diseases themselves — can produce algorithms that are biased against those groups,” says Curtis Langlotz, radiologist and director of the Center for Artificial Intelligence. in Medicine and Imaging from Stanford University.

Beyond that, diagnoses can reflect a doctor’s preconceptions and ideas – about, for example, what might be behind a patient’s chronic pain – as much as they reflect the reality of what is happening. pass. “The dirty secret of a lot of AI tools is that a lot of things that look like biological variables that we predict are actually just someone’s opinion,” Obermeyer says. This means that instead of helping physicians make better decisions, these tools often perpetuate the very inequities they should help avoid.

Decoding prejudices

When scientists train algorithms to operate a car, they know what is happening on the road. There is no debating whether there is a stop sign, school zone or pedestrian ahead. But in medicine, truth is often measured by what the doctor says, not what actually happens. A chest x-ray can be evidence of pneumonia because it’s what a doctor diagnosed and wrote in the health record, not because it’s necessarily the correct diagnosis. “These proxies are often skewed by financial, racial, and gender bias, and all sorts of other things that are social in nature,” Obermeyer says.

In a study 2019, Obermeyer and his colleagues examined an algorithm developed by health services company Optum. Hospitals use similar algorithms to predict which patients will need the most care, estimating the needs of more than 200 million people a year. But there is no simple variable to determine who will get sicker. Instead of predicting concrete health needs, Optum’s algorithm predicted which patients were likely to cost more, the logic being that the sickest people need more care and will therefore be more expensive to treat. For a variety of reasons, including income, access to care, and poor treatment from doctors, blacks spend less on health care on average than their white counterparts. Therefore, the study authors found that using cost as a proxy measure of health led the algorithm to systematically underestimate black health needs.

Instead of reflecting reality, the algorithm mimicked and further integrated racial biases into the healthcare system. “How can we make the algorithms do better than us? asks Obermeyer. “And not just reflect our biases and mistakes?

Moreover, determining the truth of a situation – whether a doctor erred due to poor judgment, racism or sexism, or that a doctor was simply lucky – is not always clear, says machine learning professor Rayid Ghani. department of Carnegie Mellon University. If a doctor does a test and finds that a patient has diabetes, did he do a good job? Yes, they diagnosed the disease. But maybe they should have tested the patient sooner or treated his rising blood sugar months ago, before diabetes developed.

If this same test was negative, the calculation becomes even more difficult. Should the doctor have ordered this test in the first place, or was it a waste of resources? “You can only measure a late diagnosis if an early diagnosis didn’t happen,” says Ghani. Decisions about which tests to perform (or which patient complaints to take seriously) often end up reflecting clinicians’ biases rather than the best possible medical treatment. But if medical records encode those biases as facts, then those biases will be replicated in AI systems that will learn from them, no matter how good the technology.

“If the AI ​​uses the same data to train, it will have some of these inherent biases,” adds Ghani, “not because that’s what AI is, but because that’s what are humans, unfortunately.”

Fight against inequalities

However, if used deliberately, this AI flaw could be a powerful tool, says Kadija Ferryman, an anthropologist at Johns Hopkins University who studies bias in medicine. She points a study 2020 in which AI is used as a resource to assess what the data shows: a kind of diagnostic to assess bias. If an algorithm is less accurate for women and people with public insurance, for example, this is an indication that care is not being provided equitably. “Instead of AI being the end, AI is almost kind of the starting point to help us really understand bias in clinical spaces,” she says.

In a 2021 Study in Nature Medicine, the researchers described an algorithm they developed to examine racial bias in the diagnosis of arthritis knee pain. Historically, black and low-income patients have been much less likely to be referred for surgery, even though they often report much higher levels of pain than white patients. Doctors would attribute this phenomenon to psychological factors such as stress or social isolation, rather than to physiological causes. So instead of relying on radiologists’ diagnoses to predict the severity of a patient’s knee pain, the researchers trained the AI ​​with a dataset that included knee X-rays and the patient’s descriptions of his own discomfort.

Not only did AI predict who felt pain more accurately than doctors, but it also showed that black patients’ pain was not psychosomatic. Instead, the AI ​​revealed that the problem lies in what radiologists think diseased knees should look like. Because our understanding of arthritis is rooted in research conducted almost exclusively on a white population, doctors may not recognize features of diseased knees that are more prevalent in black patients.

It’s much harder to design AI systems, like the Knee Pain Algorithm, that can correct or verify doctors’ biases, rather than just mimic them – and that will take a lot more monitoring and testing. than currently exists. But Obermeyer notes that, in some ways, correcting biases in AI can happen much faster than correcting the biases in our systems — and ourselves — that helped create these problems in the first place.

And building AIs that account for bias could be a promising step to solving larger systemic problems. To change how a machine works, after all, all you need is a few keystrokes; changing the way people think takes a lot more than that.


An early prototype of Watson, seen here in 2011, was originally the size of a master bedroom. (Credit: Clockready/Wikimedia Commons)

IBM’s failed revolution

In 2011, IBM’s Watson computer annihilated its human competitors in the Jeopardy! trivia game. Ken Jennings, the highest earning player of all time, lost over $50,000. “I, for one, welcome our new lords of IT,” he wrote on his reply card on the final lap.

But Watson’s reign was short-lived. One of the earliest – and most high-profile – attempts to use artificial intelligence in healthcare, Watson is now one of the biggest failures of medical AI. IBM has spent billions to create a vast repository of patient information, insurance claims and medical images. Watson Health could (allegedly) plunder this database to suggest new treatments, match patients to clinical trials and discover new drugs.

Despite Watson’s impressive database and all of IBM’s bluster, doctors complained that he rarely made helpful recommendations. The AI ​​did not take into account regional differences in patient populations, access to care, or treatment protocols. For example, because its cancer data came exclusively from one hospital, Watson for Oncology simply reflected the preferences and biases of doctors practicing there.

In January 2022, IBM finally dismantled Watson, selling its most valuable data and analytics to investment firm Francisco Partners. That fall hasn’t deterred other data giants like Google and Amazon from promoting their own AIs, promising systems that can do everything from transcribing notes to predicting kidney failure. For big tech companies experimenting with medical AI, the machine-powered doctor is still very much in.

Sharon D. Cole