New language-learning algorithms risk reinforcing inequality and social fragmentation, UM study finds

Laptop motherboard Image credit: Alexandre Debiève / Unsplash

The use of big language models could transform many facets of modern life, including how policy makers gauge public opinion about pending legislation, how patients rate their medical care, and how scientists could translate the search results into other languages.

Yet new research from the University of Michigan reveals that while these machine-learning algorithms have great potential to benefit society, they could likely reinforce inequality, tax the environment, and empower giants even more. of technology.

Large Language Models, or LLMs, can recognize, abstract, translate, predict, and generate human languages ​​based on very large textual datasets, and are likely to provide the most convincing computer-generated imitation of human language. nowadays.

A report from Technology Assessment Project to Science, technology and public policy (STPP) from the Gerald R. Ford School of Public Policy raises concerns about the many ways in which LLMs can lead to profoundly negative outcomes.

The report, “What’s in the Chatterbox?” Great Language Patterns, Why They Matter, and What We Should Do About Them,” anticipates the transformative social change they could produce:

  • Due to the concentrated development landscape and the nature of LLM datasets, new technologies will not adequately represent marginalized communities. They are likely to systematically downplay and distort these voices while amplifying the perspectives of those who are already powerful.
  • LLM processing takes place in physical data centers, which require huge amounts of natural resources. Building data centers already disproportionately harms marginalized populations.
  • LLMs will accelerate tech companies’ thirst for data, quickly integrate into existing information infrastructure, reorganize labor and expertise, reinforce inequalities, and increase social fragmentation.

“Our analysis shows that LLMs could empower communities and democratize knowledge, but at this time they are unlikely to achieve this potential. The damage can be mitigated, but not without new rules and regulations on how these technologies are created and used,” the STPP director said. Shobita Parthasarathyprofessor of public policy.

The report uses the analog case study method to analyze the development and adoption of LLM, examining the history of similar past technologies – in terms of form, function and impacts – to anticipate the implications of the technologies. emerging. STPP was the first to use this method in previous reports on facial recognition technologies in K-12 schools and vaccine hesitancy.

“Technologies can be widely implemented and negative consequences can take years to correct. LLMs present many of the same equity, environmental and access issues that we have seen in previous cases,” said Johanna OkerlundSTPP postdoctoral fellow and co-author of the report.

LLMs are much larger than their AI predecessors, both in terms of the massive amounts of data developers use to train them and the millions of complex word patterns and associations the patterns contain. They are more advanced than previous natural language processing efforts because they can perform many types of tasks without being specifically trained for each, making any LLM broadly applicable.

According to the report, many factors create the circumstances of inherent inequity.

“LLMs require enormous resources in terms of finance, infrastructure, personnel, and IT resources, including 360,000 gallons of water per day and immense consumption of electricity, infrastructure, and rare earths,” indicates the report.

Only a handful of tech companies can afford to build them, and their construction is likely to disproportionately burden already marginalized communities. The authors also say they are concerned “because the design of LLMs is likely to distort or devalue the needs of marginalized communities…LLMs could actually alienate them further from social institutions.”

The researchers also note that the vast majority of models are based on texts in English and, to a lesser extent, in Chinese.

“This means that LLMs are unlikely to meet their translation goals (even to and from English and Chinese) and will be less useful for those who are not fluent in English or Chinese,” the report said. .

An example of the usefulness of the analog case study method is to examine how racial bias is already embedded in many medical devices, including the spirometer, which is used to measure lung function: “The technology considers race in its assessment of “normal” lung function, falsely assuming that black people naturally have lower lung function than their white counterparts, making it more difficult for them to access treatment.

“We expect similar scenarios in other fields, including criminal justice, housing and education, where prejudice and discrimination written into historical texts are likely to generate advice that perpetuates inequalities in the world. resource allocation,” the report said.
“LLMs’ thirst for data will jeopardize privacy and the usual methods of establishing informed consent will no longer work.

“Because they collect massive amounts of data, LLMs will likely be able to triangulate disconnected information about individuals, including mental health status or political opinions, to develop a comprehensive and personalized picture of people. real people, their families or their communities. In a world with LLMs, the usual method of ethical data collection – individual informed consent – ​​no longer makes sense” and can shift to unethical data collection methods to diversify data sets.

LLMs will affect many sectors, but the report dives deep into one to provide an example: how they will influence scientific research and practice. The authors suggest that academic publishers, who own most research publications, will build their own LLMs and use them to increase their monopoly power.

During this time, researchers will need to develop standard protocols for how to review the information generated by LLMs and cite the results so that others can replicate the results. Scientific research will likely shift to finding patterns in big data rather than establishing causal relationships. And scientific evaluation systems relying on LLMs are unlikely to be able to identify truly novel work, a difficult enough task for human beings.
Given these likely results, the authors suspect that scientists will come to distrust LLMs.

The report concludes with policy recommendations, including:

  • U.S. government regulations on LLMs, including a clear definition of what constitutes an LLM, content and algorithm-based assessment and approval protocols, and security, monitoring, and complaint mechanisms.
  • Regulation of applications that use LLMs.
  • National or international standards that examine the diversity, performance, transparency, accuracy, security and bias of datasets, as well as copyright protection of inventions and artistic works generated by LLM.
  • Methods to ensure security and privacy when deploying LLM, especially among vulnerable populations.
  • Full-time government advisers in the social and equity dimensions of technology, including a Chief Human Rights in Tech Officer.
  • Environmental assessments of new data centers that assess impacts on local utility prices, marginalized local communities, human rights in mining, and climate change.
  • Assess the health, safety and psychological risks that LLMs and other forms of artificial intelligence create for workers, for example by redirecting them to more complex and often dangerous tasks, and developing a response to the consolidation of jobs that LLMs, and automation more generally, are likely to create.
  • A call for the National Science Foundation to dramatically increase its funding for LLM development, with a focus on the equity, social and environmental impacts of LLMs.

The report also presents specific recommendations for the scientific community and a code of conduct for the developer.

“LLM and app developers should recognize their public responsibilities and try to maximize the benefits of these technologies while minimizing the risks,” the authors wrote.

Sharon D. Cole