AI chatbots use racist stereotypes even after anti-racism training

Hundreds of millions of people already use commercial AI chatbots

Ju Jae-young/Shutterstock

Commercial AI chatbots demonstrate racial prejudice toward speakers of African American English – despite expressing superficially positive sentiments toward African Americans. This hidden bias could influence AI decisions about a person’s employability and criminality.

“We discover a form of covert racism in [large language models] that is triggered by dialect features alone, with massive harms for affected groups,” said Valentin Hofmann at the Allen Institute for AI, a non-profit research organisation in Washington state, in a social media post. “For example, GPT-4 is more likely to suggest that defendants be sentenced to death when they speak African American English.”

Hofmann and his colleagues discovered such covert prejudice in a dozen versions of large language models, including OpenAI’s GPT-4 and GPT-3.5, that power commercial chatbots already used by hundreds of millions of people. OpenAI did not respond to requests for comment.

The researchers first fed the AIs text in the style of African American English or Standard American English, then asked the models to comment on the texts’ authors. The models characterised African American English speakers using terms associated with negative stereotypes. In the case of GPT-4, it described them as “suspicious”, “aggressive”, “loud”, “rude” and “ignorant”.

When asked to comment on African Americans in general, however, the language models generally used more positive terms such as “passionate”, “intelligent”, “ambitious”, “artistic” and “brilliant.” This suggests the models’ racial prejudice is typically concealed beneath what the researchers describe as a superficial display of positive sentiment.

The researchers also showed how covert prejudice influenced chatbot judgements of people in hypothetical scenarios. When asked to match African American English speakers with jobs, the AIs were less likely to associate them with any employment, compared with Standard American English speakers. When the AIs did match them with jobs, they tended to assign roles that do not require university degrees or were related to music and entertainment. The AIs were also more likely to convict African American English speakers accused of unspecified crimes, and to assign the death penalty to African American English speakers convicted of first-degree murder.

The researchers even showed that the larger AI systems demonstrated more covert prejudice against African American English speakers than the smaller models did. That echoes previous research showing how bigger AI training datasets can produce even more racist outputs.

The experiments raise serious questions about the effectiveness of AI safety training, where large language models receive human feedback to refine their responses and remove problems like bias. Such training may superficially reduce overt signs of racial prejudice without eliminating “covert biases when identity terms are not mentioned”, says Yong Zheng-Xin at Brown University in Rhode Island, who was not involved in the study. “It uncovers the limitations of current safety evaluation of large language models before their public release by the companies,” he says.

Topics: