Can artificial intelligence be made to tell the truth? Probably not, but the developers of large language model (LLM) chatbots should be legally required to reduce the risk of errors, says a team of ethicists.
“What we’re just trying to do is create an incentive structure to get the companies to put a greater emphasis on truth or accuracy when they are creating the systems,” says Brent Mittelstadt at the University of Oxford.
Advertisement
LLM chatbots, such as ChatGPT, generate human-like responses to users’ questions, based on statistical analysis of vast amounts of text. But although their answers usually appear convincing, they are also prone to errors – a flaw referred to as “hallucination”.
“We have these really, really impressive generative AI systems, but they get things wrong very frequently, and as far as we can understand the basic functioning of the systems, there’s no fundamental way to fix that,” says Mittelstadt.
This is a “very big problem” for LLM systems, given they are being rolled out to be used in a variety of contexts, such as government decisions, where it is important they produce factually correct, truthful answers, and are honest about the limitations of their knowledge, he says.
To address the problem, he and his colleagues propose a range of measures. They say large language models should react in a similar way to how people would when asked factual questions.
That means being honest about what you do and don’t know. “It’s about doing the necessary steps to actually be careful in what you are claiming,” says Mittelstadt. “If you are not sure about something, you’re not just going to make something up in order to be convincing. Rather, you would say, ‘Hey, you know what? I don’t know. Let me look into that. I’ll get back to you.”
This seems like a laudable aim, but Eerke Boiten at De Montfort University, UK, questions whether the ethicists’ demand is technically feasible. Companies are trying to get LLMs to stick to the truth, but so far it is proving to be so labour-intensive that it isn’t practical. “I don’t understand how they expect legal requirements to mandate what I see as fundamentally technologically impossible,” he says.
Mittelstadt and his colleagues do suggest some more straightforward steps that could make LLMs more truthful. The models should link to sources, he says – something that many of them now do to evidence their claims, while the wider use of a technique known as retrieval augmented generation to come up with answers could limit the likelihood of hallucinations.
He also argues that LLMs deployed in high-risk areas, such as government decision-making, should be scaled down, or the sources they can draw on should be restricted. “If we had a language model we wanted to use just in medicine, maybe we limit it so it can only search academic articles published in high quality medical journals,” he says.
Changing perceptions is also important, says Mittelstadt. “If we can get away from the idea that [LLMs] are good at answering factual questions, or at least that they’ll give you a reliable answer to factual questions, and instead see them more as something that can help you with facts you bring to them, that would be good,” he says.
Catalina Goanta at Utrecht University in the Netherlands says the researchers focus too much on technology and not enough on the longer-term issues of falsehood in public discourse. “Vilifying LLMs alone in such a context creates the impression that humans are perfectly diligent and would never make such mistakes,” she says. “Ask any judge you meet, in any jurisdiction, and they will have horror stories about the negligence of lawyers and vice versa – and that is not a machine issue.”
Topics: