Advanced AI chatbots are less likely to admit they don't have all the answers

Researchers have discovered a glaring drawback to smarter chatbots. As AI models advance, they become more predictably accurate, but they are also more likely to answer questions beyond their ability (incorrectly) rather than answering “I don’t know.” And the humans who prompt them are likely to take the illusions of conviction at face value, creating a trickle-down effect of convincing misinformation.

“These days they answer almost everything,” José Hernández Olaro, a professor at Spain’s Polytechnic University of Valencia, told Nature. “That means it’s more accurate and at the same time less accurate,” said project leader Hernández Olaro, who worked on the study with colleagues at the Valencia Artificial Intelligence Institute in Spain.

The team studied three LLM families, including OpenAI’s GPT series, Meta’s LLaMA, and open source BLOOM. They tested early versions of each model and moved on to larger, more advanced models, but not the latest versions today. For example, the team started with OpenAI’s relatively primitive his GPT-3 ada model and tested iterations leading up to his GPT-4, which arrived in March 2023. GPT-4o, which is 4 months old, was not included in the study. New o1-preview. I’m interested to see if that trend continues with the latest model.

The researchers tested each model on thousands of questions related to “math, anagrams, geography, and science.” They also asked the AI model about its ability to transform information, such as alphabetizing a list. The team ranked the prompts based on perceived difficulty.

The data shows that as the model grew, the chatbot’s rate of incorrect answers (rather than avoiding questions altogether) increased. In other words, AI is like a professor who, as it masters more subjects, increasingly believes it has the golden answer for every subject.

To further complicate matters, humans are instructing chatbots and reading their responses. The researchers asked volunteers to rate the accuracy of the AI bot’s answers and found that they “misclassified inaccurate responses as accurate with surprising frequency.” The range of incorrect answers that volunteers incorrectly identified as correct was typically between 10 and 40 percent.

“Humans cannot supervise these models,” Hernández Olaro concluded.

The researchers recommend that AI developers program chatbots to perform better on simple questions and refuse to answer complex questions. “We need people to understand, ‘You can use it in this area, but you shouldn’t use it in that area,'” Hernández Olaro told Nature.

This is a well-intended suggestion that would make sense in an ideal world. But AI companies have huge potential. Chatbots that often say “I don’t know” are likely to be perceived as less advanced or valuable, leading to less usage and less revenue for the companies that make and sell them. So instead, you’ll see detailed warnings that “ChatGPT may make mistakes” and “Gemini may display inaccurate information.”

That means it’s up to us to avoid believing or spreading false information, such as hallucinations, that could harm ourselves or others. To ensure accuracy, fact-check your chatbot’s answers to see if they scream out loud.

You can read the team’s full study in Nature.

Source link

What's Hot

Review: Under the tree when no one is watching #1

Review: Darkstalkers x Street Fighter: Hunter Killers #1

Review: Ghosted #2 (2013)

Tesla’s “Robotaxi” brand may be too common for trademarks

Know what time this cool asteroid clock is

Get more than $ 400 from one of our favorite alien wear game monitors

Transformers #22 Review

Comic Book Review: Doctor Who #1 (2020)

Transformers #21 Review

Comic Review: X-Force #59 (1996)

Review: Under the tree when no one is watching #1

Review: Darkstalkers x Street Fighter: Hunter Killers #1

Review: Ghosted #2 (2013)

Review: Godzilla (Kaisei Era) #2 (2025)

Our Picks

Review: Under the tree when no one is watching #1

Review: Darkstalkers x Street Fighter: Hunter Killers #1

Review: Ghosted #2 (2013)

Most Popular

The best gaming laptops for 2024

Iranian hackers tried to leak Trump information to the Biden campaign

EU gives Apple six months to ease interoperability between devices

Subscribe to Updates

What's Hot

Advanced AI chatbots are less likely to admit they don’t have all the answers

Related Posts