Language models struggle to answer questions without paraphrasing training data

March 18, 2021

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.

The task of long-form question answering (LFQA) involves retrieving documents relevant to a given question and using them to generate a paragraph-length answer to that question. While many machine learning models have recently been proposed for LFQA, the work remains challenging, as a recent paper coauthored University of Massachusetts Amherst and Google researchers demonstrates.

The researchers developed a LFQA system that achieves state-of-the-art performance on a popular dataset. But they found that even the best LFQA models, including theirs, don’t always answer in a way that’s grounded in — or demonstrates an understanding of — the documents they retrieve.

Large language models like OpenAI’s GPT-3 and Google’s GShard learn to write humanlike text by internalizing billions of examples from the public web. Drawing on sources like ebooks, Wikipedia, and social media platforms like Reddit, they make inferences to complete sentences and even whole paragraphs. But studies demonstrate the pitfall of this training approach. Open-domain question-answering models — models theoretically capable of responding to novel questions with novel answers — often simply memorize answers found in the data on which they’re trained, depending on the data set. Because of this, language models can also be prompted to show sensitive, private information when fed certain words and phrases.

In this most recent study, the coauthors evaluated their LFQA model on ELI5, a Python library that allows developers to visualize and debug machine learning models using a unified API. There was significant overlap between the data used to train and test the model, as high as 81% were given in paraphrased form. And the researchers say that this reveals issues with the model in addition to ELI5.

“[Our] in-depth analysis reveals [shortcomings] not only with our model, but also with the ELI5 dataset and evaluation metrics. We hope that the community works towards solving these issues so that we can climb the right hills and make meaningful progress,” they wrote in the paper.

Memorization isn’t the only challenge large language models struggle with. Recent research shows that even state-of-the-art models struggle to answer the bulk of math problems correctly. For example, a paper published by researchers at the University of California, Berkeley finds that large language models including OpenAI’s GPT-3 can only complete 2.9% to 6.9% of problems from a dataset of over 12,500. OpenAI itself notes that its flagship language model, GPT-3, places words like ” naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism.” A paper by Stanford University Ph.D. candidate and Gradio founder Abubakar Abid detailed the anti-Muslim tendencies of text generated by GPT-3. And the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claims that GPT-3 could reliably generate ” informational” and ” influential” text that might “radicalize individuals into violent far-right extremist ideologies and behaviors.”

Among others, leading AI researcher Timnit Gebru has questioned the wisdom of building large language models by examining who benefits from them and who’s disadvantaged. A paper coauthored by Gebru earlier this year spotlights the impact of large language models’ carbon footprint on marginalized communities and their tendency to perpetuate abusive language, hate speech, microaggressions, stereotypes, and other dehumanizing language aimed at specific groups of people.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member

By VentureBeat Source Link

Language models struggle to answer questions without paraphrasing training data

VentureBeat

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Online Safety Tips and free Cyber Safety and Crimes books

The National Cyber Crime Reporting Portal

Protect your online accounts from hackers and enable 2SV

Gartner Identifies Top Commercial Threats Facing Sales Leaders in 2025

Email Scams: Understanding, Identifying, and Protecting Yourself

Surge in long-lasting attacks: 35% exceeded one-month duration in 2024

TECH NEWS

High-performance computing, with much less code

Generative and agentic AI set to transform customer service into a strategic value driver for businesses

Generative AI and Machine Learning Set for Continued Investment

Gartner Identifies Top Supply Chain Technology Trends for 2025

Tech CEOs Must Take Several Mitigating Actions to Address Pitfalls

Telcos become part of expanding cloud ecosystem for enterprise digital transformations, says GlobalData

TOP NEWS

The National Cyber Crime Reporting Portal

Over 140,000 Tonnes of CO₂ Emissions Prevented by Uplink Community in 2023-2024

The Art and Science of Cryptography: Securing the Digital World

Automotive dealers need to adapt to technological advancements to remain competitive, says GlobalData

Cryptocurrency Scams: Understanding the Risks and How to Stay Safe

The Evolution of Remote Work: Transforming Business in the 21st Century

TECH NEWS & UPDATES

Quantum-Proof Security- Preparing for the Post-Quantum Era in IT Solutions

RAH Infotech Appoints Dr. Rajendra Kumar as Chief Technology Officer to Drive Technological Innovation

The best remote access software of 2025: Expert tested

Asia Pacific Utilities Modernize Grids to Meet Rising Demand

Redfin Report: America’s Renter Population Grew 1% in the Fourth Quarter

Language models struggle to answer questions without paraphrasing training data

VentureBeat

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES