Artificial Intelligence TECH & OTHER NEWS Machine Learning

Google researchers boost speech recognition accuracy with more datasets

April 15, 2021

Join Transform 2021 this July 12-16. Register fo r the AI event of the year.

What if the key to improving speech recognition accuracy is simply mixing all available speech datasets together to train one large AI model? That’s the hypothesis behind a recent study published by a team of researchers affiliated with Google Research and Google Brain. They claim an AI model named SpeechStew that was trained on a range of speech corpora achieves state-of-the-art or near-state-of-the-art results on a variety of speech recognition benchmarks.

Training models on more data tends to be difficult, as collecting and annotating new data is expensive — particularly in the speech domain. Moreover, training large models is expensive and impractical for many members of the AI community.

Dataset solution

In pursuit of a solution, the Google researchers combined all available labeled and unlabelled speech recognition data curated by the community over the years. They drew on AMI, a dataset containing about 100 hours of meeting recordings, as well as corpora that include Switchboard (approximately 2,000 hours of telephone calls), Broadcast News (50 hours of television news), Librispeech (960 hours of audiobooks), and Mozilla’s crowdsourced Common Voice. Their combined dataset had over 5,000 hours of speech — none of which was adjusted from its original form.

With the assembled dataset, the researchers used Google Cloud TPUs to train SpeechStew, yielding a model with more than 100 million parameters. In machine learning, parameters are the properties of the data that the model learned during the training process. The researchers also trained a 1-billion-parameter model, but it suffered from degraded performance.

Once the team had a general-purpose SpeechStew model, they tested it on a number of benchmarks and found that it not only outperformed previously developed models but demonstrated an ability to adapt to challenging new tasks. Leveraging Chime-6, a 40-hour dataset of distant conversations in homes recorded by microphones, the researchers fine-tuned SpeechStew to achieve accuracy in line with a much more sophisticated model.

Transfer learning entails transferring knowledge from one domain to a different domain with less data, and it has shown promise in many subfields of AI. By taking a model like SpeechStew that’s designed to understand generic speech and refining it at the margins, it’s possible for AI to, for example, understand speech in different accents and environments.

Future applications

When VentureBeat asked via email how speech models like SpeechStew might be used in production — like in consumer devices or cloud APIs — the researchers declined to speculate. But they envision the models serving as general-purpose representations that are transferrable to any number of downstream speech recognition tasks.

“This simple technique of fine-tuning a general-purpose model to new downstream speech recognition tasks is simple, practical, yet shockingly effective,” the researchers said. “It is important to realize that the distribution of other sources of data does not perfectly match the dataset of interest. But as long as there is some common representation needed to solve both tasks, we can hope to achieve improved results by combining both datasets.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member

By VentureBeat Source Link

Google researchers boost speech recognition accuracy with more datasets

Dataset solution

Future applications

VentureBeat

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Online Safety Tips and free Cyber Safety and Crimes books

The National Cyber Crime Reporting Portal

Protect your online accounts from hackers and enable 2SV

Gartner Identifies Top Commercial Threats Facing Sales Leaders in 2025

Email Scams: Understanding, Identifying, and Protecting Yourself

Surge in long-lasting attacks: 35% exceeded one-month duration in 2024

TECH NEWS

High-performance computing, with much less code

Generative and agentic AI set to transform customer service into a strategic value driver for businesses

Generative AI and Machine Learning Set for Continued Investment

Gartner Identifies Top Supply Chain Technology Trends for 2025

Tech CEOs Must Take Several Mitigating Actions to Address Pitfalls

Telcos become part of expanding cloud ecosystem for enterprise digital transformations, says GlobalData

TOP NEWS

The National Cyber Crime Reporting Portal

Over 140,000 Tonnes of CO₂ Emissions Prevented by Uplink Community in 2023-2024

The Art and Science of Cryptography: Securing the Digital World

Automotive dealers need to adapt to technological advancements to remain competitive, says GlobalData

Cryptocurrency Scams: Understanding the Risks and How to Stay Safe

The Evolution of Remote Work: Transforming Business in the 21st Century

TECH NEWS & UPDATES

This mini PC is a powerful alternative to the Mac Mini – and it’s...

AI Becoming Essential to HCM, ISG Says

These phishing attacks are now terrorizing Mac browsers – here’s how to protect yourself

It’s time to update Chrome ASAP – again! – to fix this critical flaw

I switched to M4 Mac Studio from a Windows PC – and wish I...

Google researchers boost speech recognition accuracy with more datasets

Dataset solution

Future applications

VentureBeat

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES