Hugging Face Showcases How Test-Time Compute Scaling Can Help SLMs Outperform Larger AI Models

December 24, 2024

Hugging Face shared a new case study last week showcasing how small language models (SLMs) can outperform larger models. In the post, the platform’s researchers claimed that instead of increasing the training time of artificial intelligence (AI) models, focusing on the test-time compute can show enhanced results for AI models. The latter is an inference strategy that allows AI models to spend more time on solving a problem and offers different approaches such as self-refinement and searching against a verifier that can improve their efficiency.

How Test-Time Compute Scaling Works

In a post, Hugging Face highlighted that the traditional approach to improving the capabilities of an AI model can often be resource-intensive and extremely expensive. Typically, a technique dubbed train-time compute is used where the pretraining data and algorithms are used to improve the way a foundation model breaks down a query and gets to the solution.

Alternatively, the researchers claimed that focusing on test-time compute scaling, a technique where AI models are allowed to spend more time solving a problem and letting them correct themselves can show similar results.

Highlighting the example of OpenAI’s o1 reasoning-focused model, which uses test-time compute, the researchers stated that this technique can let AI models display enhanced capabilities despite making no changes to the training data or pretraining methods. However, there was one problem. Since most reasoning models are closed, there is no way to know the strategies that are being used.

The researchers used a study by Google DeepMind and reverse engineering techniques to unravel how exactly LLM developers can scale test-time compute in the post-training phase. As per the case study, just increasing the processing time does not show significant improvement in outputs for complex queries.

Instead, the researchers recommend using a self-refinement algorithm that allows AI models to assess the responses in subsequent iterations and identify and correct potential errors. Additionally, using a verifier that models can search against can further improve the responses. Such verifiers can be a learned reward model or hard-coded heuristics.

More advanced techniques would involve a best-of-N approach where a model generates multiple responses per problem and assigns a score to judge which would be better suited. Such approaches can be paired with a reward model. Beam search, which prioritises step-by-step reasoning and assigning scores for each step, is another strategy highlighted by researchers.

By using the abovementioned strategies, the Hugging Face researchers were able to use the Llama 3B SLM and make it outperform Llama 70B, a much larger model, on the MATH-500 benchmark.

Source Link

Hugging Face Showcases How Test-Time Compute Scaling Can Help SLMs Outperform Larger AI Models

How Test-Time Compute Scaling Works

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Growing focus on identity compromise by bad actors requires organizations to enforce zero trust principles and employ strong identity...

New Paubox Report Reveals 60% of Healthcare Orgs Admit Email Security Failure

Artificial Intelligence Fuels Rise of Hard-to-Detect Bots That Now Make up More Than Half of Global Internet Traffic

Most tested, most awarded: Kaspersky takes 97% of TOP3 places in independent industry tests

New Research by FIS and Oxford Economics Finds That Cyberthreats, Fraud, Regulatory Complexities and Financial Inefficiencies Cost Businesses $100...

Cognyte 2025 Threat Landscape Report Reveals Global Trends in Cyberattacks, Ransomware and Stolen Credentials

TECH NEWS

Researchers teach LLMs to solve complex planning challenges

Why Businesses Develop with Offshore Software Development Company

To Thrive Amid Volatility, Leaders Must Optimize Technology Investments, Excel At Driving Change, And Proactively Manage Risk: Forrester

Domestic APMs are outpacing international APMs to become the primary way for emerging markets to interact in the digital...

Syneriq Global’s Hyderabad GCC – A New Era for AI Product Engineering: Sudhakar Pennam

Tap and Go: How Gen Z is Revolutionising Payment Technology

TOP NEWS

US tariffs drive enterprise IT shift toward cloud and AI-driven digital transformation, finds GlobalData

AI Is Redefining Tech Infrastructure Priorities: Seagate Report Calls for Balance Between Cost and Carbon

New data highlights generational gap in readiness to manage inherited wealth

Global IT and Business Services Market Remained Resilient in Q1, Despite Heightened Economic Uncertainty: ISG Index

Survey of 266 Senior Enterprise Risk Executives Reveals Shifts in Top Five Emerging Risks

How to Fact-Check Online: A Comprehensive Guide

TECH NEWS & UPDATES

Google continues positive momentum in team collaboration, says GlobalData

GPT-4.1 impresses influencers with coding prowess, surpassing GPT-4o, reveals GlobalData

With Copilot Studio’s new skill, your AI agent can use websites and apps just...

Solo Indian Developer Announces Cricket-Rhythm PC Title Bat to the Beat, Will Launch on...

Google’s Identity Check Security Feature Will Arrive on More Phones With Android 16: Report

Hugging Face Showcases How Test-Time Compute Scaling Can Help SLMs Outperform Larger AI Models

How Test-Time Compute Scaling Works

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES