Google claims Gemma 3 reaches 98% of DeepSeek’s accuracy – using only one GPU

March 13, 2025

Google

The economics of artificial intelligence have been a hot topic of late, with startup DeepSeek AI claiming eye-opening economies of scale in deploying GPU chips.

Two can play that game. On Wednesday, Google announced its latest open-source large language model, Gemma 3, came close to achieving the accuracy of DeepSeek’s R1 with a fraction of the estimated computing power.

Using “Elo” scores, a common measurement system used to rank chess and athletes, Google claims Gemma 3 comes within 98% of the score of DeepSeek’s R1, 1338 versus 1363 for R1.

Also: First Gemini, now Gemma: Google’s new, open AI models target developers

That means R1 is superior to Gemma 3. However, based on Google’s estimate, the search giant claims that it would take 32 of Nvidia’s mainstream “H100” GPU chips to achieve R1’s score, whereas Gemma 3 uses only one H100 GPU.

Google’s balance of compute and Elo score is a “sweet spot,” the company claims.

In a blog post, Google bills the new program as “the most capable model you can run on a single GPU or TPU,” referring to the company’s custom AI chip, the “tensor processing unit.”

“Gemma 3 delivers state-of-the-art performance for its size, outperforming Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on LMArena’s leaderboard,” the blog post relates, referring to the Elo scores.

“This helps you to create engaging user experiences that can fit on a single GPU or TPU host.”

google-2025-gemma-3-elo-comparison — Google

Google’s model also tops Meta’s Llama 3’s Elo score, which it estimates would require 16 GPUs. (Note that the numbers of H100 chips used by the competition are Google’s estimate; DeepSeek AI has only disclosed an example of using 1,814 of Nvidia’s less-powerful H800 GPUs to server answers with R1.)

More detailed information is provided in a developer blog post on HuggingFace, where the Gemma 3 repository is offered.

The Gemma 3 models, intended for on-device usage rather than data centers, have a vastly smaller number of parameters, or neural “weights,” than R1 and other open-source models. Generally speaking, the greater the number of parameters, the more computing power is required.

Also: What is DeepSeek AI? Is it safe? Here’s everything you need to know

The Gemma code offers parameter counts of 1 billion, 4 billion, 12 billion, and 27 billion, quite small by today’s standards. In contrast, R1 has a parameter count of 671 billion, of which it can selectively use 37 billion by ignoring or turning off parts of the network.

The main enhancement to make such efficiency possible is a widely used AI technique called distillation, whereby trained model weights from a larger model are extracted from that model and inserted into a smaller model, such as Gemma 3, to give it enhanced powers.

The distilled model is also run through three different quality control measures, including Reinforcement Learning from Human Feedback (RLHF) to shape the output of GPT and other large language models to be inoffensive and helpful; as well as Reinforcement Learning from Machine Feedback (RLMF) and Reinforcement Learning from Execution Feedback (RLEF), which Google says improve the model’s math and coding capabilities, respectively.

Also: The best AI for coding (and what not to use – including DeepSeek R1)

A Google developer blog post details those approaches, and a separate post describes techniques used to optimize the smallest version, the 1 billion model, for mobile devices. These include four common AI engineering techniques: quantization, updating the “key-value” cache layouts, improved loading time of certain variables, and “GPU weight sharing.”

The company compares not only Elo scores but also Gemma 3 to the prior Gemma 2 and to its closed-source Gemini models on benchmark tests such as the LiveCodeBench programming task. Gemma 3 generally falls below the accuracy of Gemini 1.5 and Gemini 2.0, but Google calls the results noteworthy, stating that Gemma 3 is “showing competitive performance compared to closed Gemini models.”

Gemini models are much larger in parameter count than Gemma.

The main advance of Gemma 3 over Gemma 2 is a longer “context window,” the number of input tokens that can be held in memory for the model to work on at any given time.

google-2025-gemma-3-and-gemma-2-comparison — Google

Gemma 2 was only 8,000 tokens whereas Gemma 3 is 128,000, which counts as a “long” context window, better suited for working on whole papers or books. (Gemini and other closed-source models are still much more capable, with a context window of 2 million tokens for Gemini 2.0 Pro.)

Gemma 3 is also multi-modal, which Gemma 2 was not. This means it can handle image inputs along with text to serve up replies to queries such as, “What is in this photo?”

And, last, Gemma 3 supports over 140 languages rather than just the English support in Gemma 2.

Also: What does a long context window mean for an AI model, like Gemini?

A variety of other interesting features are buried in the fine print.

For example, a well-known issue with all large language models is that they may memorize portions of their training data sets, which can lead to leaked information and privacy violations if the models are tapped using malicious techniques.

Google’s researchers tested for information leakage by sampling training data and seeing how much could be directly extracted from Gemma 3 versus its other models. “We find that Gemma 3 models memorize long-form text at a much lower rate than prior models,” they note, which theoretically means the model is less vulnerable to information leakage.

Those wishing for more technical detail can read the Gemma 3 technical paper.

Artificial Intelligence

Source Link

Google claims Gemma 3 reaches 98% of DeepSeek’s accuracy – using only one GPU

Artificial Intelligence

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

59% of Indian Organizations Expect Cyber Breach in 2025: Zscaler Study

Violent Online Networks Target Vulnerable and Underage Populations Across the United States and Around the Globe

Mail Scam Targeting Corporate Executives Claims Ties to Ransomware

Banking data theft attacks on smartphones triple in 2024, Kaspersky reports

Agentic AI Will Revolutionize Cybercrime in 2025, According to Malwarebytes State of Malware Report

Gartner Identifies the Top Cybersecurity Trends for 2025

TECH NEWS

The Cost Conundrum of Cloud Computing

AI vs. IT Outages: How Machine Learning is Creating More Resilient IT Infrastructure

Global Tech Spend To Surpass $4.9 Trillion In 2025

Syneriq Global Opens AI GCC, Aims For $150 Million Revenue

Global IoT market to surpass $1.8 trillion in 2028 driven by 5G and AI, forecasts GlobalData

Gartner Identifies Top Trends in Data and Analytics for 2025

TOP NEWS

Retail banks face a loyalty crunch as card experience leaves customers underwhelmed

Seven in ten businesses want simplified customer experience from telecom providers

CMOs Fall Short of Expectations as They Struggle to Prove Marketing’s Value to the C-Suite

Anti-money laundering pros find expanding uses for AI, but adoption remains slow

India’s E-Commerce to Hit USD 550 Bn by 2035

Economy, Cyber Threats and Talent Dominate List of Critical Near-Term Risks for Boards and Executives

TECH NEWS & UPDATES

X-37B Space Plane Returns After 434-Day Mission, Advancing Space Tech

MAPP Rover Stranded as Athena Lander Tips Over During Lunar Mission

ISRO Expands with Two Launchpads, Chandrayaan-4 to Launch in 2028 with Lunar Sample Return...

Scientists Unlock Quantum Control of Atomic Collisions at Warmer Temperatures

3D Mapping Reveals 52-Foot Megaripples from Chicxulub Asteroid in Louisiana

Google claims Gemma 3 reaches 98% of DeepSeek’s accuracy – using only one GPU

Artificial Intelligence

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES