
The economics of artificial intelligence have been a hot topic of late, with startup DeepSeek AI claiming eye-opening economies of scale in deploying GPU chips.
Two can play that game. On Wednesday, Google announced its latest open-source large language model, Gemma 3, came close to achieving the accuracy of DeepSeek’s R1 with a fraction of the estimated computing power.
Using “Elo” scores, a common measurement system used to rank chess and athletes, Google claims Gemma 3 comes within 98% of the score of DeepSeek’s R1, 1338 versus 1363 for R1.
Also: First Gemini, now Gemma: Google’s new, open AI models target developers
That means R1 is superior to Gemma 3. However, based on Google’s estimate, the search giant claims that it would take 32 of Nvidia’s mainstream “H100” GPU chips to achieve R1’s score, whereas Gemma 3 uses only one H100 GPU.
Google’s balance of compute and Elo score is a “sweet spot,” the company claims.
In a blog post, Google bills the new program as “the most capable model you can run on a single GPU or TPU,” referring to the company’s custom AI chip, the “tensor processing unit.”
“Gemma 3 delivers state-of-the-art performance for its size, outperforming Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on LMArena’s leaderboard,” the blog post relates, referring to the Elo scores.
“This helps you to create engaging user experiences that can fit on a single GPU or TPU host.”
Google’s model also tops Meta’s Llama 3’s Elo score, which it estimates would require 16 GPUs. (Note that the numbers of H100 chips used by the competition are Google’s estimate; DeepSeek AI has only disclosed an example of using 1,814 of Nvidia’s less-powerful H800 GPUs to server answers with R1.)
More detailed information is provided in a developer blog post on HuggingFace, where the Gemma 3 repository is offered.
The Gemma 3 models, intended for on-device usage rather than data centers, have a vastly smaller number of parameters, or neural “weights,” than R1 and other open-source models. Generally speaking, the greater the number of parameters, the more computing power is required.
Also: What is DeepSeek AI? Is it safe? Here’s everything you need to know
The Gemma code offers parameter counts of 1 billion, 4 billion, 12 billion, and 27 billion, quite small by today’s standards. In contrast, R1 has a parameter count of 671 billion, of which it can selectively use 37 billion by ignoring or turning off parts of the network.
The main enhancement to make such efficiency possible is a widely used AI technique called distillation, whereby trained model weights from a larger model are extracted from that model and inserted into a smaller model, such as Gemma 3, to give it enhanced powers.
The distilled model is also run through three different quality control measures, including Reinforcement Learning from Human Feedback (RLHF) to shape the output of GPT and other large language models to be inoffensive and helpful; as well as Reinforcement Learning from Machine Feedback (RLMF) and Reinforcement Learning from Execution Feedback (RLEF), which Google says improve the model’s math and coding capabilities, respectively.
Also: The best AI for coding (and what not to use – including DeepSeek R1)
A Google developer blog post details those approaches, and a separate post describes techniques used to optimize the smallest version, the 1 billion model, for mobile devices. These include four common AI engineering techniques: quantization, updating the “key-value” cache layouts, improved loading time of certain variables, and “GPU weight sharing.”
The company compares not only Elo scores but also Gemma 3 to the prior Gemma 2 and to its closed-source Gemini models on benchmark tests such as the LiveCodeBench programming task. Gemma 3 generally falls below the accuracy of Gemini 1.5 and Gemini 2.0, but Google calls the results noteworthy, stating that Gemma 3 is “showing competitive performance compared to closed Gemini models.”
Gemini models are much larger in parameter count than Gemma.
The main advance of Gemma 3 over Gemma 2 is a longer “context window,” the number of input tokens that can be held in memory for the model to work on at any given time.
Gemma 2 was only 8,000 tokens whereas Gemma 3 is 128,000, which counts as a “long” context window, better suited for working on whole papers or books. (Gemini and other closed-source models are still much more capable, with a context window of 2 million tokens for Gemini 2.0 Pro.)
Gemma 3 is also multi-modal, which Gemma 2 was not. This means it can handle image inputs along with text to serve up replies to queries such as, “What is in this photo?”
And, last, Gemma 3 supports over 140 languages rather than just the English support in Gemma 2.
Also: What does a long context window mean for an AI model, like Gemini?
A variety of other interesting features are buried in the fine print.
For example, a well-known issue with all large language models is that they may memorize portions of their training data sets, which can lead to leaked information and privacy violations if the models are tapped using malicious techniques.
Google’s researchers tested for information leakage by sampling training data and seeing how much could be directly extracted from Gemma 3 versus its other models. “We find that Gemma 3 models memorize long-form text at a much lower rate than prior models,” they note, which theoretically means the model is less vulnerable to information leakage.
Those wishing for more technical detail can read the Gemma 3 technical paper.
Artificial Intelligence