TECH & OTHER NEWS

Google’s new Gemini models achieve ‘near-perfect recall’

August 28, 2024

gettyimages-1988737599 — NurPhoto/Getty Images

Google is expanding Gemini 1.5, its most powerful family of AI models, with new variants.

On Tuesday, Google AI Studio product lead Logan Kilpatrick shared on X that the company had released three new experimental versions of Gemini: a smaller model, Gemini 1.5 Flash-8B; a “stronger” Gemini 1.5 Pro model; and the “significantly improved” Gemini 1.5 Flash.

Today, we are rolling out three experimental models:
– A new smaller variant, Gemini 1.5 Flash-8B
– A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
– A significantly improved Gemini 1.5 Flash model
Try them on https://t.co/fBrh6UGKz7, details in 🧵

— Logan Kilpatrick (@OfficialLoganK) August 27, 2024

Kilpatrick explained that Google is “releasing experimental models to gather feedback and get our latest updates into the hands of developers.”

Also: Google’s Gemini just got two popular ChatGPT Plus features – and one is free to use

The first model, 1.5 Flash-8B, is an eight-billion-parameter version of the new 1.5 Flash model, and can be used for “everything from high volume multimodal use cases to long context summarization tasks,” Kilpatrick noted in the X thread.

The new version of Pro includes improvements to math, complex prompts, and coding, while the new Flash now performs better on certain internal benchmarks. Kilpatrick said that Gemini 1.5 Pro Exp 0827 (for its release day, August 27th) will replace the last-released model, 0801. Starting September 3rd, 0801 will reroute in Gemini API to the 0827 model.

Also: Want to try GPT-4o mini? 3 ways to access the smarter, cheaper AI model – and 2 are free

Shortly after their release, the latest Gemini 1.5 Pro was ranked #2 and Flash #6 overall in the Chatbot Arena, keeping them essentially neck-and-neck with GPT-4o and GPT-4o mini, respectively. The two models beat Claude 3.5 Sonnet, Grok 2, Grok 2 mini, and Llama 3.1.

Chatbot Arena update⚡!
The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!
Highlights:
– New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
– New Gemini-1.5-Pro (0827) shows strong gains in coding, math over… https://t.co/6j6EiSyy41 pic.twitter.com/D3XpU0Xiw2

— lmsys.org (@lmsysorg) August 27, 2024

The experimental models join the Gemini 1.5 family, which is intended to handle very long context windows. In a technical report from earlier this month, the DeepMind team called their capabilities “unprecedented among contemporary large language models (LLMs),” stating that Gemini 1.5 can process multimodal inputs such as “entire collections of documents, multiple hours of video, and almost five days long of audio.”

The team added that these new releases continue the models’ trend of “near-perfect retrieval (>99%) up to at least 10M tokens,” compared with Claude 3.0’s 200,000 and GPT-4 Turbo’s 128,000 tokens, respectively.

The report also mentioned potential use cases, touting Gemini 1.5’s ability to help professionals save up to 75% of their time on tasks across 10 job categories, among some surprising “frontier” skills: “When given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content,” the report notes.

Also: ChatGPT vs. Microsoft Copilot vs. Gemini: Which is the best AI chatbot?

Reception of the experimental models, however, has been mixed, with some users lauding Google’s speedy releases while others, unimpressed, asked for the release of Gemini 2.0 instead. When asked by one X user about benchmarks, Kilpatrick replied that the company plans to launch a version for production use “in the coming weeks, hopefully, that will come with evals!”

Users can test all three models for free in Google AI Studio and the Gemini API today.

Artificial Intelligence

Source Link

Artificial Intelligence

LEAVE A REPLY Cancel reply

TECH NEWS

TOP STORIES

Cyber Security