TECH & OTHER NEWS

Google’s Gemini 2.0 AI promises to be faster and smarter via agentic advances

December 11, 2024

Google has long been obsessed with speed. Whether it’s the time it takes to return a search result or the time it takes to bring a product to market, Google has always been in a rush. This approach has largely benefited the company. Faster, more comprehensive search results pushed Google to the top of the market.

But fast product releases have resulted in a long history of public betas and failed or discontinued products. There’s even a website called Killed by Google that catalogs all of Google’s failures. While that list is shockingly extensive, the company has also launched winners like Gmail and Adsense. These products helped skyrocket the company way beyond search.

Also: AI is moving undercover at work in 2025, according to Deloitte’s Tech Trends report

So, you can imagine how frustrated Google’s management has been over the last year or so when the AI revolution seemed to leave the company in the dust. While Google has invested in AI technologies for years, ChatGPT just blasted through and achieved chatbot domination in a very short time.

Google responded, of course. Its Gemini generative AI tool, introduced at the end of 2023, has been embedded at the top of the Google SERP (search engine results page). In a blog post today, Google and Alphabet CEO Sundar Pichai reports, “Our Al Overviews now reach 1 bilion people, enabling them to ask entirely new types of questions — quickly becoming one of our most popular Search features ever.”

But, as I reported based on my own testing, Google’s AI failed pretty hard, both at coding and even at its own awareness of its own capabilities.

Yet Pichai, in that same blog post, contends that “Since last December when we launched Gemini 1.0, millions of developers have used Google AI Studio and Vertex AI to build with Gemini.”

Also: OpenAI rolls out Canvas to all ChatGPT users – and it’s a powerful productivity tool

I’m sure that’s true, and it probably means that Google’s AI is suitable for certain development tasks — and not others. Because Google is so Python-centric, I’d bet that most of those developers were focusing on Python-related projects.

In other words, there’s been room for improvement. It’s quite possible that improvement has just happened. Google today is announcing Gemini 2.0, along with a raft of developer-related improvements.

Gemini 2.0

The Gemini 2.0 announcement comes to us through a blog post by Demis Hassabis and Koray Kavukcuoglu, CEO and CTO of Google DeepMind, respectively. The top-level headline says that Google 2.0 is “our new Al model for the agentic era.”

We’ll come back to the agentic bit in a minute because first we need to discuss the Gemini 2.0 model. Technically, Gemini 2.0 is a family of models, and what’s being announced today is an experimental version of Gemini 2.0 Flash. Google describes it as “our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale.”

Also: Why Google’s legal troubles could hasten Firefox’s slide into irrelevance

That’s going to take some unpacking.

The Gemini Flash models are not chatbots. They power chatbots and many other applications. Essentially, the Flash designation means that the model is intended for developer use.

A key component of the announcement goes back to our speed theme. Gemini 2.0 Flash outperforms Gemini 1.5 Flash by two to one, according to Hassabis and Kavukcuoglu.

Earlier versions of Gemini Flash supported multimodal inputs like images, video, and audio. Gemini 2.0 Flash supports multimodal output, such as “natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google search, code execution, as well as third-party user-defined functions.”

Steerable text-to-speech, by the way, is the idea that you can specify things like voice customizations (male or female, for example), the style of speech (i.e., formal, friendly, etc.), speech speed and candence, and possibly language.

Also: Is this the end of Google? This new AI tool isn’t just competing, it’s winning

Developers can use Gemini 2.0 Flash starting now. It comes in the form of an experimental model that can be accessed using the Google API in Google AI Studio and Vertex AI. Multimodal input and text output is available to all developers, but text-to-speech and image generation features are only available to Google’s early-access partners.

Non-developers can also play with Gemini 2.0 via the Gemini AI assistant, both in desktop and mobile versions. This “chat optimized” version of 2.0 Flash can be chosen in the model drop-down menu, where “users can experience an even more helpful Gemini assistant.”

Agentic AI ambitions

So, now let’s get back to the whole agentic thing. Google describes agentic as providing a user interface with “action-capabilities.” Pichai, in his blog post, say agentic AI “can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.”

Also: Agentic AI is the top strategic technology trend for 2025

I’m glad he added “with your supervision” because the idea of an AI that understands the world around you and can think multiple steps ahead is the plot behind so many science fiction stories I’ve read over the years — and they never ended well for the human protagonists.

Gemini 2.0 has a laundry list of improvements including:

Multimodal reasoning: ability to understand and process information from different input types, like pictures, videos, sounds, and text
Long context understanding: ability to participate in conversations, rather than just answering one-off questions, the ability to keep track of what’s been discussed or processed and work from that history.
Complex instruction following and planning: ability to follow a set of steps, or come up with a set of steps to meet a specific goal.
Compositional function-calling: at the coding level, the ability to combine multiple functions and APIs to accomplish a task.
Native tool use: ability to integrate and access services like Google search as part of the API’s capabilities.
Improved latency: faster response time, making interactions more seamless, and helping to feed Google’s overall speed addiction.

Taken together, these improvements help set up Gemini 2.0 for agentic activities.

Google’s Project Astra illustrates just how all of these capabilities come together. Project Astra is a prototype AI assistant that integrates real-world information into its responses and results. Think of it as a virtual assistant, where both the location and the assistant are virtual.

Also: 25 AI tips to boost your programming productivity with ChatGPT

Tasks Astra might be asked to perform include recommending a restaurant or developing an itinerary. But unlike a chatbot AI, the assistant is expected to combine multiple tools, like Google Maps and Search, make decisions based on the user’s existing knowledge, and even take the initiative if, say, there’s road construction en route to a possible destination. In that case, the AI might recommend a different route or, if time is constrained, perhaps even a different destination.

Project Mariner is another ambitious Google research project, although I find it a bit more scary as well. Mariner works with what’s on your browser screen, essentially reading what you’re reading, and then responding or taking action based on some criteria.

Mariner is expected to interpret pixel content as well as text, code, images, and forms, and — with some serious guard rails, one would hope — take on real world tasks. Right now, Google admits that Mariner is doing fairly well, but isn’t always accurate and can sometimes be somewhat slow.

Jules: Journey to the center of the codebase

Jules is an experimental agent for developers. This one also seems scary to me, so it may well be that I’m just not quite ready to let AIs run loose on their own. Jules is an agent that integrates into GitHub workflows and is expected to manage and debug code.

According to today’s blog post by Shrestha Basu Mallick, Group Product Manager of the Gemini API and Kathy Korevec, Director of Product at Google Labs, “You can offload Python and Javascript coding tasks to Jules.”

Also: Gen AI could speed up coding, but businesses should still consider risks

They go on to say, “Working asynchronously and integrated with your GitHub workflow, Jules handles bug fixes and other time-consuming tasks while you focus on what you actually want to build. Jules creates comprehensive, multi-step plans to address issues, efficiently modifies multiple files, and even prepares pull requests to land fixes directly back into GitHub.”

I can definitely see how Jules could foster an increase in productivity, but it also makes me uncomfortable. I’ve occasionally delegated my code to human coders and gotten back stuff that could only be described as, “Holy crap, what were you thinking?”

I’m concerned about getting back similarly problematic work from artificial coders. Giving an Al the ability to go in and change my code seems risky. If something goes wrong, finding what was changed and reverting it, even with tools like Git and other version control tools, seems like a big step.

I’ve had to undo work from underperforming human coders. It was not fun. I understand the benefits of automated coding. I certainly don’t love debugging and fixing my own code, but giving up that level of control is daunting, at least to me.

Also: Gen AI gives software developers surge in productivity – but it’s not for everyone

That said, if Google is willing to trust its own code base to Gemini 2.0 and Jules, who am I to judge? The company is certainly eating its own dog food, and that counts for a lot.

Avoiding Skynet

Google seems to firmly believe that AI can help make its products more helpful in a wide range of applications. But the company also seems to get the obvious concerns, stating, “We recognize the responsibility it entails, and the many questions Al agents open up for safety and security.”

Hassabis and Kavukcuoglu say that they’re “taking an exploratory and gradual approach to development, conducting research on multiple prototypes, iteratively implementing safety training, working with trusted testers and external experts and performing extensive risk assessments and safety and assurance evaluations.”

Also: 4 ways to turn generative AI experiments into real business value

They give a number of examples of the risk management steps they’re taking, including:

Working with their internal Responsibility and Safety Committee to understand risks.
Google is using Gemini 2.0 itself to help Google’s AI systems grow with safety in mind by using its own advanced reasoning to self-improve and mitigate risks. It’s a bit like having the wolf guard the henhouse, but it makes sense as one aspect of protection.
Google is working on privacy controls for Project Astra and to make sure the agents don’t take unintended actions. ‘Cause that would be baaad.
With Mariner (the screen-reading agent), Google is working on making sure the model prioritizes instructions from users rather than what might be third-party attempts to inject malicious prompts as part of the web page content.

Google states, “We firmly believe that the only way to build Al is to be responsible from the start and we’ll continue to prioritize making safety and responsibility a key element of our model development process as we advance our models and agents.”

This is good. AI has enormous potential to be a boon to productivity but is also incredibly risky. While there’s no guarantee BigTech won’t accidentally create our own Forbin Project Colossus, or a cranky Hal-9000, at least Google is aware of the risks and is paying attention.

So, what do you think about all of these Google announcements? Are you excited for Gemini 2.0? Do you think you might use a public version of Project Astra or Mariner? Are you currently using Gemini as your AI chatbot, or do you prefer another one? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Source Link

Gemini 2.0

Agentic AI ambitions

Jules: Journey to the center of the codebase

Avoiding Skynet

LEAVE A REPLY Cancel reply

TECH NEWS

TOP STORIES

Cyber Security