TECH & OTHER NEWS

GPT-4 Turbo reclaims ‘best AI model’ crown from Anthropic’s Claude 3

April 15, 2024

Trophy technology — Getty Images/sofiana indriani

OpenAI has been on an update hot streak, making the latest GPT-4 Turbo available to developers and paid ChatGPT subscribers last week. When launching the model, OpenAI shared that the new GPT-4 Turbo boasts several improvements from its predecessor, and users are finding that to be true.

Also: Zoom gets its first major overhaul in 10 years, powered by generative AI

Starting Thursday, the updated version of GPT-4 Turbo, gpt-4-turbo-2024-04-09, reclaimed its number one spot on the Large Model Systems Organization (LMSYS) Chatbot Arena, a crowdsourced open platform where users can evaluate large language models (LLM).

🔥Exciting news — GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah!
We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to @OpenAI for this incredible launch!
To offer… pic.twitter.com/IxbN2Q9ecJ

— lmsys.org (@lmsysorg) April 11, 2024

In the Chatbot Arena, users can chat with two LLMs side by side and compare their responses to each other without knowing the identity of each model.

After viewing the response, users can continue chatting until they feel comfortable determining which model won, if it is a tie, or if they are both bad, as seen below.

Those results are then used to rank the 82 LLMs in the Chatbot Arena on the leaderboard, which includes all of the most popular LLMs on the market such as Gemini Pro, the Claude 3 family of LLMs, and Mistral-Large-2402.

As of the latest Chatbot Arena update on April 13, the updated version of GPT-4 Turbo holds the lead in the overall, coding, and English categories.

Also: The best AI chatbots: ChatGPT isn’t the only one worth trying

This means that less than a month after overtaking GPT-4 Turbo in the Chatbot Arena, Anthropic’s Claude 3 Opus has been pushed into second place in the overall category, followed by GPT-4-1106-preview, an older version of GPT-4 Turbo, in third place.

These results could be attributed to gpt-4-turbo-2024-04-09’s improved coding, math, logical reasoning, and writing capabilities, demonstrated by its higher performance on a series of benchmarks used to test the proficiency of AI models, as seen below.

Interested in comparing gpt-4-turbo-2024-04-09’s performance against other LLMs for yourself? You can visit the Chatbot Arena website and click on the Arena (side-by-side) option to select which models you want to compare.

Also: Adobe Premiere Pro’s two new AI tools blew my mind. Watch them in action for yourself

It is worth noting that since you know the identity of the models in the side-by-side option, you will not be able to vote. Rather, if you want to be able to vote and have that count toward the leaderboard, you can use the Arena (battle) option to compare random models to each other.

If you’d rather skip the testing and jump straight into using gpt-4-turbo-2024-04-09 in ChatGPT, all you have to do is become a ChatGPT Plus subscriber, which costs $20 per month.

Artificial Intelligence

Source Link

GPT-4 Turbo reclaims ‘best AI model’ crown from Anthropic’s Claude 3

Artificial Intelligence

LEAVE A REPLY Cancel reply

TECH NEWS

Everything Old is New Again: AI-Driven Development and Open Source

Gen AI in Healthcare: The State of Affairs in India

Gartner Predicts Legal, Risk and Compliance Functions to Double Technology Spend...

Microsoft to End Support for Windows Mail, Calendar and People Apps...

IDC Predicts: Asia/Pacific Business Leaders to Demand 80% Success Rate on...

The Cooling Conundrum: AI and Automation Push Data Centers Toward 3X...

TOP STORIES

Organizations Remain Focused on AI; Most Innovations Not Yet Living Up...

Seventy Percent of Economies Are Underprepared for AI Disruption

New study shows almost half of tech professionals in India believe...

Organizations Combining Organizational Learning and AI-Specific Learning Are up to 80%...

Nvidia’s AI-driven triumph over Intel powered by strategic innovations

Most banks and insurers adopt cloud solutions with the primary objective...

Cyber Security

Deepfake Attacks Are Winning in Crypto: 57% Companies Impacted, Regula’s Study...

AI and Gen AI are set to transform cybersecurity for most...

ThreatQuotient Publishes 2024 Evolution of Cybersecurity Automation Adoption Research Report

Kaspersky predicts quantum-proof ransomware and advancements in mobile financial cyberthreats in...

Rising concerns, lingering gaps: most organizations fear AI-driven cyberattacks but lack...

Tenable Forecasts Data Security in the Cloud to Take Centre Stage...