Google Gemini AI evokes debates among influencers on evaluation criteria and GPT-4 comparison, finds GlobalData

Smitarani Tripathy, Social Media Analyst at GlobalData, comments: “Beyond the capabilities related discussions of Gemini AI, the evaluations were another key topic of interest among influencers. Influencers perceived Gemini Ultra as falling short of GPT-4 in standard 5-shot evaluations, with Gemini only surpassing GPT-4 when utilizing the CoT@32 methodology.

“The influencers express concerns about the practicality of CoT@32 in real-world scenarios and emphasize GPT-4’s continued superiority. They have also highlighted the importance of the MMLU benchmark and advocate for more transparent evaluations through API endpoints or model weights rather than relying on blog posts.

“Influencers are skeptical regarding Gemini’s capabilities and are urging for practical assessments before forming definitive opinions. They have emphasized a preference for direct 5-shot vs. 5-shot comparisons to ensure a more straightforward evaluation.”

Below are a few popular influencer opinions captured by GlobalData’s Social Media Analytics Platform:

  1. Saurabh Kumar, Co-Founder at Adora:

“For me, using uncertainty routed chain of thought guided evaluation to claim better MMLU score was kinda incomplete. It was clear as shown in paper that on greedy and on CoT32 analysis, GPT4 beats Gemini. Uncertainty routed CoT, a new and probably could be a better way to judge the performance but there was very little explanation in the paper as to “why this technique doesn’t benefit GPT4”

  1. Harry Surden, Professor of Law, University of Colorado Law School:

“It is only when Gemini Ultra uses “CoT@32” – which is likely something like running 32 parallel Chain-of-Thought chains, selecting the best answers among them – that Gemini Ultra surpasses GPT-4.This is disappointing, as Gemini Ultra, a newer model, should win on 5 shot itself.

  1. Shital Shah,Principal Research Engineer at Microsoft :

“Gemini does win against GPT-4 with CoT@32 but not on 5-shot. This likely indicates that Gemini is inherently more powerful but somehow without proper prompting that capability doesn’t get exposed. May be GPT-4 still has better IFT? Still, this is super exciting milestone!

  1. Bindu Reddy, CEO of Abacus.AI:

“Digging deeper into the MMLU Gemini Beat – Gemini doesn’t really Beat GPT-4 On This Key Benchmark. The Gemini MMLU beat is specifically at CoT@32. GPT-4 still beats Gemini for the standard 5-shot – 86.4% vs. 83.7%….”

  1. Ethan Mollick, Professor at The Wharton School:

“We really don’t know anything about Gemini Ultra. Does it beat GPT-4 for real? If so, why by such a small amount? Two options: 1) Gemini represents the best effort by Google, and the failure to crush GPT-4 shows limits of LLMs approaching 2) Google’s goal was just to beat GPT-4”

  1. Brett Winton, Investment Advisor at ARK Invest:

“Why is prompt engineering relevant:……But they’re comparing prompt-engineered Gemini vs GPT-4 without! Like for like reality Gemini: 83.7%, GPT-4: 86.4%”

LEAVE A REPLY

Please enter your comment!
Please enter your name here