DeepMind Technologies Limited, doing business as Google DeepMind, has recently released its latest generative AI model “Gemini 1.0” in three sizes for different tasks: Ultra, Pro and Nano. Subsequently, the launch has triggered intense debates among influencers on social media platform “X” regarding its performance relative to GPT-4, observes the Social Media Analytics Platform of GlobalData, a leading data and analytics company.
Smitarani Tripathy, Social Media Analyst at GlobalData, comments: “Beyond the capabilities related discussions of Gemini AI, the evaluations were another key topic of interest among influencers. Influencers perceived Gemini Ultra as falling short of GPT-4 in standard 5-shot evaluations, with Gemini only surpassing GPT-4 when utilizing the CoT@32 methodology.
“The influencers express concerns about the practicality of CoT@32 in real-world scenarios and emphasize GPT-4’s continued superiority. They have also highlighted the importance of the MMLU benchmark and advocate for more transparent evaluations through API endpoints or model weights rather than relying on blog posts.
“Influencers are skeptical regarding Gemini’s capabilities and are urging for practical assessments before forming definitive opinions. They have emphasized a preference for direct 5-shot vs. 5-shot comparisons to ensure a more straightforward evaluation.”
Below are a few popular influencer opinions captured by GlobalData’s Social Media Analytics Platform:
“For me, using uncertainty routed chain of thought guided evaluation to claim better MMLU score was kinda incomplete. It was clear as shown in paper that on greedy and on CoT32 analysis, GPT4 beats Gemini. Uncertainty routed CoT, a new and probably could be a better way to judge the performance but there was very little explanation in the paper as to “why this technique doesn’t benefit GPT4”
“It is only when Gemini Ultra uses “CoT@32” – which is likely something like running 32 parallel Chain-of-Thought chains, selecting the best answers among them – that Gemini Ultra surpasses GPT-4.This is disappointing, as Gemini Ultra, a newer model, should win on 5 shot itself.
“Gemini does win against GPT-4 with CoT@32 but not on 5-shot. This likely indicates that Gemini is inherently more powerful but somehow without proper prompting that capability doesn’t get exposed. May be GPT-4 still has better IFT? Still, this is super exciting milestone!
“Digging deeper into the MMLU Gemini Beat – Gemini doesn’t really Beat GPT-4 On This Key Benchmark. The Gemini MMLU beat is specifically at CoT@32. GPT-4 still beats Gemini for the standard 5-shot – 86.4% vs. 83.7%….”
“We really don’t know anything about Gemini Ultra. Does it beat GPT-4 for real? If so, why by such a small amount? Two options: 1) Gemini represents the best effort by Google, and the failure to crush GPT-4 shows limits of LLMs approaching 2) Google’s goal was just to beat GPT-4”
“Why is prompt engineering relevant:……But they’re comparing prompt-engineered Gemini vs GPT-4 without! Like for like reality Gemini: 83.7%, GPT-4: 86.4%”