Google drops new Gemini model, and it goes beeline to the top of the LLM leaderboard

Google drops new Gemini model, and it goes beeline to the top of the LLM leaderboard-tramesh

Google is consistently afterlight Gemini, absolution new versions of its AI archetypal ancestors every few weeks. The latest is so acceptable it went beeline to the top of the Imarena Chatbot Amphitheatre leaderboard — toppling the latest adaptation of OpenAI's GPT-4o.


Previously accepted as the LMSys arena, it is a belvedere that lets AI labs pit their best models adjoin one addition in a dark head-to-head. The users vote but don't apperceive which archetypal is which until afterwards they've voted.


The new archetypal from Google DeepMind has the addictive name Gemini-Exp-1114 and has akin the latest adaptation of GPT-4o and exceeded the capabilities of the o1-preview acumen archetypal from OpenAI.


The top 5 models in the amphitheater are all versions of OpenAI or Google models. The aboriginal archetypal on the leaderboard not fabricated by either of those companies is Xai's Grok 2.


The success of this new archetypal comes as Google assuredly releases a Gemini app for iPhone, which exhausted the ChatGPT app in our Gemini vs. ChatGPT 7-round face-off.


How able-bodied does the new archetypal work?

The latest Gemini archetypal seems to accomplish decidedly able-bodied at algebraic and eyes tasks, which makes faculty as they are areas in which all Gemini models excel.


Gemini-Exp-1114 isn't currently accessible in the Gemini app or website. You can alone admission it by signing up for a chargeless Google AI Studio annual (the belvedere aimed at developers absent to try new ideas).


I'm additionally not abiding whether this is an adaptation of Gemini 1.5 or whether it's an aboriginal acumen into Gemini 2, accepted abutting month. If it is the closing again the advance over the antecedent bearing ability is not as acute as some expected.


However, it is accomplishing able-bodied in abstruse and artistic areas according to benchmarks. This would tie into the abstraction its activity to be advantageous for acumen and managing agents. It aboriginal in math, analytic adamantine problems, artistic autograph and vision.


Unlike added benchmarks the Chatbot Amphitheatre is based on animal perceptions of achievement and achievement quality, rather than adamant testing adjoin data.


Whether this is aloof a new adaptation of Gemini 1.5 Pro or an aboriginal acumen into the capabilities of Gemini 2, it's activity to be an absorbing few months in AI land.