
Elon Musk’s xAI Launches Grok 3, Dethroning OpenAI on Key AI Benchmarks
- by winbuzzer.com
- Feb 18, 2025
- 0 Comments
- 0 Likes Flag 0 Of 5

— Rex (@12exyz) February 18, 2025
‘Think’ Button For AI Reasoning and Deep Search
A standout feature in Grok 3 is its “Think” button, which allows users to request a more detailed and analytical response by giving the AI additional processing time. The goal is to improve reasoning accuracy and enhance the model’s ability to tackle complex tasks.
The button enables advanced chain of thought reasoning, which like OpenAi’s o1 and o3 models and also DeepSeek R1 aims to provide users with results based on complex thinkingt
Grok 3 also introduces its own adoption of an AI-driven research features similar to OpenAI’s Deep Research and Google Gemini’s Deep Research. The tool allows Grok 3 to pull and synthesize real-time information, making it a competitor to both deep research products and Perplexity AI, which also just launched its own deep research implementation.
Andrej Karpathy, a former Tesla AI director and early tester of Grok 3 who got early access, found that with ‘Think’ mode enabled, the model successfully estimated the training FLOPs required for OpenAI’s GPT-2, a task that even OpenAI’s most powerful thinking model o1-pro failed. Karpathy noted, “Grok 3 with Thinking solves it great, while o1 pro (GPT thinking model) fails.”
For real-time research, Deep Search gives Grok 3 an edge over many models, but its accuracy issues put it behind OpenAI’s Deep Research and Perplexity AI. Karpathy says Grok 3 generates “hallucinated URLs” and avoids citing X unless explicitly asked to limits its effectiveness as a research tool.
In terms of reasoning, Grok 3’s new Deep Search mode allows it to match OpenAI’s o1-pro in some logic-heavy tasks. However, it still struggles with spatial reasoning, as demonstrated by its failed tic-tac-toe board generation test. This places it behind GPT-4o, which has been noted for its advanced logic capabilities.
Creativity remains another weak point. Claude has been widely praised for its natural and engaging writing style, while Grok 3 still produces responses that feel formulaic.
In another test, Grok 3 was able to correctly generate a Settlers of Catan board setup, a challenge that many AI models struggle with. However, when asked to generate tricky tic-tac-toe boards, the model failed, producing nonsensical layouts. Karpathy observed, “It solved a few tic tac toe boards I gave it with a pretty nice/clean chain of thought… but failed on generating tricky ones.”
I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check.
Thinking
✅ First, Grok 3 clearly has an around state of the art thinking model ("Think" button) and did great out of the box on my Settler's of Catan… pic.twitter.com/qIrUAN1IfD
Please first to comment
Related Post
Stay Connected
Tweets by elonmuskTo get the latest tweets please make sure you are logged in on X on this browser.
Sponsored
Popular Post
tesla Model 3 Owner Nearly Stung With $1,700 Bill For Windshield Crack After Delivery
35 ViewsDec 28 ,2024