After using Gemini 3, GPT-5 feels like a disaster
Gemini 3 is insane and Google is winning again.
Gemini 3 is a fantastic model.
It sees patterns, breaks down multimodal data, and gives the kind of strategic insight that finance dreams of.
Right after Google released the Gemini 3, its stock jumped 3%. Shares of Alphabet are up 55% this year.
The precision of Gemini 3 is unlike anything else in any LLM, not even close. It answers questions that don’t exist on the public web, pulls insights from obscure corners of knowledge, and somehow still surfaces sources for verification.
Even Sam Altman says it’s a great model.
Yes, hallucinations still exist, but the rate is dropping to a point where it’s genuinely surprising. The model retrieves numbers and statistics that appear in maybe three non-public articles, and it does it with confidence and accuracy.
I don’t know if it’s tapping into Google Docs, internal academic libraries, or something else entirely, but combined with compute that no competitor has, Google clearly has the richest base material to build next-generation models.
When it comes to data, Google wins.
Gemini 3 even got a record score in the RadLE benchmark.
Every new model wants to become board-certified in every field.
When an LLM can reason at its best, CFOs can spend time on strategy and not reconciliation.
Read on.
Gemini 3, Gemini 3 Pro Preview, and Gemini 3 Deep Think
Google didn’t roll them out quietly. They embedded the system into:
Search
YouTube
Workspace
Gemini app
Vertex AI
And even a new agentic coding environment called Antigravity.
On Humanity’s Last Exam, it more than doubles older models.
On ARC-AGI-2. The test that reveals whether a model can generalize. It blows past GPT-5.1, Claude 4.5, and every frontier model in the market.
On Scientific knowledge, it hits 93.8%.
On video understanding, it processes an entire YouTube link frame-by-frame.
You can drop in a 20-minute video, and it answers questions about objects on screen, text you never mentioned, and events that happen for half a second. No one else has this. Nothing comes close.
And Deep Think pushes it even further. More tokens spent thinking means more coherence, more structure, and more accuracy. This model behaves like someone who actually understands the problem.
There’s the part that matters for finance: long-horizon planning.
Gemini 3’s performance on VendingBench, a simulation of running a small business for a full year, shows the real breakthrough. It doesn’t fall apart. It finishes the year stronger than every competitor.
That’s the first concrete proof that an AI system can manage a dynamic economic environment without collapsing into nonsense.
Gemini 3 Pro shows consistent, compounding growth, finishing above $5,000.
Claude Sonnet 4.5 trails behind, plateauing earlier.
GPT-5.1 barely grows, stuck in low-performance mode.
Gemini 2.5 Pro essentially flatlines, it loses money over time.
For CFOs, this is about long-horizon reasoning.
The ability to make decisions today that still make sense months later.
This benchmark maps directly to the real work of finance:
Forecasting
Capital allocation
Budget adherence
Margin protection
Inventory planning
Scenario management
Strategic decision-making
These are all long-horizon problems.
And Gemini 3 is the first model that holds the logic together across time.
Box.com ran multi-step reasoning tasks on unstructured documents.
The kind of workflows analysts drown in. Gemini 3 outperformed Gemini 2.5 Pro by 22 points and crushed every other model on extraction + reasoning. Financial services was the smallest gap only because the baseline was already high.
Gemini 3 understands your documents. It knows how to glue information across PDFs, Excel sheets, screenshots, audio, video, and raw text into a single, coherent answer.
If your work involves financial statements, contracts, audit evidence, compliance packets, or multi-system exports, this matters more than any benchmark on social media.
CFOs can find information from 1,000 invoices with one prompt, with a higher level of accuracy than ever before.
The Bottom Line
You can also tell Gemini 3 to clean your inbox.
Gemini 3 can get things done in everyday life. It combines deeper reasoning with improved, more consistent tool use. It can simply act on your behalf.
So if you want to learn about a new topic, you can give it academic papers, long video lectures or tutorials and it can generate code for interactive flashcards, visualizations or other formats to help you master the material.
Gemini 3 sees the world like how you see the world.
If you look at how far we’ve come, you’d really be surprised at the speed we’re progressing.
At this point, forget about the AI Bubble. If it bursts, it’ll take down everyone.
No company is immune to that, not even Google. But never let the unknown get in your way of building a better future.
Next Thursday, I’ll put Gemini 3 to the test on real finance workflows and see how far the model actually goes inside a CFO environment.
But it’s only for the paid users. You must be thinking, why pay? Well, so far we’ve been very hard on sponsors because I don’t want to spoil the quality of our letters and the reading experience we’re maintaining.
28,000+ CFOs and finance leaders from the world’s biggest companies are reading this and I want to keep finance ahead of all the noise and give a clear picture of what’s happening and where the most ROI is.
Upgrade to paid and enjoy this great piece I wrote last Thursday.
How to write better financial prompts with ChatGPT 5.1 and build board slides in seconds
The highest-leverage skill in the age of AI is your ability to write better prompts.
And that’s all for today.
See you on Thursday!
Whenever you’re ready, there are 2 ways I can help you:
If you’re building an AI-powered CFO tech startup, I’d love to hear more and explore if it’s a fit for our investment portfolio.
I’m Wouter Born. A CFOTech investor, advisor, and founder of finstory.ai
Find me on LinkedIn















