✨ Gemini 2.5 Pro vs. Claude 3.7 Sonnet Coding Comparison 🔥
Google just launched a new model on March 26th which they claim to be the best on coding, reasoning and overall everything. 🥴 But I mostly care on how the model compares against the best available model which is Claude 3.7 Sonnet which itself is released on February end.
Let’s compare these two models in coding and see if I need to change my favorite coding model or if Claude 3.7 still holds. 😮💨
TL;DR
If you want to jump straight to the conclusion, when compared against these finest models on coding, I’d say go for Gemini 2.5 Pro according to our tests and the model benchmarks. However, Claude 3.7 Sonnet is not that far behind.
Just an article ago, Claude 3.7 Sonnet was the answer to every model comparison, and I thought this would remain the same for quite some time. But here you go, Gemini 2.5 Pro takes the lead. It feels like we’ve officially entered the AI era. 🫠
Brief on Gemini 2.5 Pro
Gemini 2.5 Pro, which is currently an experimental thinking model, seems to be literally the talk of the town within a week after its release. Everyone's talking about this model on Twitter (X) and YouTube. It's trending everywhere, like seriously, everywhere.
And it is #1 in the LMArena just like that. But, what does this mean? It means that this model is killing all the other models in not just coding but also in Math, Science, Image understanding, and what not.
Gemini 2.5 pro comes with a 1 million token context window with with 2 million context window coming soon. 🤯
You can check out other folks like Theo-t3 talking about this model to get a bit more insight into it:
It is said to be the best model to date for coding, with about 63.8% on SWE-bench, which is definitely higher than our previous top coding model Claude 3.7 Sonnet, with an accuracy of about 62.3%.
This is a quick demo that Google has shared on this model building a dinosaur game.
Here's a quick benchmark of this model on Reasoning, Mathematics, and Science. This confirms that the model is not just suitable for coding but also for all your other needs. I'd say they claim it's an all-rounder. 🤷♂️
This is all cool, and I’ll confirm the claim, but in this article, I will mainly be comparing the model on coding, and let’s see how well it performs compared to Claude 3.7 Sonnet.
Coding Problems
💁 Let’s compare these two models on coding. We’ll do a total of 4 tests mainly on WebDev, animation and a tough LeetCode question.
1. Flight Simulator
Prompt: Create a simple flight simulator using JavaScript. The simulator should feature a basic plane that can take off from a flat runway. The plane's movement should be controlled with simple keyboard inputs (e.g., arrow keys or WASD). Additionally, generate a basic cityscape using blocky structures, similar to Minecraft.
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
I definitely got exactly what I asked for, with everything functioning, from plane movements to the basic Minecraft-styled block buildings. I can't really complain about anything here. 10/10 for this one. 🔥
Response from Claude 3.7 Sonnet
You can find the code it generated here: Link
Here’s the output of the program:
I can see some issues with this one. The plane is clearly facing sideways, and I don't know why that is. Again, it was simply out of control once it took off and went clearly outside the city. Basically, I'd say we didn't really get a completely working flight simulator here.
Summary:
Fair to say, Gemini 2.5 really got this correct, and in one shot. But the issues with the Claude 3.7 Sonnet code aren’t really that big to resolve, but yeah, we didn’t really get the output as expected and definitely not close to what Gemini 2.5 Pro got us.
2. Rubik’s Cube Solver
This is one of the toughest questions for LLMs. I’ve tried it with many other LLMs, but none of them could get it correct. Let’s see how these two models do this one.
Prompt: Build a simple 3D Rubik’s Cube visualizer and solver in JavaScript using Three.js. The cube should build a 3x3 Rubik’s Cube with standard colors. Have a scramble button that randomly scrambles the cube. Include a solve function that animates the solution step by step. Allow basic mouse controls to rotate the view.
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
It's really impressive that it could do something this hard in one shot. I can truly see how powerful this model seems to be with the 1 million token context window.
Response from Claude 3.7 Sonnet
You can find the code it generated here: Link
Here’s the output of the program:
And again, kind of disappointed that it did fall into the same issue as some other LLMs, failing with the colors and completely failing to solve the cube. I did try to help it come up with the answer, but it didn’t really help.
Summary:
Here again, Gemini 2.5 Pro takes the lead. And the best part is that all of it was done in one shot. Claude 3.7 was really disappointing, as it could not get this one correct, despite being one of the finest coding models out there.
3. Ball Bouncing Inside a Spinning 4D Tesseract
Prompt: Create a simple JavaScript script that visualizes a ball bouncing inside a rotating 4D tesseract. When the ball collides with a side, highlight that side to indicate the impact.
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
I cannot notice a single issue in the output. The ball and the collision physics all work perfectly, even the part where I asked it to highlight the collision side works. This free model seems to be insane for coding. 🔥
Response from Claude 3.7 Sonnet
You can find the code it generated here: Link
Here’s the output of the program:
Wow, finally, Claude 3.7 Sonnet got an answer correct. It also added colors to each side, but who asked for it? 🤷♂️ Nevertheless, can’t really complain much here, as the main functionality seems to work just fine.
Summary:
The answer is obvious this time. Both models got the answer correct, implementing everything I asked for. I won’t really say that I like the output of Claude 3.7 Sonnet more, but it definitely put in quite some work compared to Gemini 2.5 Pro.
4. LeetCode Problem
For this one, let’s do a quick LeetCode check with to see how these models handle solving a tricky LeetCode question with an acceptance rate of just 14.9%: Maximum Value Sum by Placing 3 Rooks.
Claude 3.7 Sonnet is known to be super good at solving LC questions. If you want to see how Claude 3.7 compares to some top models like Grok 3 and o3-mini-high, check out this blog post:


Claude 3.7 Sonnet vs. Grok 3 vs. o3-mini-high: Coding comparison
Shrijal Acharya for Composio ・ Feb 27
Prompt:
You are given a m x n 2D array board representing a chessboard, where board[i][j] represents the value of the cell (i, j).
Rooks in the same row or column attack each other. You need to place three rooks on the chessboard such that the rooks do not attack each other.
Return the maximum sum of the cell values on which the rooks are placed.
Example 1:
Input: board = [[-3,1,1,1],[-3,1,-3,1],[-3,2,1,1]]
Output: 4
Explanation:
We can place the rooks in the cells (0, 2), (1, 3), and (2, 1) for a sum of 1 + 1 + 2 = 4.
Example 2:
Input: board = [[1,2,3],[4,5,6],[7,8,9]]
Output: 15
Explanation:
We can place the rooks in the cells (0, 0), (1, 1), and (2, 2) for a sum of 1 + 5 + 9 = 15.
Example 3:
Input: board = [[1,1,1],[1,1,1],[1,1,1]]
Output: 3
Explanation:
We can place the rooks in the cells (0, 2), (1, 1), and (2, 0) for a sum of 1 + 1 + 1 = 3.
Constraints:
3 <= m == board.length <= 100
3 <= n == board[i].length <= 100
-109 <= board[i][j] <= 109
Response from Gemini 2.5 Pro
💁 I have quite high hopes with this model as how easily it was able to answer all three of the coding questions we tested.
You can find the code it generated here: Link
It did take quite some time to answer this one though and the code it wrote is kind of super complex to make sense of. I think it did answer it complicated than required. But still, the main thing we’re looking for is to see if it can answer it correct.
And as expected, it got this tough LeetCode question in one shot as well. This is one of the questions I got stuck on when learning DSA. I’m not sure if I’m happy that it got it right in one shot. 😮💨
Response from Claude 3.7 Sonnet
💁 I have hopes that this model is going to crush this one, as in all the other coding tests I’ve done, Claude 3.7 Sonnet has answered all of the LeetCode questions correctly.
You can find the code it generated here: Link
It did write correct code but got TLE, but if I have to compare the code simplicity, I’d say this model got the code more simple and easy to understand.
Summary:
Gemini 2.5 did get the answer correct and also wrote the code in the expected time complexity, but Claude 3.7 Sonnet did fall into TLE. If I have to compare the code simplicity, Claude 3.7’s generated code seems to be better.
Conclusion
For me, Gemini 2.5 Pro is the winner. We’ve compared two models that are said to be the best at coding. The big difference I see in the model stats is just that Gemini 2.5 Pro has a slightly higher context window, but let's not forget that this is an experimental model and improvements are still on the way.
Imagine how good this model is going to be after a 2M token context window? 😵
Google's been killing it recently with such solid models, previously with the Gemma 3 27B model, a super lightweight model with unbelievable results, and now with this beast of a model, Gemini 2.5 Pro.
If you’d like to take a look at the Gemma 3 27B model comparison, here you go:


🔥 Gemma 3 27B vs. QwQ 32B vs. Deepseek R1 comparison ✅
Shrijal Acharya for Composio ・ Mar 20
What do you think about Gemini 2.5 Pro? Let me know your thoughts in the comments! 👇