Google has updated its reasoning mode, Gemini 3 Deep Think. This tool is designed to tackle complex problems in science and engineering.
In tests, the model outperformed OpenAI's GPT-5.2 and Anthropic's Claude Opus 4.6, including in the ARC-AGI-2 visual puzzles, MMMU-Pro for assessing multimodal capabilities, Elo 3455, and the "Last Exam of Humanity."
Source: Google."We updated Gemini 3 Deep Think in close collaboration with scientists and researchers to address complex scientific challenges—where tasks often lack clear boundaries or a single correct solution, and the data provided is incomplete," the company stated in its blog.
Gemini 3 Deep Think demonstrates advanced capabilities in mathematics and programming, and excels in natural sciences, including chemistry and physics. The updated mode solves problems at the level of gold medalists in international competitions.
In the CMT-Benchmark, the model scored 50.5%, confirming its deep understanding of theoretical physics.
Source: Google."In addition to its advanced performance, Deep Think is focused on practical applications: it helps researchers interpret complex data and engineers model physical systems using code," Google noted.
The new Deep Think is available in the Gemini app for Google AI Ultra subscribers and the Gemini API for select developers.
AI Mathematician from DeepMind
Google's DeepMind division has introduced the AI agent Aletheia. This model set a new record in the IMO-ProofBench Advanced benchmark, solving 91.9% of the problems. This test is considered one of the most challenging in mathematics.
The neural network is built on the Gemini Deep Think framework. It features a verification module that identifies errors in draft solutions and initiates an iterative process for refinement.
A key feature of the agent is its ability to recognize when a problem is unsolvable, significantly saving researchers' time.
Aletheia utilizes Google Search to navigate complex scientific materials, reducing the likelihood of using false references and computational errors when working with scientific content.
Among the model's achievements are:
- full generation of a scientific paper calculating structural constants in arithmetic geometry;
- collaborative proof of estimates for systems of interacting particles (independent sets);
- autonomous solution of four problems from the Erdős list, one of which was previously considered open.
DeepMind emphasized that Aletheia's success confirms the relevance of scaling laws: in proof-based mathematics, quality continues to improve through the effective application of agents.
Breakthrough in Medicine
DeepMind's subsidiary, Isomorphic Labs, has introduced the IsoDDE engine for drug design. In complex tests, this innovation outperformed AlphaFold 3 in prediction accuracy by twofold.
AlphaFold was a significant breakthrough, capable of predicting the three-dimensional structures of proteins and their interactions with molecules. IsoDDE, however, demonstrates an entirely new level:
- the model predicts binding affinity more accurately than traditional methods;
- the engine can identify hidden structures ("pockets") in proteins where drugs may bind;
- it supports a wide range of complex molecules, including antibodies and large biological structures.
"IsoDDE offers a scalable foundation for AI drug design, providing the prediction accuracy necessary to work with new biological systems with unprecedented reliability," the company stated in its blog.
As a reminder, in July 2022, the AlphaFold algorithm predicted the structures of over 200 million proteins. This includes nearly all known compounds discovered in plants, bacteria, and animals.
