Microsoft Fara1.5 AI Surpasses OpenAI and Google

Microsoft's Fara1.5 AI model surpasses OpenAI and Google in web browsing tasks, achieving a 72% score on the Online-Mind2Web benchmark.

Summary

Fara1.5-27B achieved a score of 72% on the Online-Mind2Web benchmark, surpassing OpenAI Operator's 58.3% and Gemini 2.5's 57.3%.
This AI model is available in 4 billion, 9 billion, and 27 billion parameter versions, all based on the fine-tuned Qwen 3.5 model.
The 9 billion parameter version is currently accessible on Azure AI Foundry, with the 4 billion and 27 billion versions expected soon.

Imagine instructing your computer to search for vacation rentals, compare various websites, fill out booking details, and finalize a reservation—all while you make yourself a coffee. This is the vision of "computer use agents," which are AIs that can navigate and interact with web pages just like a human, without needing special plugins.

OpenAI was the first to introduce this concept with Operator, which launched in January 2025 at a monthly fee of $200, but was discontinued in August after being integrated into ChatGPT Agent. Google also has its own version, Gemini 2.5 Computer Use. Both of these are proprietary, cloud-based options that are costly to maintain.

This week, Microsoft Research unveiled a compact model named Fara1.5, which has outperformed both competitors on key benchmarks.

The Fara1.5 series consists of three models: 4 billion, 9 billion, and 27 billion parameters, all developed from the Qwen3.5 base model, which Microsoft has optimized for web tasks, with all model weights made publicly accessible. (In AI, the number of parameters typically indicates the model's knowledge capacity, with more parameters generally suggesting greater capability.)

To achieve this success, Microsoft re-evaluated its entire development methodology. "We began with a fundamental question: What is required for a small model to excel in agentic tasks?" stated the AI Frontiers team in their report. "The solution involved rethinking data generation, training objectives, model architecture, and orchestration collectively rather than separately."

Benchmark Results

The Online-Mind2Web benchmark is critical for evaluating Microsoft's performance in this area. It assesses how effectively an AI agent completes 300 varied, real-world tasks across 136 widely-used websites—such as product comparisons, form completions, and service bookings—scoring based on the percentage of tasks correctly completed on the dynamic internet.

Fara1.5-27B achieved a score of 72%. OpenAI's Operator managed 58.3%, while Google's Gemini 2.5 scored 57.3%. Yutori's Navigator n1, the leading proprietary alternative, recorded 64.7%. Even Fara1.5-9B, the intermediate model, scored 63.4%, outperforming both OpenAI and Google.

Open-source competitors also did not measure up. Alibaba's GUI-Owl-1.5, with 8 billion parameters, scored 48.6%. AI2's MolmoWeb reached 35.3%. Meanwhile, Microsoft's previous model, Fara-7B, only scored 34.1%, making this latest release nearly double its predecessor at a similar scale.

In another benchmark, WebVoyager, which evaluates task success on the live web in a similar manner, Fara1.5-27B achieved 88.6%, slightly ahead of OpenAI Operator's 87.0% and outperforming H Company's 30-billion-parameter Holo2 at 83.0%.

Training Methodology

The key to Fara1.5's success lies in its training pipeline. Microsoft employed a system called FaraGen1.5 to create the training data. Notably, they used GPT-5.4—OpenAI's model—as a "teacher agent" to illustrate how to execute browser tasks. These demonstrations formed the training data for Fara1.5, effectively leveraging OpenAI's advanced model to train a competing open-source one.

Additionally, Microsoft developed six simulated, fully functional replicas of actual websites—such as email clients, calendars, and marketplaces—allowing the model to practice tasks that typically require logins or irreversible actions (like sending an email or booking a flight) without involving real accounts. This method, known as synthetic domain training, significantly enhances Fara1.5's ability to manage "gated" tasks compared to prior models.

Each model is designed to pause and request confirmation before executing irreversible actions. "Striking a balance between strong safeguards, like Critical Points, and smooth user experiences is essential," explained Yash Lara, Senior PM Lead at Microsoft Research, in an interview with VentureBeat . "Implementing a user interface, such as Microsoft Research's Magentic-UI, is crucial for providing users with opportunities to intervene when necessary, while also minimizing approval fatigue."

This is particularly important because OpenAI was clear about the potential risks when it introduced ChatGPT Agent. "When you log ChatGPT agent into websites or activate connectors, it will gain access to sensitive information from those sources, including emails, files, or account details," the company stated.

Fara1.5 operates through MagenticLite, a secured browser environment that monitors every action and allows users to stop the agent at any time.

The competition in browser AI is intensifying—with Google's Gemini in Chrome, Perplexity's Comet, and Anthropic's Claude for Chrome. Fara1.5's advantage is its openness: it features public weights, open inference code available on GitHub, and can run on user-controlled hardware. The 9 billion parameter model is currently available on Azure AI Foundry, with the 4 billion and 27 billion versions coming soon. Microsoft has plans to broaden Fara1.5's capabilities beyond web browsing to include desktop and enterprise applications in the future.

Microsoft's Fara1.5 AI Outperforms OpenAI and Google in Browsing Tasks

Summary

Benchmark Results

Training Methodology