Summary
- GLM-5.2 is only 1% behind Claude Opus 4.8 on the FrontierSWE benchmark, surpassing GPT-5.5. The model is released under an MIT license and has no regional restrictions.
- Developed entirely on Huawei Ascend chips, it does not utilize any NVIDIA hardware.
- Unsloth AI has introduced 2-bit GGUF quantizations, reducing the model size from 1.51TB to 238GB. However, it still requires 256GB of RAM or VRAM for operation.
Z.ai launched GLM-5.2 on June 16, showcasing significant enhancements over its predecessor, GLM 5.1.
The Beijing-based organization has been on the U.S. Entity List since January 2025, seemingly gaining from increasing apprehensions regarding the U.S. AI policy. Recently, the prohibition of Anthropic Fable alongside this new model release has propelled zAI's stock by 90%, achieving a record high.
GLM 5.2's performance metrics corroborate the excitement surrounding it.
On the FrontierSWE benchmark, which assesses an AI's ability to handle complex technical tasks over extended periods, GLM-5.2 scored 74.4, slightly trailing behind Claude Opus 4.8's score of 75.1, while outperforming GPT-5.5, which scored 72.6. In SWE-bench Pro, which evaluates the autonomous resolution of real-world GitHub issues based on pass rates, GLM-5.2 achieved 62.1, surpassing GPT-5.5's 58.6 and significantly exceeding GLM-5.1's 58.4.
This advancement positions it as the premier open-source model in the Artificial Analysis Intelligence Index, which compiles results from nine different assessments to evaluate overall AI quality. According to OpenRouter’s benchmarks, it is comparable to the now-restricted Claude Fable 5.
The hardware aspect of GLM-5.2's development adds another layer of interest. It was trained exclusively on Huawei Ascend chips without any NVIDIA components. Emad Mostaque, the founder of Stability AI, estimated that the total training expenses were around $25 million, with 80% allocated to post-training, making it cost-effective compared to its counterparts.
Earlier this year, Decrypt reported that Z.ai had been training image models on Huawei's Ascend Atlas servers without using any American chips. GLM-5.2 enhances this infrastructure with its 744-billion-parameter mixture-of-experts model, featuring a genuine 1 million-token context window—five times the 200K limit of GLM-5.1—and an MIT license that prevents governmental controls from restricting access.
Tokens refer to the segments of text that a model can interpret and produce, while Parameters signify the internal settings and values that dictate a model's processing and response generation.
Target Audience and Pricing
The context window represents a significant operational change for developers. Tasks like whole-repo navigation, multi-file refactoring, and complex pipelines that previously necessitated segmentation can now be executed in a single call. API costs are set at $1.40 per million input tokens and $4.40 per million output tokens, which is significantly lower than Claude Opus 4.8's $5 input and $25 output. The Coding Plan is priced around $18 monthly and integrates seamlessly with Claude Code, Cline, Kilo Code, and various other popular environments.
Local deployment remains a feasible option. Unsloth AI has introduced 2-bit GGUF quantizations that compress the model from 1.51TB down to 238GB while retaining approximately 82% accuracy.
However, caution is advised. It still requires 256GB of unified memory or a compatible combination of RAM and VRAM—making it suitable for high-end machines like the M4 Ultra Mac Studio or a workstation equipped with a mid-range GPU and 256GB of system RAM with mixture-of-experts offloading. Although expensive, it is at least an option for those wanting to buy and operate it in their own setup.
In a quick test, we tasked GLM-5.2 with creating a game that combined typing mechanics with shooting elements. Although the user interface was not the most aesthetically pleasing—other models produced more refined interfaces—the variety of generated scenarios, enemy types, and boss appearances throughout the gameplay was impressive.
It produced a wider range of game states than any other model we tested for the same task in a zero-shot environment.
If you're interested in trying it out, it's available on our Itch.io profile.
This variability highlights GLM-5.2's strongest economic advantage. For workflows requiring diverse outputs, particularly in multi-shot generation and agentic pipelines, the pricing at open-source levels is compelling. However, for sustained high-difficulty tasks—like SWE-Marathon, where it scores 13.0 against Opus 4.8's 26.0—the gap to the closed frontier remains significant, standing at 13 points.
Open-source weights are now available on HuggingFace under the MIT license, and the quantized weights can also be found on HuggingFace. Subscribers to the GLM Coding Plan can switch to the model string GLM-5.2, and it is also available for free testing on z.AI with some usage limitations.
