GLM 5.2: The Open-Source Model Taking On GPT-5.5 (June 2026)

Quick Answer:

GLM 5.2 is Z.ai's open-weights flagship, and it is the strongest open-source model yet for coding. It is a Mixture-of-Experts model with roughly 753 billion total parameters (about 40 billion active per token), a usable 1-million-token context window, and weights published on Hugging Face under the permissive MIT licence. On Z.ai's own benchmarks it scores 62.1 on SWE-bench Pro and 74.4 on FrontierSWE, beating GPT-5.5 and closing in on Claude Opus 4.8, whilst its hosted API costs roughly a sixth of the closed frontier models. You can run it locally for free or call the API at £1.10 ($1.40) per million input tokens.

For two years the assumption held: the best model is closed, American, and rented by the token. GLM 5.2 is the clearest sign yet that the assumption is breaking.

This is a full breakdown of Z.ai's June 2026 release: what the model actually is, the real benchmark numbers from the official model card, the MIT licence that lets you download it, how to run it on your own hardware, and exactly where it stands against GPT-5.5, Claude Opus 4.8, Kimi and DeepSeek.

Executive Summary

In mid-June 2026, Z.ai, the Beijing organisation formerly known as Zhipu AI, released GLM 5.2 and put the weights on Hugging Face under the MIT licence. That combination, frontier-class coding scores and an unrestricted open licence, is what made the launch the most-discussed AI story of the month. An open model that beats GPT-5.5 on real software-engineering benchmarks, at roughly a sixth of the cost, is not supposed to exist yet. It does.

Open weights: MIT licence, no regional limits, downloadable from Hugging Face, FP8 variant included.
Architecture: Mixture-of-Experts, roughly 753B total parameters, around 40B active per token.
Context: a usable 1-million-token window, up to about 131,072 output tokens per response.
Headline scores: 62.1 on SWE-bench Pro, 74.4 on FrontierSWE (Dominance), 91.2 on GPQA-Diamond, 99.2 on AIME 2026.
Price: API at £1.10 ($1.40) per million input tokens and £3.50 ($4.40) per million output tokens.

The short version: GLM 5.2 will not dethrone Claude Opus 4.8 on the very hardest coding suites, but it gets close enough that for most teams the price gap and the freedom to self-host matter more than the last few benchmark points.

What GLM 5.2 Is

GLM 5.2 is the latest model in Z.ai's GLM (General Language Model) family, the direct successor to GLM 5.1, which itself had briefly topped open-weights coding leaderboards earlier in 2026. Where many labs build a general chat assistant first and bolt on coding later, Z.ai has explicitly positioned GLM 5.2 the other way round. The official framing is "built for long-horizon tasks": autonomous, multi-step software engineering where the model has to hold a goal across dozens of tool calls without losing the plot.

That focus shows up everywhere. The model ships with two thinking-effort levels, High and Max, so you can trade latency for depth. It is tuned for agentic harnesses and works inside coding tools like Claude Code and other agent frameworks, not just a chat box. And the whole release is organised around the one-million-token context window, because long-horizon work on real codebases means feeding the model an entire repository at once.

The part that turned a strong release into a landmark one is the licence. Z.ai did not gate GLM 5.2 behind an API. It published the actual weights, the numbers that make the model work, on Hugging Face under the MIT licence. That is about as permissive as open licences get: you can download, modify, fine-tune, redistribute and run the model commercially with essentially no strings attached.

Architecture and Training

GLM 5.2 is a Mixture-of-Experts (MoE) model. Reporting and the model card put it at roughly 753 billion total parameters with about 40 billion active per token. In plain terms, the model is enormous in total knowledge but only lights up a small slice of its parameters for any given token, which is how an open model this large can be served at reasonable cost. That sparse design is the standard recipe for frontier open models in 2026, and it is what lets GLM 5.2 compete with dense closed systems on price.

The headline architectural trick is something Z.ai calls IndexShare. The model card describes it as reusing "the same indexer across every four sparse attention layers", which reduces per-token compute (FLOPs) by about 2.9 times at a 1-million-token context length. This matters because a long context window is only useful if the model can actually attend across it without the compute cost exploding. IndexShare is the engineering that turns a one-million-token window from a spec-sheet number into something usable in practice.

Z.ai also published an FP8 variant on Hugging Face. FP8 is a reduced-precision format that roughly halves the memory needed compared with the full-precision weights, lowering the hardware bar for anyone wanting to self-host. As with most labs, Z.ai keeps the precise training data mix and the exact expert count proprietary, but the broad shape (huge sparse MoE, long context, coding-first post-training) is clear and consistent across the official materials.

Benchmarks: The Real Numbers

Z.ai initially launched GLM 5.2 without benchmark numbers, then published a full comparison table on the model card pitting it against GLM 5.1, Qwen3.7-Max, MiniMax M3, DeepSeek-V4-Pro, Claude Opus 4.8, GPT-5.5 and Gemini 3.1 Pro. The figures below are Z.ai's own reported results.

GLM 5.2 benchmark comparison across reasoning, coding and agentic suites. Source: Z.ai official GLM-5.2 model card and GitHub resources.

The coding numbers are where GLM 5.2 makes its case:

SWE-bench Pro: GLM 5.2 62.1, GPT-5.5 58.6, GLM 5.1 58.4, DeepSeek-V4-Pro 55.4, Claude Opus 4.8 69.2.
FrontierSWE (Dominance): GLM 5.2 74.4, GPT-5.5 72.6, Claude Opus 4.8 75.1, DeepSeek-V4-Pro 29.0.
Terminal Bench 2.1 (Terminus-2): GLM 5.2 81.0, GPT-5.5 84, Claude Opus 4.8 85, GLM 5.1 63.5.
ProgramBench: GLM 5.2 63.7, GPT-5.5 70.8, Claude Opus 4.8 71.9, DeepSeek-V4-Pro 47.8.
NL2Repo: GLM 5.2 48.9, GPT-5.5 50.7, Claude Opus 4.8 69.7.

The pattern is consistent: GLM 5.2 edges past GPT-5.5 on the longest-horizon coding tasks (SWE-bench Pro, FrontierSWE) and sits a clear step behind Claude Opus 4.8 on most of them. On reasoning and maths it is genuinely competitive at the top:

GPQA-Diamond: GLM 5.2 91.2, GPT-5.5 93.6, Claude Opus 4.8 93.6, Gemini 3.1 Pro 94.3.
AIME 2026: GLM 5.2 99.2, GPT-5.5 98.3, Gemini 3.1 Pro 98.2, Claude Opus 4.8 95.7.
IMOAnswerBench: GLM 5.2 91.0, Qwen3.7-Max 90, DeepSeek-V4-Pro 89.8, Claude Opus 4.8 83.5.
HLE (Humanity's Last Exam): GLM 5.2 40.5 / 54.7 with tools, against GPT-5.5 41.4 / 52.2.
MCP-Atlas (agentic tool use): GLM 5.2 76.8, GPT-5.5 75.3, Claude Opus 4.8 77.8.

Notably, GLM 5.2 leads the field outright on AIME 2026 and IMOAnswerBench, and it scores 51 on the independent Artificial Analysis Intelligence Index, the top open-weights result reported at launch. Two honest caveats apply, as always. First, these are the vendor's own numbers, run on the vendor's harness, and agentic benchmarks are notoriously sensitive to the scaffolding around them; treat them as directional until independent runs land. Second, GLM 5.2 trails Opus 4.8 on the hardest single-shot coding suites (NL2Repo, ProgramBench, SWE-Marathon), so "beats GPT-5.5" is accurate whilst "beats everything" is not.

Open Weights and Licence

The licence is the headline. GLM 5.2's weights are published on Hugging Face under the MIT licence, described in the model card as "no regional limits, technical access without borders". MIT is one of the most permissive licences in existence. You can use the model commercially, modify it, fine-tune it on your own data, redistribute your version, and embed it in a product, all without paying Z.ai a penny and without a usage cap.

That is a sharp contrast to the closed frontier. GPT-5.5 and Claude Opus 4.8 are API-only: you rent capability by the token and you never hold the weights. With GLM 5.2 you can pull the model down, run it on infrastructure you control, and keep your data on your own machines. For regulated industries, sovereignty-conscious governments and anyone wary of sending sensitive code to a third-party API, that is a decisive advantage, and it is the same argument driving interest in other open coding models like Kimi K2.7 Code and DeepSeek.

One genuine caveat, widely flagged in coverage: the licence covers the weights, not the hosted service. If you call Z.ai's API rather than self-hosting, your prompts travel to servers in China and are subject to that jurisdiction. The open weights are the answer to that concern. If data residency matters, download the model and run it yourself. The MIT licence exists precisely so you can.

How to Run It

Running it locally

Because the weights are open, GLM 5.2 runs on the standard open-model inference stack. The model card lists support for SGLang, vLLM, Transformers, KTransformers and Unsloth, plus Ascend NPU platforms. The quickest path with vLLM is essentially two lines:

pip install vllm vllm serve "zai-org/GLM-5.2"

SGLang is just as simple (python3 -m sglang.launch_server --model-path "zai-org/GLM-5.2"). The honest catch is hardware. A 753B-class MoE is not a laptop model. At full precision it needs a multi-GPU server with a lot of high-bandwidth memory, or an NPU cluster. This is where the FP8 variant matters: it roughly halves the memory footprint, and community quantisations push the bar lower still, but you should still budget for serious kit if you want full-precision quality. For most individuals and small teams, a rented GPU instance or one of the third-party hosts (several inference providers added GLM 5.2 within days of launch) is more practical than buying hardware.

Using the API

If self-hosting is overkill, Z.ai offers a hosted API and a chatbot. The API is OpenAI-compatible, and the model also slots into agentic tools such as Claude Code, so you can point an existing coding workflow at GLM 5.2 with minimal changes. The GLM Coding Plan subscription tiers (Lite, Pro, Max, Team) bundle generous usage for a flat monthly fee, starting around £14 ($18) per month, which is how most developers will trial it before deciding whether to self-host.

Pricing

The pay-as-you-go API pricing is the other half of the story, and it is aggressive. Z.ai prices GLM 5.2 at £1.10 ($1.40) per million input tokens and £3.50 ($4.40) per million output tokens, with cached input as low as around £0.20 ($0.26) per million. That is roughly a sixth of what the closed frontier models charge for comparable coding capability, which is exactly the framing the launch coverage seized on: frontier-adjacent coding for a fraction of the price.

For comparison, premium closed flagships in mid-2026 sit at around £8 ($10) per million input and £40 ($50) per million output. GLM 5.2 undercuts that by close to an order of magnitude on output, the dimension that dominates the bill for generation-heavy agentic coding. Add the option to self-host for free, and the cost argument is the single strongest part of the GLM 5.2 pitch.

The subscription route is cheaper still for steady use. The GLM Coding Plan starts around £14 ($18) per month for the Lite tier, with Pro near £57 ($72) and Max near £127 ($160), each bundling a usage allowance aimed at developers running the model inside a coding agent all day.

How It Compares

Versus GPT-5.5. On the long-horizon coding suites that GLM 5.2 was built for, it wins: 62.1 to 58.6 on SWE-bench Pro, 74.4 to 72.6 on FrontierSWE. On general reasoning the two trade blows (GPT-5.5 leads GPQA-Diamond, GLM 5.2 leads AIME 2026). Factor in the open weights and the price, and GLM 5.2 is the better value for coding-heavy work, whilst GPT-5.5 keeps the edge in polish, ecosystem and multimodal breadth.

Versus Claude Opus 4.8. This is the harder comparison. Opus 4.8 still leads on most coding benchmarks (69.2 vs 62.1 on SWE-bench Pro, 71.9 vs 63.7 on ProgramBench) and on agentic reliability. GLM 5.2 is the value play: close enough on capability that the gap is narrow, miles ahead on cost and openness. If you need the absolute best coding model and budget is no object, Opus 4.8 remains the pick. If you want frontier-adjacent results you can self-host cheaply, GLM 5.2 is compelling.

Versus Kimi and DeepSeek. Within the open-weights field, GLM 5.2 has pulled ahead. It tops DeepSeek-V4-Pro across the board on the model card (62.1 vs 55.4 on SWE-bench Pro, 74.4 vs 29.0 on FrontierSWE) and posts the leading open-weights score on the Artificial Analysis Intelligence Index. Kimi K2.7 Code remains a strong open competitor with its own coding focus, but on the published numbers GLM 5.2 is the new open-weights leader for software engineering. For the wider picture across closed and open models, see our roundup of Claude vs ChatGPT vs Gemini vs Grok.

Limitations

Vendor benchmarks: the headline numbers are Z.ai's own, run on Z.ai's harness. Independent re-runs may move them, in either direction.
Still behind Opus 4.8 on hard coding: on NL2Repo, ProgramBench and SWE-Marathon the gap to Claude Opus 4.8 is real, not cosmetic.
Hardware bar: "free to run" assumes you have multi-GPU infrastructure. Self-hosting a 753B MoE at full precision is not casual, even with the FP8 build.
API data residency: calling the hosted API sends prompts to servers in China. Self-host if data sovereignty is a requirement.
Coding-first focus: the model is optimised for agentic engineering. For multimodal, creative-writing or general chat tasks the closed flagships may still feel more rounded.
Early ecosystem: tooling, quantisations and fine-tunes are maturing fast but were days old at launch, so expect rough edges in third-party integrations.

Who Should Use It

Use GLM 5.2 if you do a lot of agentic or long-horizon coding and you care about cost or control. Teams running coding agents all day will feel the price difference immediately. Organisations with data-residency or sovereignty requirements get a frontier-adjacent model they can run entirely on their own infrastructure. And anyone building on top of an LLM who wants the freedom to fine-tune and ship without a per-token bill or a vendor lock-in now has a genuinely capable option.

Stick with a closed flagship if you need the single best coding model regardless of price (Claude Opus 4.8), the broadest multimodal and ecosystem support (GPT-5.5, Gemini 3.1 Pro), or you simply have no appetite to manage inference infrastructure and the hosted-API data path is a dealbreaker. As always with agentic deployments, run the model with scoped permissions, human checkpoints on irreversible actions, and logging you actually review.

Frequently Asked Questions

What is GLM 5.2?

GLM 5.2 is Z.ai's (Zhipu AI's) flagship open-weights LLM from June 2026: a Mixture-of-Experts model with roughly 753B total parameters, about 40B active per token, a 1-million-token context window and a coding-first design. The weights are on Hugging Face under the MIT licence, so you can download and self-host it for free.

Is GLM 5.2 really open source and free?

Yes. The weights are MIT-licensed with no regional restrictions and an FP8 variant for cheaper inference. You can run it locally with no licence fee; the only costs are your hardware, or the hosted API at £1.10 ($1.40) per million input tokens if you prefer not to self-host.

How does GLM 5.2 compare to GPT-5.5 and Claude Opus 4.8?

It scores 62.1 on SWE-bench Pro, beating GPT-5.5 (58.6) and trailing Claude Opus 4.8 (69.2), and 74.4 on FrontierSWE, ahead of GPT-5.5 and almost level with Opus 4.8. It leads on AIME 2026 (99.2). Broadly it matches or beats GPT-5.5 on coding and maths whilst staying a notch behind Opus 4.8 on the hardest suites, at a fraction of the cost.

What is the context window of GLM 5.2?

One million tokens, with up to roughly 131,072 output tokens per response. Z.ai's IndexShare optimisation cuts per-token compute by about 2.9 times at full length, which makes the long context usable rather than nominal.

How do you run GLM 5.2 locally?

Serve it with vLLM, SGLang, Transformers, KTransformers or Unsloth from the Hugging Face weights, using the FP8 variant to save memory. A 753B MoE still needs multi-GPU or NPU infrastructure at full precision, so many people use the FP8 build, a quantised community build, or the hosted API.

The Bottom Line

GLM 5.2 does not beat every model at everything, and the launch coverage that said it did was running ahead of the numbers. What it does do is more interesting: it puts frontier-adjacent coding capability into a model you can download, fine-tune and run yourself, under one of the most permissive licences going, at roughly a sixth of the price of the closed competition. On long-horizon coding it beats GPT-5.5 outright and gets within touching distance of Claude Opus 4.8.

For the open-source field, it is a clear new leader over DeepSeek and a strong rival to Kimi. For everyone else, it is the moment the "closed models are simply better" argument got a lot harder to make. The closed flagships still hold the very top of the coding charts, but GLM 5.2 has made that lead expensive to insist on.

Free Guide

Get the free guide: Claude vs ChatGPT, Gemini & Grok

A 20-page playbook covering everything you need to choose and use the big four AI models in 2026, full cost and feature comparisons, what each is best (and worst) at, and how-tos for images, vectors, building a website, Claude Code and more.

Last updated: June 2026. This review draws on Z.ai's official GLM-5.2 blog post and Hugging Face model card; benchmark figures are Z.ai's own reported results and may be refined as independent benchmarks land.

GLM 5.2: The Open-Source Model Taking On GPT-5.5