Build an Autonomous Bug-Fixing Agent with GLM-5.2
On June 16, 2026, ZAI released GLM-5.2 under the MIT license, an open-weight model with a 1M-token context that posts 62.1 on SWE-bench Pro and 81.0 on Terminal-Bench 2.1, the highest open-source scores on both.

Introduction
On June 16, 2026, ZAI released GLM-5.2 under the MIT license, an open-weight model with a 1M-token context that posts 62.1 on SWE-bench Pro and 81.0 on Terminal-Bench 2.1, the highest open-source scores on both. You will use it to build an agent that takes a failing pytest, reads the code, proposes a patch, applies it, and re-runs the test until it passes. The whole thing runs against GLM-5.2 through an OpenAI-compatible endpoint, so you can self-host it or call a hosted one without changing the agent code
What GLM-5.2 gives an agent that a chat model does not
A bug-fixing agent is not a single prompt. It is a loop that needs three things from the model: reliable tool calling, a context window big enough to hold the failing test plus the files around it, and enough coding strength that its patches usually compile. GLM-5.2 has all three. The MIT license matters here too, because an agent that edits your codebase is exactly the kind of thing you want running on hardware you control, not shipping your source to an endpoint you do not.
The weights live on Hugging Face under the zai-org organization. You can download and serve them, or hit a hosted GLM-5.2 endpoint while you prototype. The agent code below does not know or care which one it is talking to.
What an autonomous bug-fixing loop actually is
The loop is the same one every coding agent runs, under the academic name ReAct: reason, act, observe, repeat. The model reasons about why a test fails, acts by calling a tool (read a file, write a patch, run the tests), observes the result, and decides the next move. There is no magic. The intelligence is in the model's choices; the structure is a while loop with a list of tools and a stopping rule.
Point an OpenAI-compatible client at GLM-5.2
GLM-5.2 speaks the OpenAI chat-completions protocol, so the standard openai Python SDK talks to it directly. Set the base URL to your GLM endpoint and you are done.
# client.py
import os
from openai import OpenAI
# Self-hosted: run `vllm serve zai-org/GLM-5.2 --port 8000` first,
# then base_url = "http://localhost:8000/v1".
# Hosted: point base_url at your GLM-5.2 provider.
client = OpenAI(
base_url=os.environ.get("GLM_BASE_URL", "http://localhost:8000/v1"),
api_key=os.environ.get("GLM_API_KEY", "not-needed-for-local"),
)
MODEL = "zai-org/GLM-5.2"That is the only line that changes between self-hosting on vLLM and calling a managed GLM-5.2. Everything downstream is the same.
Define the tools the agent is allowed to use
The agent can only do what you let it. Give it three tools: read a file, write a file, and run the test suite. Each is a plain Python function plus a JSON schema the model sees.
# tools.py
import subprocess
from pathlib import Path
ROOT = Path("workspace").resolve() # the agent never escapes this dir
def read_file(path: str) -> str:
target = (ROOT / path).resolve()
target.relative_to(ROOT) # raises if path escapes ROOT
return target.read_text()
def write_file(path: str, content: str) -> str:
target = (ROOT / path).resolve()
target.relative_to(ROOT)
target.write_text(content)
return f"wrote {len(content)} chars to {path}"
def run_tests() -> str:
result = subprocess.run(
["python", "-m", "pytest", "-x", "-q"],
cwd=ROOT, capture_output=True, text=True, timeout=120,
)
return (result.stdout + result.stderr)[-4000:] # tail is where failures liveNotice target.relative_to(ROOT). It throws the moment a path tries to climb out of the workspace with ../. An agent that can write files is an agent that can write the wrong files, and the sandbox boundary is not a nicety.
Describe the tools to GLM-5.2
The model needs the schema, in OpenAI tool format, to know what it can call.
# schemas.py
TOOLS = [
{"type": "function", "function": {
"name": "read_file",
"description": "Read a file from the workspace.",
"parameters": {"type": "object", "properties": {
"path": {"type": "string"}}, "required": ["path"]}}},
{"type": "function", "function": {
"name": "write_file",
"description": "Overwrite a file in the workspace with new content.",
"parameters": {"type": "object", "properties": {
"path": {"type": "string"}, "content": {"type": "string"}},
"required": ["path", "content"]}}},
{"type": "function", "function": {
"name": "run_tests",
"description": "Run the pytest suite and return the output.",
"parameters": {"type": "object", "properties": {}}}},
]The descriptions are short on purpose. The model does not need prose; it needs to know the name, the arguments, and what comes back.
Build the agent loop
Here is the ReAct loop. The model talks, you execute the tool it asked for, you feed the result back, and you repeat until the tests pass or you hit the cap.
# agent.py
import json
from client import client, MODEL
from schemas import TOOLS
import tools
DISPATCH = {"read_file": tools.read_file,
"write_file": tools.write_file,
"run_tests": tools.run_tests}
def fix_bug(task: str, max_steps: int = 12) -> bool:
messages = [
{"role": "system", "content":
"You fix failing tests. Read the code, write a patch, run the "
"tests. Repeat until they pass. Change as little as possible."},
{"role": "user", "content": task},
]
for step in range(max_steps):
reply = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS, temperature=0,
)
msg = reply.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return False # model gave up or just chatted
for call in msg.tool_calls:
args = json.loads(call.function.arguments or "{}")
output = DISPATCH[call.function.name](**args)
messages.append({"role": "tool",
"tool_call_id": call.id, "content": output})
if call.function.name == "run_tests" and "passed" in output \
and "failed" not in output:
return True # green, we are done
return FalseThe stop condition is the part to get right. The loop ends when run_tests reports passed and not failed, or when it burns through max_steps. Without that step cap, a model that cannot solve the bug will keep editing forever, and you will keep paying for it.
GLM-5.2 self-hosted on vLLM vs a hosted GLM-5.2 endpoint
| Dimension | Self-host on vLLM | Hosted GLM-5.2 API |
|---|---|---|
| Source privacy | Code never leaves your box | Code is sent to a provider |
| Setup | GPU, weights download, vLLM | An API key |
| Cost shape | Fixed hardware cost | Per-token |
| Latency | Yours to tune | Provider-dependent |
| Best for | Agents editing private repos | Prototyping, low volume |
The agent code is identical either way. That is the whole point of pinning to the OpenAI-compatible protocol: the deployment decision stays a deployment decision and never leaks into your logic.
What this agent does not do
It does not understand your codebase the way a senior engineer does. It fixes the test in front of it, which means a model that makes the test pass by deleting the assertion has technically succeeded and actually failed you. It carries no concept of regressions beyond the suite you give it, so a thin test set lets it break things it was never asked to check. It does not review its own diff for security, and a patch that hardcodes a credential will sail straight through a passing test. Run it on a branch, never on a dirty working tree, and read every diff before you merge. The loop is autonomous; your judgment is not optional.
The full working example
# run.py -> python run.py
import json
import subprocess
from pathlib import Path
from openai import OpenAI
ROOT = Path("workspace").resolve()
client = OpenAI(base_url="http://localhost:8000/v1", api_key="local")
MODEL = "zai-org/GLM-5.2"
def read_file(path):
t = (ROOT / path).resolve(); t.relative_to(ROOT); return t.read_text()
def write_file(path, content):
t = (ROOT / path).resolve(); t.relative_to(ROOT); t.write_text(content)
return f"wrote {path}"
def run_tests():
r = subprocess.run(["python", "-m", "pytest", "-x", "-q"], cwd=ROOT,
capture_output=True, text=True, timeout=120)
return (r.stdout + r.stderr)[-4000:]
DISPATCH = {"read_file": read_file, "write_file": write_file,
"run_tests": run_tests}
TOOLS = [
{"type": "function", "function": {"name": "read_file",
"description": "Read a workspace file.", "parameters": {"type": "object",
"properties": {"path": {"type": "string"}}, "required": ["path"]}}},
{"type": "function", "function": {"name": "write_file",
"description": "Overwrite a workspace file.", "parameters": {"type": "object",
"properties": {"path": {"type": "string"}, "content": {"type": "string"}},
"required": ["path", "content"]}}},
{"type": "function", "function": {"name": "run_tests",
"description": "Run pytest.", "parameters": {"type": "object",
"properties": {}}}},
]
def fix_bug(task, max_steps=12):
messages = [
{"role": "system", "content": "You fix failing tests. Read, patch, "
"run, repeat until green. Change as little as possible."},
{"role": "user", "content": task}]
for _ in range(max_steps):
msg = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS,
temperature=0).choices[0].message
messages.append(msg)
if not msg.tool_calls:
return False
for call in msg.tool_calls:
args = json.loads(call.function.arguments or "{}")
out = DISPATCH[call.function.name](**args)
messages.append({"role": "tool", "tool_call_id": call.id,
"content": out})
if call.function.name == "run_tests" and "passed" in out \
and "failed" not in out:
return True
return False
if __name__ == "__main__":
ok = fix_bug("The test in test_parser.py fails. Find and fix the bug.")
print("FIXED" if ok else "COULD NOT FIX")Start vLLM with vllm serve zai-org/GLM-5.2 --port 8000, drop a broken module and a failing test_parser.py into workspace/, and run python run.py. The agent reads, patches, and re-runs until the suite is green or it gives up after twelve steps.
When to reach for this
This pattern earns its place on well-tested code with narrow, reproducible failures: a flaky parser, an off-by-one, a regression a test already caught. It is a poor fit for a vague "make the app better," because the agent only knows what the suite tells it. Point GLM-5.2 at a bug with a test that defines done, and the loop closes it while you read something else.
You might also like
Keep reading from the journal.
June 30, 2026AI
Generate four thousand ad variants. Govern them like one.
When making is free, governing is the whole game
June 30, 2026Coding
Encode the playbook, not just the contract
Put the firm's positions where a system can apply them
June 30, 2026AI
Audit the hiring model before a regulator does
Accurate and unfair are not the same number