Building Durable AI Agents with Cloudflare Project Think

On June 16, 2026, Cloudflare shipped Agents SDK v0.16.1 and with it Project Think, a durable runtime for long-running agents.

AI AgenticAI

June 22, 20267 min read12 sectionsBy Ahmed Abdullah

Building Durable AI Agents with Cloudflare Project Think

Introduction

On June 16, 2026, Cloudflare shipped Agents SDK v0.16.1 and with it Project Think, a durable runtime for long-running agents. The headline primitive is the fiber: a unit of agent work that checkpoints its own progress to a local SQLite database and resumes from the last checkpoint after a crash. This tutorial builds a research agent that survives a worker restart mid-run instead of starting the whole job over.

If you have ever shipped an agent that ran for two minutes, hit a deploy, and lost everything it had done, this is the part you were missing.

What durable execution actually means

Durable execution means the progress of a long task is persisted as the task runs, not after it finishes. A normal request is stateless. If the process dies at 80%, the 80% is gone, and the only honest thing the system can do is run all of it again.

Durable execution flips that. The runtime records where the work got to, in storage that outlives the process. When the process comes back, it reads the last recorded point and continues. The unit of work becomes resumable rather than disposable. That is the whole idea, and everything in Project Think is an implementation detail underneath it.

What a fiber is in the Agents SDK

In Project Think, a fiber is a durable invocation that can checkpoint its own instruction pointer. You wrap a block of agent work in runFiber(name, callback). Inside the callback you call ctx.stash(data) to persist intermediate state to the agent's co-located SQLite database. If the worker restarts mid-fiber, the SDK recovers the fiber, fires onFiberRecovered(ctx), and hands you back the stashed state so you can resume from the last checkpoint instead of the beginning.

The Agents SDK runs on Cloudflare Workers backed by Durable Objects, which is why each agent gets its own SQLite instance sitting next to its code. The fiber writes there. No external queue, no separate database to provision.

Scaffold a Think agent

Start with the base class. Think<Env> is the opinionated harness from @cloudflare/think; the one method you must override is getModel(). Install the current packages first.

bash

npm i agents@latest @cloudflare/think@latest ai @cloudflare/shell zod workers-ai-provider

css

// src/agent.ts
import { Think } from "@cloudflare/think";
import { createWorkersAI } from "workers-ai-provider";
interface Env {
AI: Ai;
}
export class ResearchAgent extends Think<Env> {
getModel() {
const workersai = createWorkersAI({ binding: this.env.AI });
return workersai("@cf/meta/llama-3.3-70b-instruct");
}
}

That is a complete, deployable agent with a chat lifecycle already wired in. It is also not yet durable. A long task inside it is still disposable. We fix that next.

Run the work inside a fiber

Wrap the long task in runFiber. The first argument names the fiber so the runtime can find it again after a restart. Everything inside the callback is now a recoverable

typescript

async research(topic: string) {
return this.runFiber("research", async (ctx) => {
const subtopics = await this.plan(topic);
const findings: string[] = [];
for (let i = 0; i < subtopics.length; i++) {
const answer = await this.callModel(subtopics[i]);
findings.push(answer);
}
return findings;
});
}

This runs. It is also still fragile, because the work inside the loop holds its progress only in the findings array in memory. A restart at subtopic four throws away subtopics one through three. The fiber wrapper alone does not save you. The checkpoint does.

Checkpoint progress with ctx.stash

ctx.stash writes the current state to SQLite before the next expensive step. Call it after each unit of progress you would hate to repeat. Here every completed subtopic is stashed with the index, so the recovered run knows exactly how far it got.

typescript

async research(topic: string) {
return this.runFiber("research", async (ctx) => {
const subtopics = await this.plan(topic);
const findings: string[] = [];
for (let i = 0; i < subtopics.length; i++) {
const answer = await this.callModel(subtopics[i]);
findings.push(answer);
// persist after each completed unit, before the next model call
ctx.stash({ topic, step: i + 1, findings });
}
return findings;
});
}

The stash is the load-bearing line. The model call above it is the expensive part; the stash below it is the receipt. After the loop has run four times, SQLite holds four findings and step: 4. That record is what a recovery reads.

Recover after a crash with onFiberRecovered

When the worker restarts mid-fiber, the runtime re-enters through onFiberRecovered. You read the stashed state and resume from step rather than from zero. Recovery in v0.16.1 runs with backoff, so a worker that keeps dying does not hot-loop.

css

async onFiberRecovered(ctx) {
const state = ctx.stashed as {
topic: string;
step: number;
findings: string[];
};
const subtopics = await this.plan(state.topic);
const findings = state.findings;
// resume at the step after the last checkpoint
for (let i = state.step; i < subtopics.length; i++) {
findings.push(await this.callModel(subtopics[i]));
ctx.stash({ topic: state.topic, step: i + 1, findings });
}
return findings;
}

The recovered run does no work twice. It re-plans the subtopics (cheap and deterministic), then picks up at the first index it never finished. Three completed subtopics cost three model calls total across the crash, not six.

Keep long jobs alive past the limits

Workers reclaim idle execution. For a fiber that legitimately runs for minutes, wrap the long stretch in keep alive While so the runtime does not collect it mid task.

typescript

async deepResearch(topic: string) {
return this.runFiber("deep-research", async (ctx) => {
return this.keepAliveWhile(async () => {
const subtopics = await this.plan(topic);
const findings: string[] = [];
for (let i = 0; i < subtopics.length; i++) {
findings.push(await this.callModel(subtopics[i]));
ctx.stash({ topic, step: i + 1, findings });
}
return findings;
});
});
}

A plain Worker request vs a Project Think fiber

Property	Plain Worker request	Project Think fiber
State on crash at 80%	lost, rerun from 0	recovered from last ctx.stash
Where progress lives	in-memory only	co-located SQLite (Durable Object)
Restart behavior	new invocation, no memory	onFiberRecovered with backoff
Long-task survival	subject to idle reclaim	held by keepAliveWhile
Extra infra to set up	a queue and a database	none

What this does not do

A fiber checkpoints state, not side effects. If a model call already sent an email, posted a PR, or charged a card before the crash, recovery will not un-send it, and naive resume logic will do it twice. Idempotency is still your job, not the runtime's. The honest version of durable execution is this: it guarantees your state survives, and it guarantees nothing about whether the outside world can tell the difference between your first attempt and your second.

There is also no magic in the granularity. You only recover to the last ctx.stash you wrote. Stash too rarely and recovery is cheap to write but expensive to run. Stash on every token and you have built a slow agent that is very good at remembering how slow it is

The full working agent

css

// src/agent.ts
import { Think } from "@cloudflare/think";
import { createWorkersAI } from "workers-ai-provider";
interface Env {
AI: Ai;
}
export class ResearchAgent extends Think<Env> {
getModel() {
const workersai = createWorkersAI({ binding: this.env.AI });
return workersai("@cf/meta/llama-3.3-70b-instruct");
}
async plan(topic: string): Promise<string[]> {
const model = this.getModel();
const { text } = await model.doGenerate({
prompt: `List 4 research subtopics for "${topic}". One per line, no numbering.`,
});
return text.split("\n").map((s) => s.trim()).filter(Boolean).slice(0, 4);
}
async callModel(subtopic: string): Promise<string> {
const model = this.getModel();
const { text } = await model.doGenerate({
prompt: `Write one tight paragraph of findings on: ${subtopic}`,
});
return text.trim();
}
async research(topic: string) {
return this.runFiber("research", async (ctx) => {
return this.keepAliveWhile(async () => {
const subtopics = await this.plan(topic);
const findings: string[] = [];
for (let i = 0; i < subtopics.length; i++) {
findings.push(await this.callModel(subtopics[i]));
ctx.stash({ topic, step: i + 1, findings });
}
return findings;
});
});
}
async onFiberRecovered(ctx) {
const state = ctx.stashed as {
topic: string;
step: number;
findings: string[];
};
const subtopics = await this.plan(state.topic);
const findings = state.findings;
for (let i = state.step; i < subtopics.length; i++) {
findings.push(await this.callModel(subtopics[i]));
ctx.stash({ topic: state.topic, step: i + 1, findings });
}
return findings;
}
}

Deploy it with wrangler deploy, bind Workers AI in wrangler.toml, and trigger research() from a fetch handler. Kill the worker mid-run and watch it pick up where it left off.

When to reach for it

Use fibers when the cost of redoing work is real: multi-step research, batch processing, anything where a single run spans more model calls than you want to pay for twice. For a request that finishes in 200 milliseconds, this is overhead you do not need; a plain handler is the right tool and a fiber is a costume. The line is simple. If losing the run halfway through would make you wince, the work wanted to be durable before it crashed, and now it can be.

Keep reading from the journal.

Both halves got worse and the average got better

July 20, 2026

AgenticAI

Both halves got worse and the average got better

Rate-mix decomposition splits every KPI move into what customers did and what the mix did

July 13, 2026

Build a Bulk Product-Image Generation Service with Google Nano Banana 2 Lit

On June 30, 2026, Google released Nano Banana 2 Lite, an image generation model that produces a finished image in about 4 seconds and costs $0.034 per 1,000 images.

July 10, 2026

Build a Self-Hosted Support Ticket Triage Service with Qwen3.5-4B

In late June 2026, vLLM shipped v0.21: speculative decoding support for reasoning models, KV cache offload, and Model Runner V2 becoming the default for dense Llama and Mistral models.