Skip to content

feat: HTTP-proxy LangGraph checkpoint API#146

Open
danielmillerp wants to merge 1 commit intomainfrom
dm/langgraph-setup
Open

feat: HTTP-proxy LangGraph checkpoint API#146
danielmillerp wants to merge 1 commit intomainfrom
dm/langgraph-setup

Conversation

@danielmillerp
Copy link
Collaborator

@danielmillerp danielmillerp commented Feb 8, 2026

What this does

This PR adds backend support for LangGraph checkpoint persistence — the mechanism LangGraph uses to save and restore agent state between messages (conversation history, channel values, pending writes, etc.).

Why we need this

LangGraph agents need to persist their state (checkpoints) to a database. The built-in approach (AsyncPostgresSaver) has each agent connect directly to Postgres with its own connection pool. This doesn't scale — as we spin up more LangGraph agent pods, we'd hit connection limits quickly. This is the same problem we already solved for Temporal: instead of agents talking to the DB directly, they go through the backend API, which uses a shared connection pool.

Why Postgres (not MongoDB)

Even though agent state currently lives in MongoDB, we chose Postgres for checkpoint storage. There have been some reliability concerns around MongoDB recently and there's a potential future migration to Postgres. Keeping new storage in Postgres is more future-forward. The checkpoint tables are independent and don't conflict with existing MongoDB state storage.

How it works

The pattern mirrors what we do with Temporal. The agent doesn't know about the database — it talks to the backend API, and the backend handles the DB operations.

Agent (SDK HttpCheckpointSaver)  →  Backend API (/checkpoints/*)  →  Postgres

On the backend side, we:

  1. Created 4 new Postgres tables via ORM models + Alembic migration (checkpoints, checkpoint_blobs, checkpoint_writes, checkpoint_migrations) — these mirror the schema that LangGraph's own AsyncPostgresSaver uses
  2. Built a repository layer that reimplements the same SQL operations from AsyncPostgresSaver using our SQLAlchemy patterns (composite primary keys, JSONB metadata, upserts via ON CONFLICT)
  3. Exposed 5 POST endpoints under /checkpoints (get-tuple, put, put-writes, list, delete-thread) — one for each method on LangGraph's BaseCheckpointSaver
  4. Added 19 integration tests for the repository layer, running against a real Postgres via testcontainers — covering round-trip storage, blob versioning, pending writes (upsert vs skip), metadata filtering, pagination, thread/namespace isolation, and deletion

Binary blob data (serialized Python objects) is base64-encoded for JSON transport. The actual serialization/deserialization stays in the SDK — the backend just stores and retrieves raw JSONB + bytes.

Companion PRs

Test plan

  • 19 integration tests passing against real Postgres (testcontainers)
  • Manually tested end-to-end: backend + langgraph agent, sent messages, confirmed checkpoint stored and conversation history restored
  • Verified ruff lint + format passes

🤖 Generated with Claude Code

Agents no longer need a direct Postgres connection for LangGraph
checkpointing. Instead, checkpoint operations are proxied through
5 new backend endpoints under /checkpoints (get-tuple, put,
put-writes, list, delete-thread). Binary blob data is base64-encoded
for JSON transport.

Includes ORM models for the 4 checkpoint tables, Alembic migration,
repository with composite-PK queries, use case layer, Pydantic
schemas, and FastAPI routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant