riverlog total size cap, perf and allocation optimizations by bgentry · Pull Request #1157 · riverqueue/river

bgentry · 2026-02-25T01:38:32Z

A user reported high transient memory usage and slow append cycles when using riverlog when the total size of its jsonb array grew large (especially in snooze-heavy jobs). It turns out we were doing a lot of highly inefficient stuff in here with lots of low-hanging fruit for optimization. To start, I added a benchmark to validate the perf troubles and potential solutions.

First, the biggest user-facing change is the new total history cap: MiddlewareConfig.MaxTotalBytes (default 8 MB). When appending a new river:log entry would exceed that limit, the middleware now prunes oldest entries first while always keeping the newest entry. That directly addresses unbounded log-history growth for long-lived/snoozed jobs.

Internally, the implementation was first refactored from its original struct-unmarshal/append logic to json.RawMessage; however this turned out to still be quite slow and allocation-heavy, so I did another pass leveraging gjson/sjson which was far better (results below).

A final round of optimization improved this significantly more using a one-pass capped-array builder which marshals only the new entry, computes exactly how much existing suffix can be kept, and emits the final JSON array once instead of append-then-repeated-delete loops. It also truncates log bytes before converting to string, so oversized buffers are not copied in full first.

Coverage was expanded with focused unit tests around append/prune semantics (missing key, non-array metadata, minimal pruning, newest-entry retention when over cap, and no-cap behavior).

In the end, we dramatically reduced the duration of these operations as well as the number of allocations, at the cost of a small increase in the total memory overhead per operation.

Stage	Change	256KB ns/op	256KB B/op	256KB allocs/op	2MB ns/op	2MB B/op	2MB allocs/op
Baseline (`23921c2`)	Original append path + benchmark harness	1,331,725	703,634	294	8,348,696	6,441,894	2,100
Interim (`ecb718b`)	Internal []json.RawMessage path	1,562,610 (+17.3%)	674,228 (-4.2%)	292 (-0.7%)	12,557,009 (+50.4%)	6,433,251 (-0.1%)	2,097 (-0.1%)
Interim (`c87f47e`)	gjson/sjson append+prune	627,022 (-52.9%)	1,166,917 (+65.8%)	24 (-91.8%)	4,397,872 (-47.3%)	6,480,502 (+0.6%)	30 (-98.6%)
Final (`5896050`)	One-pass suffix selection + single-build output	163,320 (-87.7%)	884,066 (+25.6%)	31 (-89.5%)	1,207,252 (-85.5%)	7,006,968 (+8.8%)	46 (-97.8%)

Large `river:log` metadata can drive significant allocation spikes while an attempt appends one more log entry. Before changing behavior, we need an inexpensive way to measure the hot path and compare alternatives. Add focused benchmarks in riverlog that exercise middleware append behavior with prebuilt 256 KB and 2 MB metadata payloads. The benchmark stays in-process (no DB setup), reports allocations, and skips in testing.Short() so it remains safe for quick CI-oriented runs.

riverlog/river_log.go

Snoozed jobs can run repeatedly without consuming attempts, so river:log entries may grow without bound over long-running jobs. Large metadata payloads then amplify memory use on each future append. Add a new `MiddlewareConfig` field `MaxTotalBytes` to bound the total serialized `river:log` payload. When the new attempt is appended and the result exceeds the cap, oldest entries are dropped first while always retaining the newest entry. The middleware now defaults `MaxTotalBytes` to 8 MB. New tests verify that oldest entries are pruned when the cap is reached, a tiny cap still retains the newest entry, and the default is applied.

Appending a new river:log entry previously decoded the whole historical array into []logAttempt and then re-encoded it. For large metadata this creates substantial transient allocations for strings and slices. Switch middleware internals to decode river:log as []json.RawMessage. Each run now marshals only the newest entry and re-serializes the raw entry array while preserving the existing total-cap pruning behavior. This removes per-entry struct decoding from the append path.

The raw-message iteration still rebuilt log arrays through json marshaling, which left append performance significantly worse than the baseline implementation used on master. Switch riverlog history to gjson/sjson so operations stay on raw JSON arrays. The middleware reads only `river:log` with `gjson`, appends with `sjson.SetRawBytes` using `-1`, and enforces `MaxTotalBytes` by deleting oldest entries with `sjson.DeleteBytes("0")`. Comparison for `BenchmarkMiddlewareWorkAppend` (`-benchtime=100ms`): case: 256 KB impl ns/op B/op allocs/op baseline 976052 645856 291 rawmessage 1605106 698233 293 sjson 614433 1167265 25 case: 2 MB impl ns/op B/op allocs/op baseline 7762435 5920954 2096 rawmessage 12516198 6651551 2098 sjson 4485274 6482082 32 The pruning test changed only to avoid a brittle exact-byte boundary. `MaxTotalBytes` is now set slightly above the two-entry payload so the test consistently verifies the intended behavior: appending a third entry drops the oldest and keeps the newest two.

Appending to large river:log histories still did extra work in the sjson version: it built an appended array first, then repeatedly deleted index 0 while over cap. In high-volume histories this caused multiple full-array rewrites before reaching the final payload. Rework append/prune to compute the kept suffix first and then build the final array once. The new flow marshals only the new entry, derives bounds for existing array elements, decides how many oldest entries to drop to fit MaxTotalBytes, and emits a single final JSON array. It also truncates log bytes before converting to string so oversized log buffers are not copied in full. Benchmark comparison for `BenchmarkMiddlewareWorkAppend` (`-benchtime=100ms`, same machine): case: 256 KB impl ns/op B/op allocs/op before 821178 1167325 25 after 177355 884454 31 case: 2 MB impl ns/op B/op allocs/op before 4854185 6479606 29 after 1247496 7006818 46 Added focused unit coverage for appendLogDataWithCap, including missing-key behavior, non-array metadata errors, minimal oldest-first pruning, keeping newest entry when over cap, and no-cap behavior.

brandur

Looks great man! Really nice to see the implementation get swapped out like this for something more optimal while the API stays 99% the same and is non-breaking.

bgentry · 2026-02-25T04:15:50Z

CHANGELOG.md


+### Changed
+
+- `riverlog.Middleware` now supports `MiddlewareConfig.MaxTotalBytes` (default 8 MB) to cap total persisted `river:log` history per job. When the cap is exceeded, oldest log entries are dropped first while retaining the newest entry. Values over 64 MB are clamped to 64 MB. [PR #1157](https://github.com/riverqueue/river/pull/1157).


@brandur I should have called this out earlier, does this 8MB default seem reasonable to you? Do you feel it should be called out as "breaking" or is "changed" sufficient?

We can tweak it in a follow up before release if needed.

Yeah, I saw it in there so I can't complain. I could see it potentially on the low side, but it really depends on the use case. How did you pick 8 MB anyway?

Just re-reading this, we might want to update this comment on MaxSizeBytes and call out that you'll likely hit MaxTotalBytes way before you hit something like 50 MB:

// Be careful with this number because the maximum total log size is equal // to maximum number of attempts multiplied by this number (each attempt's // logs are kept separately). For example, 25 * 2 MB = 50 MB maximum // theoretical log size. Log data goes into metadata which is a JSONB field, // and JSONB fields have a maximum size of 255 MB, so any number larger than // 255 divided by maximum number of attempts may cause serious operational // problems.

Sroose · 2026-02-25T10:09:03Z

Hi,

Thank you for the very quick response!

We tested the changes from this PR against our real-world case (a long-lived snoozed job with ~6000 log entries, ~17MB of river:log in metadata, snoozing every 2 minutes).
Unfortunately we see no meaningful improvement.

Numbers

We ran go tool pprof heap profiles after 7 snooze iterations (~12 minutes) on both v0.28.0 and a build with this PR applied.

inuse_space:

	v0.28.0	PR #1157
`iobufpool` (pgx sync.Pool)	112 MB (50%)	128 MB (58%)
`pgtype.encodePlan...Encode`	58.5 MB (26%)	66.5 MB (30%)
`bytes.growSlice`	16 MB	0.5 MB
`json.Marshal`	16 MB	-
`riverlog.buildAppendedArray`	-	8 MB
Total	224 MB	219 MB

alloc_space: 3,300 MB (old) vs 3,412 MB (new) - similar with the fix.

The json.Unmarshal / json.Marshal allocations inside riverlog are gone, which is good. But they've been replaced by gjson.ParseBytes (129 MB) + gjson.getBytes (129 MB) + buildAppendedArray (112 MB), so the net effect within alloc_space is roughly a wash.

More importantly, the two dominant allocators - iobufpool and pgtype.Encode - are completely unchanged. Together they account for ~88% of inuse in both old and new.

Why

The PR optimises step 3 of what's essentially a 7-step amplification chain per snooze cycle:

JobGetAvailable fetches the job, including the full metadata column → pgx reads ~17 MB from the wire into iobufpool
pgtype.Scan copies the metadata bytes out of the wire buffer → another ~17 MB
riverlog parses the river:log array → this is what the PR fixes ✅
riverlog builds the new capped array → buildAppendedArray, still ~8 MB (MaxTotalBytes default)
job_executor.reportResult calls json.Marshal(metadataUpdates) on the full update map (containing the log array), then sjson.SetBytes to add the snooze counter → copies the whole blob again
pgx encodes the UPDATE Bind message with metadata_updates → pgproto3.(*Bind).Encode at ~112 MB in alloc_space
The UPDATE query has RETURNING ... metadata → pgx reads the full merged metadata back from the wire → another iobufpool allocation

Steps 1, 2, 5, 6 and 7 are untouched. The iobufpool sync.Pool retains peak-allocated buffers - once it allocates 17 MB for the first read it never shrinks, since the job keeps reusing the pool every 2 minutes.

Even after MaxTotalBytes trims the log down to 8 MB, that's still a large blob being round-tripped through pgx on every cycle. And the pool buffers sized for the original 17 MB read stick around.

Full pprof output

v0.28.0 (old)

Type: inuse_space - 224.22MB total
     112MB 49.95%  iobufpool.init.0.func1
   58.53MB 26.10%  pgtype.encodePlanJSONCodecEitherFormatByteSlice.Encode
      16MB  7.14%  bytes.growSlice
   15.98MB  7.13%  encoding/json.Marshal

Type: alloc_space - 3300.24MB total
  337.80MB 10.24%  pgtype.scanPlanAnyTextToBytes.Scan
     304MB  9.21%  iobufpool.init.0.func1
  225.25MB  6.83%  bytes.growSlice
  225.13MB  6.82%  encoding/json.Marshal
  192.44MB  5.83%  pgtype.encodePlanJSONCodecEitherFormatByteSlice.Encode
  132.23MB  4.01%  json.(*decodeState).literalStore
  114.70MB  3.48%  json.unquoteBytes
  111.81MB  3.39%  sjson.appendRawPaths
  111.80MB  3.39%  pgproto3.(*Bind).Encode

PR #1157 (new)

Type: inuse_space - 219.09MB total
     128MB 58.42%  iobufpool.init.0.func1
   66.48MB 30.35%  pgtype.encodePlanJSONCodecEitherFormatByteSlice.Encode
       8MB  3.65%  riverlog.buildAppendedArray
    0.51MB  0.23%  bytes.growSlice

Type: alloc_space - 3412MB total
  371.58MB 10.89%  pgtype.scanPlanAnyTextToBytes.Scan
     336MB  9.85%  iobufpool.init.0.func1
  180.88MB  5.30%  pgtype.encodePlanJSONCodecEitherFormatByteSlice.Encode
  128.79MB  3.77%  gjson.ParseBytes
  128.79MB  3.77%  gjson.getBytes
  124.17MB  3.64%  bytes.growSlice
  113.28MB  3.32%  encoding/json.Marshal
     112MB  3.28%  riverlog.buildAppendedArray
  111.78MB  3.28%  sjson.appendRawPaths
  111.76MB  3.28%  pgproto3.(*Bind).Encode

We have currently applied a workaround for our case (which is less logging and running less frequently), so this is not a blocker. But it would be nice to see the memory amplification being resolved eventually.

bertpersyn-nxh · 2026-02-25T12:42:05Z

Thanks @Sroose for the detailled output :)

@bgentry and @brandur, thanks for the quick response and fix.

For users of the riverqueue go sdk that do not want to use the logs at runtime (opt-out?), are there any postgres and jsonb strategies that append logs (concat) + trim when updating the river_job? This could essentially remove the whole memory issue on the client side.

An (quick and non production ready) example:

CREATE TEMP TABLE job_monitor (
                                  id serial PRIMARY KEY,
                                  metadata jsonb DEFAULT '{"logs": []}'::jsonb
);

INSERT INTO job_monitor (metadata)
VALUES ('{"status": "running", "logs": [{"m": "Start proces", "t": 1700000000}]}');

-- 3. "Atomic Append & Rotate" Update
UPDATE job_monitor
SET metadata = jsonb_set(
        metadata,
        '{logs}',
        (
            SELECT jsonb_agg(elem ORDER BY idx)  -- preserve chronological order (oldest->newest) in stored array
        FROM (
            SELECT
                elem,
                idx,
                sum(length(elem::text)) OVER (ORDER BY idx DESC) AS total_bytes
            FROM unnest(
                array_append(
                    ARRAY(SELECT jsonb_array_elements(metadata->'logs')),
                    jsonb_build_object('m', 'Some text and some other text and some more text' || random(), 't', CURRENT_TIMESTAMP)::jsonb
                )
            ) WITH ORDINALITY AS t(elem, idx)
        ) sub
        WHERE total_bytes < 500
    )
)
WHERE id = 1;

SELECT
    pg_column_size(metadata) as storage_size_bytes,
    jsonb_pretty(metadata) as content
FROM job_monitor;

Execute step 3 a couple of times and the logs should rotate, so it stays at +~ 500 bytes of data.

bgentry · 2026-02-25T15:25:28Z

@Sroose @bertpersyn-nxh I made another optimization in #1159 that lowers memory overhead for any snooze metadata updates when the metadata payload is large.

Beyond this, I'm not sure how much we can improve things without a significant re-architecture where we either don't store all these things in metadata or at least don't have to read/write them fully on every job update where some fields need adjustment. Most of the remaining overhead is in pgx itself from the sheer cost of writing those byte slices. Let me know if you have other ideas that seem workable!

github-advanced-security bot found potential problems Feb 25, 2026

View reviewed changes

riverlog/river_log.go Fixed Show fixed Hide fixed

riverlog/river_log.go Fixed Show fixed Hide fixed

bgentry force-pushed the bg/riverlog-log-cap-perf branch from 5896050 to 6da39e2 Compare February 25, 2026 02:00

bgentry added 3 commits February 24, 2026 20:06

bgentry force-pushed the bg/riverlog-log-cap-perf branch from 6da39e2 to bb18bd1 Compare February 25, 2026 02:14

bgentry force-pushed the bg/riverlog-log-cap-perf branch from bb18bd1 to 020e1db Compare February 25, 2026 02:20

bgentry requested a review from brandur February 25, 2026 02:21

brandur approved these changes Feb 25, 2026

View reviewed changes

bgentry commented Feb 25, 2026

View reviewed changes

bgentry merged commit 9be9aff into master Feb 25, 2026
24 of 25 checks passed

bgentry deleted the bg/riverlog-log-cap-perf branch February 25, 2026 04:16

bgentry mentioned this pull request Feb 25, 2026

optimize snooze metadata updates and add benches #1159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riverlog total size cap, perf and allocation optimizations#1157

riverlog total size cap, perf and allocation optimizations#1157
bgentry merged 5 commits intomasterfrom
bg/riverlog-log-cap-perf

bgentry commented Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

brandur left a comment

Uh oh!

bgentry Feb 25, 2026

Uh oh!

brandur Feb 25, 2026

Uh oh!

Uh oh!

Sroose commented Feb 25, 2026

Uh oh!

bertpersyn-nxh commented Feb 25, 2026

Uh oh!

bgentry commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		### Changed

		- `riverlog.Middleware` now supports `MiddlewareConfig.MaxTotalBytes` (default 8 MB) to cap total persisted `river:log` history per job. When the cap is exceeded, oldest log entries are dropped first while retaining the newest entry. Values over 64 MB are clamped to 64 MB. [PR #1157](https://github.com/riverqueue/river/pull/1157).

Conversation

bgentry commented Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

brandur left a comment

Choose a reason for hiding this comment

Uh oh!

bgentry Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

brandur Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Sroose commented Feb 25, 2026

Numbers

Why

Full pprof output

Uh oh!

bertpersyn-nxh commented Feb 25, 2026

Uh oh!

bgentry commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants