riverlog total size cap, perf and allocation optimizations#1157
riverlog total size cap, perf and allocation optimizations#1157
Conversation
Large `river:log` metadata can drive significant allocation spikes while an attempt appends one more log entry. Before changing behavior, we need an inexpensive way to measure the hot path and compare alternatives. Add focused benchmarks in riverlog that exercise middleware append behavior with prebuilt 256 KB and 2 MB metadata payloads. The benchmark stays in-process (no DB setup), reports allocations, and skips in testing.Short() so it remains safe for quick CI-oriented runs.
5896050 to
6da39e2
Compare
Snoozed jobs can run repeatedly without consuming attempts, so river:log entries may grow without bound over long-running jobs. Large metadata payloads then amplify memory use on each future append. Add a new `MiddlewareConfig` field `MaxTotalBytes` to bound the total serialized `river:log` payload. When the new attempt is appended and the result exceeds the cap, oldest entries are dropped first while always retaining the newest entry. The middleware now defaults `MaxTotalBytes` to 8 MB. New tests verify that oldest entries are pruned when the cap is reached, a tiny cap still retains the newest entry, and the default is applied.
Appending a new river:log entry previously decoded the whole historical array into []logAttempt and then re-encoded it. For large metadata this creates substantial transient allocations for strings and slices. Switch middleware internals to decode river:log as []json.RawMessage. Each run now marshals only the newest entry and re-serializes the raw entry array while preserving the existing total-cap pruning behavior. This removes per-entry struct decoding from the append path.
The raw-message iteration still rebuilt log arrays through json
marshaling, which left append performance significantly worse than the
baseline implementation used on master.
Switch riverlog history to gjson/sjson so operations stay on raw JSON
arrays. The middleware reads only `river:log` with `gjson`, appends with
`sjson.SetRawBytes` using `-1`, and enforces `MaxTotalBytes` by deleting
oldest entries with `sjson.DeleteBytes("0")`.
Comparison for `BenchmarkMiddlewareWorkAppend` (`-benchtime=100ms`):
case: 256 KB
impl ns/op B/op allocs/op
baseline 976052 645856 291
rawmessage 1605106 698233 293
sjson 614433 1167265 25
case: 2 MB
impl ns/op B/op allocs/op
baseline 7762435 5920954 2096
rawmessage 12516198 6651551 2098
sjson 4485274 6482082 32
The pruning test changed only to avoid a brittle exact-byte boundary.
`MaxTotalBytes` is now set slightly above the two-entry payload so the
test consistently verifies the intended behavior: appending a third
entry drops the oldest and keeps the newest two.
6da39e2 to
bb18bd1
Compare
Appending to large river:log histories still did extra work in the sjson version: it built an appended array first, then repeatedly deleted index 0 while over cap. In high-volume histories this caused multiple full-array rewrites before reaching the final payload. Rework append/prune to compute the kept suffix first and then build the final array once. The new flow marshals only the new entry, derives bounds for existing array elements, decides how many oldest entries to drop to fit MaxTotalBytes, and emits a single final JSON array. It also truncates log bytes before converting to string so oversized log buffers are not copied in full. Benchmark comparison for `BenchmarkMiddlewareWorkAppend` (`-benchtime=100ms`, same machine): case: 256 KB impl ns/op B/op allocs/op before 821178 1167325 25 after 177355 884454 31 case: 2 MB impl ns/op B/op allocs/op before 4854185 6479606 29 after 1247496 7006818 46 Added focused unit coverage for appendLogDataWithCap, including missing-key behavior, non-array metadata errors, minimal oldest-first pruning, keeping newest entry when over cap, and no-cap behavior.
bb18bd1 to
020e1db
Compare
brandur
left a comment
There was a problem hiding this comment.
Looks great man! Really nice to see the implementation get swapped out like this for something more optimal while the API stays 99% the same and is non-breaking.
|
|
||
| ### Changed | ||
|
|
||
| - `riverlog.Middleware` now supports `MiddlewareConfig.MaxTotalBytes` (default 8 MB) to cap total persisted `river:log` history per job. When the cap is exceeded, oldest log entries are dropped first while retaining the newest entry. Values over 64 MB are clamped to 64 MB. [PR #1157](https://github.com/riverqueue/river/pull/1157). |
There was a problem hiding this comment.
@brandur I should have called this out earlier, does this 8MB default seem reasonable to you? Do you feel it should be called out as "breaking" or is "changed" sufficient?
We can tweak it in a follow up before release if needed.
There was a problem hiding this comment.
Yeah, I saw it in there so I can't complain. I could see it potentially on the low side, but it really depends on the use case. How did you pick 8 MB anyway?
Just re-reading this, we might want to update this comment on MaxSizeBytes and call out that you'll likely hit MaxTotalBytes way before you hit something like 50 MB:
// Be careful with this number because the maximum total log size is equal
// to maximum number of attempts multiplied by this number (each attempt's
// logs are kept separately). For example, 25 * 2 MB = 50 MB maximum
// theoretical log size. Log data goes into metadata which is a JSONB field,
// and JSONB fields have a maximum size of 255 MB, so any number larger than
// 255 divided by maximum number of attempts may cause serious operational
// problems.|
Hi, Thank you for the very quick response! We tested the changes from this PR against our real-world case (a long-lived snoozed job with ~6000 log entries, ~17MB of NumbersWe ran inuse_space:
alloc_space: 3,300 MB (old) vs 3,412 MB (new) - similar with the fix. The More importantly, the two dominant allocators - WhyThe PR optimises step 3 of what's essentially a 7-step amplification chain per snooze cycle:
Steps 1, 2, 5, 6 and 7 are untouched. The Even after Full pprof outputv0.28.0 (old)PR #1157 (new)We have currently applied a workaround for our case (which is less logging and running less frequently), so this is not a blocker. But it would be nice to see the memory amplification being resolved eventually. |
|
Thanks @Sroose for the detailled output :) @bgentry and @brandur, thanks for the quick response and fix. For users of the riverqueue go sdk that do not want to use the logs at runtime (opt-out?), are there any postgres and jsonb strategies that append logs (concat) + trim when updating the river_job? This could essentially remove the whole memory issue on the client side. An (quick and non production ready) example: Execute step 3 a couple of times and the logs should rotate, so it stays at +~ 500 bytes of data. |
|
@Sroose @bertpersyn-nxh I made another optimization in #1159 that lowers memory overhead for any snooze metadata updates when the metadata payload is large. Beyond this, I'm not sure how much we can improve things without a significant re-architecture where we either don't store all these things in metadata or at least don't have to read/write them fully on every job update where some fields need adjustment. Most of the remaining overhead is in pgx itself from the sheer cost of writing those byte slices. Let me know if you have other ideas that seem workable! |
A user reported high transient memory usage and slow append cycles when using
riverlogwhen the total size of its jsonb array grew large (especially in snooze-heavy jobs). It turns out we were doing a lot of highly inefficient stuff in here with lots of low-hanging fruit for optimization. To start, I added a benchmark to validate the perf troubles and potential solutions.First, the biggest user-facing change is the new total history cap:
MiddlewareConfig.MaxTotalBytes(default 8 MB). When appending a newriver:logentry would exceed that limit, the middleware now prunes oldest entries first while always keeping the newest entry. That directly addresses unbounded log-history growth for long-lived/snoozed jobs.Internally, the implementation was first refactored from its original struct-unmarshal/append logic to
json.RawMessage; however this turned out to still be quite slow and allocation-heavy, so I did another pass leveraginggjson/sjsonwhich was far better (results below).A final round of optimization improved this significantly more using a one-pass capped-array builder which marshals only the new entry, computes exactly how much existing suffix can be kept, and emits the final JSON array once instead of append-then-repeated-delete loops. It also truncates log bytes before converting to string, so oversized buffers are not copied in full first.
Coverage was expanded with focused unit tests around append/prune semantics (missing key, non-array metadata, minimal pruning, newest-entry retention when over cap, and no-cap behavior).
In the end, we dramatically reduced the duration of these operations as well as the number of allocations, at the cost of a small increase in the total memory overhead per operation.