Skip to content

Conversation

@brandur
Copy link
Contributor

@brandur brandur commented Jul 5, 2025

As described by #972, it may be possible for huge numbers of snoozes to
blow out a job row's attempted_by array as a job is locked over and
over again. Multiplied by many jobs, this can produce vast quantities of
data that gets sent over the network.

Here, put in a ratchet on attempted_by so that if the array becomes
larger than 100 elements, we knock the oldest one off in favor of the
most recent client and the most fresh 99.

Unfortunately the implementation isn't particularly clean in either
Postgres or SQLite. In Postgres it would've been cleaner if we'd had the
attempted_by in reverse order so the new client was on front because
the built-in array functions would be friendlier to that layout, but
because it's not, we have to do something a little hackier involving a
CASE statement instead.

SQLite is even worse. SQLite has no array functions at all, which
doesn't help, but moreover every strategy I tried ended up blocked by a
sqlc SQLite bug, so after trying everything I could think of, I ended up
having to extract the piece that does the array truncation into a SQL
template string to get this over the line. This could be removed in the
future if any one of a number of outstanding sqlc bugs are fixed (e.g.
[1]).

[1] sqlc-dev/sqlc#3610

@brandur brandur force-pushed the brandur-max-attempted-by-length branch 10 times, most recently from e8c7aec to 55a1f73 Compare July 5, 2025 21:03
@attempted_by::text
)
ELSE array_append(river_job.attempted_by, @attempted_by::text)
END
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgentry I don't love how gross this got. Feel free to try and take a shot at golfing it down a bit.

It's unfortunate that new entries were appended instead of prepended, because if they were prepended, we could do this much more nicely like:

    attempted_by = trim_array(
        array_prepend(river_job.attempted_by, @attempted_by::text),
        array_length(river_job.attempted_by, 1) - @max_attempted_by
    )

Not worth trying to change it at this point though.

We could also use more of a table-based approach like SQLite, but I left this as array functions as I suspect they're more performant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be just a little cleaner than the CASE?

diff --git a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
index df48a3a..9025b95 100644
--- a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
+++ b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
@@ -281,18 +281,14 @@ SET
     state = 'running',
     attempt = river_job.attempt + 1,
     attempted_at = coalesce($1::timestamptz, now()),
-    attempted_by = CASE WHEN array_length(river_job.attempted_by, 1) >= $2::int
-                        THEN array_append(
-                            -- +2 instead of +1 because in one of those history
-                            -- making mistakes that's likely in aggregate cost
-                            -- humanity >$10B in bugs and lost productivity by
-                            -- now, like strings, Postgres array indexing start
-                            -- at 1 instead of 0.
-                            river_job.attempted_by[array_length(river_job.attempted_by, 1) + 2 - $2:],
-                            $3::text
-                        )
-                        ELSE array_append(river_job.attempted_by, $3::text)
-                        END
+    attempted_by = (
+        coalesce(river_job.attempted_by, '{}'::text[])
+        || $2::text
+        -- +2 instead of +1 because in one of those history making mistakes
+        -- that's likely in aggregate cost humanity >$10B in bugs and lost
+        -- productivity by now, like strings, Postgres array indexing start
+        -- at 1 instead of 0.
+    )[ GREATEST(cardinality(river_job.attempted_by) + 2 - $3::int, 1) : ]
 FROM
     locked_jobs
 WHERE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the existing one to flatten it out a bit more, but TBH I have trouble reading this new version, so going to largely leave it as the original.

@brandur brandur requested a review from bgentry July 5, 2025 21:15
Copy link
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea to improve but otherwise LGTM!

@attempted_by::text
)
ELSE array_append(river_job.attempted_by, @attempted_by::text)
END
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be just a little cleaner than the CASE?

diff --git a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
index df48a3a..9025b95 100644
--- a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
+++ b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
@@ -281,18 +281,14 @@ SET
     state = 'running',
     attempt = river_job.attempt + 1,
     attempted_at = coalesce($1::timestamptz, now()),
-    attempted_by = CASE WHEN array_length(river_job.attempted_by, 1) >= $2::int
-                        THEN array_append(
-                            -- +2 instead of +1 because in one of those history
-                            -- making mistakes that's likely in aggregate cost
-                            -- humanity >$10B in bugs and lost productivity by
-                            -- now, like strings, Postgres array indexing start
-                            -- at 1 instead of 0.
-                            river_job.attempted_by[array_length(river_job.attempted_by, 1) + 2 - $2:],
-                            $3::text
-                        )
-                        ELSE array_append(river_job.attempted_by, $3::text)
-                        END
+    attempted_by = (
+        coalesce(river_job.attempted_by, '{}'::text[])
+        || $2::text
+        -- +2 instead of +1 because in one of those history making mistakes
+        -- that's likely in aggregate cost humanity >$10B in bugs and lost
+        -- productivity by now, like strings, Postgres array indexing start
+        -- at 1 instead of 0.
+    )[ GREATEST(cardinality(river_job.attempted_by) + 2 - $3::int, 1) : ]
 FROM
     locked_jobs
 WHERE

@brandur brandur force-pushed the brandur-max-attempted-by-length branch from 55a1f73 to 78c9baa Compare July 7, 2025 23:06
As described by #972, it may be possible for huge numbers of snoozes to
blow out a job row's `attempted_by` array as a job is locked over and
over again. Multiplied by many jobs, this can produce vast quantities of
data that gets sent over the network.

Here, put in a ratchet on `attempted_by` so that if the array becomes
larger than 100 elements, we knock the oldest one off in favor of the
most recent client and the most fresh 99.

Unfortunately the implementation isn't particularly clean in either
Postgres or SQLite. In Postgres it would've been cleaner if we'd had the
`attempted_by` in reverse order so the new client was on front because
the built-in array functions would be friendlier to that layout, but
because it's not, we have to do something a little hackier involving a
`CASE` statement instead.

SQLite is even worse. SQLite has no array functions at all, which
doesn't help, but moreover every strategy I tried ended up blocked by a
sqlc SQLite bug, so after trying everything I could think of, I ended up
having to extract the piece that does the array truncation into a SQL
template string to get this over the line. This could be removed in the
future if any one of a number of outstanding sqlc bugs are fixed (e.g.
[1]).

[1] sqlc-dev/sqlc#3610
@brandur brandur force-pushed the brandur-max-attempted-by-length branch from d7d9cc0 to 90ee8dc Compare July 7, 2025 23:10
@brandur brandur merged commit bec6982 into master Jul 7, 2025
10 checks passed
@brandur brandur deleted the brandur-max-attempted-by-length branch July 7, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants