-
Notifications
You must be signed in to change notification settings - Fork 132
Add maximum bound for attempted_by array
#974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e8c7aec to
55a1f73
Compare
| @attempted_by::text | ||
| ) | ||
| ELSE array_append(river_job.attempted_by, @attempted_by::text) | ||
| END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgentry I don't love how gross this got. Feel free to try and take a shot at golfing it down a bit.
It's unfortunate that new entries were appended instead of prepended, because if they were prepended, we could do this much more nicely like:
attempted_by = trim_array(
array_prepend(river_job.attempted_by, @attempted_by::text),
array_length(river_job.attempted_by, 1) - @max_attempted_by
)Not worth trying to change it at this point though.
We could also use more of a table-based approach like SQLite, but I left this as array functions as I suspect they're more performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be just a little cleaner than the CASE?
diff --git a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
index df48a3a..9025b95 100644
--- a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
+++ b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
@@ -281,18 +281,14 @@ SET
state = 'running',
attempt = river_job.attempt + 1,
attempted_at = coalesce($1::timestamptz, now()),
- attempted_by = CASE WHEN array_length(river_job.attempted_by, 1) >= $2::int
- THEN array_append(
- -- +2 instead of +1 because in one of those history
- -- making mistakes that's likely in aggregate cost
- -- humanity >$10B in bugs and lost productivity by
- -- now, like strings, Postgres array indexing start
- -- at 1 instead of 0.
- river_job.attempted_by[array_length(river_job.attempted_by, 1) + 2 - $2:],
- $3::text
- )
- ELSE array_append(river_job.attempted_by, $3::text)
- END
+ attempted_by = (
+ coalesce(river_job.attempted_by, '{}'::text[])
+ || $2::text
+ -- +2 instead of +1 because in one of those history making mistakes
+ -- that's likely in aggregate cost humanity >$10B in bugs and lost
+ -- productivity by now, like strings, Postgres array indexing start
+ -- at 1 instead of 0.
+ )[ GREATEST(cardinality(river_job.attempted_by) + 2 - $3::int, 1) : ]
FROM
locked_jobs
WHEREThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the existing one to flatten it out a bit more, but TBH I have trouble reading this new version, so going to largely leave it as the original.
bgentry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea to improve but otherwise LGTM!
| @attempted_by::text | ||
| ) | ||
| ELSE array_append(river_job.attempted_by, @attempted_by::text) | ||
| END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be just a little cleaner than the CASE?
diff --git a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
index df48a3a..9025b95 100644
--- a/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
+++ b/riverdriver/riverdatabasesql/internal/dbsqlc/river_job.sql.go
@@ -281,18 +281,14 @@ SET
state = 'running',
attempt = river_job.attempt + 1,
attempted_at = coalesce($1::timestamptz, now()),
- attempted_by = CASE WHEN array_length(river_job.attempted_by, 1) >= $2::int
- THEN array_append(
- -- +2 instead of +1 because in one of those history
- -- making mistakes that's likely in aggregate cost
- -- humanity >$10B in bugs and lost productivity by
- -- now, like strings, Postgres array indexing start
- -- at 1 instead of 0.
- river_job.attempted_by[array_length(river_job.attempted_by, 1) + 2 - $2:],
- $3::text
- )
- ELSE array_append(river_job.attempted_by, $3::text)
- END
+ attempted_by = (
+ coalesce(river_job.attempted_by, '{}'::text[])
+ || $2::text
+ -- +2 instead of +1 because in one of those history making mistakes
+ -- that's likely in aggregate cost humanity >$10B in bugs and lost
+ -- productivity by now, like strings, Postgres array indexing start
+ -- at 1 instead of 0.
+ )[ GREATEST(cardinality(river_job.attempted_by) + 2 - $3::int, 1) : ]
FROM
locked_jobs
WHERE55a1f73 to
78c9baa
Compare
As described by #972, it may be possible for huge numbers of snoozes to blow out a job row's `attempted_by` array as a job is locked over and over again. Multiplied by many jobs, this can produce vast quantities of data that gets sent over the network. Here, put in a ratchet on `attempted_by` so that if the array becomes larger than 100 elements, we knock the oldest one off in favor of the most recent client and the most fresh 99. Unfortunately the implementation isn't particularly clean in either Postgres or SQLite. In Postgres it would've been cleaner if we'd had the `attempted_by` in reverse order so the new client was on front because the built-in array functions would be friendlier to that layout, but because it's not, we have to do something a little hackier involving a `CASE` statement instead. SQLite is even worse. SQLite has no array functions at all, which doesn't help, but moreover every strategy I tried ended up blocked by a sqlc SQLite bug, so after trying everything I could think of, I ended up having to extract the piece that does the array truncation into a SQL template string to get this over the line. This could be removed in the future if any one of a number of outstanding sqlc bugs are fixed (e.g. [1]). [1] sqlc-dev/sqlc#3610
d7d9cc0 to
90ee8dc
Compare
As described by #972, it may be possible for huge numbers of snoozes to
blow out a job row's
attempted_byarray as a job is locked over andover again. Multiplied by many jobs, this can produce vast quantities of
data that gets sent over the network.
Here, put in a ratchet on
attempted_byso that if the array becomeslarger than 100 elements, we knock the oldest one off in favor of the
most recent client and the most fresh 99.
Unfortunately the implementation isn't particularly clean in either
Postgres or SQLite. In Postgres it would've been cleaner if we'd had the
attempted_byin reverse order so the new client was on front becausethe built-in array functions would be friendlier to that layout, but
because it's not, we have to do something a little hackier involving a
CASEstatement instead.SQLite is even worse. SQLite has no array functions at all, which
doesn't help, but moreover every strategy I tried ended up blocked by a
sqlc SQLite bug, so after trying everything I could think of, I ended up
having to extract the piece that does the array truncation into a SQL
template string to get this over the line. This could be removed in the
future if any one of a number of outstanding sqlc bugs are fixed (e.g.
[1]).
[1] sqlc-dev/sqlc#3610