Conversation
…eRow Add get_metadata to avoid full payload reads for HEAD requests, tombstone checks, and delete operations. BigTable override uses ColumnQualifierRegexFilter, GCS skips the alt=media download. Add delete_and_detect using BigTable's CheckAndMutateRow to atomically delete and detect tombstones in a single RPC. Co-Authored-By: Claude <noreply@anthropic.com>
The BigTable `delete_and_detect` used a payload-column predicate (`ColumnQualifierRegexFilter(^p$)`) to distinguish tombstones from real objects. Since `put_row` always writes a `p` cell — even for tombstones with empty bytes — the predicate matched tombstones as real objects, causing the service layer to skip the long-term delete and orphan data in GCS. Switch to a Chain filter on the metadata column value to detect `is_redirect_tombstone:true`. Also rename `delete_and_detect` to `delete_and_check_tombstone` with a binary `TombstoneCheckResponse`, restore TTI bump in `get_metadata`, and add an integration test with BigTable + GCS that verifies no orphans after delete. Co-Authored-By: Claude <noreply@anthropic.com>
Move TTI bumping into shared helper methods so both `get_object` and `get_metadata` consistently extend idle time. BigTable skips the bump for now since it requires rewriting the full row including payload. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace three duplicated retry loops with a single generic with_retry function. Unifies metrics and log levels across read, mutate, and check_and_mutate operations. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| filter: Some(v2::row_filter::Filter::ValueRegexFilter( | ||
| // RE2 full-match: .* anchors required since the | ||
| // regex must match the entire cell value. | ||
| b".*\"is_redirect_tombstone\":true.*".to_vec(), | ||
| )), |
There was a problem hiding this comment.
insane edge case but somebody could store custom metadata that contains this text. if we can't deserialize metadata in these filters then this field name should include characters that we don't allow in custom metadata keys/values or we should put our name in the string or something
There was a problem hiding this comment.
Excellent catch!
A better way would be to use protobuf instead of JSON because we can then read fields inside the column directly - I believe this is even possible for predicates. However, that would require a migration at this point and in the meanwhile we need a solution for this anyway. So we'll stick with JSON and try to harden it.
There was a problem hiding this comment.
I addressed this now with a somewhat dirty workaround: We're locking in the tombstone marker at first position, so we can match the metadata at the start.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ent, S3 size Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* origin/main: feat(gcs): Introduce retries (#279) build(deps): bump cryptography from 46.0.2 to 46.0.5 (#299) ref(service): Add metadata API, fix delete orphans, simplify BigTable backend (#298) ref(server): Add MeteredBody extractor and wrap_stream util (#293) meta(claude): Add default permissions for claude (#297) docs(clients): Restructure Rust and Python client READMEs (#294) ci: Add working directory to changelog-preview workflow (#295) feat(types): Add origin as built-in metadata field (#292) fix(metrics): Exclude health check endpoints from request metrics (#290) fix(service): Add backend tags to delete timing metric (#291) meta(ai): Add AGENTS file (#288) feat(killswitches): Add service filtering with x-downstream-service header (#287) build(deps): bump time from 0.3.44 to 0.3.47 (#285) meta(git): Ignore claude local settings (#286) # Conflicts: # clients/rust/README.md
Add
get_metadatato theBackendtrait so that HEAD requests, tombstone checks, and delete operations can avoid full payload reads. BigTable usesColumnQualifierRegexFilterfor column-level reads; GCS skipsalt=media.Tombstone-safe delete sequence
When deleting an object that lives on the long-term backend, the high-volume backend holds a redirect tombstone. The delete sequence ensures the tombstone is only removed after the long-term object is gone, so data is never orphaned:
BigTable conditional delete
BigTable implements
delete_non_tombstoneas a singleCheckAndMutateRowRPC. A regex predicate on the metadata column detectsis_redirect_tombstone:true:DeleteFromRow— the row is removed.This keeps the common-case delete at 1 RPC while preserving the tombstone for the two-phase long-term cleanup.
Additional changes
get_metadatacalls. BigTable conditionally fetches the payload column and rewrites the row only when a bump is needed (~1/day due to debounce), keeping the common case at 1 RPC. The bump is best-effort — failures don't fail the read.RowData,read_row,with_retry) that eliminate duplicated retry loops, cell parsing, and expiry checking.