Skip to content

Conversation

@supervacuus
Copy link
Collaborator

@supervacuus supervacuus commented Jan 19, 2026

📜 Description

This implements the second aspect of the tombstone integration: merging tombstone events with crash events coming from sentry-native. It does this in the following way:

  • No changes to the OutboxSender, but instead let TombstoneIntegration own the outbox folder before the OutboxSender operates on it (i.e., we rely on integration instantiation order and the fact that all async integration work is done on a single-threaded executor service).
  • NativeEventCollector implements the duplication wrt OutboxSender, but specifically for handling envelopes that contain native crashes
  • The upside is that we don't have to touch non-Android-specific code without making it more generic. If it turns out we want to extend OutboxSender later, that change wouldn't be a considerable effort.
  • Introduces a correlation id that allows us to map exactly between native SDK events and tombstones (vs correlating via timestamp). This requires changes to sentry-native, so for now it's only exposed but not used.
  • TombstonePolicy searches, matches, and merges events
  • merging is essentially: take the native SDK event and replace debugMeta, exception, and threads
  • Right now, these events will go through the same enrichment as the tombstones, which might or might not be the right choice

This is primarily an end-to-end implementation to discuss the trade-offs listed.

💡 Motivation and Context

This is the last essential step before releasing the tombstone integration to a wider audience. It maps directly to
Implement merge mechanism with the Native SDK envelope

💚 How did you test it?

  • only manually up to now
  • Once the direction is agreed on, will add unit+integration tests

📝 Checklist

  • I added GH Issue ID & Linear ID
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

  • decide on this approach or extending OutboxSender and surrounding infra more generically (we could introduce a callback for the core of the OutboxSender decision and set up integrations that use it accordingly)
  • decide merging strategy
  • decide on enrichment (maybe we should use a more default EventProcessor for merge events?)
  • Related to the above:
    • Should correlation only be attempted for the latest tombstone?
    • How should historical tombstones be matched? (Since we have all the actual data from the moment of the event in the native envelope, they will likely be more accurate for that moment in time vs backfilling, but we don't track the same amount of context information)
  • Add correlation id to the Native SDK

@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2026

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (distribution) Add install_groups support by runningcode in #5062
  • Merge tombstone and native sdk events by supervacuus in #5037

Bug Fixes 🐛

  • Establish native exception mechanisms by supervacuus in #5052

Internal Changes 🔧

Deps

  • Bump urllib3 from 2.6.0 to 2.6.3 in the pip group across 1 directory by dependabot in #5003
  • Update Native SDK to v0.12.4 by github-actions in #5061
  • Bump getsentry/github-workflows/.github/workflows/updater.yml from 2 to 3 by dependabot in #4884
  • Bump actions/cache from 4 to 5 by dependabot in #4997
  • Bump github/codeql-action from 4.31.10 to 4.31.11 by dependabot in #5057
  • Bump getsentry/craft from 2.19.0 to 2.20.0 by dependabot in #5058

Other

  • (android) Update targetSdk to API 36 (Android 16) by markushi in #5016
  • (ci) Write permission for statuses in changelog preview by supervacuus in #5053
  • (samples) Convert main screen to Jetpack Compose by markushi in #5017

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2026

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against e6f6788

@supervacuus
Copy link
Collaborator Author

@markushi and @romtsn, can you have a first look so we can discuss trade-offs? Thx!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2026

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 312.68 ms 359.04 ms 46.36 ms
Size 1.58 MiB 2.19 MiB 623.75 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
fc5ccaf 322.49 ms 405.25 ms 82.76 ms
e59e22a 329.74 ms 383.31 ms 53.57 ms
ae7fed0 293.84 ms 380.22 ms 86.38 ms
fcec2f2 314.96 ms 373.66 ms 58.70 ms
dba088c 320.59 ms 361.29 ms 40.70 ms
b03edbb 314.90 ms 350.22 ms 35.33 ms
ee747ae 358.21 ms 389.41 ms 31.20 ms
b6702b0 395.86 ms 409.98 ms 14.12 ms
fcec2f2 328.91 ms 387.75 ms 58.84 ms
dcc6bbf 382.58 ms 462.13 ms 79.54 ms

App size

Revision Plain With Sentry Diff
fc5ccaf 1.58 MiB 2.13 MiB 557.54 KiB
e59e22a 1.58 MiB 2.20 MiB 635.34 KiB
ae7fed0 1.58 MiB 2.12 MiB 551.77 KiB
fcec2f2 1.58 MiB 2.12 MiB 551.50 KiB
dba088c 1.58 MiB 2.13 MiB 558.99 KiB
b03edbb 1.58 MiB 2.13 MiB 557.32 KiB
ee747ae 1.58 MiB 2.10 MiB 530.95 KiB
b6702b0 1.58 MiB 2.12 MiB 551.79 KiB
fcec2f2 1.58 MiB 2.12 MiB 551.50 KiB
dcc6bbf 1.58 MiB 2.12 MiB 553.10 KiB

Previous results on branch: feat/tombstone_native_sdk_merge

Startup times

Revision Plain With Sentry Diff
d091211 325.47 ms 371.58 ms 46.11 ms
6ad3f8e 313.98 ms 362.27 ms 48.29 ms
14ebc28 320.09 ms 398.66 ms 78.57 ms
dc79daa 333.69 ms 382.49 ms 48.80 ms
7aef08b 286.72 ms 361.82 ms 75.09 ms

App size

Revision Plain With Sentry Diff
d091211 1.58 MiB 2.19 MiB 622.38 KiB
6ad3f8e 1.58 MiB 2.19 MiB 622.25 KiB
14ebc28 1.58 MiB 2.20 MiB 637.07 KiB
dc79daa 1.58 MiB 2.19 MiB 622.86 KiB
7aef08b 1.58 MiB 2.19 MiB 622.25 KiB

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking quite promising already, I think this is the right track. Left a few comments for further discussion.

*/
private static void setNativeCrashCorrelationId(
final @NotNull Context context, final @NotNull SentryAndroidOptions options) {
final String correlationId = UUID.randomUUID().toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As anyone could call this API to store information, it might be a good idea to ensure it's our data we're reading back later. I guess a simple prefix could work e.g. sentry-app-run-<uuid>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, and this is only meant as a proposal that survived from 2022:

  • processStateSummary is global and generally owned by the app
  • If we write into this, we should first check if there is already something in there
  • Even if nothing was in there, any later code can overwrite it without us knowing anything about it
  • The maximum length is 128 bytes, and writing into it is free-form, so there is no way for binary stability except if we communicate that Sentry owns this if you use the SDK (not sure if that is a good idea).
  • If we prefix it with something generic like sentry-app-run-, it shows that this might be used beyond native crash correlation (it will be accessible from any ApplicationExitInfo).

I implemented this because we considered it a stable correlation mechanism back then, but I think timestamp correlation is stable enough and doesn't depend on app-owned storage.

It's likely a good idea to drop this altogether, and if we introduce it, maybe together with other ApplicationExitInfo handlers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fully agree, I think the timestamp correlation is already good enough!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in b832e7c

.getLogger()
.log(SentryLevel.DEBUG, "Scanning %d files in outbox for native events.", files.length);

for (final File file : files) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried of ending up with too many envelopes in memory here and potentially running into OOMs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me which code in particular is likely to cause references to spill? I'm not entirely sure. My assumption was that there would typically be very few native events (on average, 1 after each native crash), since they are usually sent on the next restart (and we only collect native events and their envelopes). So I would guess that, while iterating over and deserializing all envelopes, each iteration should leave the discarded ones ready for GC, right?

My bigger worry was the attachments that we pass around with the envelopes (yes, they are currently not attached to the tombstone events). I am fine with not passing around the envelope for the event, but we must ensure it is loaded at some point. Was this your concern? We could store only the envelope path in NativeEventData and only load it once we have a matching tombstone. Or did I completely misunderstand?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see we extractNativeEventFromFile() parses any envelope, so there could be transactions, or multiple crashes, e.g. if the device was offline. Or e.g. the app continuously crashes (in native code) during startup, a fix get released and now all unset envelopes pile up.

So I would just be a bit more defensive here, e.g. drop transactions, and have a sane maximum for nativeEvents.size()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would just be a bit more defensive here, e.g. drop transactions, and have a sane maximum for nativeEvents.size()

I would love to be defensive; I would like to discard envelopes based solely on the filename (or other metadata), rather than having to read them. But even with reading them i am not sure about the concrete consequences of your proposal:

  • In extractNativeEventFromFile(), when iterating over the envelope items, we drop everything that isn't an event; isn't that what you mean by "drop transactions"?
    for (final SentryEnvelopeItem item : envelope.getItems()) {
    if (!SentryItemType.Event.equals(item.getHeader().getType())) {
    continue;
    }
  • RE: sane maximum:
    • If we limit the native events that we collect for matching, should the other events be discarded (since we assume sending a pure tombstone in their place) or allow a duplicate?
    • How do we know which events are the most significant/recent? Should we rely on disk timestamps?
    • We would, in any case, read more envelopes at least until decoding and reaching the maximum collection size, than we actually collect. So if heap pressure during parsing/decoding until those pieces are GCed is the worry, then the collection threshold is still happening too late.
    • If otoh, passing around the event + envelope is the worry, then we could reduce the NativeEventData to just timestamp and File and re-read the data only once there is an actual match.

The true cost is, of course, that we have to read, decode all envelopes, and deserialize every event, to verify that it is a native event and read its timestamp.

Copy link
Collaborator Author

@supervacuus supervacuus Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The true cost is, of course, that we have to read, decode all envelopes, and deserialize every event, to verify that it is a native event and read its timestamp.

To be clear: We can make this part of the NativeEventCollector stream the envelopes (we only need item headers to skip anything but events) and the event (using a JsonObjectReader on an InputStream that is bound by the payload size), avoiding loading the entire envelope and hydrating the full object graph of each event until we actually have a match with a tombstone (at which point we would just use the current implementation extractNativeEventFromFile()).

I can't work on this today, but tomorrow you can have a basic implementation. That would eliminate the worry about loading a bunch of items into memory that we will never use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants