-
-
Notifications
You must be signed in to change notification settings - Fork 465
feat: merge tombstone and native sdk events #5037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Semver Impact of This PR🟡 Minor (new features) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Bug Fixes 🐛
Internal Changes 🔧Deps
Other
🤖 This preview updates automatically when you update the PR. |
|
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| fc5ccaf | 322.49 ms | 405.25 ms | 82.76 ms |
| e59e22a | 329.74 ms | 383.31 ms | 53.57 ms |
| ae7fed0 | 293.84 ms | 380.22 ms | 86.38 ms |
| fcec2f2 | 314.96 ms | 373.66 ms | 58.70 ms |
| dba088c | 320.59 ms | 361.29 ms | 40.70 ms |
| b03edbb | 314.90 ms | 350.22 ms | 35.33 ms |
| ee747ae | 358.21 ms | 389.41 ms | 31.20 ms |
| b6702b0 | 395.86 ms | 409.98 ms | 14.12 ms |
| fcec2f2 | 328.91 ms | 387.75 ms | 58.84 ms |
| dcc6bbf | 382.58 ms | 462.13 ms | 79.54 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| fc5ccaf | 1.58 MiB | 2.13 MiB | 557.54 KiB |
| e59e22a | 1.58 MiB | 2.20 MiB | 635.34 KiB |
| ae7fed0 | 1.58 MiB | 2.12 MiB | 551.77 KiB |
| fcec2f2 | 1.58 MiB | 2.12 MiB | 551.50 KiB |
| dba088c | 1.58 MiB | 2.13 MiB | 558.99 KiB |
| b03edbb | 1.58 MiB | 2.13 MiB | 557.32 KiB |
| ee747ae | 1.58 MiB | 2.10 MiB | 530.95 KiB |
| b6702b0 | 1.58 MiB | 2.12 MiB | 551.79 KiB |
| fcec2f2 | 1.58 MiB | 2.12 MiB | 551.50 KiB |
| dcc6bbf | 1.58 MiB | 2.12 MiB | 553.10 KiB |
Previous results on branch: feat/tombstone_native_sdk_merge
Startup times
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| d091211 | 325.47 ms | 371.58 ms | 46.11 ms |
| 6ad3f8e | 313.98 ms | 362.27 ms | 48.29 ms |
| 14ebc28 | 320.09 ms | 398.66 ms | 78.57 ms |
| dc79daa | 333.69 ms | 382.49 ms | 48.80 ms |
| 7aef08b | 286.72 ms | 361.82 ms | 75.09 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| d091211 | 1.58 MiB | 2.19 MiB | 622.38 KiB |
| 6ad3f8e | 1.58 MiB | 2.19 MiB | 622.25 KiB |
| 14ebc28 | 1.58 MiB | 2.20 MiB | 637.07 KiB |
| dc79daa | 1.58 MiB | 2.19 MiB | 622.86 KiB |
| 7aef08b | 1.58 MiB | 2.19 MiB | 622.25 KiB |
markushi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking quite promising already, I think this is the right track. Left a few comments for further discussion.
| */ | ||
| private static void setNativeCrashCorrelationId( | ||
| final @NotNull Context context, final @NotNull SentryAndroidOptions options) { | ||
| final String correlationId = UUID.randomUUID().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As anyone could call this API to store information, it might be a good idea to ensure it's our data we're reading back later. I guess a simple prefix could work e.g. sentry-app-run-<uuid>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, and this is only meant as a proposal that survived from 2022:
processStateSummaryis global and generally owned by the app- If we write into this, we should first check if there is already something in there
- Even if nothing was in there, any later code can overwrite it without us knowing anything about it
- The maximum length is 128 bytes, and writing into it is free-form, so there is no way for binary stability except if we communicate that Sentry owns this if you use the SDK (not sure if that is a good idea).
- If we prefix it with something generic like
sentry-app-run-, it shows that this might be used beyond native crash correlation (it will be accessible from anyApplicationExitInfo).
I implemented this because we considered it a stable correlation mechanism back then, but I think timestamp correlation is stable enough and doesn't depend on app-owned storage.
It's likely a good idea to drop this altogether, and if we introduce it, maybe together with other ApplicationExitInfo handlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fully agree, I think the timestamp correlation is already good enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in b832e7c
sentry-android-core/src/main/java/io/sentry/android/core/TombstoneIntegration.java
Outdated
Show resolved
Hide resolved
| .getLogger() | ||
| .log(SentryLevel.DEBUG, "Scanning %d files in outbox for native events.", files.length); | ||
|
|
||
| for (final File file : files) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried of ending up with too many envelopes in memory here and potentially running into OOMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me which code in particular is likely to cause references to spill? I'm not entirely sure. My assumption was that there would typically be very few native events (on average, 1 after each native crash), since they are usually sent on the next restart (and we only collect native events and their envelopes). So I would guess that, while iterating over and deserializing all envelopes, each iteration should leave the discarded ones ready for GC, right?
My bigger worry was the attachments that we pass around with the envelopes (yes, they are currently not attached to the tombstone events). I am fine with not passing around the envelope for the event, but we must ensure it is loaded at some point. Was this your concern? We could store only the envelope path in NativeEventData and only load it once we have a matching tombstone. Or did I completely misunderstand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see we extractNativeEventFromFile() parses any envelope, so there could be transactions, or multiple crashes, e.g. if the device was offline. Or e.g. the app continuously crashes (in native code) during startup, a fix get released and now all unset envelopes pile up.
So I would just be a bit more defensive here, e.g. drop transactions, and have a sane maximum for nativeEvents.size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I would just be a bit more defensive here, e.g. drop transactions, and have a sane maximum for
nativeEvents.size()
I would love to be defensive; I would like to discard envelopes based solely on the filename (or other metadata), rather than having to read them. But even with reading them i am not sure about the concrete consequences of your proposal:
- In
extractNativeEventFromFile(), when iterating over the envelope items, we drop everything that isn't an event; isn't that what you mean by "drop transactions"?
sentry-java/sentry-android-core/src/main/java/io/sentry/android/core/NativeEventCollector.java
Lines 208 to 211 in b832e7c
for (final SentryEnvelopeItem item : envelope.getItems()) { if (!SentryItemType.Event.equals(item.getHeader().getType())) { continue; } - RE: sane maximum:
- If we limit the native events that we collect for matching, should the other events be discarded (since we assume sending a pure tombstone in their place) or allow a duplicate?
- How do we know which events are the most significant/recent? Should we rely on disk timestamps?
- We would, in any case, read more envelopes at least until decoding and reaching the maximum collection size, than we actually collect. So if heap pressure during parsing/decoding until those pieces are GCed is the worry, then the collection threshold is still happening too late.
- If otoh, passing around the event + envelope is the worry, then we could reduce the
NativeEventDatato just timestamp and File and re-read the data only once there is an actual match.
The true cost is, of course, that we have to read, decode all envelopes, and deserialize every event, to verify that it is a native event and read its timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The true cost is, of course, that we have to read, decode all envelopes, and deserialize every event, to verify that it is a native event and read its timestamp.
To be clear: We can make this part of the NativeEventCollector stream the envelopes (we only need item headers to skip anything but events) and the event (using a JsonObjectReader on an InputStream that is bound by the payload size), avoiding loading the entire envelope and hydrating the full object graph of each event until we actually have a match with a tombstone (at which point we would just use the current implementation extractNativeEventFromFile()).
I can't work on this today, but tomorrow you can have a basic implementation. That would eliminate the worry about loading a bunch of items into memory that we will never use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📜 Description
This implements the second aspect of the tombstone integration: merging tombstone events with crash events coming from sentry-native. It does this in the following way:
OutboxSender, but instead letTombstoneIntegrationown the outbox folder before theOutboxSenderoperates on it (i.e., we rely on integration instantiation order and the fact that all async integration work is done on a single-threaded executor service).NativeEventCollectorimplements the duplication wrtOutboxSender, but specifically for handling envelopes that contain native crashesOutboxSenderlater, that change wouldn't be a considerable effort.sentry-native, so for now it's only exposed but not used.TombstonePolicysearches, matches, and merges eventsdebugMeta,exception, andthreadsThis is primarily an end-to-end implementation to discuss the trade-offs listed.
💡 Motivation and Context
This is the last essential step before releasing the tombstone integration to a wider audience. It maps directly to
Implement merge mechanism with the Native SDK envelope
💚 How did you test it?
📝 Checklist
sendDefaultPIIis enabled.🔮 Next steps
OutboxSenderand surrounding infra more generically (we could introduce a callback for the core of theOutboxSenderdecision and set up integrations that use it accordingly)EventProcessorfor merge events?)