-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Fix deadline alert hashing bug #61702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix deadline alert hashing bug #61702
Conversation
When I implemented UUID reuse to reduce DAG re-serialization, the code was too overzealous. I reused UUIDs if the DAG previously had ANY deadlines, which caused two bugs: 1. Multiple deadline alerts failed silently (no DeadlineAlert records created) 2. Deadline property changes (interval/reference/callback) were ignored We now compare existing DeadlineAlert records and check if any definitions changed. New logic: - Reuse UUIDs when ALL deadlines match (preserves hash, avoids unnecessary writes and re-serialization) - Generate new UUIDs when any deadline changes (updates the hash and creates new records)
|
This seems right but I kind of wonder if there’s a more straightforward design to this. Maybe by not using a UUID field on the model? |
| deadline_data = deadline_alert.get(Encoding.VAR, deadline_alert) | ||
| # Create a temporary alert for comparison | ||
| temp_alert = DeadlineAlertModel( | ||
| id="temp", # id is required for the object but isn't included in the __eq__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have multiple DeadlineAlertModels with the same property but different IDs?
DeadlineAlertModel(id=1, reference="the same"...)
DeadlineAlertModel(id=2, reference="the same"...)
If it's possible, it looks incorrect to use __eq__ to me. Comparing those attributes directly might seem lengthy, but it is probably more accurate and less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that can happen. Two different Dags can reuse the same DeadlineAlert object, for example:
my_deadline = DeadlineAlert(stuff)
with DAG(id="dag1", deadline=my_deadline,...) as dag1:
....
with DAG(id="dag2", deadline=my_deadline,...) as dag2:
....
would result in two DeadlineAlertModel entries identical other than their unique id and serialized_dag_id fields. I think the custom __eq__ method is the right call here. The alternative to just compare those three fields between the two objects would just be duplicating that function, unless I'm missing something that you are seeing?
Or perhaps I misunderstood and the point was that the custom __eq__ was what is causing confusion because someone might assume it's using object equivalence instead of just checking those three fields? Maybe I need a comment in the code to call out that it's a custom __eq__ and not the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps I misunderstood and the point was that the custom eq was what is causing confusion because someone might assume it's using object equivalence
Yes, this is how I feel and why I'm confused 🤔
Maybe I need a comment in the code to call out that it's a custom eq and not the default?
I feel like creating a new function would be less confusing.
When I implemented UUID reuse to reduce DAG re-serialization, my strategy was too overzealous. I blindly reused UUIDs if the DAG previously had ANY deadlines, which caused two bugs:
We now compare existing DeadlineAlert records and check if any definitions changed.
New logic:
Also improved the relevant unit tests which would have caught this to prevent regression in the future.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.