Skip to content

Conversation

@ferruzzi
Copy link
Contributor

When I implemented UUID reuse to reduce DAG re-serialization, my strategy was too overzealous. I blindly reused UUIDs if the DAG previously had ANY deadlines, which caused two bugs:

  1. If the Dag had multiple DeadlineAlerts, they failed silently (no new DeadlineAlert records were created)
  2. Deadline property changes (interval/reference/callback) were ignored if you edited the Dag once it was initially parsed

We now compare existing DeadlineAlert records and check if any definitions changed.

New logic:

  • Reuse UUIDs when ALL deadlines match (preserves hash, avoids unnecessary writes and re-serialization)
  • Generate new UUIDs when any deadline changes (updates the hash and creates new records)

Also improved the relevant unit tests which would have caught this to prevent regression in the future.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

When I implemented UUID reuse to reduce DAG re-serialization, the code was too overzealous.  I reused UUIDs if the DAG previously had ANY deadlines, which caused two bugs:

1. Multiple deadline alerts failed silently (no DeadlineAlert records created)
2. Deadline property changes (interval/reference/callback) were ignored

We now compare existing DeadlineAlert records and check if any definitions changed.

New logic:
- Reuse UUIDs when ALL deadlines match (preserves hash, avoids unnecessary writes and re-serialization)
- Generate new UUIDs when any deadline changes (updates the hash and creates new records)
@uranusjr
Copy link
Member

This seems right but I kind of wonder if there’s a more straightforward design to this. Maybe by not using a UUID field on the model?

deadline_data = deadline_alert.get(Encoding.VAR, deadline_alert)
# Create a temporary alert for comparison
temp_alert = DeadlineAlertModel(
id="temp", # id is required for the object but isn't included in the __eq__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have multiple DeadlineAlertModels with the same property but different IDs?

DeadlineAlertModel(id=1, reference="the same"...)
DeadlineAlertModel(id=2, reference="the same"...)

If it's possible, it looks incorrect to use __eq__ to me. Comparing those attributes directly might seem lengthy, but it is probably more accurate and less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can happen. Two different Dags can reuse the same DeadlineAlert object, for example:

my_deadline = DeadlineAlert(stuff)

with DAG(id="dag1", deadline=my_deadline,...) as dag1:
   ....
   
with DAG(id="dag2", deadline=my_deadline,...) as dag2:
   ....

would result in two DeadlineAlertModel entries identical other than their unique id and serialized_dag_id fields. I think the custom __eq__ method is the right call here. The alternative to just compare those three fields between the two objects would just be duplicating that function, unless I'm missing something that you are seeing?

Or perhaps I misunderstood and the point was that the custom __eq__ was what is causing confusion because someone might assume it's using object equivalence instead of just checking those three fields? Maybe I need a comment in the code to call out that it's a custom __eq__ and not the default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps I misunderstood and the point was that the custom eq was what is causing confusion because someone might assume it's using object equivalence

Yes, this is how I feel and why I'm confused 🤔

Maybe I need a comment in the code to call out that it's a custom eq and not the default?

I feel like creating a new function would be less confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants