Skip to content

Conversation

@GaneshPatil7517
Copy link
Contributor

@GaneshPatil7517 GaneshPatil7517 commented Jan 17, 2026

Which issue does this PR close?

Fixes #19768

Rationale for this change

Currently, ArrayAgg always produces an output with data type:

DataType::List(Arc::new(Field::new_list_field(
    arg_types[0].clone(),
    true,
)))

Copilot AI review requested due to automatic review settings January 17, 2026 11:29
@github-actions github-actions bot added the functions Changes to functions implementation label Jan 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to preserve the input field's nullability in the ArrayAgg aggregate function's return field. When the input field is non-nullable, the result field would be marked as non-nullable; when the input is nullable, the result field would remain nullable.

Changes:

  • Added a return_field method override to the ArrayAgg implementation that sets the result field's nullability to match the input field's nullability
  • Added a unit test to verify that the schema metadata correctly reflects the input field's nullability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 668ce8c to b833445 Compare January 17, 2026 12:03
@github-actions github-actions bot added the core Core DataFusion crate label Jan 17, 2026
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 114f5bc to 2b2a41b Compare January 17, 2026 12:17
@github-actions github-actions bot added the spark label Jan 17, 2026
- Override return_field() to preserve input field nullability in list elements

- Add input_nullable field to all ArrayAgg accumulators

- List itself remains nullable (NULL for empty groups)

- Inner elements nullability now matches input field nullability
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 024785b to 24e57cb Compare January 17, 2026 17:47
- Separate state() from evaluate() in all three accumulators
- state() always uses nullable=true for inner elements (matches state_fields())
- evaluate() uses input_nullable to preserve nullability for final output
- Fix clippy: use std::slice::from_ref() instead of .clone() on FieldRef
- Fix array_sort return_type() to preserve input inner field nullability
- Fix clippy: use Arc::clone() instead of .clone() on FieldRef
- Update unnest_with_redundant_columns snapshot for non-nullable inner
- Make return_type() return error since return_field() is implemented
- Simplify comments in return_field()
- Revert array_sort changes (unrelated to original issue)
- Revert spark/collect.rs changes (UDAF return_field not fixed)
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from 84c066f to 402927a Compare January 19, 2026 12:05
@GaneshPatil7517
Copy link
Contributor Author

ok ill update this all

@Jefffrey
Copy link
Contributor

@GaneshPatil7517 Please refrain from closing a discussion when no action has been taken or no comment was left. My comments regarding the state related functions don't seem to have been addressed.

- Update state_fields() to preserve input nullability (same as return_field)
- Simplify state() to just call evaluate() since they now share the same nullability
- Remove duplicate handling code in all three accumulators
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from fce0b8b to f7c3206 Compare January 20, 2026 05:11
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 20, 2026
array_sort must preserve the input list's inner field nullability to match
the type returned by array_agg which now preserves input nullability.
@GaneshPatil7517 GaneshPatil7517 force-pushed the feature/arrayagg-preserve-nullability branch from e744abe to 4d0118d Compare January 20, 2026 05:52
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaneshPatil7517 I strongly urge you to do a self-review on PRs; I'm seeing an unrelated file be accidentally committed, as well as the array-sort changes coming back without any reason given

}

fn state(&mut self) -> Result<Vec<ScalarValue>> {
if !self.is_input_pre_ordered {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this line?

DataType::Null => Ok(DataType::Null),
DataType::List(field) => {
Ok(DataType::new_list(field.data_type().clone(), true))
// Preserve the inner field's nullability from input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changes coming back again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate documentation Improvements or additions to documentation functions Changes to functions implementation spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance ArrayAgg to preserve input field nullability

2 participants