Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

When using DataFrameWriteOptions::with_single_file_output(true), the setting was being ignored if the output path didn't have a file extension. For example:

df.write_parquet("/path/to/output", 
    DataFrameWriteOptions::new().with_single_file_output(true), 
    None).await?;

Would create a directory /path/to/output/ with files inside instead of a single file at /path/to/output.

This happened because the demuxer used a heuristic based solely on file extension, ignoring the explicit user setting.

What changes are included in this PR?

  • Added single_file_output: Option<bool> to FileSinkConfig
  • Added test test_single_file_output_without_extension to verify the fix

Are these changes tested?

  • New unit test test_single_file_output_without_extension tests the fixed behavior
  • All sqllogictest pass

Are there any user-facing changes?

FileSinkConfig now has a new required field single_file_output: Option. Any code that constructs FileSinkConfig
directly will need to add this field. Set to None to preserve existing heuristic behavior.

@github-actions github-actions bot added core Core DataFusion crate catalog Related to the catalog crate proto Related to proto crate datasource Changes to the datasource crate labels Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: DataFrameWriteOptions::with_single_file_output produces a directory

1 participant