Skip to content

Conversation

@cht42
Copy link
Contributor

@cht42 cht42 commented Jan 17, 2026

Which issue does this PR close?

Rationale for this change

Currently, combining DataFusion's default features with Spark features is awkward because:

  1. Expression planners must be registered before calling with_default_features().build() to take precedence
  2. UDFs must be registered after the state is built (if using register_all)

This requires splitting the setup into multiple phases, which is verbose and error-prone.

What changes are included in this PR?

  • Added SessionStateBuilderSpark extension trait in datafusion-spark that provides with_spark_features() method to register both the Spark expression planner (with correct precedence) and all Spark UDFs in one call
  • Added core feature flag to datafusion-spark with datafusion as an optional dependency (this avoids having datafusion-core depend on datafusion-spark)
  • Updated datafusion-spark crate documentation with usage example
  • Simplified test context setup in datafusion-sqllogictest to use the new extension trait

Are these changes tested?

Yes, there is a unit test in datafusion-spark/src/session_state.rs plus the existing Spark SQLLogicTest suite validates that all Spark functions work correctly. The test context in datafusion-sqllogictest now uses the SessionStateBuilderSpark extension trait, serving as both a usage example and integration test.

Are there any user-facing changes?

Yes, this adds a new public API: SessionStateBuilderSpark extension trait (behind the core feature flag in datafusion-spark).

@cht42 cht42 changed the title feat: Add with_spark_features to SessionStateBuilder feat: Add with_spark_features to SessionStateBuilder Jan 17, 2026
@github-actions github-actions bot added development-process Related to development process of DataFusion core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) spark labels Jan 17, 2026
@cht42 cht42 mentioned this pull request Jan 17, 2026
@cht42 cht42 changed the title feat: Add with_spark_features to SessionStateBuilder feat(core): Add with_spark_features to SessionStateBuilder Jan 17, 2026
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its better to introduce a new trait (e.g. SessionStateBuilderSparkExt, though with a better name) to datafusion-spark containing the new with_spark_features method and impl this onto SessionStateBuilder to avoid needing having core depend on datafusion-spark

@github-actions github-actions bot removed the core Core DataFusion crate label Jan 17, 2026
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Jan 17, 2026
@cht42
Copy link
Contributor Author

cht42 commented Jan 17, 2026

Maybe its better to introduce a new trait (e.g. SessionStateBuilderSparkExt, though with a better name) to datafusion-spark containing the new with_spark_features method and impl this onto SessionStateBuilder to avoid needing having core depend on datafusion-spark

Souds good, updated the code to use that approach

@cht42 cht42 changed the title feat(core): Add with_spark_features to SessionStateBuilder feat(spark): Add SessionStateBuilderSpark to datafusion-spark Jan 17, 2026
@cht42 cht42 changed the title feat(spark): Add SessionStateBuilderSpark to datafusion-spark feat(spark): Add SessionStateBuilderSpark to datafusion-spark Jan 17, 2026
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @comphead

//! ```
//!
//! Then use the extension trait:
//! ```ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to avoid ignore here if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Add method to register udf and expr planner in one go

2 participants