-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(spark): Add SessionStateBuilderSpark to datafusion-spark
#19865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
with_spark_features to SessionStateBuilder
with_spark_features to SessionStateBuilderwith_spark_features to SessionStateBuilder
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its better to introduce a new trait (e.g. SessionStateBuilderSparkExt, though with a better name) to datafusion-spark containing the new with_spark_features method and impl this onto SessionStateBuilder to avoid needing having core depend on datafusion-spark
Souds good, updated the code to use that approach |
with_spark_features to SessionStateBuilderSessionStateBuilderSpark to datafusion-spark
SessionStateBuilderSpark to datafusion-sparkSessionStateBuilderSpark to datafusion-spark
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @comphead
datafusion/spark/src/lib.rs
Outdated
| //! ``` | ||
| //! | ||
| //! Then use the extension trait: | ||
| //! ```ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to avoid ignore here if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
Which issue does this PR close?
Rationale for this change
Currently, combining DataFusion's default features with Spark features is awkward because:
with_default_features().build()to take precedenceregister_all)This requires splitting the setup into multiple phases, which is verbose and error-prone.
What changes are included in this PR?
SessionStateBuilderSparkextension trait indatafusion-sparkthat provideswith_spark_features()method to register both the Spark expression planner (with correct precedence) and all Spark UDFs in one callcorefeature flag todatafusion-sparkwithdatafusionas an optional dependency (this avoids havingdatafusion-coredepend ondatafusion-spark)datafusion-sparkcrate documentation with usage exampledatafusion-sqllogictestto use the new extension traitAre these changes tested?
Yes, there is a unit test in
datafusion-spark/src/session_state.rsplus the existing Spark SQLLogicTest suite validates that all Spark functions work correctly. The test context in datafusion-sqllogictest now uses theSessionStateBuilderSparkextension trait, serving as both a usage example and integration test.Are there any user-facing changes?
Yes, this adds a new public API:
SessionStateBuilderSparkextension trait (behind thecorefeature flag indatafusion-spark).