-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Test]: Add parallel concat_batches and use in HashJoin #19864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Test]: Add parallel concat_batches and use in HashJoin #19864
Conversation
This commit introduces `parallel_concat_batches`, a function that concatenates a slice of `RecordBatch`es by processing each column in a separate Tokio task. This parallel implementation is now used within the `collect_left_input` function in `HashJoinExec` to accelerate the build-side preparation. This can significantly improve performance for joins where the build side consists of many batches that need to be concatenated, particularly when the tables are wide. The `try_create_array_map` function was also made `async` to accommodate the new asynchronous concatenation function.
|
Run benchmark tpch_mem |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Run benchmark tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Run benchmark tpch_mem |
|
🤖 |
|
Run benchmark tpch |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Looks like a nice improvement on this query. |
|
Run benchmark tpch_mem tpcds tpch |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark tpch_mem |
|
🤖 |
|
🤖: Benchmark completed Details
|
This commit introduces
parallel_concat_batches, a function that concatenates a slice ofRecordBatches by processing each column in a separate Tokio task.This parallel implementation is now used within the
collect_left_inputfunction inHashJoinExecto accelerate the build-side preparation. This can significantly improve performance for joins where the build side consists of many batches that need to be concatenated, particularly when the tables are wide.The
try_create_array_mapfunction was also madeasyncto accommodate the new asynchronous concatenation function.Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?